bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[BasicBlockUtils] Generalize DeleteDeadBlock to deal with multiple dead blocks	Max Kazantsev	2019-01-14	1	-36/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Utility function `DeleteDeadBlock` expects that all predecessors of a block being deleted are already deleted, with the exception of single-block loop. It makes it hard to use for deletion of a set of blocks that may contain cyclic dependencies. The is no correct order of invocations of this function that does not produce dangling pointers on already deleted blocks. This patch introduces a generalized version of this function `DeleteDeadBlocks` that allows us to remove multiple blocks at once, even if there are cycles among them. The only requirement is that no block being deleted should have a predecessor that is not being deleted. The logic of `DeleteDeadBlocks` is following: for each block create relevant DT updates; remove all instructions (replace with undef if needed); replace terminator with unreacheable; apply DT updates; for each block delete block; Therefore, `DeleteDeadBlock` becomes a particular case of the general algorithm called for a single block. Differential Revision: https://reviews.llvm.org/D56120 Reviewed By: skatkov llvm-svn: 351045
*	Add support for prefix-only CLI options	Thomas Preud'homme	2019-01-14	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add support for options that always prefix their value, giving an error if the value is in the next argument or if the option is given a value assignment (ie. opt=val). This is the desired behavior for the -D option of FileCheck for instance. Copyright: - Linaro (changes in version 2 of revision D55940) - GraphCore (changes in later versions and introduced when creating D56549) Reviewers: jdenny Subscribers: llvm-commits, probinson, kristina, hiraditya, JonChesterfield Differential Revision: https://reviews.llvm.org/D56549 llvm-svn: 351038
*	[X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select ↵	Craig Topper	2019-01-14	2	-6/+13
\| \| \| \| \| \| \| \|	in IR instead. Fixes PR40259 llvm-svn: 351035
*	[X86] Update type profile for DBPSADBW to indicate the immediate is an i8 ↵	Craig Topper	2019-01-14	1	-1/+1
\| \| \| \| \| \| \| \|	not just any int. Removes some type checks from X86GenDAGISel.inc llvm-svn: 351033
*	[X86] Remove unused intrinsic handlers. NFC	Craig Topper	2019-01-14	2	-39/+2
\| \| \| \|	llvm-svn: 351032
*	[X86] Remove FPCLASS intrinsic handler. Use INTR_TYPE_2OP instead. NFC	Craig Topper	2019-01-14	2	-14/+7
\| \| \| \|	llvm-svn: 351031
*	[X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a ↵	Craig Topper	2019-01-14	3	-42/+19
\| \| \| \| \| \| \| \| \| \|	vXi1 vector. The input mask can be represented with an AND in IR. Fixes PR40258 llvm-svn: 351028
*	[DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)	Simon Pilgrim	2019-01-13	1	-0/+4
\| \| \| \| \|	NOTE: We need more powerful signed overflow detection in computeOverflowKind llvm-svn: 351026
*	Fix unused variable warning. NFCI.	Simon Pilgrim	2019-01-13	1	-1/+0
\| \| \| \|	llvm-svn: 351025
*	[DAGCombiner] Some very basic add/sub saturation combines.	Simon Pilgrim	2019-01-13	1	-0/+64
\| \| \| \| \| \|	Handle combines with zero and constant canonicalization for adds. llvm-svn: 351024
*	[LegalizeDAG] Remove 'NeedInvert' code from expansion of BR_CC. Replace with ↵	Craig Topper	2019-01-13	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \|	an assert. I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work. It calls 'getNOT' on a node that should be a CondCode. I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC? llvm-svn: 351022
*	[X86] Rename overly verbose method; NFC	Nikita Popov	2019-01-13	3	-8/+5
\| \| \| \| \| \|	As suggested on D56636. llvm-svn: 351021
*	[X86] Add more ISD nodes to handle masked versions of ↵	Craig Topper	2019-01-13	5	-12/+152
\| \| \| \| \| \| \| \| \| \|	VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877 llvm-svn: 351018
*	[X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which ↵	Craig Topper	2019-01-13	5	-18/+89
\| \| \| \| \| \| \| \| \| \|	only produces 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877. llvm-svn: 351017
*	Give helper classes/functions local linkage. NFC.	Benjamin Kramer	2019-01-12	8	-4/+14
\| \| \| \|	llvm-svn: 351016
*	[X86] More aggressive shuffle mask widening in combineExtractWithShuffle	Simon Pilgrim	2019-01-12	1	-0/+9
\| \| \| \| \| \|	Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through. llvm-svn: 351013
*	[LoopVectorizer] give more advice in remark about failure to vectorize call	Sanjay Patel	2019-01-12	1	-3/+23
\| \| \| \| \| \| \| \| \| \|	Something like this is requested by: https://bugs.llvm.org/show_bug.cgi?id=40265 ...and it seems like a common enough case that we should acknowledge it. Differential Revision: https://reviews.llvm.org/D56551 llvm-svn: 351010
*	[DAGCombiner] fold insert_subvector of insert_subvector	Sanjay Patel	2019-01-12	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pattern: t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0> t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0> ...shows up in PR33758: https://bugs.llvm.org/show_bug.cgi?id=33758 ...although this patch doesn't make any difference to the final result on that yet. In the affected tests here, it looks like it just makes RA wiggle. But we might as well squash this to prevent it interfering with other pattern-matching. Differential Revision: https://reviews.llvm.org/D56604 llvm-svn: 351008
*	Use getShiftAmountTy for shift amounts.	Simon Pilgrim	2019-01-12	1	-1/+2
\| \| \| \|	llvm-svn: 351005
*	[ORC][MIPS] Fill delay-slot after `jr` instruction	Simon Atanasyan	2019-01-12	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	MIPS `jr` instruction uses a delay-slot. To escape execution of arbitrary instruction we should either fill the delay-slot by `nop` instruction or swap `jr` instruction and logically preceding instruction. This fix implements the second method to generate a bit more effective code. llvm-svn: 351001
*	[ORC][MIPS] Setup t9 register and call function through this register	Simon Atanasyan	2019-01-12	1	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MIPS ABI states that every function must be called through jalr $t9. In other words, a function expect that t9 register points to the beginning of its code. A function uses this register to calculate offset to the Global Offset Table and save it to the `gp` register. ``` lui $gp, %hi(_gp_disp) addiu $gp, %lo(_gp_disp) addu $gp, $gp, $t9 ``` If `t9` and as a result `$gp` point to the wrong place the following code loads incorrect value from GOT and passes control to invalid code. ``` lw $v0,%call16(foo)($gp) jalr $t9 ``` OrcMips32 and OrcMips64 writeResolverCode methods pass control to the resolved address, but do not setup `$t9` before the call. The `t9` holds value of the beginning of `resolver` code so any attempts to call routines via GOT failed. This change fixes the problem. The `OrcLazy/hidden-visibility.ll` test starts to pass correctly. Before the change it fails on MIPS because the `exitOnLazyCallThroughFailure` called from the resolver code could not call libc routine `exit` via GOT. Differential Revision: http://reviews.llvm.org/D56058 llvm-svn: 351000
*	[X86] Improve vXi64 ISD::ABS codegen with SSE41+	Simon Pilgrim	2019-01-12	1	-0/+9
\| \| \| \| \| \| \| \|	Make use of vblendvpd to select on the signbit Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350999
*	[X86][AARCH64] Improve ISD::ABS support	Simon Pilgrim	2019-01-12	4	-6/+56
\| \| \| \| \| \| \| \|	This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998
*	Reapply "[DemandedBits] Use SetVector for Worklist"	Nikita Popov	2019-01-12	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DemandedBits currently uses a simple vector for the worklist, which means that instructions may be inserted multiple times into it. Especially in combination with the deep lattice, this may cause instructions too be recomputed very often. To avoid this, switch to a SetVector. Reapplying with a smaller number of inline elements in the SmallSetVector, to avoid running into the SmallDenseMap issue described in D56455. Differential Revision: https://reviews.llvm.org/D56362 llvm-svn: 350997
*	[X86] Remove X86ISD::SELECT as its no longer used by any of our intrinsic ↵	Craig Topper	2019-01-12	3	-3/+1
\| \| \| \| \| \|	lowering. llvm-svn: 350995
*	[X86] Add ISD node for masked version of CVTPS2PH.	Craig Topper	2019-01-12	5	-22/+59
\| \| \| \| \| \| \| \| \| \|	The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do. This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there. Fixes another instruction for PR34877. llvm-svn: 350994
*	[RISCV] Introduce codegen patterns for RV64M-only instructions	Alex Bradbury	2019-01-12	2	-5/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed on llvm-dev <http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have to be careful when trying to select the *w RV64M instructions. i32 is not a legal type for RV64 in the RISC-V backend, so operations have been promoted by the time they reach instruction selection. Information about whether the operation was originally a 32-bit operations has been lost, and it's easy to write incorrect patterns. Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being selected (and so save instructions to sext/zext the input operands). Differential Revision: https://reviews.llvm.org/D53230 llvm-svn: 350993
*	[RISCV] Add patterns for RV64I SLLW/SRLW/SRAW instructions	Alex Bradbury	2019-01-12	2	-1/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This restores support for selecting the SLLW/SRLW/SRAW instructions, which was removed in rL348067 as the previous patterns made some unsafe assumptions. Also see the related llvm-dev discussion <http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html> Ultimately I didn't introduce a custom SelectionDAG node, but instead added a DAG combine that inserts an AssertZext i5 on the shift amount for an i32 variable-length shift and also added an ANY_EXTEND DAG-combine which will instead produce a SIGN_EXTEND for an i32 variable-length shift, increasing the opportunity to safely select SLLW/SRLW/SRAW. There are obviously different ways of addressing this (a number discussed in the llvm-dev thread), so I'd welcome further feedback and comments. Note that there are now some cases in test/CodeGen/RISCV/rv64i-exhaustive-w-insts.ll where sraw/srlw/sllw is selected even though sra/srl/sll could be used without any extra instructions. Given both are semantically equivalent, there doesn't seem a good reason to prefer one vs the other. Given that would require more logic to still select sra/srl/sll in those cases, I've left it preferring the *w variants. Differential Revision: https://reviews.llvm.org/D56264 llvm-svn: 350992
*	[X86] Remove unnecessary code from getMaskNode.	Craig Topper	2019-01-12	1	-5/+1
\| \| \| \| \| \|	We no longer need to extend mask scalars before bitcasting them to vXi1. This was only needed for the truncate intrinsics. And was really a bug in our lowering of them. llvm-svn: 350991
*	[X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not ↵	Craig Topper	2019-01-12	2	-7/+11
\| \| \| \| \| \| \| \| \| \|	avx512dq, use v16i1 as the intermediate mask type instead of v8i1. We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type. By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned. llvm-svn: 350989
*	[X86] Change some patterns that select MOVZX16rm8 to instead select ↵	Craig Topper	2019-01-12	1	-3/+6
\| \| \| \| \| \| \| \|	MOVZX32rm8 and extract the subregister. This should be a shorter encoding and is consistent with what we do for zext i8->i16 llvm-svn: 350988
*	[ARM] Fix typo	Evandro Menezes	2019-01-12	1	-1/+0
\| \| \| \| \| \|	Fix typo in r350952. llvm-svn: 350986
*	[X86] Add ISD nodes for masked truncate so we can properly represent when ↵	Craig Topper	2019-01-12	5	-150/+305
\| \| \| \| \| \| \| \| \| \| \| \|	the output has more elements than the input due to needing to be 128 bits. We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask. This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that. This fixes some of PR34877 llvm-svn: 350985
*	[AArch64] Improve Exynos predicates	Evandro Menezes	2019-01-11	1	-3/+12
\| \| \| \| \| \| \|	Expand the predicate using shifted arithmetic and logic instructions to also consider the respective not shifted instructions. llvm-svn: 350976
*	[ConstantFolding] Fold undef for integer intrinsics	Nikita Popov	2019-01-11	1	-63/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes https://bugs.llvm.org/show_bug.cgi?id=40110. This implements handling of undef operands for integer intrinsics in ConstantFolding, in particular for the bitcounting intrinsics (ctpop, cttz, ctlz), the with.overflow intrinsics, the saturating math intrinsics and the funnel shift intrinsics. The undef behavior follows what InstSimplify does for the general cas e of non-constant operands. For the bitcount intrinsics (where InstSimplify doesn't do undef handling -- there cannot be a combination of an undef + non-constant operand) I'm using a 0 result if the intrinsic is defined for zero and undef otherwise. Differential Revision: https://reviews.llvm.org/D55950 llvm-svn: 350971
*	[X86] Fix incomplete handling of register-assigned variables in parsing.	Nirav Dave	2019-01-11	1	-185/+205
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Teach x86 assembly operand parsing to distinguish between assembler variable assigned to named registers and those assigned to immediate values. Reviewers: rnk, nickdesaulniers, void Subscribers: hiraditya, jyknight, llvm-commits Differential Revision: https://reviews.llvm.org/D56287 llvm-svn: 350966
*	[AArch64] Add pipeline model for Exynos M4	Evandro Menezes	2019-01-11	2	-1/+1006
\| \| \| \| \| \|	Add the scheduling and cost model for Exynos M4. llvm-svn: 350960
*	[AArch64] Create feature set for Exynos M4	Evandro Menezes	2019-01-11	2	-1/+23
\| \| \| \| \| \|	Complete the feature set for Exynos M4 and update test cases. llvm-svn: 350953
*	[Legalizer] Use correct ValueType of SELECT_CC node during Float promotion	Pirama Arumuga Nainar	2019-01-11	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When legalizing the result of a SELECT_CC node by promoting the floating-point type, use the promoted-to type rather than the original type. Fix PR40273. Reviewers: efriedma, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56566 llvm-svn: 350951
*	[LTO] Record whether LTOUnit splitting is enabled in index	Teresa Johnson	2019-01-11	7	-10/+127
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Records in the module summary index whether the bitcode was compiled with the option necessary to enable splitting the LTO unit (e.g. -fsanitize=cfi, -fwhole-program-vtables, or -fsplit-lto-unit). The information is passed down to the ModuleSummaryIndex builder via a new module flag "EnableSplitLTOUnit", which is propagated onto a flag on the summary index. This is then used during the LTO link to check whether all linked summaries were built with the same value of this flag. If not, an error is issued when we detect a situation requiring whole program visibility of the class hierarchy. This is the case when both of the following conditions are met: 1) We are performing LowerTypeTests or Whole Program Devirtualization. 2) There are type tests or type checked loads in the code. Note I have also changed the ThinLTOBitcodeWriter to also gate the module splitting on the value of this flag. Reviewers: pcc Subscribers: ormris, mehdi_amini, Prazek, inglorion, eraman, steven_wu, dexonsmith, arphaman, dang, llvm-commits Differential Revision: https://reviews.llvm.org/D53890 llvm-svn: 350948
*	[MergeFunc] Erase unused duplicate functions if they are discardable	Vedant Kumar	2019-01-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	MergeFunc only deletes unused duplicate functions if they have local linkage, but it should be safe to relax this to any "discardable if unused" linkage type. Differential Revision: https://reviews.llvm.org/D56574 llvm-svn: 350939
*	[MergeFunc] Use Instruction::getFunction as a cleanup, NFC	Vedant Kumar	2019-01-11	1	-2/+2
\| \| \| \|	llvm-svn: 350938
*	[Jump Threading] Unfold a select insn that feeds a switch via a phi node	Ehsan Amiri	2019-01-11	1	-28/+70
\| \| \| \| \| \| \| \| \| \| \|	Currently when a select has a constant value in one branch and the select feeds a conditional branch (via a compare/ phi and compare) we unfold the select statement. This results in threading the conditional branch later on. Similar opportunity exists when a select (with a constant in one branch) feeds a switch (via a phi node). The patch unfolds select under this condition. A testcase is provided. llvm-svn: 350931
*	[x86] allow insert/extract when matching horizontal ops	Sanjay Patel	2019-01-11	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we limited this transform to cases where the extraction into the build vector happens from vectors of the same type as the build vector, but that's not required. There's a slight potential regression seen in the AVX512 result for phadd -- we're using the 256-bit flavor of the instruction now even though the 128-bit subset is sufficient. The same problem could already be seen in the AVX2 result. Follow-up patches will attempt to narrow that back down. llvm-svn: 350928
*	Revert "[SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI."	Martin Storsjo	2019-01-11	1	-42/+60
\| \| \| \| \| \| \|	This reverts commit r350841, as it actually had functional changes and broke compilation. See PR40290. llvm-svn: 350921
*	[X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. ↵	Craig Topper	2019-01-11	2	-23/+34
\| \| \| \| \| \| \| \| \| \| \| \|	Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector. We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions. Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector. This fixes some of the regressions from D350800. llvm-svn: 350918
*	[WebAssembly] Fix stack pointer store check in RegStackify	Heejin Ahn	2019-01-10	1	-13/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We now use __stack_pointer global and global.get/global.set instruction. This fixes the checking routine for stack_pointer writes accordingly. This also fixes the existing __stack_pointer test in reg-stackify.ll: That test used to pass not because of __stack_pointer clashes but because the function `stackpointer_callee` was not marked as `readnone`, so it was assumed to possibly write to memory arbitraily, and `global.set` instruction was marked as `mayStore` in the .td definition, so they were identified as intervening writes. After we added `readnone` to its attribute, this test fails without this patch. Reviewers: dschuff, sunfish Subscribers: jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D56094 llvm-svn: 350906
*	[MSP430] Minor fixes/improvements for assembler/disassembler	Anton Korobeynikov	2019-01-10	3	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	* Teach AsmParser to recognize @rn in distination operand as 0(rn). * Do not allow Disassembler decoding instructions that have size more than a number of input bytes. * Fix UB in MSP430MCCodeEmitter. Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56547 llvm-svn: 350903
*	[MSP430] Add missing instruction forms	Anton Korobeynikov	2019-01-10	1	-9/+115
\| \| \| \| \| \| \| \| \| \| \|	* Add missing mm, [r\|m]n, [r\|m]p instruction forms. * Fix bit16mc instruction. Patch by Kristina Bessonova! Differential Revision: https://reviews.llvm.org/D56546 llvm-svn: 350902
*	[WebAssembly] Add unimplemented-simd128 subtarget feature	Thomas Lively	2019-01-10	9	-35/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a third attempt, but this time we have vetted it on Windows first. The previous errors were due to an uninitialized class member. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D56560 llvm-svn: 350901