bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][SSE] SimplifyDemandedBitsForTargetNode - PCMPGT(0,X) sign mask	Simon Pilgrim	2019-02-04	6	-48/+10
\| \| \| \| \| \| \| \|	For PCMPGT(0, X) patterns where we only demand the sign bit (e.g. BLENDV or MOVMSK) then we can use X directly. Differential Revision: https://reviews.llvm.org/D57667 llvm-svn: 353051
*	[DAGCombine] Add ADD(SUB,SUB) combines	Simon Pilgrim	2019-02-04	2	-27/+10
\| \| \| \| \| \| \| \|	Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case. We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage. llvm-svn: 353044
*	[AsmPrinter] Remove hidden flag -print-schedule.	Andrea Di Biagio	2019-02-04	48	-129220/+2425
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043
*	[X86] Add a couple of missed ADD combine tests	Simon Pilgrim	2019-02-04	1	-0/+42
\| \| \| \| \| \|	Noticed while investigating PR40483 llvm-svn: 353042
*	Recommit r352660 "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."	Craig Topper	2019-02-04	1	-86/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now print ST0 as 'st' when generating the clobber list for MS inline assembly in clang. This matches what the gcc reg name list expects. Original commit message: This fixes the test case in PR35982 by preventing MMX instructions that read MM0-7 from being moved below EMMS/FEMMS by the post RA scheduler. Though as discussed in bugzilla, this is not a complete fix. There is still the possibility of reordering in IR or by the pre-RA scheduler. Differential Revision: https://reviews.llvm.org/D57298 llvm-svn: 353016
*	[X86] Print %st(0) as %st when its implicit to the instruction. Continue ↵	Craig Topper	2019-02-04	11	-229/+229
\| \| \| \| \| \| \| \|	printing it as %st(0) when its encoded in the instruction. This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior. llvm-svn: 353015
*	[X86] Regenerate test to drop 'End function' comments some other other regex ↵	Craig Topper	2019-02-04	1	-12/+2
\| \| \| \| \| \|	updates. llvm-svn: 353014
*	Revert r352985 "[X86] Print %st(0) as %st to match what gcc inline asm uses ↵	Craig Topper	2019-02-04	16	-318/+328
\| \| \| \| \| \| \| \| \| \|	as the clobber name to make MS inline asm work correctly" Looking into gcc and objdump behavior more this was overly aggressive. If the register is encoded in the instruction we should print %st(0), if its implicit we should print %st. I'll be making a more directed change in a future patch. llvm-svn: 353013
*	[NFC] Make vector types legal in UREM test	Simon Pilgrim	2019-02-03	1	-198/+60
\| \| \| \| \| \| \| \| \| \|	As discussed in D50222, this changes the vector types in tests required for that revision to ones legal for X86. Patch by @hermord (Dmytro Shynkevych) Differential Revision: https://reviews.llvm.org/D56372 llvm-svn: 353004
*	[CGP] adjust target constraints for forming uaddo	Sanjay Patel	2019-02-03	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are 2 changes visible here: 1. There's no reason to limit this transform based on number of condition registers. That diff allows PPC to produce slightly better (dot-instructions should be generally good) code. Note: someone that cares about PPC codegen might want to look closer at that output because it seems like we could still improve this. 2. We (probably?) should not bother trying to form uaddo (or other overflow ops) when there's no target support for such an op. This goes beyond checking whether the op is expanded because both PPC and AArch64 show better codegen for standard types regardless of whether the op is legal/custom. llvm-svn: 353001
*	[X86][AVX] Support shuffle combining for VBROADCAST with smaller vector sources	Simon Pilgrim	2019-02-03	2	-17/+14
\| \| \| \| \| \|	getTargetShuffleMask can only do this safely if we're extracting the lowest subvector from a vector of the same result type. llvm-svn: 352999
*	[PatternMatch] add special-case uaddo matching for increment-by-one (2nd try)	Sanjay Patel	2019-02-03	2	-13/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the most important uaddo problem mentioned in PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 ...but that was overcome in x86 codegen with D57637. That patch also corrects the inc vs. add regressions seen with the previous attempt at this. Still, we want to make this matcher complete, so we can potentially canonicalize the pattern even if it's an 'add 1' operation. Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 commuted variants of uaddo. There's also a test with a crazy type to show that the existing CGP transform based on this matcher is not limited by target legality checks. I'm not sure if the Hexagon diff means the test is no longer testing what it intended to test, but that should be solvable in a follow-up. Differential Revision: https://reviews.llvm.org/D57516 llvm-svn: 352998
*	[X86][AVX] Support shuffle combining for VPMOVZX with smaller vector sources	Simon Pilgrim	2019-02-03	1	-8/+19
\| \| \| \|	llvm-svn: 352997
*	[X86][AVX] More aggressively simplify BROADCAST source operand	Simon Pilgrim	2019-02-03	5	-46/+17
\| \| \| \| \| \| \| \|	Aim to use scalar source or lowest 128-bit vector directly. We're still missing some VZMOVL_LOAD combines. llvm-svn: 352994
*	[x86] add CGP uaddo test with weird type; NFC	Sanjay Patel	2019-02-03	1	-0/+19
\| \| \| \| \| \| \|	There's probably no reason to try this transform for an obviously unsupported op. llvm-svn: 352993
*	[X86] Print %st(0) as %st to match what gcc inline asm uses as the clobber ↵	Craig Topper	2019-02-03	16	-328/+318
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	name to make MS inline asm work correctly Summary: When calculating clobbers for MS style inline assembly we fail if the asm clobbers stack top because we print st(0) and try to pass it through the gcc register name check. This was found with when I attempted to make a emms/femms clobber all ST registers. If you use emms/femms in MS inline asm we would try to use st(0) as the clobber name but clang would think that wasn't a valid clobber name. This also matches what objdump disassembly prints. It's also what is printed by gcc -S. Reviewers: RKSimon, rnk, efriedma, spatel, andreadb, lebedev.ri Reviewed By: rnk Subscribers: eraman, gbedwell, lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D57621 llvm-svn: 352985
*	[X86] Lower ISD::UADDO to use the Z flag instead of C flag when the RHS is a ↵	Craig Topper	2019-02-03	5	-45/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	constant 1 to encourage INC formation. Summary: Add an additional combine to combineCarryThroughADD to reverse it back to the C flag to avoid regressions. I believe this catches the cases that D57547 got. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57637 llvm-svn: 352984
*	[X86] Add another test case for PR40539. NFC	Craig Topper	2019-02-02	1	-0/+37
\| \| \| \|	llvm-svn: 352967
*	[X86][AVX] Enable INSERT_SUBVECTOR(SRC0, SHUFFLE(SRC1)) shuffle combining	Simon Pilgrim	2019-02-02	10	-240/+398
\| \| \| \| \| \| \| \| \| \|	Push the insert_subvector up through the shuffle operands to help find more cross-lane shuffles. The is exposes a couple of minor issues that will be fixed shortly: Missed broadcast folds - we have a mixture of vzext_load lengths that need cleaning up combine-sdiv.ll - AVX1 SimplifyDemandedVectorElts failure (hits max depth due to a couple of extra bitcasts). llvm-svn: 352963
*	[X86][AVX] Add VMOVDDUP-VPBROADCASTQ execution domain mapping	Simon Pilgrim	2019-02-01	26	-298/+257
\| \| \| \| \| \| \| \|	Noticed in D57514. Differential Revision: https://reviews.llvm.org/D57519 llvm-svn: 352922
*	[DWARF v5] Fix DWARF emitter and consumer to produce/expect a uleb for a ↵	Wolfgang Pieb	2019-02-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	location description's length. Reviewer: davide, JDevliegere Differential Revision: https://reviews.llvm.org/D57550 llvm-svn: 352889
*	[X86][SSE] Use PSLLDQ/PSRLDQ to mask out zeroable ends of a shuffle	Simon Pilgrim	2019-02-01	4	-168/+148
\| \| \| \| \| \| \| \| \| \|	As suggested on PR40318, this patch uses PSLLDQ/PSRLDQ to lower shuffles to zero out the ends of a vector, leaving a sequential inner section. For pre-SSSE3 we do this for shuffles with zeros at either end (requiring up to 3 shifts), but once PSHUFB is available I've limited this to shuffles with a single zeroable end (2 shifts). Differential Revision: https://reviews.llvm.org/D56784 llvm-svn: 352883
*	[TargetLowering] try harder to determine undef elements of vector binops	Sanjay Patel	2019-02-01	1	-23/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This might be the start of tracking all vector element constants generally if we take it to its logical conclusion, but let's stop here and make sure this is correct/beneficial so far. The affected tests require a convoluted path before they get simplified currently because we don't call SimplifyDemandedVectorElts() from binops directly and don't modify the binop operands directly in SimplifyDemandedVectorElts(). That's why the tests all have a trailing shuffle to induce a chain reaction of transforms. So something like this is happening: 1. Improve the knowledge of undefs in the binop via a SimplifyDemandedVectorElts() call that originates from a shuffle. 2. Transfer that undef knowledge back to the shuffle mask user as more undef lanes. 3. Combine the modified shuffle by calling SimplifyDemandedVectorElts() again. 4. Translate the improved shuffle mask as undemanded lanes of build vector constants causing those to become full undef constants. 5. Simplify the binop now that it has a full undef operand. As we can see from the unchanged 'and' and 'or' tests, tracking undefs alone isn't a full solution. We would need to track zero and all-ones constants to improve those opcodes. We'd probably need to track NaN for FP ops too (assuming we don't have fast-math-flags set). Differential Revision: https://reviews.llvm.org/D57066 llvm-svn: 352880
*	[X86][AVX] Combine INSERT_SUBVECTOR(SRC0, ↵	Simon Pilgrim	2019-02-01	1	-9/+16
\| \| \| \| \| \| \| \| \| \|	BITCAST(SHUFFLE(EXTRACT_SUBVECTOR(SRC1))) Enable peeking through one use bitcasts to the subvector shuffle. This still depends on the subvector being the same scalar-size but D57514 has already helped with the more tricky patterns llvm-svn: 352879
*	[X86][BdVer2] Transfer delays from the integer to the floating point unit.	Roman Lebedev	2019-02-01	4	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I'm unable to find this number in the "AMD SOG for family 15h". llvm-exegesis measures the latencies of these instructions as `2`, which matches the latencies specified in "AMD SOG for family 15h". However if we look at Agner, Microarchitecture, "AMD Bulldozer, Piledriver, Steamroller and Excavator pipeline", "Data delay between different execution domains", the int->ivec transfer is listed as `8`..`10`cy of additional latency. Also, Agner's "Instruction tables", for Piledriver, lists their latencies as `12`, which is consistent with `2cy` from exegesis / AMD SOG + `10cy` transfer delay. Additional data point comes from the fact that Agner's "Instruction tables", for Jaguar, lists their latencies as `8`; and "AMD SOG for family 16h" does state the `+6cy` int->ivec delay, which is consistent with instr latency of `1` or `2`. Reviewers: andreadb, RKSimon, craig.topper Reviewed By: andreadb Subscribers: gbedwell, courbet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57300 llvm-svn: 352861
*	[x86] adjust test to show both add/inc options; NFC	Sanjay Patel	2019-02-01	1	-2/+4
\| \| \| \| \| \| \| \|	If we're optimizing for size, that overrides the subtarget feature, so we would always produce 'inc' if we matched this pattern. llvm-svn: 352821
*	[x86] add test for missed opportunity to use 'inc'; NFC	Sanjay Patel	2019-01-31	1	-0/+43
\| \| \| \| \| \|	Another pattern exposed in D57516. llvm-svn: 352820
*	[x86] add test for missed opportunity to use 'inc'; NFC	Sanjay Patel	2019-01-31	1	-0/+30
\| \| \| \|	llvm-svn: 352805
*	[DAGCombine] Avoid CombineZExtLogicopShiftLoad if there is free ZEXT	Guozhi Wei	2019-01-31	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes pr39098. For the attached test case, CombineZExtLogicopShiftLoad can optimize it to t25: i64 = Constant<1099511627775> t35: i64 = Constant<0> t0: ch = EntryToken t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64 t58: i64 = srl t57, Constant:i8<1> t60: i64 = and t58, Constant:i64<524287> t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64 But later visitANDLike transforms it to t25: i64 = Constant<1099511627775> t35: i64 = Constant<0> t0: ch = EntryToken t57: i64,ch = load<(load 4 from `i40* undef`, align 8), zext from i32> t0, undef:i64, undef:i64 t61: i32 = truncate t57 t63: i32 = srl t61, Constant:i8<1> t64: i32 = and t63, Constant:i32<524287> t65: i64 = zero_extend t64 t58: i64 = srl t57, Constant:i8<1> t60: i64 = and t58, Constant:i64<524287> t29: ch = store<(store 5 into `i40* undef`, align 8), trunc to i40> t57:1, t60, undef:i64, undef:i64 And it triggers CombineZExtLogicopShiftLoad again, causes a dead loop. Both forms should generate same instructions, CombineZExtLogicopShiftLoad generated IR looks cleaner. But it looks more difficult to prevent visitANDLike to do the transform, so I prevent CombineZExtLogicopShiftLoad to do the transform if the ZExt is free. Differential Revision: https://reviews.llvm.org/D57491 llvm-svn: 352792
*	[Intrinsic] Expand SMULFIX to MUL, MULH[US], or [US]MUL_LOHI on vector arguments	Leonard Chan	2019-01-31	1	-72/+28
\| \| \| \| \| \| \| \| \| \| \|	r zero scale SMULFIX, expand into MUL which produces better code for X86. For vector arguments, expand into MUL if SMULFIX is provided with a zero scale. Otherwise, expand into MULH[US] or [US]MUL_LOHI. Differential Revision: https://reviews.llvm.org/D56987 llvm-svn: 352783
*	Revert "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."	Craig Topper	2019-01-31	1	-42/+86
\| \| \| \| \| \|	This is causing a failure in chromium llvm-svn: 352782
*	[X86][AVX] Fold concat(broadcast(x),broadcast(x)) -> broadcast(x)	Simon Pilgrim	2019-01-31	2	-76/+25
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D57514 llvm-svn: 352774
*	[X86][AVX] insert_subvector(bitcast(v), bitcast(s), c1) -> ↵	Simon Pilgrim	2019-01-31	4	-98/+67
\| \| \| \| \| \| \| \| \| \|	bitcast(insert_subvector(v,s,c2)) Similar to what we already do in DAGCombiner, but this version also handles bitcasts from types with different scalar sizes, which x86 is better at handling. Differential Revision: https://reviews.llvm.org/D57514 llvm-svn: 352773
*	revert r352766: [PatternMatch] add special-case uaddo matching for ↵	Sanjay Patel	2019-01-31	1	-1/+3
\| \| \| \| \| \| \| \|	increment-by-one Missed some regression test updates when testing this. llvm-svn: 352769
*	[PatternMatch] add special-case uaddo matching for increment-by-one	Sanjay Patel	2019-01-31	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the most important uaddo problem mentioned in PR31754: https://bugs.llvm.org/show_bug.cgi?id=31754 We were failing to match the canonicalized pattern when it's an 'add 1' operation. Pattern matching, however, shouldn't assume that we have canonicalized IR, so we match 4 commuted variants of uaddo. There's also a test with a crazy type to show that the existing CGP transform based on this matcher is not limited by target legality checks, but that's a different problem. Differential Revision: https://reviews.llvm.org/D57516 llvm-svn: 352766
*	[X86][AVX] Fold broadcast(bitcast(src)) -> bitcast(broadcast(src))	Simon Pilgrim	2019-01-31	3	-12/+8
\| \| \| \|	llvm-svn: 352751
*	[X86][AVX] Add PR34394 subvector broadcast test cases	Simon Pilgrim	2019-01-31	1	-10/+131
\| \| \| \| \| \|	Tidyup check-prefixes at the same time llvm-svn: 352749
*	[X86] combineExtractWithShuffle - more aggressively peek through bitcasts	Simon Pilgrim	2019-01-31	1	-29/+14
\| \| \| \| \| \|	Fixes regression introduced by rL352743 llvm-svn: 352745
*	[X86][AVX] Enable AVX1 broadcasts in shuffle combining	Simon Pilgrim	2019-01-31	7	-17/+10
\| \| \| \| \| \| \| \|	Enables 32/64-bit scalar load broadcasts on AVX1 targets The extractelement-load.ll regression will be fixed shortly in a followup commit. llvm-svn: 352743
*	[X86][AVX] Fold vt1 concat_vectors(vt2 undef, vt2 broadcast(x)) --> vt1 ↵	Simon Pilgrim	2019-01-31	1	-8/+4
\| \| \| \| \| \| \| \| \| \|	broadcast(x) If we're not inserting the broadcast into the lowest subvector then we can avoid the insertion by just performing a larger broadcast. Avoids a regression when we enable AVX1 broadcasts in shuffle combining llvm-svn: 352742
*	[SelectionDAG] Codesize: don't expand SHIFT to SHIFT_PARTS	Sjoerd Meijer	2019-01-31	1	-0/+134
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	And instead just generate a libcall. My motivating example on ARM was a simple: shl i64 %A, %B for which the code bloat is quite significant. For other targets that also accept __int128/i128 such as AArch64 and X86, it is also beneficial for these cases to generate a libcall when optimising for minsize. On these 64-bit targets, the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS lowering operation action is not set to custom/expand. Differential Revision: https://reviews.llvm.org/D57386 llvm-svn: 352736
*	GlobalISel: Fix creating MMOs with align 0	Matt Arsenault	2019-01-31	1	-73/+73
\| \| \| \|	llvm-svn: 352712
*	[X86] Add a 32-bit command line to avx512-intrinsics.ll. Move all 64-bit ↵	Craig Topper	2019-01-31	2	-2265/+4569
\| \| \| \| \| \| \| \|	mode only intrinsics to avx512-intrinsics-x86_64.ll. Most of the other intrinsic tests have a 32-bit command lines. llvm-svn: 352708
*	[X86] Add test case for pr40539. NFC	Craig Topper	2019-01-31	1	-0/+36
\| \| \| \|	llvm-svn: 352697
*	MIR: Reject non-power-of-4 alignments in MMO parsing	Matt Arsenault	2019-01-30	10	-104/+104
\| \| \| \|	llvm-svn: 352686
*	[DAGCombiner] sub X, 0/1 --> add X, 0/-1	Sanjay Patel	2019-01-30	2	-8/+6
\| \| \| \| \| \| \| \| \| \|	This extends the existing transform for: add X, 0/1 --> sub X, 0/-1 ...to allow the sibling subtraction fold. This pattern could regress with the proposed change in D57401. llvm-svn: 352680
*	[AArch64][x86] add tests for add/sub signbits fold; NFC	Sanjay Patel	2019-01-30	1	-0/+34
\| \| \| \| \| \| \|	As discussed/shown in D57401, we are missing a fold for subtract of 0/1 --> add 0/-1. llvm-svn: 352678
*	Add a 'dynamic' parameter to the objectsize intrinsic	Erik Pilkington	2019-01-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is meant to be used with clang's __builtin_dynamic_object_size. When 'true' is passed to this parameter, the intrinsic has the potential to be folded into instructions that will be evaluated at run time. When 'false', the objectsize intrinsic behaviour is unchanged. rdar://32212419 Differential revision: https://reviews.llvm.org/D56761 llvm-svn: 352664
*	[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7.	Craig Topper	2019-01-30	1	-86/+42
\| \| \| \| \| \| \| \| \| \|	This fixes the test case in PR35982 by preventing MMX instructions that read MM0-7 from being moved below EMMS/FEMMS by the post RA scheduler. Though as discussed in bugzilla, this is not a complete fix. There is still the possibility of reordering in IR or by the pre-RA scheduler. Differential Revision: https://reviews.llvm.org/D57298 llvm-svn: 352660
*	[X86][AVX] Prefer to combine shuffle to broadcasts whenever possible	Simon Pilgrim	2019-01-30	1	-11/+23
\| \| \| \| \| \|	This is the first step towards improving broadcast support on AVX1 targets. llvm-svn: 352634