bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86][SSE] Dropped -mcpu from vector average tests	Simon Pilgrim	2017-06-23	1	-645/+686
\| \| \| \| \| \|	Use triple and attribute only for consistency llvm-svn: 306104
*	[X86][SSE] Dropped -mcpu from scalar math tests	Simon Pilgrim	2017-06-23	1	-6/+4
\| \| \| \| \| \|	Use triple and attribute only for consistency llvm-svn: 306097
*	[X86][SSE] Dropped -mcpu from insertps tests	Simon Pilgrim	2017-06-23	1	-3/+3
\| \| \| \| \| \|	Use triple and attribute only for consistency llvm-svn: 306092
*	[x86] add/sub (X==0) --> sbb(cmp X, 1)	Sanjay Patel	2017-06-22	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is very similar to the transform in: https://reviews.llvm.org/rL306040 ...but in this case, we use cmp X, 1 to set the carry bit as needed. Again, we can show that all of these are logically equivalent (although InstCombine currently canonicalizes to a form not seen here), and if we believe IACA, then this is the smallest/fastest code. Eg, with SNB: \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| 1.0 \| \| \| \| \| \| \| cmp edi, 0x1 \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| sbb eax, eax The larger motivation is to clean up all select-of-constants combining/lowering because we're missing some common cases. llvm-svn: 306072
*	Supported lowerInterleavedStore() in X86InterleavedAccess.	Farhana Aleen	2017-06-22	1	-60/+32
\| \| \| \| \| \| \| \| \| \|	Reviewers: RKSimon, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32658 llvm-svn: 306068
*	[x86] add more tests for select --> sbb transform; NFC	Sanjay Patel	2017-06-22	1	-4/+61
\| \| \| \| \| \|	These are siblings of the tests added with r306032. llvm-svn: 306064
*	[AVX-512] Remove and autoupgrade the masked integer compare intrinsics	Craig Topper	2017-06-22	8	-2704/+4230
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These intrinsics aren't used by clang and haven't been for a while. There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today. Reviewers: spatel, RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34389 llvm-svn: 306047
*	[x86] add/sub (X==0) --> sbb(neg X)	Sanjay Patel	2017-06-22	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480), lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb' codegen in 1 out of the 4 tests. This is a step towards smoothing that out. First, show that all of these IR forms are equivalent: http://rise4fun.com/Alive/mx Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge (later Intel and AMD chips are similar based on Agner's tables): This is the "obvious" x86 codegen (what gcc appears to produce currently): \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1* \| \| \| \| \| \| \| \| xor eax, eax \| 1 \| 1.0 \| \| \| \| \| \| CP \| test edi, edi \| 1 \| \| \| \| \| \| 1.0 \| CP \| setnz al \| 1 \| \| 1.0 \| \| \| \| \| CP \| neg eax This is the adc version: \| 1* \| \| \| \| \| \| \| \| xor eax, eax \| 1 \| 1.0 \| \| \| \| \| \| CP \| cmp edi, 0x1 \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| adc eax, 0xffffffff And this is sbb: \| 1 \| 1.0 \| \| \| \| \| \| \| neg edi \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| sbb eax, eax If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be clearly better than the alternatives going forward. llvm-svn: 306040
*	[x86] add tests for select --> sbb transform; NFC	Sanjay Patel	2017-06-22	1	-0/+62
\| \| \| \|	llvm-svn: 306032
*	[X86] Add support for "probe-stack" attribute	whitequark	2017-06-22	2	-0/+50
\| \| \| \| \| \| \| \| \| \| \|	This commit adds prologue code emission for stack probe function calls. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D34387 llvm-svn: 306010
*	[GlobalISel][X86] Support vector type G_INSERT legalization/selection.	Igor Breger	2017-06-22	4	-0/+543
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Support vector type G_INSERT legalization/selection. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33956 llvm-svn: 305989
*	AVX-512: Lowering Masked Gather intrinsic - fixed a bug	Elena Demikhovsky	2017-06-22	1	-8/+61
\| \| \| \| \| \| \| \| \| \| \| \|	Masked gather for vector length 2 is lowered incorrectly for element type i32. The type <2 x i32> was automatically extended to <2 x i64> and we generated VPGATHERQQ instead of VPGATHERQD. The type <2 x float> is extended to <4 x float>, so there is no bug for this type, but the sequence may be more optimal. In this patch I'm fixing <2 x i32>bug and optimizing <2 x float> sequence for GATHERs only. The same fix should be done for Scatters as well. Differential revision: https://reviews.llvm.org/D34343 llvm-svn: 305987
*	[Solaris] emit .init_array instead of .ctors on Solaris (Sparc/x86)	Davide Italiano	2017-06-21	1	-0/+2
\| \| \| \| \| \| \| \|	Patch by Fedor Sergeev. Differential Revision: https://reviews.llvm.org/D33868 llvm-svn: 305948
*	[X86][SSE] Dropped -mcpu from 256-bit vector shuffle tests	Simon Pilgrim	2017-06-21	4	-20/+12
\| \| \| \| \| \|	Use triple and attribute only for consistency llvm-svn: 305916
*	[X86][SSE] Dropped -mcpu from 128-bit vector shuffle tests	Simon Pilgrim	2017-06-21	4	-38/+26
\| \| \| \| \| \|	Use triple and attribute only for consistency llvm-svn: 305913
*	[X86][SSE] Regenerate merge store tests	Simon Pilgrim	2017-06-21	1	-15/+17
\| \| \| \|	llvm-svn: 305910
*	[X86][SSE] Dropped -mcpu from vector blend shuffle tests and regenerate	Simon Pilgrim	2017-06-21	1	-54/+20
\| \| \| \| \| \|	Use triple and attribute only for consistency llvm-svn: 305909
*	[X86][SSE] Dropped -mcpu from vector shuffle tests	Simon Pilgrim	2017-06-21	4	-14/+24
\| \| \| \| \| \|	Use triple and attribute only for consistency llvm-svn: 305908
*	[X86][SSE] Dropped -mcpu from vector zero extend tests	Simon Pilgrim	2017-06-21	1	-7/+5
\| \| \| \| \| \|	Use triple and attribute only for consistency llvm-svn: 305907
*	[X86][SSE] Dropped -mcpu from variable shuffle tests	Simon Pilgrim	2017-06-21	2	-8/+7
\| \| \| \| \| \|	Use triple and attribute only for consistency llvm-svn: 305906
*	[X86][AVX] Add AVX1 shuffle truncation tests	Simon Pilgrim	2017-06-21	1	-107/+234
\| \| \| \|	llvm-svn: 305905
*	[X86][SSE] Add SSE2/SSE42 shuffle truncation tests	Simon Pilgrim	2017-06-21	1	-0/+156
\| \| \| \|	llvm-svn: 305904
*	[X86] Rerun the update_llc_test_checks tool on test. NFC.	Zvi Rackover	2017-06-21	1	-0/+8
\| \| \| \|	llvm-svn: 305897
*	[DAGCombiner] Add another combine from build vector to shuffle	Guy Blank	2017-06-21	1	-36/+12
\| \| \| \| \| \| \|	Add support for combining a build vector to a shuffle. When the build vector is of extracted elements from 2 vectors (vec1, vec2) where vec2 is 2 times smaller than vec1. llvm-svn: 305883
*	[XRay] Reduce synthetic references emitted by XRay	Dean Michael Berris	2017-06-21	4	-23/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When we're building with XRay instrumentation, we use a trick that preserves references from the function to a function sled index. This index table lives in a separate section, and without this trick the linker is free to garbage-collect this section and all the segments it refers to. Until we're able to tell the linkers to preserve these sections, we use this reference trick to keep around both the index and the entries in the instrumentation map. Before this change we emitted both a synthetic reference to the label in the instrumentation map, and to the entry in the function map index. This change removes the first synthetic reference and only emits one synthetic reference to the index -- the index entry has the references to the labels in the instrumentation map, so the linker will still preserve those if the function itself is preserved. This reduces the amount of synthetic references we emit from 16 bytes to just 8 bytes in x86_64, and similarly to other platforms. Reviewers: dblaikie Subscribers: javed.absar, kpw, pelikan, llvm-commits Differential Revision: https://reviews.llvm.org/D34340 llvm-svn: 305880
*	[ImplicitNullChecks] Uphold an invariant in areMemoryOpsAliased	Serguei Katkov	2017-06-21	1	-0/+293
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now areMemoryOpsAliased has an assertion justified as: MMO1 should have a value due it comes from operation we'd like to use as implicit null check. assert(MMO1->getValue() && "MMO1 should have a Value!"); However, it is possible for that invariant to not be upheld in the following situation (conceptually): Null check %RAX NotNullSucc: %RAX = LEA %RSP, 16 // I0 %RDX = MOV64rm %RAX // I1 With the current code, we will have an early exit from ImplicitNullChecks::isSuitableMemoryOp on I0 with SR_Unsuitable. However, I1 will look plausible (since it loads from %RAX) and will go ahead and call areMemoryOpsAliased(I1, I0). This will cause us to fail the assert mentioned above since I1 does not load from an IR level value and thus is allowed to have a non-Value base address. The fix is to bail out earlier whenever we see an unsuitable instruction overwrite PointerReg. This would guarantee that when we call areMemoryOpsAliased, we're guaranteed to be looking at an instruction that loads from or stores to an IR level value. Original Patch Author: sanjoy Reviewers: sanjoy, mkazantsev, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34385 llvm-svn: 305879
*	[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes	Sanjay Patel	2017-06-20	1	-23/+85
\| \| \| \| \| \| \| \| \|	There are a couple of potential improvements as seen in the IR and asm: 1. We're unnecessarily extending to a larger type to compare values. 2. The codegen for (select cond, 1, -1) could avoid a cmov. (or we could change the order of the compares, so we have a select with 0 operand) llvm-svn: 305802
*	[X86][SSE] Relax 0/-1 vector element insertion to work for any vector with ↵	Simon Pilgrim	2017-06-20	2	-31/+7
\| \| \| \| \| \| \| \|	>=16bit elements Shuffle lowering/combining now does a good job for 256/512-bit vectors - we don't need to prevent this llvm-svn: 305801
*	Fixed test name. NFCI.	Simon Pilgrim	2017-06-20	1	-7/+7
\| \| \| \|	llvm-svn: 305787
*	[GlobalISel][X86] Get correct RegClass for given RegBank.	Igor Breger	2017-06-20	1	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In some cases RegClass depends on target feature. Hight (16-31) vector registers exist only if AVX512f available. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: t.p.northover, guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33952 Conflicts: test/CodeGen/X86/GlobalISel/select-memop-scalar.mir llvm-svn: 305784
*	[GlobalISel] combine not symmetric merge/unmerge nodes.	Igor Breger	2017-06-20	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In some cases legalization ends up with not symmetric merge/unmerge nodes. Transform it to merge/unmerge nodes. Reviewers: t.p.northover, qcolombet, zvi Reviewed By: t.p.northover Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33626 llvm-svn: 305783
*	[GlobalISel][X86] add legalizer mir tests. NFC	Igor Breger	2017-06-20	2	-73/+205
\| \| \| \|	llvm-svn: 305781
*	Fix machine instruction in test case	Sanjoy Das	2017-06-19	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	The AMD64rm instruction used in the test case was incorrect. Since the first input register to AND64rm is tied to output register, they must be the same. Thanks for Jesper Antonsson for pointing this out! llvm-svn: 305756
*	Revert r304824 "Fix PR23384 (part 3 of 3)"	Hans Wennborg	2017-06-19	7	-67/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This seems to be interacting badly with ASan somehow, causing false reports of heap-buffer overflows: PR33514. > Summary: > The patch makes instruction count the highest priority for > LSR solution for X86 (previously registers had highest priority). > > Reviewers: qcolombet > > Differential Revision: http://reviews.llvm.org/D30562 > > From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 305720
*	[GlobalISel][X86] Fold FI/G_GEP into LDR/STR instruction addressing mode.	Igor Breger	2017-06-19	7	-171/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implement some of the simplest addressing modes.It should help to test ABI. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33888 llvm-svn: 305691
*	[X86] Simplify vector-shuffle-v48 test. NFC.	Guy Blank	2017-06-19	1	-42/+39
\| \| \| \|	llvm-svn: 305670
*	[x86] specify triples and auto-generate complete checks; NFC	Sanjay Patel	2017-06-18	3	-80/+174
\| \| \| \|	llvm-svn: 305656
*	[x86] specify triples and auto-generate complete checks; NFC	Sanjay Patel	2017-06-18	4	-122/+244
\| \| \| \|	llvm-svn: 305655
*	[x86] specify triple and auto-generate checks; NFC	Sanjay Patel	2017-06-18	3	-86/+182
\| \| \| \|	llvm-svn: 305654
*	x86] adjust test constants to maintain coverage; NFC	Sanjay Patel	2017-06-18	4	-33/+33
\| \| \| \| \| \|	Increment (add 1) could be transformed to sub -1, and we'd lose coverage for these patterns. llvm-svn: 305646
*	[x86] adjust test constants to maintain coverage; NFC	Sanjay Patel	2017-06-18	1	-7/+7
\| \| \| \| \| \|	Increment (add 1) could be transformed to sub -1, and we'd lose coverage for these patterns. llvm-svn: 305645
*	[x86] adjust test constants to maintain coverage; NFC	Sanjay Patel	2017-06-18	1	-10/+10
\| \| \| \| \| \|	Increment (add 1) could be transformed to sub -1, and we'd lose coverage for these patterns. llvm-svn: 305644
*	RegScavenging: Add scavengeRegisterBackwards()	Matthias Braun	2017-06-17	1	-17/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-apply r276044/r279124/r305516. Fixed a problem where we would refuse to place spills as the very first instruciton of a basic block and thus artifically increase pressure (test in test/CodeGen/PowerPC/scavenging.mir:spill_at_begin) This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 305625
*	[SelectionDAG] Update Loop info after splitting critical edges.	Davide Italiano	2017-06-17	1	-0/+27
\| \| \| \| \| \|	The analysis is expected to be preserved by SelectionDAG. llvm-svn: 305621
*	Revert "RegScavenging: Add scavengeRegisterBackwards()"	Matthias Braun	2017-06-16	1	-12/+17
\| \| \| \| \| \| \| \| \|	Revert because of reports of some PPC input starting to spill when it was predicted that it wouldn't and no spillslot was reserved. This reverts commit r305516. llvm-svn: 305566
*	[Atomics] Rename and change prototype for atomic memcpy intrinsic	Daniel Neilson	2017-06-16	1	-24/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic. Reviewers: reames, sanjoy, efriedma Reviewed By: reames Subscribers: mzolotukhin, anna, llvm-commits, skatkov Differential Revision: https://reviews.llvm.org/D33240 llvm-svn: 305558
*	RegScavenging: Add scavengeRegisterBackwards()	Matthias Braun	2017-06-15	1	-17/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64 problems reported in the stage2 build last time, which I cannot reproduce right now. This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 305516
*	ISel: Fix FastISel of swifterror values	Arnold Schwaighofer	2017-06-15	1	-0/+108
\| \| \| \| \| \| \| \| \| \| \| \|	The code assumed that we process instructions in basic block order. FastISel processes instructions in reverse basic block order. We need to pre-assign virtual registers before selecting otherwise we get def-use relationships wrong. This only affects code with swifterror registers. rdar://32659327 llvm-svn: 305484
*	[X86][AVX2] Fix issue in lowerV8I16GeneralSingleInputVectorShuffle that was ↵	Simon Pilgrim	2017-06-15	1	-0/+18
\| \| \| \| \| \| \| \| \| \|	assuming v8i16 vectors We can use this with v16i16/v32i16 as well. Found during fuzz testing. llvm-svn: 305472
*	Revert r305465: [X86][AVX512] Improve lowering of AVX512 compare intrinsics ↵	Simon Pilgrim	2017-06-15	4	-13490/+50
\| \| \| \| \| \| \| \|	(remove redundant shift left+right instructions). This is causing windows buildbot failures llvm-svn: 305470