bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[ARM] GlobalISel: Allow i8 and i16 adds	Diana Picus	2016-12-19	3	-5/+122
\| \| \| \| \| \| \| \| \|	Teach the instruction selector and legalizer that it's ok to have adds with 8 or 16-bit integers. This is the second part of https://reviews.llvm.org/D27704 llvm-svn: 290105
*	[ARM] GlobalISel: Select i8 and i16 copies	Diana Picus	2016-12-19	1	-3/+60
\| \| \| \| \| \| \| \| \|	Teach the instruction selector that it's ok to copy small values from physical registers. First part of https://reviews.llvm.org/D27704 llvm-svn: 290104
*	[ARM] GlobalISel: Lower more than 4 arguments	Diana Picus	2016-12-19	2	-0/+28
\| \| \| \| \| \| \| \| \| \|	This adds support for lowering more than 4 arguments (although still i32 only). It uses the handleAssignments / ValueHandler infrastructure extracted from the AArch64 backend in r288658. Differential Revision: https://reviews.llvm.org/D27195 llvm-svn: 290098
*	[ARM] GlobalISel: Support loading from the stack	Diana Picus	2016-12-19	2	-0/+62
\| \| \| \| \| \| \| \| \| \|	Add support for selecting simple G_LOAD and G_FRAME_INDEX instructions (32-bit scalars only). This will be useful for functions that need to pass arguments on the stack. First part of https://reviews.llvm.org/D27195. llvm-svn: 290096
*	[XRay] Fix assertion failure on empty machine basic blocks (PR 31424)	Dean Michael Berris	2016-12-19	2	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original version of the code in XRayInstrumentation.cpp assumed that functions may not have empty machine basic blocks (or that the first one couldn't be). This change addresses that by special-casing that specific situation. We provide two .mir test-cases to make sure we're handling this appropriately. Fixes llvm.org/PR31424. Reviewers: chandlerc Subscribers: varno, llvm-commits Differential Revision: https://reviews.llvm.org/D27913 llvm-svn: 290091
*	Revert r289955 and r289962. This is causing lots of ASAN failures for us.	Daniel Jasper	2016-12-18	1	-41/+0
\| \| \| \| \| \| \| \|	Not sure whether it causes and ASAN false positive or whether it actually leads to incorrect code or whether it even exposes bad code. Hans, I'll get you instructions to reproduce this. llvm-svn: 290066
*	[X86][SSE] Add support for combining target shuffles to SHUFPS.	Simon Pilgrim	2016-12-18	11	-113/+93
\| \| \| \| \| \|	As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064
*	[X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations ↵	Craig Topper	2016-12-18	4	-45/+58
\| \| \| \| \| \| \| \| \| \| \| \|	if they are available. This will allow a bunch of patterns to be removed. These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine. For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode. For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now. llvm-svn: 290060
*	[AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when ↵	Craig Topper	2016-12-18	1	-1/+25
\| \| \| \| \| \| \| \|	DQI and VLX instructions are available. This can give the register allocator more registers to use. llvm-svn: 290057
*	[AVX-512] Make sure VLX is also enabled before using EVEX encoded logic ops ↵	Craig Topper	2016-12-18	1	-1/+1
\| \| \| \| \| \|	for scalars. I missed this in r290049. llvm-svn: 290055
*	AMDGPU: Fix broken check prefix in test	Matt Arsenault	2016-12-17	1	-10/+7
\| \| \| \|	llvm-svn: 290050
*	[AVX-512] Use EVEX encoded logic operations for scalar types when they are ↵	Craig Topper	2016-12-17	1	-4/+4
\| \| \| \| \| \|	available. This gives the register allocator more registers to work with. llvm-svn: 290049
*	[AVX-512] Update scalar logic test to show missed opportunity to use EVEX ↵	Craig Topper	2016-12-17	1	-19/+40
\| \| \| \| \| \|	encoded logic instructions to get more registers to use. llvm-svn: 290048
*	Revert "AArch64CollectLOH: Rewrite as block-local analysis."	Matthias Braun	2016-12-17	4	-194/+9
\| \| \| \| \| \| \| \|	It is still breaking Chrome. http://llvm.org/PR31361 This reverts commit r290026. llvm-svn: 290047
*	Move test to correct directory	Matthias Braun	2016-12-17	1	-0/+0
\| \| \| \| \| \|	See also test/CodeGen/MIR/README llvm-svn: 290032
*	AArch64CollectLOH: Rewrite as block-local analysis.	Matthias Braun	2016-12-17	4	-9/+194
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-apply r288561: Liveness tracking should be correct now after r290014. Previously this pass was using up to 5% compile time in some cases which is a bit much for what it is doing. The pass featured a full blown data-flow analysis which in the default configuration was restricted to a single block. This rewrites the pass under the assumption that we only ever work on a single block. This is done in a single pass maintaining a state machine per general purpose register to catch LOH patterns. Differential Revision: https://reviews.llvm.org/D27329 llvm-svn: 290026
*	AArch64: Enable post-ra liveness updates	Matthias Braun	2016-12-16	2	-6/+6
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D27559 llvm-svn: 290014
*	[CodeGenPrep] Skip merging empty case blocks	Jun Bum Lim	2016-12-16	4	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly : Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289988
*	Revert "[IR] Remove the DIExpression field from DIGlobalVariable."	Adrian Prantl	2016-12-16	15	-140/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 289920 (again). I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable has not DIExpression. Unfortunately it is not possible to safely upgrade these variables without adding a flag to the bitcode record indicating which version they are. My plan of record is to roll the planned follow-up patch that adds a unit: field to DIGlobalVariable into this patch before recomitting. This way we only need one Bitcode upgrade for both changes (with a version flag in the bitcode record to safely distinguish the record formats). Sorry for the churn! llvm-svn: 289982
*	[ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.	Eli Friedman	2016-12-16	2	-6/+223
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Recommiting with fix to avoid forming vld1dup if the type of the load doesn't match the type of the vdup (see https://llvm.org/bugs/show_bug.cgi?id=31404). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289972
*	Revert "[CodeGenPrep] Skip merging empty case blocks"	Jun Bum Lim	2016-12-16	2	-6/+5
\| \| \| \| \| \|	This reverts commit r289951. llvm-svn: 289960
*	[X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, ↵	Hans Wennborg	2016-12-16	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-C), COND) (PR31367) atomic_load_add returns the value before addition, but sets EFLAGS based on the result of the addition. That means it's setting the flags based on effectively subtracting C from the value at x, which is also what the outer cmp does. This targets a pattern that occurs frequently with reference counting pointers: void decrement(long volatile *ptr) { if (_InterlockedDecrement(ptr) == 0) release(); } Clang would previously compile it (for 32-bit at -Os) as: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 31 c9 xor %ecx,%ecx 6: 49 dec %ecx 7: f0 0f c1 08 lock xadd %ecx,(%eax) b: 83 f9 01 cmp $0x1,%ecx e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14> 14: c3 ret and with this patch it becomes: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: f0 ff 08 lock decl (%eax) 7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd> d: c3 ret (Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add or pre-decrement operator generate the same code.) Differential Revision: https://reviews.llvm.org/D27781 llvm-svn: 289955
*	[CodeGenPrep] Skip merging empty case blocks	Jun Bum Lim	2016-12-16	2	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block: Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289951
*	[X86][AVX512] use a single shufps for 512-bit vectors when it can save ↵	Simon Pilgrim	2016-12-16	1	-8/+3
\| \| \| \| \| \| \| \| \| \| \| \|	instructions This is the 512-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289946
*	[X86][AVX512] Add tests showing missed opportunity to efficiently lower ↵	Simon Pilgrim	2016-12-16	1	-0/+32
\| \| \| \| \| \|	v16i32 to VSHUFPS (PR27885) llvm-svn: 289945
*	[ARM] GlobalISel: Select add i32, i32	Diana Picus	2016-12-16	5	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add the minimal support necessary to select a function that returns the sum of two i32 values. This includes some support for argument/return lowering of i32 values through registers, as well as the handling of copy and add instructions throughout the GlobalISel pipeline. Differential Revision: https://reviews.llvm.org/D26677 llvm-svn: 289940
*	[X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain.	Simon Pilgrim	2016-12-16	1	-6/+2
\| \| \| \| \| \|	We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead. llvm-svn: 289937
*	[AVR] Add a test for 64-bit left shifts	Dylan McKay	2016-12-16	1	-0/+8
\| \| \| \|	llvm-svn: 289936
*	Extra coverage tests to demonstrate fixes in D72618 and D26855	Andrew V. Tischenko	2016-12-16	2	-0/+334
\| \| \| \|	llvm-svn: 289931
*	Revert r289638: [PowerPC] Fix logic dealing with nop after calls (and ↵	Chandler Carruth	2016-12-16	2	-133/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	tail-call eligibility) This patch appears to result in trampolines in vtables being miscompiled when they in turn tail call a method. I've posted some preliminary details about the failure on the thread for this commit and talked to Hal. He was comfortable going ahead and reverting until we sort out what is wrong. llvm-svn: 289928
*	Revert 279703, it caused PR31404.	Nico Weber	2016-12-16	2	-163/+6
\| \| \| \|	llvm-svn: 289923
*	[IR] Remove the DIExpression field from DIGlobalVariable.	Adrian Prantl	2016-12-16	15	-140/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. This reapplies r289902 with additional testcase upgrades. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289920
*	Revert patch series introducing the DAG combine to match a load-by-bytes	Chandler Carruth	2016-12-16	3	-1193/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	idiom. r289538: Match load by bytes idiom and fold it into a single load r289540: Fix a buildbot failure introduced by r289538 r289545: Use more detailed assertion messages in the code ... r289646: Add a couple of assertions to the load combine code ... This DAG combine has a bad crash in it that is quite hard to trigger sadly -- it relies on sneaking code with UB through the SDAG build and into this particular combine. I've responded to the original commit with a test case that reproduces it. However, the code also has other problems that will require substantial changes to address and so I'm going ahead and reverting it for now. This should unblock us and perhaps others that are hitting the crash in the wild and will let a fresh patch with updated approach come in cleanly afterward. Sorry for any trouble or disruption! llvm-svn: 289916
*	Revert "[IR] Remove the DIExpression field from DIGlobalVariable."	Adrian Prantl	2016-12-16	14	-137/+138
\| \| \| \| \| \|	This reverts commit 289902 while investigating bot berakage. llvm-svn: 289906
*	[IR] Remove the DIExpression field from DIGlobalVariable.	Adrian Prantl	2016-12-16	14	-138/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289902
*	[PPC] corrections in two testcases	Ehsan Amiri	2016-12-16	1	-14/+14
\| \| \| \| \| \| \| \| \|	Removing sensitivity to scheduling (by using CHECK-DAG instead of CHECK) and some other minor corrections. In preparation to commit Power9 processor model. llvm-svn: 289900
*	[IRTranslator] Merge the entry and ABI lowering blocks.	Quentin Colombet	2016-12-15	3	-39/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The IRTranslator uses an additional block before the LLVM-IR entry block to perform all the ABI lowering and the constant hoisting. Thus, this block is the actual entry block and it falls through the LLVM-IR entry block. However, with such representation, we end up with two basic blocks that are not maximal. Therefore, this patch adds a bit of canonicalization by merging both the LLVM-IR entry block and the ABI lowering/constants hoisting into one block, making the resulting block more likely to be maximal (indeed the LLVM-IR entry block might not have been maximal). llvm-svn: 289891
*	Don't combine splats with other shuffles.	Eli Friedman	2016-12-15	2	-29/+24
\| \| \| \| \| \| \| \| \| \| \|	We sometimes end up creating shuffles which are worse than the obvious translation of the IR. Fixes https://llvm.org/bugs/show_bug.cgi?id=31301 . Differential Revision: https://reviews.llvm.org/D27793 llvm-svn: 289882
*	AMDGPU: Select branch on undef to uniform scc branch	Matt Arsenault	2016-12-15	6	-13/+14
\| \| \| \|	llvm-svn: 289877
*	Don't combine a shuffle of two BUILD_VECTORs with duplicate elements.	Eli Friedman	2016-12-15	4	-173/+118
\| \| \| \| \| \| \| \| \| \| \| \| \|	Targets can't handle this case well in general; we often transform a shuffle of two cheap BUILD_VECTORs to element-by-element insertion, which is very inefficient. Fixes https://llvm.org/bugs/show_bug.cgi?id=31364 . Partially fixes https://llvm.org/bugs/show_bug.cgi?id=31301. Differential Revision: https://reviews.llvm.org/D27787 llvm-svn: 289874
*	[PPC] Use CHECK-DAG instead of CHECK in the testcase	Ehsan Amiri	2016-12-15	1	-15/+15
\| \| \| \| \| \| \| \| \|	This test is currently sensitive to scheduling. Using CHECK-DAG allows us to preserve the main purpose of the test and remove this sensivity. In preparation to commit Power9 processor model. llvm-svn: 289869
*	AMDGPU: Fix asserting on returned tail calls	Matt Arsenault	2016-12-15	1	-0/+14
\| \| \| \|	llvm-svn: 289868
*	[x86] use a single shufps for 256-bit vectors when it can save instructions	Sanjay Patel	2016-12-15	2	-41/+18
\| \| \| \| \| \| \| \| \| \| \|	This is the 256-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289846
*	[x86] use a single shufps when it can save instructions	Sanjay Patel	2016-12-15	22	-1187/+770
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837
*	[X86][SSE] Fix domains for scalar store instructions	Simon Pilgrim	2016-12-15	3	-7/+7
\| \| \| \| \| \|	As discussed on D27692 llvm-svn: 289834
*	[lanai] Simplify small section check in LowerGlobalAddress and treat ldata ↵	Jacques Pienaar	2016-12-15	1	-0/+14
\| \| \| \| \| \| \| \|	sections specially. Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation). llvm-svn: 289832
*	[X86][SSE] Fix domains for VZEXT_LOAD type instructions	Simon Pilgrim	2016-12-15	47	-202/+190
\| \| \| \| \| \| \| \|	Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825
*	Fix for regression after Global Load Scalarization patch	Alexander Timofeev	2016-12-15	1	-0/+11
\| \| \| \|	llvm-svn: 289822
*	Revert "[TESTS] Initial commit of tests, by Andrew Tischenko"	Alexey Bataev	2016-12-15	2	-350/+0
\| \| \| \| \| \|	This reverts commit ee709f8988653a0334fbf100cdbbdd83a3933347. llvm-svn: 289814
*	[TESTS] Initial commit of tests, by Andrew Tischenko	Alexey Bataev	2016-12-15	2	-0/+350
\| \| \| \|	llvm-svn: 289807