bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Added a template for building target specific memory node in DAG.	Elena Demikhovsky	2016-12-21	4	-44/+42
\| \| \| \| \| \| \| \| \| \|	I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp. There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern. Differential Revision: https://reviews.llvm.org/D27899 llvm-svn: 290250
*	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support	Oren Ben Simhon	2016-12-21	1	-1/+1
\| \| \| \| \| \|	Fixing failing test. llvm-svn: 290246
*	[X86] Vectorcall Calling Convention - Adding CodeGen Complete Support	Oren Ben Simhon	2016-12-21	1	-6/+136
\| \| \| \| \| \| \| \| \| \| \| \| \|	The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible. vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use. The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above. The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it. This aubmit also includes additional lit tests to cover better HVAs corner cases. Differential Revision: https://reviews.llvm.org/D27392 llvm-svn: 290240
*	remove pretty-print test that requires debug	Sebastian Pop	2016-12-21	1	-5/+0
\| \| \| \| \| \| \|	There is no need to test the pretty printer. Remove the boggus test to make the build bots happy. llvm-svn: 290234
*	machine combiner: fix pretty printer	Sebastian Pop	2016-12-21	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \|	we used to print UNKNOWN instructions when the instruction to be printer was not yet inserted in any BB: in that case the pretty printer would not be able to compute a TII as the instruction does not belong to any BB or function yet. This patch explicitly passes the TII to the pretty-printer. Differential Revision: https://reviews.llvm.org/D27645 llvm-svn: 290228
*	[ARM] Implement isExtractSubvectorCheap.	Eli Friedman	2016-12-20	4	-51/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	See https://reviews.llvm.org/D6678 for the history of isExtractSubvectorCheap. Essentially the same considerations apply to ARM. This temporarily breaks the formation of vpadd/vpaddl in certain cases; AddCombineToVPADDL essentially assumes that we won't form VUZP shuffles. See https://reviews.llvm.org/D27779 for followup fix. Differential Revision: https://reviews.llvm.org/D27774 llvm-svn: 290198
*	[ARM] Generate checks for shuffle tests using update_llc_test_checks.py.	Eli Friedman	2016-12-20	3	-143/+542
\| \| \| \|	llvm-svn: 290196
*	AMDGPU: Allow 16-bit types in inline asm constraints	Matt Arsenault	2016-12-20	1	-0/+41
\| \| \| \|	llvm-svn: 290193
*	AMDGPU: Run fp combine tests on VI	Matt Arsenault	2016-12-20	3	-135/+171
\| \| \| \|	llvm-svn: 290192
*	AMDGPU: Don't add same instruction multiple times to worklist	Matt Arsenault	2016-12-20	1	-0/+14
\| \| \| \| \| \| \| \| \|	When the instruction is processed the first time, it may be deleted resulting in crashes. While the new test adds the same user to the worklist twice, this particular case doesn't crash but I'm not sure why. llvm-svn: 290191
*	AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*	Tom Stellard	2016-12-20	2	-3/+17
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle, mareko Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27834 llvm-svn: 290184
*	AMDGPU/SI: Add a MachineMemOperand to MIMG instructions	Tom Stellard	2016-12-20	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without a MachineMemOperand, the scheduler was assuming MIMG instructions were ordered memory references, so no loads or stores could be reordered across them. Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27536 llvm-svn: 290179
*	[IR] Remove the DIExpression field from DIGlobalVariable.	Adrian Prantl	2016-12-20	15	-140/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. This reapplies r289902 with additional testcase upgrades and a change to the Bitcode record for DIGlobalVariable, that makes upgrading the old format unambiguous also for variables without DIExpressions. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 290153
*	Add ARM support to update_llc_test_checks.py	Eli Friedman	2016-12-19	1	-34/+64
\| \| \| \| \| \| \| \| \| \|	Just the minimal support to get it working at the moment. Includes checks for test/CodeGen/ARM/vzip.ll as an example. Differential Revision: https://reviews.llvm.org/D27829 llvm-svn: 290144
*	[AMDGPU] When unifying metadata, add operands to named metadata individually	Konstantin Zhuravlyov	2016-12-19	1	-6/+11
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D27725 llvm-svn: 290114
*	[ARM] GlobalISel: Add more checks to test	Diana Picus	2016-12-19	1	-0/+4
\| \| \| \|	llvm-svn: 290108
*	[ARM] GlobalISel: Minor style fixup in test	Diana Picus	2016-12-19	1	-3/+3
\| \| \| \|	llvm-svn: 290107
*	[ARM] GlobalISel: Lower i8 and i16 register args	Diana Picus	2016-12-19	2	-8/+52
\| \| \| \| \| \| \| \| \| \| \|	This allows lowering i8 and i16 arguments if they can fit in the registers. Note that the lowering is incomplete - ABI extensions are handled in a subsequent patch. (Last part of) Differential Revision: https://reviews.llvm.org/D27704 llvm-svn: 290106
*	[ARM] GlobalISel: Allow i8 and i16 adds	Diana Picus	2016-12-19	3	-5/+122
\| \| \| \| \| \| \| \| \|	Teach the instruction selector and legalizer that it's ok to have adds with 8 or 16-bit integers. This is the second part of https://reviews.llvm.org/D27704 llvm-svn: 290105
*	[ARM] GlobalISel: Select i8 and i16 copies	Diana Picus	2016-12-19	1	-3/+60
\| \| \| \| \| \| \| \| \|	Teach the instruction selector that it's ok to copy small values from physical registers. First part of https://reviews.llvm.org/D27704 llvm-svn: 290104
*	[ARM] GlobalISel: Lower more than 4 arguments	Diana Picus	2016-12-19	2	-0/+28
\| \| \| \| \| \| \| \| \| \|	This adds support for lowering more than 4 arguments (although still i32 only). It uses the handleAssignments / ValueHandler infrastructure extracted from the AArch64 backend in r288658. Differential Revision: https://reviews.llvm.org/D27195 llvm-svn: 290098
*	[ARM] GlobalISel: Support loading from the stack	Diana Picus	2016-12-19	2	-0/+62
\| \| \| \| \| \| \| \| \| \|	Add support for selecting simple G_LOAD and G_FRAME_INDEX instructions (32-bit scalars only). This will be useful for functions that need to pass arguments on the stack. First part of https://reviews.llvm.org/D27195. llvm-svn: 290096
*	[XRay] Fix assertion failure on empty machine basic blocks (PR 31424)	Dean Michael Berris	2016-12-19	2	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The original version of the code in XRayInstrumentation.cpp assumed that functions may not have empty machine basic blocks (or that the first one couldn't be). This change addresses that by special-casing that specific situation. We provide two .mir test-cases to make sure we're handling this appropriately. Fixes llvm.org/PR31424. Reviewers: chandlerc Subscribers: varno, llvm-commits Differential Revision: https://reviews.llvm.org/D27913 llvm-svn: 290091
*	Revert r289955 and r289962. This is causing lots of ASAN failures for us.	Daniel Jasper	2016-12-18	1	-41/+0
\| \| \| \| \| \| \| \|	Not sure whether it causes and ASAN false positive or whether it actually leads to incorrect code or whether it even exposes bad code. Hans, I'll get you instructions to reproduce this. llvm-svn: 290066
*	[X86][SSE] Add support for combining target shuffles to SHUFPS.	Simon Pilgrim	2016-12-18	11	-113/+93
\| \| \| \| \| \|	As discussed on D27692, the next step will be to allow cross-domain shuffles once the combined shuffle depth passes a certain point. llvm-svn: 290064
*	[X86][SSE][AVX-512] Convert FAND/FOR/FXOR/FANDN nodes to integer operations ↵	Craig Topper	2016-12-18	4	-45/+58
\| \| \| \| \| \| \| \| \| \| \| \|	if they are available. This will allow a bunch of patterns to be removed. These nodes are only emitted for lowering FABS/FNEG/FNABS/FCOPYSIGN. Ideally we just wouldn't create these nodes if SSE2 or higher is available, but it was simple to just convert them in DAG combine. For SSE2, AVX, and AVX512 with DQI this is no functional change as the execution domain fixing pass ensures the right domain is selected regardless of the ISD opcode. For AVX-512 without DQI we end up using integer instructions since the floating point versions aren't available. But we were already doing that for any logical operations in code that didn't come from FABS/FNEG/FNABS/FCOPYSIGN so this seems no worse. And we get the benefit of being able to fold broadcasts now. llvm-svn: 290060
*	[AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when ↵	Craig Topper	2016-12-18	1	-1/+25
\| \| \| \| \| \| \| \|	DQI and VLX instructions are available. This can give the register allocator more registers to use. llvm-svn: 290057
*	[AVX-512] Make sure VLX is also enabled before using EVEX encoded logic ops ↵	Craig Topper	2016-12-18	1	-1/+1
\| \| \| \| \| \|	for scalars. I missed this in r290049. llvm-svn: 290055
*	AMDGPU: Fix broken check prefix in test	Matt Arsenault	2016-12-17	1	-10/+7
\| \| \| \|	llvm-svn: 290050
*	[AVX-512] Use EVEX encoded logic operations for scalar types when they are ↵	Craig Topper	2016-12-17	1	-4/+4
\| \| \| \| \| \|	available. This gives the register allocator more registers to work with. llvm-svn: 290049
*	[AVX-512] Update scalar logic test to show missed opportunity to use EVEX ↵	Craig Topper	2016-12-17	1	-19/+40
\| \| \| \| \| \|	encoded logic instructions to get more registers to use. llvm-svn: 290048
*	Revert "AArch64CollectLOH: Rewrite as block-local analysis."	Matthias Braun	2016-12-17	4	-194/+9
\| \| \| \| \| \| \| \|	It is still breaking Chrome. http://llvm.org/PR31361 This reverts commit r290026. llvm-svn: 290047
*	Move test to correct directory	Matthias Braun	2016-12-17	1	-0/+0
\| \| \| \| \| \|	See also test/CodeGen/MIR/README llvm-svn: 290032
*	AArch64CollectLOH: Rewrite as block-local analysis.	Matthias Braun	2016-12-17	4	-9/+194
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-apply r288561: Liveness tracking should be correct now after r290014. Previously this pass was using up to 5% compile time in some cases which is a bit much for what it is doing. The pass featured a full blown data-flow analysis which in the default configuration was restricted to a single block. This rewrites the pass under the assumption that we only ever work on a single block. This is done in a single pass maintaining a state machine per general purpose register to catch LOH patterns. Differential Revision: https://reviews.llvm.org/D27329 llvm-svn: 290026
*	AArch64: Enable post-ra liveness updates	Matthias Braun	2016-12-16	2	-6/+6
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D27559 llvm-svn: 290014
*	[CodeGenPrep] Skip merging empty case blocks	Jun Bum Lim	2016-12-16	4	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly : Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289988
*	Revert "[IR] Remove the DIExpression field from DIGlobalVariable."	Adrian Prantl	2016-12-16	15	-140/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 289920 (again). I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable has not DIExpression. Unfortunately it is not possible to safely upgrade these variables without adding a flag to the bitcode record indicating which version they are. My plan of record is to roll the planned follow-up patch that adds a unit: field to DIGlobalVariable into this patch before recomitting. This way we only need one Bitcode upgrade for both changes (with a version flag in the bitcode record to safely distinguish the record formats). Sorry for the churn! llvm-svn: 289982
*	[ARM] Add ARMISD::VLD1DUP to match vld1_dup more consistently.	Eli Friedman	2016-12-16	2	-6/+223
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, there are substantial problems forming vld1_dup even if the VDUP survives legalization. The lack of an actual node leads to terrible results: not only can we not form post-increment vld1_dup instructions, but we form scalar pre-increment and post-increment loads which force the loaded value into a GPR. This patch fixes that by combining the vdup+load into an ARMISD node before DAGCombine messes it up. Also includes a crash fix for vld2_dup (see testcase @vld2dupi8_postinc_variable). Recommiting with fix to avoid forming vld1dup if the type of the load doesn't match the type of the vdup (see https://llvm.org/bugs/show_bug.cgi?id=31404). Differential Revision: https://reviews.llvm.org/D27694 llvm-svn: 289972
*	Revert "[CodeGenPrep] Skip merging empty case blocks"	Jun Bum Lim	2016-12-16	2	-6/+5
\| \| \| \| \| \|	This reverts commit r289951. llvm-svn: 289960
*	[X86] Fold (setcc (cmp (atomic_load_add x, -C) C), COND) to (setcc (LADD x, ↵	Hans Wennborg	2016-12-16	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-C), COND) (PR31367) atomic_load_add returns the value before addition, but sets EFLAGS based on the result of the addition. That means it's setting the flags based on effectively subtracting C from the value at x, which is also what the outer cmp does. This targets a pattern that occurs frequently with reference counting pointers: void decrement(long volatile *ptr) { if (_InterlockedDecrement(ptr) == 0) release(); } Clang would previously compile it (for 32-bit at -Os) as: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: 31 c9 xor %ecx,%ecx 6: 49 dec %ecx 7: f0 0f c1 08 lock xadd %ecx,(%eax) b: 83 f9 01 cmp $0x1,%ecx e: 0f 84 00 00 00 00 je 14 <?decrement@@YAXPCJ@Z+0x14> 14: c3 ret and with this patch it becomes: 00000000 <?decrement@@YAXPCJ@Z>: 0: 8b 44 24 04 mov 0x4(%esp),%eax 4: f0 ff 08 lock decl (%eax) 7: 0f 84 00 00 00 00 je d <?decrement@@YAXPCJ@Z+0xd> d: c3 ret (Equivalent variants with _InterlockedExchangeAdd, std::atomic<>'s fetch_add or pre-decrement operator generate the same code.) Differential Revision: https://reviews.llvm.org/D27781 llvm-svn: 289955
*	[CodeGenPrep] Skip merging empty case blocks	Jun Bum Lim	2016-12-16	2	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block: Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting. Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D22696 llvm-svn: 289951
*	[X86][AVX512] use a single shufps for 512-bit vectors when it can save ↵	Simon Pilgrim	2016-12-16	1	-8/+3
\| \| \| \| \| \| \| \| \| \| \| \|	instructions This is the 512-bit counterpart to the 128-bit transform checked in here: https://reviews.llvm.org/rL289837 This patch is based on the draft by @sroland (Roland Scheidegger) that is attached to PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 llvm-svn: 289946
*	[X86][AVX512] Add tests showing missed opportunity to efficiently lower ↵	Simon Pilgrim	2016-12-16	1	-0/+32
\| \| \| \| \| \|	v16i32 to VSHUFPS (PR27885) llvm-svn: 289945
*	[ARM] GlobalISel: Select add i32, i32	Diana Picus	2016-12-16	5	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add the minimal support necessary to select a function that returns the sum of two i32 values. This includes some support for argument/return lowering of i32 values through registers, as well as the handling of copy and add instructions throughout the GlobalISel pipeline. Differential Revision: https://reviews.llvm.org/D26677 llvm-svn: 289940
*	[X86][SSE] Combine shuffles to MOVSS/MOVSD whatever the domain.	Simon Pilgrim	2016-12-16	1	-6/+2
\| \| \| \| \| \|	We already do the same thing in shuffle lowering; but don't do it if we have SSE41 (PBLEND) instead. llvm-svn: 289937
*	[AVR] Add a test for 64-bit left shifts	Dylan McKay	2016-12-16	1	-0/+8
\| \| \| \|	llvm-svn: 289936
*	Extra coverage tests to demonstrate fixes in D72618 and D26855	Andrew V. Tischenko	2016-12-16	2	-0/+334
\| \| \| \|	llvm-svn: 289931
*	Revert r289638: [PowerPC] Fix logic dealing with nop after calls (and ↵	Chandler Carruth	2016-12-16	2	-133/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	tail-call eligibility) This patch appears to result in trampolines in vtables being miscompiled when they in turn tail call a method. I've posted some preliminary details about the failure on the thread for this commit and talked to Hal. He was comfortable going ahead and reverting until we sort out what is wrong. llvm-svn: 289928
*	Revert 279703, it caused PR31404.	Nico Weber	2016-12-16	2	-163/+6
\| \| \| \|	llvm-svn: 289923
*	[IR] Remove the DIExpression field from DIGlobalVariable.	Adrian Prantl	2016-12-16	15	-140/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression. Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this: (1) The DIGlobalVariable should describe the source level variable, not how to get to its location. (2) It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable. (3) It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_LLVM_fragment-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons. This reapplies r289902 with additional testcase upgrades. <rdar://problem/29250149> https://llvm.org/bugs/show_bug.cgi?id=31013 Differential Revision: https://reviews.llvm.org/D26769 llvm-svn: 289920