bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[SystemZ] Add support for llvm.thread.pointer intrinsic.	Marcin Koscielnicki	2016-04-20	1	-0/+14
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D19054 llvm-svn: 266844
*	[LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.	Mandeep Singh Grang	2016-04-19	50	-53/+53
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests. Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier Subscribers: mcrosier, dsanders Differential Revision: http://reviews.llvm.org/D19279 llvm-svn: 266834
*	ARM: fix assertion failure on -O0 cmpxchg.	Tim Northover	2016-04-19	1	-0/+21
\| \| \| \| \| \| \| \|	Because lowering of CMP_SWAP_64 occurs during type legalization, there can be i64 types produced by more than just a BUILD_PAIR or similar. My initial tests used just incoming function args. llvm-svn: 266828
*	Add IntrWrite[Arg]Mem intrinsic property	Nicolai Haehnle	2016-04-19	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This property is used to mark an intrinsic that only writes to memory, but neither reads from memory nor has other side effects. An example where this is useful is the llvm.amdgcn.buffer.store.format.* intrinsic, which corresponds to a store instruction that goes through a special buffer descriptor rather than through a plain pointer. With this property, the intrinsic should still be handled as having side effects at the LLVM IR level, but machine scheduling can make smarter decisions. Reviewers: tstellarAMD, arsenm, joker.eph, reames Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18291 llvm-svn: 266826
*	AMDGPU: Guard VOPC instructions against incorrect commute	Nicolai Haehnle	2016-04-19	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825
*	[Hexagon] Fix operand swapping in HexagonPeephole	Krzysztof Parzyszek	2016-04-19	1	-0/+30
\| \| \| \| \| \|	Also, disable zero- and size-extend optimizations for now. llvm-svn: 266821
*	[AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic.	Marcin Koscielnicki	2016-04-19	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that just return the thread pointer. I have a pending patch that does the same for SystemZ (D19054), and there are many more targets that could benefit from one. This patch merges the ARM and AArch64 intrinsics into a single target independent one that will also be used by subsequent targets. Differential Revision: http://reviews.llvm.org/D19098 llvm-svn: 266818
*	[Hexagon] Fix printing the address operand of S2_storerinewabs	Krzysztof Parzyszek	2016-04-19	2	-1/+17
\| \| \| \|	llvm-svn: 266811
*	[PPC, SSP] Support PowerPC Linux stack protection.	Tim Shen	2016-04-19	1	-3/+14
\| \| \| \|	llvm-svn: 266809
*	[SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARD	Tim Shen	2016-04-19	8	-15/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this change, ideally IR pass can always generate llvm.stackguard call to get the stack guard; but for now there are still IR form stack guard customizations around (see getIRStackGuard()). Future SSP customization should go through LOAD_STACK_GUARD. There is a behavior change: stack guard values are not CSEed anymore, since we should never reuse the value in case that it has been spilled (and corrupted). See ssp-guard-spill.ll. This also cause the change of stack size and codegen in X86 and AArch64 test cases. Ideally we'd like to know if the guard created in llvm.stackprotector() gets spilled or not. If the value is spilled, discard the value and reload stack guard; otherwise reuse the value. This can be done by teaching register allocator to know how to rematerialize LOAD_STACK_GUARD and force a rematerialization (which seems hard), or check for spilling in expandPostRAPseudo. It only makes sense when the stack guard is a global variable, which requires more instructions to load. Anyway, this seems to go out of the scope of the current patch. llvm-svn: 266806
*	[lanai] Add lowering for SETCCE i32.	Jacques Pienaar	2016-04-19	2	-1/+109
\| \| \| \| \| \| \| \|	* Add lowering for SETCCE i32. * Add test to check lowering of i64 compares uses SETCCE expansion (outside of EQ and NE). * Fix select.ll test and immediate form selection for RI operations. llvm-svn: 266802
*	[X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary ↵	Simon Pilgrim	2016-04-19	5	-32/+48
\| \| \| \| \| \| \| \| \| \|	shuffles Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not. Differential Revision: http://reviews.llvm.org/D19228 llvm-svn: 266728
*	Introduce a "patchable-function" function attribute	Sanjoy Das	2016-04-19	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The `"patchable-function"` attribute can be used by an LLVM client to influence LLVM's code generation in ways that makes the generated code easily patchable at runtime (for instance, to redirect control). Right now only one patchability scheme is supported, `"prologue-short-redirect"`, but this can be expanded in the future. Reviewers: joker.eph, rnk, echristo, dberris Subscribers: joker.eph, echristo, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19046 llvm-svn: 266715
*	ARM: use a pseudo-instruction for cmpxchg at -O0.	Tim Northover	2016-04-18	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fast register-allocator cannot cope with inter-block dependencies without spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions where any value produced within a block is dead by the end, but not for cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded after regalloc. Fortunately this is at -O0 so we don't have to care about performance. This simplifies the various axes of expansion considerably: we assume a strong seq_cst operation and ensure ordering via the always-present DMB instructions rather than v8 acquire/release instructions. Should fix the 32-bit part of PR25526. llvm-svn: 266679
*	[X86][SSE] Test case for PR2585	Simon Pilgrim	2016-04-18	1	-0/+32
\| \| \| \|	llvm-svn: 266669
*	[X86][AVX] Added extra memory folding tests for D19228	Simon Pilgrim	2016-04-18	1	-0/+25
\| \| \| \|	llvm-svn: 266662
*	[X86][AVX] Added zero+blend vs vperm2f128 optsize tests cases (PR22984)	Simon Pilgrim	2016-04-18	1	-0/+69
\| \| \| \| \| \|	We should be trying to use vperm2f128 instead of zero+blend (if we're only the user of zero?) when optsize is enabled. llvm-svn: 266632
*	[AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt	Konstantin Zhuravlyov	2016-04-18	1	-0/+75
\| \| \| \| \| \| \| \| \| \| \|	Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626
*	[X86][AVX] Renamed vperm2f128 test to make it quicker to review	Simon Pilgrim	2016-04-18	1	-23/+23
\| \| \| \| \| \|	missed one the first time round... llvm-svn: 266623
*	[X86][AVX] Renamed vperm2f128 tests to make it quicker to review	Simon Pilgrim	2016-04-18	1	-79/+79
\| \| \| \|	llvm-svn: 266621
*	[PowerPC] add comment to test	Strahinja Petrovic	2016-04-18	1	-0/+2
\| \| \| \| \| \| \|	Added comment in test for soft-float operations on ppc architecture. Test commit. llvm-svn: 266600
*	[X86][SSE] Added 16i8 -> 8i64 sext test	Simon Pilgrim	2016-04-17	1	-1/+125
\| \| \| \| \| \|	Shows poor codegen for AVX2 llvm-svn: 266560
*	[AVX512] ISD::MUL v2i64/v4i64 should only be legal if DQI and VLX features ↵	Craig Topper	2016-04-17	1	-0/+122
\| \| \| \| \| \|	are enabled. llvm-svn: 266554
*	[X86][AVX] Add shuffle combine tests for MOVDDUP/MOVSHDUP/MOVSLDUP	Simon Pilgrim	2016-04-16	2	-0/+121
\| \| \| \| \| \|	128, 256 and 512 bit implementations (some not yet supported by combineX86ShuffleChain) llvm-svn: 266535
*	[X86][XOP] Added VPPERM constant mask decoding and target shuffle combining ↵	Simon Pilgrim	2016-04-16	1	-3/+16
\| \| \| \| \| \| \| \|	support Added additional test that peeks through bitcast to v16i8 mask llvm-svn: 266533
*	[X86][XOP] More VPPERM shuffle mask decode tests	Simon Pilgrim	2016-04-16	1	-0/+104
\| \| \| \| \| \|	As requested by D18441 llvm-svn: 266531
*	AMDGPU: Enable LocalStackSlotAllocation pass	Matt Arsenault	2016-04-16	2	-20/+72
\| \| \| \| \| \| \| \| \| \| \|	This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508
*	AMDGPU: Use s_addk_i32 / s_mulk_i32	Matt Arsenault	2016-04-16	5	-6/+140
\| \| \| \|	llvm-svn: 266506
*	Don't skip splitSeparateComponents in eliminateDeadDefs for ↵	Wei Mi	2016-04-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	HoistSpillHelper::hoistAllSpills. Because HoistSpillHelper::hoistAllSpills is called in postOptimization, before the patch we didn't want LiveRangeEdit::eliminateDeadDefs to call splitSeparateComponents and generate unassigned new vregs. However, skipping splitSeparateComponents will make verify-machineinstrs unhappy, so I remove the early return, and use HoistSpillHelper::LRE_DidCloneVirtReg to assign physreg/stackslot for those new vregs. In addition, some code reorganization to make class HoistSpillHelper privately inheriting from LiveRangeEdit::Delegate possible. This is to be consistent with class RAGreedy and class RegisterCoalescer. Differential Revision: http://reviews.llvm.org/D19142 llvm-svn: 266489
*	Switch lowering: don't add incoming PHI values from skipped bit test MBB's ↵	Hans Wennborg	2016-04-15	1	-2/+32
\| \| \| \| \| \| \| \| \| \| \|	(PR27135) After r245976, LLVM will skip the last bit test case if knows it will always be true. However, we would still erroneously update PHI nodes with incoming values from the MBB that would perform the final bit test, causing -verify-machineinstrs to fail. llvm-svn: 266479
*	Let the DISubprogram in this test point to the right compile unit.	Adrian Prantl	2016-04-15	1	-1/+1
\| \| \| \|	llvm-svn: 266468
*	Update testcase to new debug metadata format.	Adrian Prantl	2016-04-15	1	-3/+2
\| \| \| \|	llvm-svn: 266467
*	[AArch64] Add load/store pair instructions to getMemOpBaseRegImmOfsWidth().	Chad Rosier	2016-04-15	2	-1/+35
\| \| \| \| \| \| \| \| \|	This improves AA in the MI schduler when reason about paired instructions. Phabricator Revision: http://reviews.llvm.org/D17098 PR26358 llvm-svn: 266462
*	[SystemZ] Fix large tests broken by conditional returns.	Marcin Koscielnicki	2016-04-15	9	-0/+27
\| \| \| \| \| \| \| \|	These were broken by D17339. Differential Revision: http://reviews.llvm.org/D19158 llvm-svn: 266454
*	Fix test to require Asserts since it uses debug output.	Geoff Berry	2016-04-15	1	-0/+1
\| \| \| \|	llvm-svn: 266448
*	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram.	Adrian Prantl	2016-04-15	75	-285/+223
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
*	llvm/test/CodeGen/AArch64/arm64-csldst-mmo.ll requires +Asserts.	NAKAMURA Takumi	2016-04-15	1	-0/+1
\| \| \| \|	llvm-svn: 266443
*	[AArch64] Add MMOs to callee-save load/store instructions.	Geoff Berry	2016-04-15	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without MMOs, the callee-save load/store instructions were treated as volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17661 llvm-svn: 266439
*	Fix typing on generated LXV2DX/STXV2DX instructions	Nirav Dave	2016-04-15	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[PPC] Previously when casting generic loads to LXV2DX/ST instructions we would leave the original load return type in place allowing for an assertion failure when we merge two equivalent LXV2DX nodes with different types. This fixes PR27350. Reviewers: nemanjai Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19133 llvm-svn: 266438
*	[MachineScheduler]Add support for store clustering	Jun Bum Lim	2016-04-15	4	-18/+167
\| \| \| \| \| \| \| \| \| \| \| \|	Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437
*	AMDGPU/SI: Fix regression with no-return atomics	Nicolai Haehnle	2016-04-15	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433
*	Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX ↵	Justin Lebar	2016-04-15	1	-0/+24
\| \| \| \| \| \|	target. llvm-svn: 266403
*	AMDGPU: Include LDS size in printed comment	Matt Arsenault	2016-04-14	1	-4/+10
\| \| \| \|	llvm-svn: 266382
*	AMDGPU: Run SIFoldOperands after PeepholeOptimizer	Matt Arsenault	2016-04-14	15	-45/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378
*	AMDGPU: Fold bitcasts of scalar constants to vectors	Matt Arsenault	2016-04-14	4	-50/+49
\| \| \| \| \| \| \|	This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376
*	AMDGPU: Add skeleton GlobalIsel implementation	Tom Stellard	2016-04-14	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356
*	[lanai] Add custom lowering for SRL_PARTS i32.	Jacques Pienaar	2016-04-14	1	-0/+12
\| \| \| \|	llvm-svn: 266349
*	[DivergenceAnalysis] Treat PHI with incoming undef as constant	Nicolai Haehnle	2016-04-14	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If a PHI has an incoming undef, we can pretend that it is equal to one non-undef, non-self incoming value. This is particularly relevant in combination with the StructurizeCFG pass, which introduces PHI nodes with undefs. Previously, this lead to branch conditions that were uniform before StructurizeCFG to become non-uniform afterwards, which confused the SIAnnotateControlFlow pass. This fixes a crash when Mesa radeonsi compiles a shader from dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex Reviewers: arsenm, tstellarAMD, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19013 llvm-svn: 266347
*	AMDGPU: Remove SIFixSGPRLiveRanges pass	Nicolai Haehnle	2016-04-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345
*	AArch64: expand cmpxchg after regalloc at -O0.	Tim Northover	2016-04-14	1	-0/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FastRegAlloc works only at the basic-block level and spills all live-out registers. Unfortunately for a stack-based cmpxchg near the spill slots, this can perpetually clear the exclusive monitor, which means the cmpxchg will never succeed. I believe the only way to handle this within LLVM is by expanding the loop post-regalloc. We don't want this in general because it severely limits the optimisations that can be done, so we limit this to -O0 compilations. It's an ugly hack, and about the one good point in the whole mess is that we can treat all cmpxchg operations in the most naive way possible (seq_cst, no clrex faff) without affecting correctness. Should fix PR25526. llvm-svn: 266339