bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[AMDGPU] Fix typo in GCNSchedStrategy	Valery Pykhtin	2017-01-26	1	-8/+3
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D28980 llvm-svn: 293171
*	AMDGPU: Fold fneg into round instructions	Matt Arsenault	2017-01-26	3	-11/+99
\| \| \| \|	llvm-svn: 293127
*	AMDGPU: Set call_convention bit in kernel_code_t	Matt Arsenault	2017-01-25	1	-0/+2
\| \| \| \| \| \| \|	According to the documentation this is supposed to be -1 if indirect calls are not supported. llvm-svn: 293081
*	AMDGPU: Check nsz instead of unsafe math	Matt Arsenault	2017-01-25	2	-3/+3
\| \| \| \|	llvm-svn: 293028
*	DAG: Recognize no-signed-zeros-fp-math attribute	Matt Arsenault	2017-01-25	2	-0/+80
\| \| \| \| \| \| \| \|	clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024
*	DAGCombiner: Allow negating ConstantFP after legalize	Matt Arsenault	2017-01-25	1	-2/+1
\| \| \| \|	llvm-svn: 293019
*	AMDGPU: Implement early ifcvt target hooks.	Matt Arsenault	2017-01-25	4	-2/+567
\| \| \| \| \| \| \| \| \| \| \| \|	Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016
*	AMDGPU: Remove spurious out branches after a kill	Matt Arsenault	2017-01-24	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sequence like this: v_cmpx_le_f32_e32 vcc, 0, v0 s_branch BB0_30 s_cbranch_execnz BB0_30 ; BB#29: exp null off, off, off, off done vm s_endpgm BB0_30: ; %endif110 is likely wrong. The s_branch instruction will unconditionally jump to BB0_30 and the skip block (exp done + endpgm) inserted for performing the kill instruction will never be executed. This results in a GPU hang with Star Ruler 2. The s_branch instruction is added during the "Control Flow Optimizer" pass which seems to re-organize the basic blocks, and we assume that SI_KILL_TERMINATOR is always the last instruction inside a basic block. Thus, after inserting a skip block we just go to the next BB without looking at the subsequent instructions after the kill, and the s_branch op is never removed. Instead, we should remove the unconditional out branches and let skip the two instructions if the exec mask is non-zero. This patch fixes the GPU hang and doesn't introduce any regressions with "make check". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019 Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 292985
*	Enable FeatureFlatForGlobal on Volcanic Islands	Matt Arsenault	2017-01-24	273	-377/+374
\| \| \| \| \| \| \| \| \| \| \|	This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982
*	AMDGPU/SI: Give up in promote alloca when a pointer may be captured.	Changpeng Fang	2017-01-24	1	-0/+47
\| \| \| \| \| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D28970 Reviewer: Matt llvm-svn: 292966
*	[AMDGPU] Add VGPR copies post regalloc fix pass	Stanislav Mekhanoshin	2017-01-24	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 llvm-svn: 292956
*	AMDGPU : Add trap handler support.	Wei Ding	2017-01-24	1	-5/+4
\| \| \| \|	llvm-svn: 292893
*	AMDGPU: Custom lower more vector operations	Matt Arsenault	2017-01-23	4	-18/+513
\| \| \| \| \| \|	This avoids stack usage. llvm-svn: 292846
*	DAG: Don't fold vector extract into load if target doesn't want to	Matt Arsenault	2017-01-23	1	-0/+31
\| \| \| \| \| \| \|	Fixes turning a 32-bit scalar load into an extending vector load for AMDGPU when dynamically indexing a vector. llvm-svn: 292842
*	AMDGPU: Combine fp16/fp64 subtarget features	Matt Arsenault	2017-01-23	10	-45/+90
\| \| \| \| \| \| \|	The same control register controls both, and are set to the same defaults. Keep the old names around as aliases. llvm-svn: 292837
*	DAG: Allow legalization of fcanonicalize vector types	Matt Arsenault	2017-01-23	1	-0/+214
\| \| \| \|	llvm-svn: 292814
*	Fix some broken CHECK lines.	Benjamin Kramer	2017-01-22	2	-3/+3
\| \| \| \| \| \|	The colon is important. llvm-svn: 292761
*	AMDGPU/R600: Serialize vector trunc stores to private AS	Jan Vesely	2017-01-20	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \|	Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651
*	[test] Remove a unwanted match for `XFAIL:`.	Greg Parker	2017-01-20	1	-1/+1
\| \| \| \|	llvm-svn: 292567
*	[AMDGPU] Prevent spills before exec mask is restored	Stanislav Mekhanoshin	2017-01-20	1	-0/+78
\| \| \| \| \| \| \| \| \| \| \| \| \|	Inline spiller can decide to move a spill as early as possible in the basic block. It will skip phis and label, but we also need to make sure it skips instructions in the basic block prologue which restore exec mask. Added isPositionLike callback in TargetInstrInfo to detect instructions which shall be skipped in addition to common phis, labels etc. Differential Revision: https://reviews.llvm.org/D27997 llvm-svn: 292554
*	AMDGPU: Disable some fneg combines unless nsz	Matt Arsenault	2017-01-19	2	-41/+106
\| \| \| \| \| \| \| \| \| \| \| \|	For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473
*	AMDGPU: Remove modifiers from v_div_scale_*	Matt Arsenault	2017-01-19	2	-3/+5
\| \| \| \| \| \| \| \|	They seem to produce nonsense results when used. This should be applied to the release branch. llvm-svn: 292472
*	[AMDGPU] Do not allow register coalescer to create big superregs	Stanislav Mekhanoshin	2017-01-18	2	-9/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Limit register coalescer by not allowing it to artificially increase size of registers beyond dword. Such super-registers are in fact register sequences and not distinct HW registers. With more super-regs we would need to allocate adjacent registers and constraint regalloc more than needed. Moreover, our super registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2, VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers allocation even more, resulting in excessive spilling. Differential Revision: https://reviews.llvm.org/D28782 llvm-svn: 292413
*	DAG: Consider nnan in isKnownNeverNaN	Matt Arsenault	2017-01-18	1	-0/+16
\| \| \| \|	llvm-svn: 292328
*	AMDGPU: Add replacement export intrinsics	Matt Arsenault	2017-01-17	2	-0/+627
\| \| \| \|	llvm-svn: 292205
*	ADMGPU/EG,CM: Implement _noret global atomics	Jan Vesely	2017-01-16	1	-0/+542
\| \| \| \| \| \| \| \|	_RTN versions will be a lot more complicated Differential Revision: https://reviews.llvm.org/D28067 llvm-svn: 292162
*	[AMDGPU] Implement f16 fcopysign and fcopysign(f32, f64)	Konstantin Zhuravlyov	2017-01-13	1	-0/+262
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28496 llvm-svn: 291954
*	AMDGPU: Skip fneg/select combine if it can fold into other	Matt Arsenault	2017-01-12	2	-0/+159
\| \| \| \|	llvm-svn: 291792
*	AMDGPU: Fold free fneg into sin	Matt Arsenault	2017-01-12	1	-0/+42
\| \| \| \|	llvm-svn: 291790
*	AMDGPU: Fold fneg into fmul_legacy	Matt Arsenault	2017-01-12	1	-0/+178
\| \| \| \|	llvm-svn: 291784
*	AMDGPU: Fold fneg into rcp	Matt Arsenault	2017-01-12	1	-0/+100
\| \| \| \|	llvm-svn: 291779
*	AMDGPU: Fold fneg into fp_round	Matt Arsenault	2017-01-12	1	-0/+172
\| \| \| \|	llvm-svn: 291778
*	AMDGPU: Fold fneg into fp_extend	Matt Arsenault	2017-01-12	1	-0/+126
\| \| \| \|	llvm-svn: 291777
*	AMDGPU: Fold fneg into fma or fmad	Matt Arsenault	2017-01-12	1	-0/+308
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291733
*	AMDGPU: Fold fneg into fmul	Matt Arsenault	2017-01-12	3	-12/+189
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291732
*	AMDGPU: Fold fneg into fadd	Matt Arsenault	2017-01-12	1	-0/+179
\| \| \| \| \| \|	Patch mostly by Fiona Glaser llvm-svn: 291731
*	AMDGPU: Pull fneg/fabs out of a select	Matt Arsenault	2017-01-11	2	-2/+729
\| \| \| \| \| \|	Allows better source modifier usage. llvm-svn: 291729
*	AMDGPU: Fix shrinking of addc/subb.	Matt Arsenault	2017-01-11	1	-0/+292
\| \| \| \| \| \|	To shrink to VOP2 the input carry must also be VCC. llvm-svn: 291720
*	AMDGPU: Fix sext_inreg for i1 in i16	Matt Arsenault	2017-01-11	1	-0/+133
\| \| \| \| \| \| \| \|	This produces worse code when i16 is legal, mostly due to combines getting confused by conversions inserted for uniform 16-bit operations. llvm-svn: 291717
*	AMDGPU: Fix breaking VOP3 v_add_i32s	Matt Arsenault	2017-01-11	1	-0/+305
\| \| \| \| \| \| \|	This was shrinking the instruction even though the carry output register was a virtual register, not known VCC. llvm-svn: 291716
*	AMDGPU: Fix folding immediates into mac src2	Matt Arsenault	2017-01-11	1	-0/+66
\| \| \| \| \| \| \|	Whether it is legal or not needs to check for the instruction it will be replaced with. llvm-svn: 291711
*	Revert "CodeGen: Allow small copyable blocks to "break" the CFG."	Kyle Butt	2017-01-11	4	-51/+19
\| \| \| \| \| \| \| \| \|	This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded. This needs a simple probability check because there are some cases where it is not profitable. llvm-svn: 291695
*	DAGCombiner: Add hasOneUse checks to fadd/fma combine	Matt Arsenault	2017-01-11	1	-0/+262
\| \| \| \| \| \| \| \|	Even with aggressive fusion enabled, this requires duplicating the fmul, or increases an fadd to another fma which is not an improvement. llvm-svn: 291642
*	AMDGPU/EG,CM: Add fp16 conversion instructions	Jan Vesely	2017-01-11	4	-35/+49
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28164 llvm-svn: 291622
*	AMDGPU: Constant fold when immediate is materialized	Matt Arsenault	2017-01-10	1	-0/+858
\| \| \| \| \| \|	In future commits these patterns will appear after moveToVALU changes. llvm-svn: 291615
*	CodeGen: Allow small copyable blocks to "break" the CFG.	Kyle Butt	2017-01-10	4	-19/+51
\| \| \| \| \| \| \| \| \| \| \|	When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well. Differential revision: https://reviews.llvm.org/D27742 llvm-svn: 291609
*	DAG: Avoid OOB when legalizing vector indexing	Matt Arsenault	2017-01-10	2	-4/+11
\| \| \| \| \| \| \| \| \|	If a vector index is out of bounds, the result is supposed to be undefined but is not undefined behavior. Change the legalization for indexing the vector on the stack so that an out of bounds index does not create an out of bounds memory access. llvm-svn: 291604
*	AMDGPU: Add tests for HasMultipleConditionRegisters	Matt Arsenault	2017-01-10	1	-0/+161
\| \| \| \| \| \|	This was enabled without many specific tests or the comment. llvm-svn: 291586
*	AMDGPU: Add Assert[SZ]Ext during argument load creation	Matt Arsenault	2017-01-09	1	-75/+97
\| \| \| \| \| \| \| \| \| \| \|	For i16 zeroext arguments when i16 was a legal type, the known bits information from the truncate was lost. Insert a zeroext so the known bits optimizations work with the 32-bit loads. Fixes code quality regressions vs. SI in min.ll test. llvm-svn: 291461
*	[SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486.	Bjorn Pettersson	2017-01-09	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Originally i64 = umax t8, Constant:i64<4> was expanded into i32,i32 = umax Constant:i32<0>, Constant:i32<0> i32,i32 = umax t7, Constant:i32<4> Now instead the two produced umax:es return i32 instead of i32, i32. Thanks to Jan Vesely for help with the test case. Patch by mikael.holmen at ericsson.com Reviewers: bogner, jvesely, tstellarAMD, arsenm Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D28135 llvm-svn: 291441