bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AMDGPU: fdiv -1, x -> rcp -x	Matt Arsenault	2016-08-02	1	-16/+25
\| \| \| \|	llvm-svn: 277535
*	AMDGPU: Stay in WQM for non-intrinsic stores	Nicolai Haehnle	2016-08-02	6	-10/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays. For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway. This is a candidate for the 3.9 release branch. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D22675 llvm-svn: 277504
*	AMDGPU: Track physical registers in SIWholeQuadMode	Nicolai Haehnle	2016-08-02	1	-26/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are cases where uniform branch conditions are computed in VGPRs, and we didn't correctly mark those as WQM. The stray change in basic-branch.ll is because invoking the LiveIntervals analysis leads to the detection of a dead register that would otherwise not be seen at -O0. This is a candidate for the 3.9 branch, as it fixes a possible hang. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22673 llvm-svn: 277500
*	[AMDGPU] refactor DS instruction definitions. NFC.	Valery Pykhtin	2016-08-01	7	-608/+896
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D22522 llvm-svn: 277344
*	[AMDGPU] Fix lifetime of SmallVector temporaries.	Benjamin Kramer	2016-07-30	1	-6/+4
\| \| \| \| \| \|	Found by asan -fsanitize-address-use-after-scope. llvm-svn: 277265
*	AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior	Matt Arsenault	2016-07-30	1	-2/+2
\| \| \| \| \| \| \|	This should really be true for any immediate, not just inline ones. llvm-svn: 277260
*	AMDGPU: Set s_setpc_b64 as a terminator	Matt Arsenault	2016-07-30	1	-0/+3
\| \| \| \|	llvm-svn: 277259
*	AMDGPU: Remove unused pattern	Matt Arsenault	2016-07-30	1	-8/+7
\| \| \| \|	llvm-svn: 277258
*	TargetInstrInfo: add virtual function getInstSizeInBytes	Sjoerd Meijer	2016-07-29	1	-1/+1
\| \| \| \| \| \| \| \| \|	This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of subclasses already implement. Differential Revision: https://reviews.llvm.org/D22885 llvm-svn: 277126
*	AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.	Changpeng Fang	2016-07-28	1	-0/+2
\| \| \| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D22021 Reviewed by: arsenm llvm-svn: 277073
*	MachineFunction: Return reference for getFrameInfo(); NFC	Matthias Braun	2016-07-28	9	-44/+44
\| \| \| \| \| \| \|	getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
*	AMDGPU : Add intrinsics for compare with the full wavefront result	Wei Ding	2016-07-28	5	-0/+103
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998
*	AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling	Tom Stellard	2016-07-28	4	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980
*	AMDGPU: add execfix flag to SI_ELSE	Nicolai Haehnle	2016-07-28	3	-10/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 llvm-svn: 276972
*	AMDGPU: Turn dead checks into asserts	Matt Arsenault	2016-07-28	1	-9/+5
\| \| \| \|	llvm-svn: 276946
*	AMDGPU: Remove analyzeImmediate	Matt Arsenault	2016-07-28	3	-34/+12
\| \| \| \| \| \| \|	This no longer uses the more complicated classification of constants. llvm-svn: 276945
*	Remove MCAsmInfo.h include from TargetOptions.h	Reid Kleckner	2016-07-27	1	-0/+1
\| \| \| \| \| \| \| \| \|	TargetOptions wants the ExceptionHandling enum. Move that to MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in clang. Now you can add a DWARF tag without a full rebuild of clang semantic analysis. llvm-svn: 276883
*	[GlobalISel] Introduce an instruction selector.	Ahmed Bougacha	2016-07-27	1	-0/+5
\| \| \| \| \| \| \| \|	And implement it for AArch64, supporting x/w ADD/OR. Differential Revision: https://reviews.llvm.org/D22373 llvm-svn: 276875
*	AMDGPU: Use rcp for fdiv 1, x with fpmath metadata	Matt Arsenault	2016-07-26	1	-1/+1
\| \| \| \| \| \| \|	Using rcp should be OK for safe math usually, so this should not be replacing the original fdiv. llvm-svn: 276823
*	AMDGPU: Use implicit_def for selecting anyext	Matt Arsenault	2016-07-26	1	-4/+7
\| \| \| \|	llvm-svn: 276819
*	AMDGPU/R600: Remove dead custom inserters	Matt Arsenault	2016-07-26	1	-209/+1
\| \| \| \| \| \|	The intrinsics for these were removed, so this is dead. llvm-svn: 276805
*	AMDGPU: Minor AsmPrinter cleanups	Matt Arsenault	2016-07-26	1	-79/+84
\| \| \| \|	llvm-svn: 276804
*	AMDGPU: Make AMDGPUMachineFunction fields private	Matt Arsenault	2016-07-26	10	-56/+80
\| \| \| \| \| \| \| \| \|	ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766
*	AMDGPU: Add fp legacy instruction intrinsics	Matt Arsenault	2016-07-26	5	-2/+21
\| \| \| \| \| \| \|	This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764
*	AMDGPU: Remove read_workdim intrinsic	Jan Vesely	2016-07-25	3	-14/+0
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D22732 llvm-svn: 276682
*	AMDGPU: Make skip threshold an option	Matt Arsenault	2016-07-25	1	-3/+8
\| \| \| \|	llvm-svn: 276680
*	AMDGPU: Delete dead code	Matt Arsenault	2016-07-25	4	-41/+0
\| \| \| \|	llvm-svn: 276675
*	MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC	Joel Jones	2016-07-25	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some targets, notably AArch64 for ILP32, have different relocation encodings based upon the ABI. This is an enabling change, so a future patch can use the ABIName from MCTargetOptions to chose which relocations to use. Tested using check-llvm. The corresponding change to clang is in: http://reviews.llvm.org/D16538 Patch by: Joel Jones Differential Revision: https://reviews.llvm.org/D16213 llvm-svn: 276654
*	AMDGPU: Delete dead code	Matt Arsenault	2016-07-23	2	-97/+0
\| \| \| \| \| \|	This has been dead since r269479 llvm-svn: 276518
*	Revert "[AMDGPU] Emit read-only data to .rodata for hsa"	Tom Stellard	2016-07-22	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r276298. Data stored in .rodata can have a negative offset from .text, but we don't support negative values in relocations yet. This caused a regression in one of the amp conformance tests: 5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01 llvm-svn: 276498
*	GlobalISel: implement legalization pass, with just one transformation.	Tim Northover	2016-07-22	1	-0/+5
\| \| \| \| \| \| \| \| \|	This adds the actual MachineLegalizeHelper to do the work and a trivial pass wrapper that legalizes all instructions in a MachineFunction. Currently the only transformation supported is splitting up a vector G_ADD into one acting on smaller vectors. llvm-svn: 276461
*	AMDGPU: Fix groupstaticsize for large LDS	Matt Arsenault	2016-07-22	1	-3/+3
\| \| \| \| \| \| \| \| \|	The size can exceed s_movk_i32's limit, and we don't want to use it this early since it inhibits optimizations. This should probably be merged to the release branch. llvm-svn: 276438
*	AMDGPU: Add HSA dispatch id intrinsic	Matt Arsenault	2016-07-22	5	-8/+31
\| \| \| \|	llvm-svn: 276437
*	AMDGPU: Delete more dead code	Matt Arsenault	2016-07-22	10	-182/+15
\| \| \| \| \| \| \|	Remove dead code from r600 intrinsic removal. Remove unset members, rename StackSize to be less ambiguous. llvm-svn: 276436
*	AMDGPU: Fix i1 fp_to_int	Matt Arsenault	2016-07-22	4	-7/+34
\| \| \| \| \| \| \|	R600's i1 fp_to_uint selected but was incorrect according to what instcombine constant folds to. llvm-svn: 276435
*	AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIs	Matt Arsenault	2016-07-22	1	-26/+2
\| \| \| \|	llvm-svn: 276434
*	[AMDGPU] Emit read-only data to .rodata for hsa	Konstantin Zhuravlyov	2016-07-21	1	-1/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D22538 llvm-svn: 276298
*	AMDGPU/SI: Add support for R_AMDGPU_ABS32	Konstantin Zhuravlyov	2016-07-21	1	-0/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D21646 llvm-svn: 276294
*	[AMDGPU] Some code cleaning in SIRegisterInfo.td	Sam Kolton	2016-07-21	1	-33/+23
\| \| \| \| \| \| \| \| \| \|	Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl Differential Revision: https://reviews.llvm.org/D22620 llvm-svn: 276274
*	AMDGPU: Fix phis from blocks split due to register indexing	Matt Arsenault	2016-07-21	1	-15/+22
\| \| \| \|	llvm-svn: 276257
*	AMDGPU: Fix bug causing crash due to invalid opencl version metadata.	Yaxun Liu	2016-07-20	1	-9/+13
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D22526 llvm-svn: 276119
*	AMDGPU: Change fdiv lowering based on !fpmath metadata	Matt Arsenault	2016-07-19	8	-49/+227
\| \| \| \| \| \| \| \| \| \| \|	If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051
*	[AMDGPU] Remove spurious line (should've been removed in r276029).	Davide Italiano	2016-07-19	1	-3/+0
\| \| \| \|	llvm-svn: 276030
*	[AMDGPU] Remove dead code.	Davide Italiano	2016-07-19	1	-25/+0
\| \| \| \| \| \|	LGTM'd by Matt Arsenault. llvm-svn: 276029
*	AMDGPU: Only use legal inline immediates with kill pseudo	Matt Arsenault	2016-07-19	5	-3/+15
\| \| \| \| \| \| \| \| \| \| \|	Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988
*	AMDGPU/SI: Fix SI scheduler refcount issue	Matt Arsenault	2016-07-19	1	-0/+3
\| \| \| \| \| \| \| \| \|	Without this fix, releaseSuccessors when InOrOutBlock is false could release SUs outside the schedule BasicBlock. Patch by Axel Davy llvm-svn: 275935
*	AMDGPU: Expand register indexing pseudos in custom inserter	Matt Arsenault	2016-07-19	8	-300/+451
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
*	AMDGPU: Remove pointless dyn_cast_or_null	Matt Arsenault	2016-07-18	1	-4/+3
\| \| \| \| \| \|	This is already casted above so non-null llvm-svn: 275881
*	AMDGPU: Fix missing switch case warning	Matt Arsenault	2016-07-18	1	-0/+1
\| \| \| \|	llvm-svn: 275873
*	AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32	Matt Arsenault	2016-07-18	5	-1/+8
\| \| \| \|	llvm-svn: 275871