bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland	Marek Olsak	2016-08-05	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is the setting of the Vulkan closed source driver. It decreases the max wave count from 10 to 8. 26010 shaders in 14650 tests Totals: VGPRS: 829593 -> 808440 (-2.55 %) Spilled SGPRs: 81878 -> 42226 (-48.43 %) Spilled VGPRs: 367 -> 358 (-2.45 %) Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread Code Size: 36677864 -> 35923932 (-2.06 %) bytes There is a massive decrease in SGPR spilling in general and -7.4% spilled VGPRs for DiRT Showdown (= SGPRs spilled to scratch?) Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23034 llvm-svn: 277867
*	[OpenCL] Add missing tests for getOCLTypeName	Yaxun Liu	2016-08-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown. Patch by Aaron En Ye Shi. Differential Revision: https://reviews.llvm.org/D22964 llvm-svn: 277759
*	LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack ↵	Alina Sbirlea	2016-08-04	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	adjustments. Summary: TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses. A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted. Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures: - x86 failing tests either did not have the right target, or the right alignment. - NVPTX failing tests did not have the right alignment. - AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks for a maximum size allowed and relies on the next condition checking for %4 for correctness. This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list). Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure), so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context. Reviewers: arsenm, jlebar, tstellarAMD Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D23068 llvm-svn: 277735
*	[X86] Heuristic to selectively build Newton-Raphson SQRT estimation	Nikolai Bozhenov	2016-08-04	2	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725
*	AMDGPU: fdiv -1, x -> rcp -x	Matt Arsenault	2016-08-02	1	-16/+25
\| \| \| \|	llvm-svn: 277535
*	AMDGPU: Stay in WQM for non-intrinsic stores	Nicolai Haehnle	2016-08-02	6	-10/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays. For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway. This is a candidate for the 3.9 release branch. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D22675 llvm-svn: 277504
*	AMDGPU: Track physical registers in SIWholeQuadMode	Nicolai Haehnle	2016-08-02	1	-26/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are cases where uniform branch conditions are computed in VGPRs, and we didn't correctly mark those as WQM. The stray change in basic-branch.ll is because invoking the LiveIntervals analysis leads to the detection of a dead register that would otherwise not be seen at -O0. This is a candidate for the 3.9 branch, as it fixes a possible hang. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22673 llvm-svn: 277500
*	[AMDGPU] refactor DS instruction definitions. NFC.	Valery Pykhtin	2016-08-01	7	-608/+896
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D22522 llvm-svn: 277344
*	[AMDGPU] Fix lifetime of SmallVector temporaries.	Benjamin Kramer	2016-07-30	1	-6/+4
\| \| \| \| \| \|	Found by asan -fsanitize-address-use-after-scope. llvm-svn: 277265
*	AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior	Matt Arsenault	2016-07-30	1	-2/+2
\| \| \| \| \| \| \|	This should really be true for any immediate, not just inline ones. llvm-svn: 277260
*	AMDGPU: Set s_setpc_b64 as a terminator	Matt Arsenault	2016-07-30	1	-0/+3
\| \| \| \|	llvm-svn: 277259
*	AMDGPU: Remove unused pattern	Matt Arsenault	2016-07-30	1	-8/+7
\| \| \| \|	llvm-svn: 277258
*	TargetInstrInfo: add virtual function getInstSizeInBytes	Sjoerd Meijer	2016-07-29	1	-1/+1
\| \| \| \| \| \| \| \| \|	This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of subclasses already implement. Differential Revision: https://reviews.llvm.org/D22885 llvm-svn: 277126
*	AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.	Changpeng Fang	2016-07-28	1	-0/+2
\| \| \| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D22021 Reviewed by: arsenm llvm-svn: 277073
*	MachineFunction: Return reference for getFrameInfo(); NFC	Matthias Braun	2016-07-28	9	-44/+44
\| \| \| \| \| \| \|	getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
*	AMDGPU : Add intrinsics for compare with the full wavefront result	Wei Ding	2016-07-28	5	-0/+103
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998
*	AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling	Tom Stellard	2016-07-28	4	-6/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980
*	AMDGPU: add execfix flag to SI_ELSE	Nicolai Haehnle	2016-07-28	3	-10/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 llvm-svn: 276972
*	AMDGPU: Turn dead checks into asserts	Matt Arsenault	2016-07-28	1	-9/+5
\| \| \| \|	llvm-svn: 276946
*	AMDGPU: Remove analyzeImmediate	Matt Arsenault	2016-07-28	3	-34/+12
\| \| \| \| \| \| \|	This no longer uses the more complicated classification of constants. llvm-svn: 276945
*	Remove MCAsmInfo.h include from TargetOptions.h	Reid Kleckner	2016-07-27	1	-0/+1
\| \| \| \| \| \| \| \| \|	TargetOptions wants the ExceptionHandling enum. Move that to MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in clang. Now you can add a DWARF tag without a full rebuild of clang semantic analysis. llvm-svn: 276883
*	[GlobalISel] Introduce an instruction selector.	Ahmed Bougacha	2016-07-27	1	-0/+5
\| \| \| \| \| \| \| \|	And implement it for AArch64, supporting x/w ADD/OR. Differential Revision: https://reviews.llvm.org/D22373 llvm-svn: 276875
*	AMDGPU: Use rcp for fdiv 1, x with fpmath metadata	Matt Arsenault	2016-07-26	1	-1/+1
\| \| \| \| \| \| \|	Using rcp should be OK for safe math usually, so this should not be replacing the original fdiv. llvm-svn: 276823
*	AMDGPU: Use implicit_def for selecting anyext	Matt Arsenault	2016-07-26	1	-4/+7
\| \| \| \|	llvm-svn: 276819
*	AMDGPU/R600: Remove dead custom inserters	Matt Arsenault	2016-07-26	1	-209/+1
\| \| \| \| \| \|	The intrinsics for these were removed, so this is dead. llvm-svn: 276805
*	AMDGPU: Minor AsmPrinter cleanups	Matt Arsenault	2016-07-26	1	-79/+84
\| \| \| \|	llvm-svn: 276804
*	AMDGPU: Make AMDGPUMachineFunction fields private	Matt Arsenault	2016-07-26	10	-56/+80
\| \| \| \| \| \| \| \| \|	ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766
*	AMDGPU: Add fp legacy instruction intrinsics	Matt Arsenault	2016-07-26	5	-2/+21
\| \| \| \| \| \| \|	This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764
*	AMDGPU: Remove read_workdim intrinsic	Jan Vesely	2016-07-25	3	-14/+0
\| \| \| \| \| \|	Differential revision: https://reviews.llvm.org/D22732 llvm-svn: 276682
*	AMDGPU: Make skip threshold an option	Matt Arsenault	2016-07-25	1	-3/+8
\| \| \| \|	llvm-svn: 276680
*	AMDGPU: Delete dead code	Matt Arsenault	2016-07-25	4	-41/+0
\| \| \| \|	llvm-svn: 276675
*	MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC	Joel Jones	2016-07-25	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some targets, notably AArch64 for ILP32, have different relocation encodings based upon the ABI. This is an enabling change, so a future patch can use the ABIName from MCTargetOptions to chose which relocations to use. Tested using check-llvm. The corresponding change to clang is in: http://reviews.llvm.org/D16538 Patch by: Joel Jones Differential Revision: https://reviews.llvm.org/D16213 llvm-svn: 276654
*	AMDGPU: Delete dead code	Matt Arsenault	2016-07-23	2	-97/+0
\| \| \| \| \| \|	This has been dead since r269479 llvm-svn: 276518
*	Revert "[AMDGPU] Emit read-only data to .rodata for hsa"	Tom Stellard	2016-07-22	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r276298. Data stored in .rodata can have a negative offset from .text, but we don't support negative values in relocations yet. This caused a regression in one of the amp conformance tests: 5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01 llvm-svn: 276498
*	GlobalISel: implement legalization pass, with just one transformation.	Tim Northover	2016-07-22	1	-0/+5
\| \| \| \| \| \| \| \| \|	This adds the actual MachineLegalizeHelper to do the work and a trivial pass wrapper that legalizes all instructions in a MachineFunction. Currently the only transformation supported is splitting up a vector G_ADD into one acting on smaller vectors. llvm-svn: 276461
*	AMDGPU: Fix groupstaticsize for large LDS	Matt Arsenault	2016-07-22	1	-3/+3
\| \| \| \| \| \| \| \| \|	The size can exceed s_movk_i32's limit, and we don't want to use it this early since it inhibits optimizations. This should probably be merged to the release branch. llvm-svn: 276438
*	AMDGPU: Add HSA dispatch id intrinsic	Matt Arsenault	2016-07-22	5	-8/+31
\| \| \| \|	llvm-svn: 276437
*	AMDGPU: Delete more dead code	Matt Arsenault	2016-07-22	10	-182/+15
\| \| \| \| \| \| \|	Remove dead code from r600 intrinsic removal. Remove unset members, rename StackSize to be less ambiguous. llvm-svn: 276436
*	AMDGPU: Fix i1 fp_to_int	Matt Arsenault	2016-07-22	4	-7/+34
\| \| \| \| \| \| \|	R600's i1 fp_to_uint selected but was incorrect according to what instcombine constant folds to. llvm-svn: 276435
*	AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIs	Matt Arsenault	2016-07-22	1	-26/+2
\| \| \| \|	llvm-svn: 276434
*	[AMDGPU] Emit read-only data to .rodata for hsa	Konstantin Zhuravlyov	2016-07-21	1	-1/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D22538 llvm-svn: 276298
*	AMDGPU/SI: Add support for R_AMDGPU_ABS32	Konstantin Zhuravlyov	2016-07-21	1	-0/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D21646 llvm-svn: 276294
*	[AMDGPU] Some code cleaning in SIRegisterInfo.td	Sam Kolton	2016-07-21	1	-33/+23
\| \| \| \| \| \| \| \| \| \|	Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl Differential Revision: https://reviews.llvm.org/D22620 llvm-svn: 276274
*	AMDGPU: Fix phis from blocks split due to register indexing	Matt Arsenault	2016-07-21	1	-15/+22
\| \| \| \|	llvm-svn: 276257
*	AMDGPU: Fix bug causing crash due to invalid opencl version metadata.	Yaxun Liu	2016-07-20	1	-9/+13
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D22526 llvm-svn: 276119
*	AMDGPU: Change fdiv lowering based on !fpmath metadata	Matt Arsenault	2016-07-19	8	-49/+227
\| \| \| \| \| \| \| \| \| \| \|	If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051
*	[AMDGPU] Remove spurious line (should've been removed in r276029).	Davide Italiano	2016-07-19	1	-3/+0
\| \| \| \|	llvm-svn: 276030
*	[AMDGPU] Remove dead code.	Davide Italiano	2016-07-19	1	-25/+0
\| \| \| \| \| \|	LGTM'd by Matt Arsenault. llvm-svn: 276029
*	AMDGPU: Only use legal inline immediates with kill pseudo	Matt Arsenault	2016-07-19	5	-3/+15
\| \| \| \| \| \| \| \| \| \| \|	Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988
*	AMDGPU/SI: Fix SI scheduler refcount issue	Matt Arsenault	2016-07-19	1	-0/+3
\| \| \| \| \| \| \| \| \|	Without this fix, releaseSuccessors when InOrOutBlock is false could release SUs outside the schedule BasicBlock. Patch by Axel Davy llvm-svn: 275935