bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	LoadStoreVectorizer: Split even sized illegal chains properly	Matt Arsenault	2017-02-23	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement isLegalToVectorizeLoadChain for AMDGPU to avoid producing private address spaces accesses that will need to be split up later. This was doing the wrong thing in the case where the queried chain was an even number of elements. A possible <4 x i32> store was being split into store <2 x i32> store i32 store i32 rather than store <2 x i32> store <2 x i32> when legal. llvm-svn: 295933
*	AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsics	Matt Arsenault	2017-02-16	1	-20/+3
\| \| \| \| \| \|	Update test uses with expansion in terms of new intrinsics. llvm-svn: 295269
*	[AMDGPU] Bump -amdgpu-unroll-threshold-private to 2000	Stanislav Mekhanoshin	2017-02-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This has quite positive performance impact according to measurements. Before previous fixes to limit the optimization that was too high and blowed compile time and scratch usage, but now this is gone and we can bump the threshold. Differential Revision: https://reviews.llvm.org/D29505 llvm-svn: 294032
*	AMDGPU: Don't unroll for private with dynamic allocas	Matt Arsenault	2017-02-03	1	-1/+1
\| \| \| \| \| \| \|	This won't be elimnated, so this will just bloat code if/when these are ever used/supported. llvm-svn: 294030
*	[AMDGPU] Unroll preferences improvements	Stanislav Mekhanoshin	2017-02-03	1	-1/+28
\| \| \| \| \| \| \| \| \| \| \|	Exit loop analysis early if suitable private access found. Do not account for GEPs which are invariant to loop induction variable. Do not account for Allocas which are too big to fit into register file anyway. Add option for tuning: -amdgpu-unroll-threshold-private. Differential Revision: https://reviews.llvm.org/D29473 llvm-svn: 293991
*	AMDGPU: Fix atomic_inc/atomic_dec + ds_swizzle not being divergent	Matt Arsenault	2017-01-30	1	-0/+3
\| \| \| \|	llvm-svn: 293504
*	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch.	Mohammed Agabaria	2017-01-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
*	AMDGPU: llvm.amdgcn.interp.mov is a source of divergence	Nicolai Haehnle	2016-12-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: While the result is constant across a single primitive, each pixel shader wave can have pixels from multiple primitives. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27572 llvm-svn: 289447
*	Add new target hooks for LoadStoreVectorizer	Volkan Keles	2016-10-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D24727 llvm-svn: 283099
*	AMDGPU: Implement getLoadStoreVecRegBitWidth	Matt Arsenault	2016-07-01	1	-0/+22
\| \| \| \|	llvm-svn: 274312
*	AMDGPU: Remove llvm.SI.tid intrinsic	Matt Arsenault	2016-06-17	1	-1/+0
\| \| \| \| \| \|	Mesa doesn't emit this for llvm >= 3.8 anymore. llvm-svn: 273050
*	AMDGPU: llvm.SI.fs.constant is a source of divergence	Nicolai Haehnle	2016-05-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This intrinsic is used to get flat-shaded fragment shader inputs. Those are uniform across a primitive, but a fragment shader wave may process pixels from multiple primitives (as indicated by the prim_mask), and so that's where divergence can arise. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19747 llvm-svn: 268259
*	AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic	Nicolai Haehnle	2016-04-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102
*	AMDGPU: Add a shader calling convention	Nicolai Haehnle	2016-04-06	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589
*	AMDGPU: Cost model for basic integer operations	Matt Arsenault	2016-03-25	1	-0/+31
\| \| \| \| \| \| \|	This resolves bug 21148 by preventing promotion to i64 induction variables. llvm-svn: 264376
*	AMDGPU: Partially implement getArithmeticInstrCost for FP ops	Matt Arsenault	2016-03-25	1	-0/+64
\| \| \| \|	llvm-svn: 264374
*	AMDGPU: TTI: Make insertelement free.	Matt Arsenault	2016-03-25	1	-0/+5
\| \| \| \| \| \|	We don't want to have a cost to scalarizing operations. llvm-svn: 264364
*	AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics	Nicolai Haehnle	2016-03-18	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used by Mesa to implement atomics with buffer semantics. The intrinsic interface matches that of buffer.load.format and buffer.store.format, except that the GLC bit is not exposed (it is automatically deduced based on whether the return value is used). The change of hasSideEffects is required for TableGen to accept the pattern that matches the intrinsic. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, rivanvx, llvm-commits Differential Revision: http://reviews.llvm.org/D18151 llvm-svn: 263791
*	AMDGPU: mark atomic instructions as sources of divergence	Nicolai Haehnle	2016-03-17	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As explained by the comment, threads will typically see different values returned by atomic instructions even if the arguments are equal. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18156 llvm-svn: 263719
*	AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence	Nicolai Haehnle	2016-03-14	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When multiple threads perform an atomic op with the same arguments, they will usually see different return values. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18101 llvm-svn: 263440
*	AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions	Tom Stellard	2016-02-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
*	AMDGPU: Fix not handling new workitem intrinsics in DivergenceAnalysis	Matt Arsenault	2016-02-11	1	-0/+3
\| \| \| \|	llvm-svn: 260491
*	AMDGPU: Fix getRegisterBitWidth for vectors	Matt Arsenault	2015-12-24	1	-1/+3
\| \| \| \|	llvm-svn: 256362
*	AMDGPU/SI: Fix implemenation of isSourceOfDivergence() for graphics shaders	Tom Stellard	2015-12-19	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The analysis of shader inputs was completely wrong. We were passing the wrong index to AttributeSet::hasAttribute() and the logic for which inputs where in SGPRs was wrong too. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15608 llvm-svn: 256082
*	AMDGPU: Override getCFInstrCost	Matt Arsenault	2015-12-16	1	-0/+11
\| \| \| \| \| \|	The default cost was 0 with the assumption that it is predictable. llvm-svn: 255796
*	AMDGPU/SI: Implement AMDGPUTargetTransformInfo::isSourceOfDivergence()	Tom Stellard	2015-12-15	1	-0/+77
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15476 llvm-svn: 255661
*	AMDGPU: Report extractelement as free in cost model	Matt Arsenault	2015-12-01	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \|	The cost for scalarized operations is computed as N * (scalar operation cost + 1 extractelement + 1 insertelement). This partially fixes inflating the cost of scalarized operations since every operation is scalarized and free. I don't think we want any cost asociated with scalarization, but for now insertelement is still counted. I'm not sure if we should pretend that insertelement is also free, or add a way to compute a custom scalarization cost. llvm-svn: 254438
*	R600 -> AMDGPU rename	Tom Stellard	2015-06-13	1	-0/+82
	llvm-svn: 239657