bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AMDGPU: Custom lower v2i32 loads and stores	Matt Arsenault	2016-05-02	1	-1/+98
\| \| \| \| \| \| \|	This will allow us to split up 64-bit private accesses when necessary. llvm-svn: 268296
*	AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratch	Tom Stellard	2016-05-02	1	-1/+1
\| \| \| \| \| \| \| \| \|	We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295
*	AMDGPU: Make i64 loads/stores promote to v2i32	Matt Arsenault	2016-05-02	4	-18/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293
*	AMDGPU/SI: Use the hazard recognizer to break SMEM soft clauses	Tom Stellard	2016-05-02	3	-39/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add support for detecting hazards in SMEM soft clauses, so that we only break the clauses when necessary, either by adding s_nop or re-ordering other alu instructions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18870 llvm-svn: 268260
*	AMDGPU/SI: Use hazard recognizer to detect DPP hazards	Tom Stellard	2016-05-02	1	-2/+6
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18603 llvm-svn: 268247
*	AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaits	Tom Stellard	2016-04-30	2	-2/+4
\| \| \| \| \| \|	This was supposed to be part of r268143. llvm-svn: 268154
*	AMDGPU/SI: Enable the post-ra scheduler	Tom Stellard	2016-04-30	26	-99/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
*	AMDGPU: Fix crash with unreachable terminators.	Matt Arsenault	2016-04-29	1	-0/+56
\| \| \| \| \| \| \| \| \| \|	If a block has no successors because it ends in unreachable, this was accessing an invalid iterator. Also stop counting instructions that don't emit any real instructions. llvm-svn: 268119
*	AMDGPU: Add kernarg.segment.ptr intrinsic	Matt Arsenault	2016-04-29	1	-0/+21
\| \| \| \|	llvm-svn: 268105
*	DAGCombiner: Reduce truncated shl width	Matt Arsenault	2016-04-29	1	-0/+123
\| \| \| \|	llvm-svn: 268094
*	AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions	Tom Stellard	2016-04-29	2	-1/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043
*	AMDGPU/SI: Assembler: Unify parsing/printing of operands.	Nikolay Haustov	2016-04-29	27	-210/+210
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015
*	RegisterPressure: Fix default lanemask for missing regunit intervals	Matthias Braun	2016-04-29	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of missing live intervals for a physical registers getLanesWithProperty() would report 0 which was not a safe default in all situations. Add a parameter to pass in a safe default. No testcase because in-tree targets do not skip computing register unit live intervals. Also cleanup the getXXX() functions to not perform the RequireLiveIntervals checks anymore so we do not even need to return safe defaults. llvm-svn: 267977
*	AMDGPU: Emit error if too much LDS is used	Matt Arsenault	2016-04-28	2	-3/+17
\| \| \| \|	llvm-svn: 267922
*	AMDGPU: Fix mishandling array allocations when promoting alloca	Matt Arsenault	2016-04-28	3	-16/+66
\| \| \| \| \| \| \| \|	The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916
*	CodeGen: Add DetectDeadLanes pass.	Matthias Braun	2016-04-28	1	-0/+408
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DetectDeadLanes pass performs a dataflow analysis of used/defined subregister lanes across COPY instructions and instructions that will get lowered to copies. It detects dead definitions and uses reading undefined values which are obscured by COPY and subregister usage. These dead definitions cause trouble in the register coalescer which cannot deal with definitions suddenly becoming dead after coalescing COPY instructions. For now the pass only adds dead and undef flags to machine operands. It should be possible to extend it in the future to remove the dead instructions and redo the analysis for the affected virtual registers. Differential Revision: http://reviews.llvm.org/D18427 llvm-svn: 267851
*	AMDGPU: Account for globals in AMDGPUPromoteAlloca pass	Matt Arsenault	2016-04-27	1	-0/+32
\| \| \| \| \| \|	Patch by Bas Nieuwenhuizen llvm-svn: 267791
*	AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic	Nicolai Haehnle	2016-04-27	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: So it appears that to guarantee some of the ordering requirements of a GLSL memoryBarrier() executed in the shader, we need to emit an s_waitcnt. (We can't use an s_barrier, because memoryBarrier() may appear anywhere in the shader, in particular it may appear in non-uniform control flow.) Reviewers: arsenm, mareko, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19203 llvm-svn: 267729
*	[AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware ↵	Artem Tamazov	2016-04-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	registers. Possibility to specify code of hardware register kept. Disassemble to symbolic name, if name is known. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19335 llvm-svn: 267724
*	[AMDGPU] Reserve VGPRs for trap handler usage if instructed	Konstantin Zhuravlyov	2016-04-26	1	-0/+37
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563
*	AMDGPU: Implement addrspacecast	Matt Arsenault	2016-04-25	3	-22/+274
\| \| \| \|	llvm-svn: 267452
*	AMDGPU: Add queue ptr intrinsic	Matt Arsenault	2016-04-25	2	-0/+30
\| \| \| \|	llvm-svn: 267451
*	[AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax.	Artem Tamazov	2016-04-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Added hwreg(reg[,offset,width]) syntax. Default offset = 0, default width = 32. Possibility to specify 16-bit immediate kept. Added out-of-range checks. Disassembling is always to hwreg(...) format. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19329 llvm-svn: 267410
*	AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.Size	Matt Arsenault	2016-04-22	1	-21/+123
\| \| \| \|	llvm-svn: 267244
*	AMDGPU: Re-visit nodes in performAndCombine	Matt Arsenault	2016-04-22	2	-9/+12
\| \| \| \| \| \|	This fixes test regressions when i64 loads/stores are made promote. llvm-svn: 267240
*	DAGCombiner: Relax alignment restriction when changing store type	Matt Arsenault	2016-04-22	1	-0/+53
\| \| \| \| \| \|	If the target allows the alignment, this should be OK. llvm-svn: 267217
*	DAGCombiner: Relax alignment restriction when changing load type	Matt Arsenault	2016-04-22	1	-0/+38
\| \| \| \| \| \|	If the target allows the alignment, this should still be OK. llvm-svn: 267209
*	[AMDGPU] Insert nop pass: take care of outstanding feedback	Konstantin Zhuravlyov	2016-04-22	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \|	- Switch few loops to range-based for loops - Fix nop insertion at the end of BB - Fix formatting - Check for endpgm Differential Revision: http://reviews.llvm.org/D19380 llvm-svn: 267167
*	AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic	Nicolai Haehnle	2016-04-22	1	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102
*	DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit component	Matt Arsenault	2016-04-21	3	-4/+512
\| \| \| \| \| \| \|	If the extracted bits are restricted to the upper half or lower half, this can be truncated. llvm-svn: 267024
*	[LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.	Mandeep Singh Grang	2016-04-19	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests. Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier Subscribers: mcrosier, dsanders Differential Revision: http://reviews.llvm.org/D19279 llvm-svn: 266834
*	Add IntrWrite[Arg]Mem intrinsic property	Nicolai Haehnle	2016-04-19	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This property is used to mark an intrinsic that only writes to memory, but neither reads from memory nor has other side effects. An example where this is useful is the llvm.amdgcn.buffer.store.format.* intrinsic, which corresponds to a store instruction that goes through a special buffer descriptor rather than through a plain pointer. With this property, the intrinsic should still be handled as having side effects at the LLVM IR level, but machine scheduling can make smarter decisions. Reviewers: tstellarAMD, arsenm, joker.eph, reames Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18291 llvm-svn: 266826
*	AMDGPU: Guard VOPC instructions against incorrect commute	Nicolai Haehnle	2016-04-19	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825
*	[AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt	Konstantin Zhuravlyov	2016-04-18	1	-0/+75
\| \| \| \| \| \| \| \| \| \| \|	Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626
*	AMDGPU: Enable LocalStackSlotAllocation pass	Matt Arsenault	2016-04-16	2	-20/+72
\| \| \| \| \| \| \| \| \| \| \|	This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508
*	AMDGPU: Use s_addk_i32 / s_mulk_i32	Matt Arsenault	2016-04-16	5	-6/+140
\| \| \| \|	llvm-svn: 266506
*	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram.	Adrian Prantl	2016-04-15	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
*	AMDGPU/SI: Fix regression with no-return atomics	Nicolai Haehnle	2016-04-15	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433
*	AMDGPU: Include LDS size in printed comment	Matt Arsenault	2016-04-14	1	-4/+10
\| \| \| \|	llvm-svn: 266382
*	AMDGPU: Run SIFoldOperands after PeepholeOptimizer	Matt Arsenault	2016-04-14	15	-45/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378
*	AMDGPU: Fold bitcasts of scalar constants to vectors	Matt Arsenault	2016-04-14	4	-50/+49
\| \| \| \| \| \| \|	This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376
*	AMDGPU: Add skeleton GlobalIsel implementation	Tom Stellard	2016-04-14	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356
*	[DivergenceAnalysis] Treat PHI with incoming undef as constant	Nicolai Haehnle	2016-04-14	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If a PHI has an incoming undef, we can pretend that it is equal to one non-undef, non-self incoming value. This is particularly relevant in combination with the StructurizeCFG pass, which introduces PHI nodes with undefs. Previously, this lead to branch conditions that were uniform before StructurizeCFG to become non-uniform afterwards, which confused the SIAnnotateControlFlow pass. This fixes a crash when Mesa radeonsi compiles a shader from dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex Reviewers: arsenm, tstellarAMD, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19013 llvm-svn: 266347
*	AMDGPU: Remove SIFixSGPRLiveRanges pass	Nicolai Haehnle	2016-04-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345
*	AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit	Tom Stellard	2016-04-14	2	-0/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337
*	AMDGPU/SI: Use the correct scratch wave offset register for shaders.	Tom Stellard	2016-04-14	2	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336
*	AMDGPU: Implement canonicalize	Matt Arsenault	2016-04-14	1	-0/+320
\| \| \| \| \| \|	Also add generic DAG node for it. llvm-svn: 266272
*	[AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and ↵	Artem Tamazov	2016-04-13	1	-16/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205
*	AMDGPU: Add test for m0 initialization in basic loop	Matt Arsenault	2016-04-13	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \|	Initialization of m0 is emitted for each LDS operation, so every block with LDS usage ends up with one. MachineLICM used to fail to hoist this out of the loop, so every loop iteration with LDS usage in it would re-initialize it. This seems to be fixed now, so add a test to make sure that it stays this way. llvm-svn: 266156
*	AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics	Nicolai Haehnle	2016-04-12	8	-22/+242
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and atomic counters at least when robust buffer access behavior is desired. (These instructions perform no format conversion and do buffer range checking per component.) As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format, it has become trivial to add support for the f32 and v2f32 variants of that intrinsic, so the patch does so. Also DAG-ify (and fix) some tests that I noticed intermittent failures in while developing this patch. Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects changes to the BUFFER_STORE_DWORD* instructions. See also http://reviews.llvm.org/D18291. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18292 llvm-svn: 266126