summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Custom lower v2i32 loads and storesMatt Arsenault2016-05-021-1/+98
| | | | | | | This will allow us to split up 64-bit private accesses when necessary. llvm-svn: 268296
* AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratchTom Stellard2016-05-021-1/+1
| | | | | | | | | We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295
* AMDGPU: Make i64 loads/stores promote to v2i32Matt Arsenault2016-05-024-18/+14
| | | | | | | | | | | | Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293
* AMDGPU/SI: Use the hazard recognizer to break SMEM soft clausesTom Stellard2016-05-023-39/+37
| | | | | | | | | | | | | | | Summary: Add support for detecting hazards in SMEM soft clauses, so that we only break the clauses when necessary, either by adding s_nop or re-ordering other alu instructions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18870 llvm-svn: 268260
* AMDGPU/SI: Use hazard recognizer to detect DPP hazardsTom Stellard2016-05-021-2/+6
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18603 llvm-svn: 268247
* AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaitsTom Stellard2016-04-302-2/+4
| | | | | | This was supposed to be part of r268143. llvm-svn: 268154
* AMDGPU/SI: Enable the post-ra schedulerTom Stellard2016-04-3026-99/+102
| | | | | | | | | | | | | | Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
* AMDGPU: Fix crash with unreachable terminators.Matt Arsenault2016-04-291-0/+56
| | | | | | | | | | If a block has no successors because it ends in unreachable, this was accessing an invalid iterator. Also stop counting instructions that don't emit any real instructions. llvm-svn: 268119
* AMDGPU: Add kernarg.segment.ptr intrinsicMatt Arsenault2016-04-291-0/+21
| | | | llvm-svn: 268105
* DAGCombiner: Reduce truncated shl widthMatt Arsenault2016-04-291-0/+123
| | | | llvm-svn: 268094
* AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructionsTom Stellard2016-04-292-1/+21
| | | | | | | | | | | | | | Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043
* AMDGPU/SI: Assembler: Unify parsing/printing of operands.Nikolay Haustov2016-04-2927-210/+210
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015
* RegisterPressure: Fix default lanemask for missing regunit intervalsMatthias Braun2016-04-292-4/+4
| | | | | | | | | | | | | | In case of missing live intervals for a physical registers getLanesWithProperty() would report 0 which was not a safe default in all situations. Add a parameter to pass in a safe default. No testcase because in-tree targets do not skip computing register unit live intervals. Also cleanup the getXXX() functions to not perform the RequireLiveIntervals checks anymore so we do not even need to return safe defaults. llvm-svn: 267977
* AMDGPU: Emit error if too much LDS is usedMatt Arsenault2016-04-282-3/+17
| | | | llvm-svn: 267922
* AMDGPU: Fix mishandling array allocations when promoting allocaMatt Arsenault2016-04-283-16/+66
| | | | | | | | The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916
* CodeGen: Add DetectDeadLanes pass.Matthias Braun2016-04-281-0/+408
| | | | | | | | | | | | | | | | | | | | The DetectDeadLanes pass performs a dataflow analysis of used/defined subregister lanes across COPY instructions and instructions that will get lowered to copies. It detects dead definitions and uses reading undefined values which are obscured by COPY and subregister usage. These dead definitions cause trouble in the register coalescer which cannot deal with definitions suddenly becoming dead after coalescing COPY instructions. For now the pass only adds dead and undef flags to machine operands. It should be possible to extend it in the future to remove the dead instructions and redo the analysis for the affected virtual registers. Differential Revision: http://reviews.llvm.org/D18427 llvm-svn: 267851
* AMDGPU: Account for globals in AMDGPUPromoteAlloca passMatt Arsenault2016-04-271-0/+32
| | | | | | Patch by Bas Nieuwenhuizen llvm-svn: 267791
* AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsicNicolai Haehnle2016-04-271-0/+38
| | | | | | | | | | | | | | | | | Summary: So it appears that to guarantee some of the ordering requirements of a GLSL memoryBarrier() executed in the shader, we need to emit an s_waitcnt. (We can't use an s_barrier, because memoryBarrier() may appear anywhere in the shader, in particular it may appear in non-uniform control flow.) Reviewers: arsenm, mareko, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19203 llvm-svn: 267729
* [AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware ↵Artem Tamazov2016-04-271-1/+1
| | | | | | | | | | | | registers. Possibility to specify code of hardware register kept. Disassemble to symbolic name, if name is known. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19335 llvm-svn: 267724
* [AMDGPU] Reserve VGPRs for trap handler usage if instructedKonstantin Zhuravlyov2016-04-261-0/+37
| | | | | | Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563
* AMDGPU: Implement addrspacecastMatt Arsenault2016-04-253-22/+274
| | | | llvm-svn: 267452
* AMDGPU: Add queue ptr intrinsicMatt Arsenault2016-04-252-0/+30
| | | | llvm-svn: 267451
* [AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax.Artem Tamazov2016-04-251-1/+1
| | | | | | | | | | | | | Added hwreg(reg[,offset,width]) syntax. Default offset = 0, default width = 32. Possibility to specify 16-bit immediate kept. Added out-of-range checks. Disassembling is always to hwreg(...) format. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19329 llvm-svn: 267410
* AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.SizeMatt Arsenault2016-04-221-21/+123
| | | | llvm-svn: 267244
* AMDGPU: Re-visit nodes in performAndCombineMatt Arsenault2016-04-222-9/+12
| | | | | | This fixes test regressions when i64 loads/stores are made promote. llvm-svn: 267240
* DAGCombiner: Relax alignment restriction when changing store typeMatt Arsenault2016-04-221-0/+53
| | | | | | If the target allows the alignment, this should be OK. llvm-svn: 267217
* DAGCombiner: Relax alignment restriction when changing load typeMatt Arsenault2016-04-221-0/+38
| | | | | | If the target allows the alignment, this should still be OK. llvm-svn: 267209
* [AMDGPU] Insert nop pass: take care of outstanding feedbackKonstantin Zhuravlyov2016-04-221-4/+8
| | | | | | | | | | | - Switch few loops to range-based for loops - Fix nop insertion at the end of BB - Fix formatting - Check for endpgm Differential Revision: http://reviews.llvm.org/D19380 llvm-svn: 267167
* AMDGPU/SI: add llvm.amdgcn.ps.live intrinsicNicolai Haehnle2016-04-221-0/+59
| | | | | | | | | | | | | | | | | | | | | | | Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102
* DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit componentMatt Arsenault2016-04-213-4/+512
| | | | | | | If the extracted bits are restricted to the upper half or lower half, this can be truncated. llvm-svn: 267024
* [LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.Mandeep Singh Grang2016-04-192-3/+3
| | | | | | | | | | | | Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests. Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier Subscribers: mcrosier, dsanders Differential Revision: http://reviews.llvm.org/D19279 llvm-svn: 266834
* Add IntrWrite[Arg]Mem intrinsic propertyNicolai Haehnle2016-04-192-8/+8
| | | | | | | | | | | | | | | | | | | | | | Summary: This property is used to mark an intrinsic that only writes to memory, but neither reads from memory nor has other side effects. An example where this is useful is the llvm.amdgcn.buffer.store.format.* intrinsic, which corresponds to a store instruction that goes through a special buffer descriptor rather than through a plain pointer. With this property, the intrinsic should still be handled as having side effects at the LLVM IR level, but machine scheduling can make smarter decisions. Reviewers: tstellarAMD, arsenm, joker.eph, reames Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18291 llvm-svn: 266826
* AMDGPU: Guard VOPC instructions against incorrect commuteNicolai Haehnle2016-04-191-0/+49
| | | | | | | | | | | | | | Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825
* [AMDGPU] Add insert nops pass based on subtarget features instead of cl::optKonstantin Zhuravlyov2016-04-181-0/+75
| | | | | | | | | | | Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626
* AMDGPU: Enable LocalStackSlotAllocation passMatt Arsenault2016-04-162-20/+72
| | | | | | | | | | | This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508
* AMDGPU: Use s_addk_i32 / s_mulk_i32Matt Arsenault2016-04-165-6/+140
| | | | llvm-svn: 266506
* [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.Adrian Prantl2016-04-151-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446
* AMDGPU/SI: Fix regression with no-return atomicsNicolai Haehnle2016-04-151-0/+9
| | | | | | | | | | | | | | | Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433
* AMDGPU: Include LDS size in printed commentMatt Arsenault2016-04-141-4/+10
| | | | llvm-svn: 266382
* AMDGPU: Run SIFoldOperands after PeepholeOptimizerMatt Arsenault2016-04-1415-45/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378
* AMDGPU: Fold bitcasts of scalar constants to vectorsMatt Arsenault2016-04-144-50/+49
| | | | | | | This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376
* AMDGPU: Add skeleton GlobalIsel implementationTom Stellard2016-04-141-0/+12
| | | | | | | | | | | | | | | Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356
* [DivergenceAnalysis] Treat PHI with incoming undef as constantNicolai Haehnle2016-04-141-0/+41
| | | | | | | | | | | | | | | | | | | | | | | Summary: If a PHI has an incoming undef, we can pretend that it is equal to one non-undef, non-self incoming value. This is particularly relevant in combination with the StructurizeCFG pass, which introduces PHI nodes with undefs. Previously, this lead to branch conditions that were uniform before StructurizeCFG to become non-uniform afterwards, which confused the SIAnnotateControlFlow pass. This fixes a crash when Mesa radeonsi compiles a shader from dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex Reviewers: arsenm, tstellarAMD, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19013 llvm-svn: 266347
* AMDGPU: Remove SIFixSGPRLiveRanges passNicolai Haehnle2016-04-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345
* AMDGPU: allow specifying a workgroup size that needs to fit in a compute unitTom Stellard2016-04-142-0/+113
| | | | | | | | | | | | | | | | | | | Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337
* AMDGPU/SI: Use the correct scratch wave offset register for shaders.Tom Stellard2016-04-142-4/+5
| | | | | | | | | | | | | | | | | | | | | | Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336
* AMDGPU: Implement canonicalizeMatt Arsenault2016-04-141-0/+320
| | | | | | Also add generic DAG node for it. llvm-svn: 266272
* [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and ↵Artem Tamazov2016-04-131-16/+8
| | | | | | | | | | | | | | | TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205
* AMDGPU: Add test for m0 initialization in basic loopMatt Arsenault2016-04-131-0/+48
| | | | | | | | | | | | Initialization of m0 is emitted for each LDS operation, so every block with LDS usage ends up with one. MachineLICM used to fail to hoist this out of the loop, so every loop iteration with LDS usage in it would re-initialize it. This seems to be fixed now, so add a test to make sure that it stays this way. llvm-svn: 266156
* AMDGPU: add llvm.amdgcn.buffer.load/store intrinsicsNicolai Haehnle2016-04-128-22/+242
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and atomic counters at least when robust buffer access behavior is desired. (These instructions perform no format conversion and do buffer range checking per component.) As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format, it has become trivial to add support for the f32 and v2f32 variants of that intrinsic, so the patch does so. Also DAG-ify (and fix) some tests that I noticed intermittent failures in while developing this patch. Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects changes to the BUFFER_STORE_DWORD* instructions. See also http://reviews.llvm.org/D18291. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18292 llvm-svn: 266126
OpenPOWER on IntegriCloud