summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Custom lower v2i32 loads and storesMatt Arsenault2016-05-021-7/+39
| | | | | | | This will allow us to split up 64-bit private accesses when necessary. llvm-svn: 268296
* AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratchTom Stellard2016-05-021-2/+1
| | | | | | | | | We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295
* AMDGPU: Make i64 loads/stores promote to v2i32Matt Arsenault2016-05-022-55/+12
| | | | | | | | | | | | Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293
* Fix instance of -Winconsistent-missing-override in AMDGPU codeReid Kleckner2016-05-021-1/+1
| | | | llvm-svn: 268289
* AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratchTom Stellard2016-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | Summary: When we restore an SGPR value from scratch, we first load it into a temporary VGPR and then use v_readlane_b32 to copy the value from the VGPR back into an SGPR. We weren't setting the kill flag on the VGPR in the v_readlane_b32 instruction, so the register scavenger wasn't able to re-use this temp value later. I wasn't able to create a lit test for this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19744 llvm-svn: 268287
* AMDGPU: Move R600 specific code out of AMDGPUISelLowering.cppTom Stellard2016-05-023-39/+51
| | | | | | | | | | Reviewers: arsenm Subscribers: jvesely, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19736 llvm-svn: 268267
* AMDGPU/SI: Fix bug in SIInstrInfo::insertWaitStates() uncovered by r268260Tom Stellard2016-05-021-1/+2
| | | | | | | We can't use MI->getDebugLoc() when MI is an iterator that could be MBB.end(). llvm-svn: 268265
* AMDGPU/SI: Use the hazard recognizer to break SMEM soft clausesTom Stellard2016-05-023-4/+72
| | | | | | | | | | | | | | | Summary: Add support for detecting hazards in SMEM soft clauses, so that we only break the clauses when necessary, either by adding s_nop or re-ordering other alu instructions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18870 llvm-svn: 268260
* AMDGPU: llvm.SI.fs.constant is a source of divergenceNicolai Haehnle2016-05-021-0/+1
| | | | | | | | | | | | | | | | Summary: This intrinsic is used to get flat-shaded fragment shader inputs. Those are uniform across a primitive, but a fragment shader wave may process pixels from multiple primitives (as indicated by the prim_mask), and so that's where divergence can arise. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19747 llvm-svn: 268259
* AMDGPU/SI: Use hazard recognizer to detect DPP hazardsTom Stellard2016-05-023-55/+27
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18603 llvm-svn: 268247
* Silence unused variable warnings; NFC.Aaron Ballman2016-05-021-9/+4
| | | | llvm-svn: 268234
* Add missing override.Rafael Espindola2016-04-301-1/+2
| | | | llvm-svn: 268163
* AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaitsTom Stellard2016-04-301-6/+0
| | | | | | This was supposed to be part of r268143. llvm-svn: 268154
* AMDGPU/SI: Enable the post-ra schedulerTom Stellard2016-04-309-18/+324
| | | | | | | | | | | | | | Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143
* AMDGPU: Fix crash with unreachable terminators.Matt Arsenault2016-04-291-12/+27
| | | | | | | | | | If a block has no successors because it ends in unreachable, this was accessing an invalid iterator. Also stop counting instructions that don't emit any real instructions. llvm-svn: 268119
* AMDGPU: Add kernarg.segment.ptr intrinsicMatt Arsenault2016-04-291-0/+5
| | | | llvm-svn: 268105
* AMDGPU/SI: Move post regalloc run of SIShrinkInstructionsMatt Arsenault2016-04-291-5/+1
| | | | | | | | Move to addPreEmitPass. This is so it runs after post-RA scheduling so we can merge s_nops emitted by the scheduler and hazard recognizer. llvm-svn: 268095
* Fixed/Recommitted r267733 "[AMDGPU][llvm-mc] Add support of TTMP quads. ↵Artem Tamazov2016-04-296-19/+39
| | | | | | | | | | | Rework M0 exclusion for SMRD." Previously reverted by r267752. r267733 review: Differential Revision: http://reviews.llvm.org/D19342 llvm-svn: 268066
* AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructionsTom Stellard2016-04-293-12/+8
| | | | | | | | | | | | | | Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043
* AMDGPU/SI: Assembler: Unify parsing/printing of operands.Nikolay Haustov2016-04-294-622/+365
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015
* AMDGPU: Stop reporting an addressing mode for unknown addrspaceMatt Arsenault2016-04-291-1/+6
| | | | | | | | | This was being treated the same as private, which has an immediate offset. For unknown, it probably means it's for a computation not actually being used for accessing memory, so it should not have a nontrivial addressing mode. llvm-svn: 268002
* AMDGPU: Emit error if too much LDS is usedMatt Arsenault2016-04-281-0/+5
| | | | llvm-svn: 267922
* AMDGPU: Fix mishandling array allocations when promoting allocaMatt Arsenault2016-04-281-1/+3
| | | | | | | | The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916
* [CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in ↵Craig Topper2016-04-281-8/+2
| | | | | | TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior. llvm-svn: 267853
* AMDGPU: Account for globals in AMDGPUPromoteAlloca passMatt Arsenault2016-04-271-2/+4
| | | | | | Patch by Bas Nieuwenhuizen llvm-svn: 267791
* Revert "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for ↵Chad Rosier2016-04-276-39/+13
| | | | | | | | SMRD." This reverts commit r267733 due to a -Werror,-Wunused-function error. llvm-svn: 267752
* Silence a -Wdangling-elseReid Kleckner2016-04-271-1/+2
| | | | llvm-svn: 267737
* [AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD.Artem Tamazov2016-04-276-13/+39
| | | | | | | | | | | Added support of TTMP quads. Reworked M0 exclusion machinery for SMRD and similar instructions to enable usage of TTMP registers in those instructions as destinations. Tests added. Differential Revision: http://reviews.llvm.org/D19342 llvm-svn: 267733
* AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsicNicolai Haehnle2016-04-272-14/+78
| | | | | | | | | | | | | | | | | Summary: So it appears that to guarantee some of the ordering requirements of a GLSL memoryBarrier() executed in the shader, we need to emit an s_waitcnt. (We can't use an s_barrier, because memoryBarrier() may appear anywhere in the shader, in particular it may appear in non-uniform control flow.) Reviewers: arsenm, mareko, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19203 llvm-svn: 267729
* [AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware ↵Artem Tamazov2016-04-272-13/+42
| | | | | | | | | | | | registers. Possibility to specify code of hardware register kept. Disassemble to symbolic name, if name is known. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19335 llvm-svn: 267724
* [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.Ahmed Bougacha2016-04-263-36/+28
| | | | | | Differential Revision: http://reviews.llvm.org/D17176 llvm-svn: 267606
* [AMDGPU] Move reserved vgpr count for trap handler usage to ↵Konstantin Zhuravlyov2016-04-266-9/+20
| | | | | | | | SIMachineFunctionInfo + minor commenting changes Differential Revision: http://reviews.llvm.org/D19537 llvm-svn: 267573
* [AMDGPU] Reserve VGPRs for trap handler usage if instructedKonstantin Zhuravlyov2016-04-266-1/+48
| | | | | | Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563
* [AMDGPU] Assembler: basic support for SDWA instructionsSam Kolton2016-04-268-58/+414
| | | | | | | | | | | | | | | Support for SDWA instructions for VOP1 and VOP2 encoding. Not done yet: - converters for support optional operands and modifiers - VOPC - sext() modifier - intrinsics - VOP2b (see vop_dpp.s) - V_MAC_F32 (see vop_dpp.s) Differential Revision: http://reviews.llvm.org/D19360 llvm-svn: 267553
* Add optimization bisect opt-in calls for AMDGPU passesAndrew Kaylor2016-04-257-1/+19
| | | | | | Differential Revision: http://reviews.llvm.org/D19450 llvm-svn: 267485
* AMDGPU/SI: Optimize adjacent s_nop instructionsMatt Arsenault2016-04-251-0/+27
| | | | | | | | | | | | Use the operand for how long to wait. This is somewhat distasteful, since it would be better to just emit s_nop with the right argument in the first place. This would require changing TII::insertNoop to emit N operands, which would be easy. Slightly more problematic is the post-RA scheduler and hazard recognizer represent nops as a single null node, and would require inventing another way of representing N nops. llvm-svn: 267456
* AMDGPU: Implement addrspacecastMatt Arsenault2016-04-255-71/+124
| | | | llvm-svn: 267452
* AMDGPU: Add queue ptr intrinsicMatt Arsenault2016-04-254-3/+18
| | | | llvm-svn: 267451
* AMDGPU: Add DAG to debug dumpMatt Arsenault2016-04-251-2/+2
| | | | | | Also reorder case to match enum order llvm-svn: 267449
* Fix incorrect redundant expression in target AMDGPU.Etienne Bergeron2016-04-251-1/+1
| | | | | | | | | | | | | | | | | | | Summary: The expression is detected as a redundant expression. Turn out, this is probably a bug. ``` /home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression] if (isSMRD(*FirstLdSt) && isSMRD(*FirstLdSt)) { ``` Reviewers: rnk, tstellarAMD Subscribers: arsenm, cfe-commits Differential Revision: http://reviews.llvm.org/D19460 llvm-svn: 267415
* [AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax.Artem Tamazov2016-04-255-3/+127
| | | | | | | | | | | | | Added hwreg(reg[,offset,width]) syntax. Default offset = 0, default width = 32. Possibility to specify 16-bit immediate kept. Added out-of-range checks. Disassembling is always to hwreg(...) format. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19329 llvm-svn: 267410
* Fix a couple assertions that can never fire because they just contained the ↵Craig Topper2016-04-241-1/+1
| | | | | | text string which always evaluates to true. Add a ! so they'll evaluate to false. llvm-svn: 267312
* AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.SizeMatt Arsenault2016-04-221-0/+16
| | | | llvm-svn: 267244
* AMDGPU: Re-visit nodes in performAndCombineMatt Arsenault2016-04-221-0/+5
| | | | | | This fixes test regressions when i64 loads/stores are made promote. llvm-svn: 267240
* [AMDGPU] Insert nop pass: take care of outstanding feedbackKonstantin Zhuravlyov2016-04-222-21/+18
| | | | | | | | | | | - Switch few loops to range-based for loops - Fix nop insertion at the end of BB - Fix formatting - Check for endpgm Differential Revision: http://reviews.llvm.org/D19380 llvm-svn: 267167
* AMDGPU/SI: add llvm.amdgcn.ps.live intrinsicNicolai Haehnle2016-04-224-16/+55
| | | | | | | | | | | | | | | | | | | | | | | Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102
* AMDGPU: Fix debug name of pass to better matchMatt Arsenault2016-04-211-1/+1
| | | | | | I get this wrong every time I try to debug this. llvm-svn: 267030
* Split IntrReadArgMem into IntrReadMem and IntrArgMemOnlyNicolai Haehnle2016-04-211-1/+1
| | | | | | | | | | | | | | | | | | Summary: IntrReadWriteArgMem simply becomes IntrArgMemOnly. So there are fewer intrinsic properties that express their orthogonality better, and correspond more closely to the corresponding IR attributes. Suggested by: Philip Reames Reviewers: joker.eph, reames, tstellarAMD Subscribers: jholewinski, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19291 llvm-svn: 267021
* [AMDGPU] Assembler: prevent parseDPPCtrlOps from eating invalid tokensSam Kolton2016-04-211-2/+14
| | | | | | | | | | Reviewers: nhaustov, tstellarAMD Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19317 llvm-svn: 266984
* AMDGPU/SI: Assembler: improvements to support trap handlers.Nikolay Haustov2016-04-202-70/+124
| | | | | | | | | | | | Add ParseAMDGPURegister which can be invoked recursively for parsing lists. Rename getRegForName to getSpecialRegForName. Support legacy SP3 register list syntax: [s2,s3,s4,s5] or [flat_scratch_lo,flat_scratch_hi]. Add 64-bit registers TBA, TMA where missing. Add some tests. Differential Revision: http://reviews.llvm.org/D19163 llvm-svn: 266865
OpenPOWER on IntegriCloud