summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "AMDGPU: Add VI i16 support"Tom Stellard2016-11-041-72/+4
| | | | | | This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995
* AMDGPU: Add VI i16 supportTom Stellard2016-11-031-4/+72
| | | | | | | | Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939
* AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratchTom Stellard2016-10-261-0/+16
| | | | | | | | | | | | | | Summary: A single flat memory operations that might access the scratch buffer can only access MaxPrivateElementSize bytes. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25788 llvm-svn: 285198
* AMDGPU: Fix Two Address problems with v_movreldNicolai Haehnle2016-10-241-27/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The v_movreld machine instruction is used with three operands that are in a sense tied to each other (the explicit VGPR_32 def and the implicit VGPR_NN def and use). There is no way to express that using the currently available operand bits, and indeed there are cases where the Two Address instructions pass does the wrong thing. This patch introduces a new set of pseudo instructions that are identical in intended semantics as v_movreld, but they only have two tied operands. Having to add a new set of pseudo instructions is admittedly annoying, but it's a fairly straightforward and solid approach. The only alternative I see is to try to teach the Two Address instructions pass about Three Address instructions, and I'm afraid that's trickier and is going to end up more fragile. Note that v_movrels does not suffer from this problem, and so this patch does not touch it. This fixes several GL45-CTS.shaders.indexing.* tests. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25633 llvm-svn: 284980
* [AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FPKonstantin Zhuravlyov2016-10-211-4/+13
| | | | | | | | | This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/cvt_f32_ubyte.ll Differential Revision: https://reviews.llvm.org/D25805 llvm-svn: 284891
* [AMDGPU] Emit constant address space data in .rodata section and use ↵Konstantin Zhuravlyov2016-10-201-22/+22
| | | | | | | | relocations instead of fixups (amdhsa only) Differential Revision: https://reviews.llvm.org/D25693 llvm-svn: 284759
* AMDGPU/SI: LowerParameter() should be computing align based on memory typeTom Stellard2016-10-171-1/+1
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25203 llvm-svn: 284398
* AMDGPU/SI: Fix LowerParameter() for i16 argumentsTom Stellard2016-10-171-10/+18
| | | | | | | | | | | | | | Summary: If we are loading an i16 value from a 32-bit memory location, then we need to be able to truncate the loaded value to i16. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25198 llvm-svn: 284397
* AMDGPU/SI: Don't allow unaligned scratch accessTom Stellard2016-10-141-0/+9
| | | | | | | | | | | | Summary: The hardware doesn't support this. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25523 llvm-svn: 284257
* AMDGPU: Fix use-after-freesNicolai Haehnle2016-10-141-14/+15
| | | | | | | | | | Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25312 llvm-svn: 284215
* [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external ↵Konstantin Zhuravlyov2016-10-141-12/+43
| | | | | | | | and global address space variables Differential Revision: https://reviews.llvm.org/D25562 llvm-svn: 284196
* AMDGPU: Assume spilling will occur at -O0Matt Arsenault2016-10-131-1/+5
| | | | | | | | Because everything live is spilled at the end of a block by fast regalloc, assume this will happen and avoid the copies of the resource descriptor. llvm-svn: 284119
* AMDGPU: Fix truncate to bool warningsMatt Arsenault2016-10-131-5/+5
| | | | llvm-svn: 284116
* AMDGPU: Initial implementation of VGPR indexing modeMatt Arsenault2016-10-121-42/+167
| | | | | | | | | | | This is the most basic handling of the indirect access pseudos using GPR indexing mode. This currently only enables the mode for a single v_mov_b32 and then disables it. This is much more complicated to use than the movrel instructions, so a new optimization pass is probably needed to fold the access into the uses and keep the mode enabled for them. llvm-svn: 284031
* AMDGPU: Refactor indirect vector loweringMatt Arsenault2016-10-041-36/+42
| | | | | | | Allow inserting multiple instructions in the expanded loop. llvm-svn: 283177
* [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit ↵Konstantin Zhuravlyov2016-09-281-0/+4
| | | | | | | | instructions Differential Revision: https://reviews.llvm.org/D24125 llvm-svn: 282624
* AMDGPU: Fix broken FrameIndex handlingMatt Arsenault2016-09-171-39/+0
| | | | | | | | | | | | | | | | | We were trying to avoid using a FrameIndex operand in non-pointer operands in a convoluted way, and would break because of using TargetFrameIndex. The TargetFrameIndex should only be used in the case where it makes sense to fold it as part of the addressing mode, otherwise it requires materialization like a normal constant. This wasn't working reliably and failed in the added testcase, hitting the assert when processing the frame index. The TargetFrameIndex was coming from trying to produce an AssertZext limiting the maximum stack size. I'm not sure this was correct to begin with, because it is apparently possible to have a single workitem dispatch that requires all 4G of private memory. llvm-svn: 281824
* AMDGPU: Allow some control flow intrinsics to be CSEdMatt Arsenault2016-09-161-17/+47
| | | | | | | | | | | These clean up some unnecessary or instructions in cases with complex loops. In the original testcase I noticed this, the same or with exec was repeated 5 or 6 times in a row. With this only one is emitted or sometimes a copy. llvm-svn: 281786
* AMDGPU: Refactor kernel argument loweringTom Stellard2016-09-161-5/+5
| | | | | | | | | | | | | | | | | | | Summary: The main challenge in lowering kernel arguments for AMDGPU is determing the memory type of the argument. The generic calling convention code assumes that only legal register types can be stored in memory, but this is not the case for AMDGPU. This consolidates all the logic AMDGPU uses for deducing memory types into a single function. This will make it much easier to support different ABIs in the future. Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24614 llvm-svn: 281781
* AMDGPU/SI: Add support for triples with the mesa3d operating systemTom Stellard2016-09-161-4/+4
| | | | | | | | | | | | | | Summary: mesa3d will use the same kernel calling convention as amdhsa, but it will handle everything else like the default 'unknown' OS type. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22783 llvm-svn: 281779
* AMDGPU: Improve splitting 64-bit bit ops by constantsMatt Arsenault2016-09-141-50/+123
| | | | | | | | This addresses a TODO to handle operations besides and. This also starts eliminating no-op operations with a constant that can emerge later. llvm-svn: 281488
* [CodeGen] Split out the notions of MI invariance and MI dereferenceability.Justin Lebar2016-09-111-7/+12
| | | | | | | | | | | | | | | | | | | Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151
* AMDGPU: Relax SGPR asm constraint register classMatt Arsenault2016-08-301-1/+1
| | | | | | | s should be SReg_32 to be as general as possible. This can avoid a copy from m0. llvm-svn: 280154
* AMDGPU: Move cndmask pseudo to be isel pseudoMatt Arsenault2016-08-271-0/+30
| | | | | | | | There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901
* Untabify.NAKAMURA Takumi2016-08-221-6/+4
| | | | llvm-svn: 279408
* AMDGPU: Remove dead optionMatt Arsenault2016-08-171-6/+0
| | | | llvm-svn: 278965
* Replace "fallthrough" comments with LLVM_FALLTHROUGHJustin Bogner2016-08-171-1/+1
| | | | | | | This is a mechanical change of comments in switches like fallthrough, fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead. llvm-svn: 278902
* AMDGPU: Fix missing test for addressing mode with odd offsetsMatt Arsenault2016-08-131-0/+1
| | | | | | Add test if the constant offset looks unaligned. llvm-svn: 278589
* LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack ↵Alina Sbirlea2016-08-041-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | adjustments. Summary: TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses. A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted. Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures: - x86 failing tests either did not have the right target, or the right alignment. - NVPTX failing tests did not have the right alignment. - AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks for a maximum size allowed and relies on the next condition checking for %4 for correctness. This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list). Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure), so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context. Reviewers: arsenm, jlebar, tstellarAMD Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D23068 llvm-svn: 277735
* AMDGPU: fdiv -1, x -> rcp -xMatt Arsenault2016-08-021-16/+25
| | | | llvm-svn: 277535
* AMDGPU: Fix shouldConvertConstantLoadToIntImm behaviorMatt Arsenault2016-07-301-2/+2
| | | | | | | This should really be true for any immediate, not just inline ones. llvm-svn: 277260
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-3/+3
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* AMDGPU : Add intrinsics for compare with the full wavefront resultWei Ding2016-07-281-0/+29
| | | | | | Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998
* AMDGPU: Remove analyzeImmediateMatt Arsenault2016-07-281-28/+0
| | | | | | | This no longer uses the more complicated classification of constants. llvm-svn: 276945
* AMDGPU: Make AMDGPUMachineFunction fields privateMatt Arsenault2016-07-261-2/+2
| | | | | | | | | ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766
* AMDGPU: Add fp legacy instruction intrinsicsMatt Arsenault2016-07-261-0/+9
| | | | | | | This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764
* AMDGPU: Remove read_workdim intrinsicJan Vesely2016-07-251-5/+0
| | | | | | Differential revision: https://reviews.llvm.org/D22732 llvm-svn: 276682
* AMDGPU: Fix groupstaticsize for large LDSMatt Arsenault2016-07-221-3/+3
| | | | | | | | | The size can exceed s_movk_i32's limit, and we don't want to use it this early since it inhibits optimizations. This should probably be merged to the release branch. llvm-svn: 276438
* AMDGPU: Add HSA dispatch id intrinsicMatt Arsenault2016-07-221-0/+10
| | | | llvm-svn: 276437
* AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIsMatt Arsenault2016-07-221-26/+2
| | | | llvm-svn: 276434
* AMDGPU: Fix phis from blocks split due to register indexingMatt Arsenault2016-07-211-15/+22
| | | | llvm-svn: 276257
* AMDGPU: Change fdiv lowering based on !fpmath metadataMatt Arsenault2016-07-191-29/+34
| | | | | | | | | | | If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051
* AMDGPU: Only use legal inline immediates with kill pseudoMatt Arsenault2016-07-191-2/+7
| | | | | | | | | | | Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988
* AMDGPU: Expand register indexing pseudos in custom inserterMatt Arsenault2016-07-191-5/+326
| | | | | | | | | | | | | | | | | | | | | | | This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
* AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32Matt Arsenault2016-07-181-0/+3
| | | | llvm-svn: 275871
* AMDGPU: Remove legacy rsq.clamped intrinsicMatt Arsenault2016-07-151-2/+1
| | | | | | | | Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining. Also fix mismatch with non-IEEE rsq selecting to IEEE rsq. llvm-svn: 275617
* [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, ↵Justin Lebar2016-07-151-12/+8
| | | | | | | | | | | | | | | | | | | | | | | getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592
* AMDGPU: Fix splitting kill blocks with defs before killMatt Arsenault2016-07-151-13/+3
| | | | llvm-svn: 275508
* AMDGPU: Remove unused intrinsicsMatt Arsenault2016-07-141-8/+0
| | | | llvm-svn: 275371
* AMDGPU/SI: Add support for R_AMDGPU_GOTPCRELTom Stellard2016-07-131-23/+50
| | | | | | | | | | Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21484 llvm-svn: 275268
OpenPOWER on IntegriCloud