summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIISelLowering.h
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Add Assert[SZ]Ext during argument load creationMatt Arsenault2017-01-091-1/+2
| | | | | | | | | | | For i16 zeroext arguments when i16 was a legal type, the known bits information from the truncate was lost. Insert a zeroext so the known bits optimizations work with the 32-bit loads. Fixes code quality regressions vs. SI in min.ll test. llvm-svn: 291461
* AMDGPU: Check fast math flags in fadd/fsub combinesMatt Arsenault2016-12-221-1/+2
| | | | llvm-svn: 290312
* AMDGPU: Form more FMAs if fusion is allowedMatt Arsenault2016-12-221-0/+1
| | | | | | | Extend the existing fadd/fsub->fmad combines to produce FMA if allowed. llvm-svn: 290311
* AMDGPU: Move combines into separate functionsMatt Arsenault2016-12-221-0/+5
| | | | llvm-svn: 290309
* AMDGPU: Custom lower f16 fdivMatt Arsenault2016-12-221-0/+1
| | | | llvm-svn: 290301
* AMDGPU: Make f16 ConstantFP legalMatt Arsenault2016-12-081-3/+0
| | | | | | | | | | | | | | Not having this legal led to combine failures, resulting in dumb things like bitcasts of constants not being folded away. The only reason I'm leaving the v_mov_b32 hack that f32 already uses is to avoid madak formation test regressions. PeepholeOptimizer has an ordering issue where the immediate fold attempt is into the sgpr->vgpr copy instead of the actual use. Running it twice avoids that problem. llvm-svn: 289096
* [AMDGPU] Scalarization of global uniform loads.Alexander Timofeev2016-12-081-0/+1
| | | | | | | | | | | | | | | | | | Summary: LC can currently select scalar load for uniform memory access basing on readonly memory address space only. This restriction originated from the fact that in HW prior to VI vector and scalar caches are not coherent. With MemoryDependenceAnalysis we can check that the memory location corresponding to the memory operand of the LOAD is not clobbered along the all paths from the function entry. Reviewers: rampitec, tstellarAMD, arsenm Subscribers: wdng, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D26917 llvm-svn: 289076
* AMDGPU: Implement isCheapAddrSpaceCastMatt Arsenault2016-12-021-0/+1
| | | | llvm-svn: 288523
* [AMDGPU] Custom lower f16 = fp_round f64Konstantin Zhuravlyov2016-11-171-0/+3
| | | | llvm-svn: 287203
* [AMDGPU] Promote f16/i16 conversions to f32/i32Konstantin Zhuravlyov2016-11-171-6/+0
| | | | llvm-svn: 287201
* [AMDGPU] Add f16 support (VI+)Konstantin Zhuravlyov2016-11-131-0/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
* [AMDGPU] Emit constant address space data in .rodata section and use ↵Konstantin Zhuravlyov2016-10-201-0/+13
| | | | | | | | relocations instead of fixups (amdhsa only) Differential Revision: https://reviews.llvm.org/D25693 llvm-svn: 284759
* AMDGPU: Fix broken FrameIndex handlingMatt Arsenault2016-09-171-1/+0
| | | | | | | | | | | | | | | | | We were trying to avoid using a FrameIndex operand in non-pointer operands in a convoluted way, and would break because of using TargetFrameIndex. The TargetFrameIndex should only be used in the case where it makes sense to fold it as part of the addressing mode, otherwise it requires materialization like a normal constant. This wasn't working reliably and failed in the added testcase, hitting the assert when processing the frame index. The TargetFrameIndex was coming from trying to produce an AssertZext limiting the maximum stack size. I'm not sure this was correct to begin with, because it is apparently possible to have a single workitem dispatch that requires all 4G of private memory. llvm-svn: 281824
* AMDGPU: Improve splitting 64-bit bit ops by constantsMatt Arsenault2016-09-141-0/+6
| | | | | | | | This addresses a TODO to handle operations besides and. This also starts eliminating no-op operations with a constant that can emerge later. llvm-svn: 281488
* AMDGPU: Remove analyzeImmediateMatt Arsenault2016-07-281-1/+0
| | | | | | | This no longer uses the more complicated classification of constants. llvm-svn: 276945
* AMDGPU: Change fdiv lowering based on !fpmath metadataMatt Arsenault2016-07-191-1/+2
| | | | | | | | | | | If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051
* AMDGPU: Follow up to r275203Matt Arsenault2016-07-121-0/+3
| | | | | | I meant to squash this into it. llvm-svn: 275220
* CodeGen: Use MachineInstr& in TargetLowering, NFCDuncan P. N. Exon Smith2016-06-301-3/+4
| | | | | | | | | | | | | This is a mechanical change to make TargetLowering API take MachineInstr& (instead of MachineInstr*), since the argument is expected to be a valid MachineInstr. In one case, changed a parameter from MachineInstr* to MachineBasicBlock::iterator, since it was used as an insertion point. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. llvm-svn: 274287
* [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in ↵Konstantin Zhuravlyov2016-06-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | the kernel code header Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue. Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format: - offset 0: work group ID x - offset 4: work group ID y - offset 8: work group ID z - offset 16: work item ID x - offset 20: work item ID y - offset 24: work item ID z Set - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled Differential Revision: http://reviews.llvm.org/D20335 llvm-svn: 273769
* AMDGPU/SI: Make sure not to fold offsets into local address space globalsTom Stellard2016-06-251-0/+2
| | | | | | | | | | | | | | Summary: Offset folding only works if you are emitting relocations, and we don't emit relocations for local address space globals. Reviewers: arsenm, nhaustov Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21647 llvm-svn: 273765
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-1/+3
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU: Add implicitarg.ptr intrinsic.Jan Vesely2016-06-211-1/+3
| | | | | | | | Points to the start of implicit arguments (appended after explicit arguments) Differential Revision: http://reviews.llvm.org/D20297 llvm-svn: 273317
* AMDGPU: Temporarily select trap to s_endpgmMatt Arsenault2016-06-171-0/+1
| | | | | | | | | | | | This should select to s_trap, but that requires additonal work to setup and enable the trap handler. For now emit s_endpgm so bugpoint stops getting stuck on the unsupported call to abort. Emit a warning that this will only terminate the wave and not really trap. llvm-svn: 273062
* AMDGPU/SI: Refactor fixup handling for constant addrspace variablesTom Stellard2016-06-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Re-commit this after fixing a bug where we were trying to use a reference to a Triple object that had already been destroyed. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272705
* Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"Tom Stellard2016-06-141-2/+1
| | | | | | This reverts commit r272675. llvm-svn: 272677
* AMDGPU/SI: Refactor fixup handling for constant addrspace variablesTom Stellard2016-06-141-1/+2
| | | | | | | | | | | | | | | | | | | Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272675
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-121-14/+11
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* AMDGPU: Unify LowerGlobalAddressJan Vesely2016-05-131-2/+0
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19794 llvm-svn: 269481
* AMDGPU: Implement addrspacecastMatt Arsenault2016-04-251-0/+3
| | | | llvm-svn: 267452
* AMDGPU: Implement canonicalizeMatt Arsenault2016-04-141-0/+1
| | | | | | Also add generic DAG node for it. llvm-svn: 266272
* AMDGPU: Add atomic_inc + atomic_dec intrinsicsMatt Arsenault2016-04-121-0/+4
| | | | | | | These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074
* AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}Tom Stellard2016-04-011-0/+1
| | | | | | | | | | | | | | | | | Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 llvm-svn: 265170
* AMDGPU: R600 code splitting cleanupMatt Arsenault2016-03-111-3/+3
| | | | | | | Move a few functions only used by R600 to R600 specific code, fix header macros to stop using R600, mark classes as final. llvm-svn: 263204
* AMDGPU/SI: Detect uniform branches and emit s_cbranch instructionsTom Stellard2016-02-121-0/+2
| | | | | | | | | | Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
* AMDGPU: Match some med3 patternsMatt Arsenault2016-01-281-1/+2
| | | | llvm-svn: 259089
* AMDGPU: Remove old sample intrinsicsMatt Arsenault2016-01-261-2/+0
| | | | | | | | | | | I did my best to try to update all the uses in tests that just happened to use the old ones to the newer intrinsics. I'm not sure I got all of the immediate operand conversions correct, since the value seems to have been ignored by the old pattern but I don't think it really matters. llvm-svn: 258787
* AMDGPU: Implement read_register and write_register intrinsicsMatt Arsenault2016-01-261-0/+3
| | | | | | | | | | | | | | Some of the special intrinsics now that now correspond to a instruction also have special setting of some registers, e.g. llvm.SI.sendmsg sets m0 as well as use s_sendmsg. Using these explicit register intrinsics may be a better option. Reading the exec mask and others may be useful for debugging. For this I'm not sure this is entirely correct because we would want this to be convergent, although it's possible this is already treated sufficently conservatively. llvm-svn: 258785
* AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputsTom Stellard2016-01-201-0/+2
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15035 llvm-svn: 258256
* AMDGPU/SI: Add support for non-void functionsMarek Olsak2016-01-131-0/+7
| | | | | | | | | | | | | | | | | | | Summary: Return values can be stored in SGPRs (i32) and VGPRs (f32). This will be used by functions which expect some bytecode or other binary to be appended at the end. It allows defining in which registers the return values will be stored. v2: don't do this for compute shaders Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16033 llvm-svn: 257621
* AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF ↵Tom Stellard2015-12-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | instructions Summary: We were previously selecting all constant loads to SMRD instructions and legalizing the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass. This new solution is more simple and also generates much better code, because the instruction selector is able to take advantage of all the MUBUF addressing modes that are legalization pass wasn't able to. We also no longer need to generate v_add_* instructions when we have a uniform pointer and a non-uniform offset, as this is now folded into the MUBUF instruction during instruction selection. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15425 llvm-svn: 255672
* AMDGPU/SI: Add support for sgpr and vgpr inline assembly constraintsTom Stellard2015-12-101-0/+1
| | | | | | | | | | | | Summary: The 's' constraint represents sgprs and the 'v' constraint represents vgprs. Reviewers: arsenm, echristo Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15342 llvm-svn: 255203
* AMDGPU: Implement isNoopAddrSpaceCastMatt Arsenault2015-12-011-0/+2
| | | | llvm-svn: 254468
* AMDGPU: Remove SIPrepareScratchRegsMatt Arsenault2015-11-301-4/+0
| | | | | | | | | | | | | | | | | | | | | | It does not work because of emergency stack slots. This pass was supposed to eliminate dummy registers for the spill instructions, but the register scavenger can introduce more during PrologEpilogInserter, so some would end up left behind if they were needed. The potential for spilling the scratch resource descriptor and offset register makes doing something like this overly complicated. Reserve registers to use for the resource descriptor and use them directly in eliminateFrameIndex. Also removes creating another scratch resource descriptor when directly selecting scratch MUBUF instructions. The choice of which registers are reserved is temporary. For now it attempts to pick the next available registers after the user and system SGPRs. llvm-svn: 254329
* AMDGPU: Use assert zext for workgroup sizesMatt Arsenault2015-11-301-0/+3
| | | | llvm-svn: 254328
* AMDGPU: Assume SMRD access for constant address spaceMatt Arsenault2015-08-071-0/+1
| | | | | | | Since r243294 these are selected to SMRD and moved later if required. llvm-svn: 244354
* AMDGPU/SI: Add VI patterns to select FLAT instructions for global memory opsTom Stellard2015-07-201-0/+1
| | | | | | | | | | | | | | Summary: The MUBUF addr64 bit has been removed on VI, so we must use FLAT instructions when the pointer is stored in VGPRs. Reviewers: arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11067 llvm-svn: 242673
* Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT userMehdi Amini2015-07-091-1/+1
| | | | | | | A documentation for this function would be nice by the way. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241807
* Make isLegalAddressingMode() taking DataLayout as an argumentMehdi Amini2015-07-091-2/+2
| | | | | | | | | | | | | | | | Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11040 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241778
* Make TargetLowering::getShiftAmountTy() taking DataLayout as an argumentMehdi Amini2015-07-091-1/+1
| | | | | | | | | | | | | | | | Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11037 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241776
* Make TargetLowering::getPointerTy() taking DataLayout as an argumentMehdi Amini2015-07-091-1/+2
| | | | | | | | | | | | | | | | Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775
OpenPOWER on IntegriCloud