summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Fold out no-op kill intrinsicsMatt Arsenault2016-07-131-0/+8
| | | | llvm-svn: 275253
* AMDGPU: Follow up to r275203Matt Arsenault2016-07-121-3/+60
| | | | | | I meant to squash this into it. llvm-svn: 275220
* AMDGPU: Treat texture gather instructions more like other MIMG instructionsNicolai Haehnle2016-07-111-1/+2
| | | | | | | | | | | | | | | | | | | | | Summary: Setting MIMG to 0 has a bunch of unexpected side effects, including that isVMEM returns false which leads to incorrect treatment in the hazard recognizer. The reason I noticed it is that it also leads to incorrect treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug. The only reason why MIMG was set to 0 is to signal the special handling of dmasks, but that can be checked differently. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22210 llvm-svn: 275113
* Revert "AMDGPU: Remove unused control flow intrinsic"Matt Arsenault2016-07-091-0/+1
| | | | llvm-svn: 274978
* AMDGPU: Fix fdiv lowering when f32 denormals supportedMatt Arsenault2016-07-091-5/+3
| | | | | | Also fix test not actually using function labels. llvm-svn: 274969
* AMDGPU: Remove unused control flow intrinsicMatt Arsenault2016-07-081-1/+0
| | | | llvm-svn: 274939
* AMDGPU: Add feature for unaligned accessMatt Arsenault2016-07-011-8/+14
| | | | llvm-svn: 274398
* AMDGPU: Improve load/store of illegal types.Matt Arsenault2016-07-011-75/+1
| | | | | | | | | | There was a combine before to handle the simple copy case. Split this into handling loads and stores separately. We might want to change how this handles some of the vector extloads, since this can result in large code size increases. llvm-svn: 274394
* CodeGen: Use MachineInstr& in TargetLowering, NFCDuncan P. N. Exon Smith2016-06-301-28/+29
| | | | | | | | | | | | | This is a mechanical change to make TargetLowering API take MachineInstr& (instead of MachineInstr*), since the argument is expected to be a valid MachineInstr. In one case, changed a parameter from MachineInstr* to MachineBasicBlock::iterator, since it was used as an insertion point. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. llvm-svn: 274287
* CodeGen: Use MachineInstr& in TargetInstrInfo, NFCDuncan P. N. Exon Smith2016-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `*` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr*` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
* [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in ↵Konstantin Zhuravlyov2016-06-251-0/+31
| | | | | | | | | | | | | | | | | | | | | | | the kernel code header Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue. Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format: - offset 0: work group ID x - offset 4: work group ID y - offset 8: work group ID z - offset 16: work item ID x - offset 20: work item ID y - offset 24: work item ID z Set - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled Differential Revision: http://reviews.llvm.org/D20335 llvm-svn: 273769
* AMDGPU/SI: Make sure not to fold offsets into local address space globalsTom Stellard2016-06-251-0/+8
| | | | | | | | | | | | | | Summary: Offset folding only works if you are emitting relocations, and we don't emit relocations for local address space globals. Reviewers: arsenm, nhaustov Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21647 llvm-svn: 273765
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-39/+33
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU: Fix verifier errors in SILowerControlFlowMatt Arsenault2016-06-221-3/+4
| | | | | | | | | | | | | The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467
* AMDGPU: Add implicitarg.ptr intrinsic.Jan Vesely2016-06-211-8/+18
| | | | | | | | Points to the start of implicit arguments (appended after explicit arguments) Differential Revision: http://reviews.llvm.org/D20297 llvm-svn: 273317
* AMDGPU: Fold more custom nodes to undefMatt Arsenault2016-06-201-11/+40
| | | | | | | | | | | This will help sneak undefs past GVN into the DAG for some tests. Also add missing intrinsic for rsq_legacy, even though the node was already selected to the instruction. Also start passing the debug location to intrinsic errors. llvm-svn: 273181
* AMDGPU: Add support for R_AMDGPU_REL32 relocationsTom Stellard2016-06-201-1/+2
| | | | | | | | | | Reviewers: arsenm, kzhuravl, rafael Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21401 llvm-svn: 273168
* Reformat blank lines.NAKAMURA Takumi2016-06-201-3/+0
| | | | llvm-svn: 273131
* Untabify.NAKAMURA Takumi2016-06-201-2/+1
| | | | llvm-svn: 273129
* AMDGPU: Temporarily select trap to s_endpgmMatt Arsenault2016-06-171-0/+19
| | | | | | | | | | | | This should select to s_trap, but that requires additonal work to setup and enable the trap handler. For now emit s_endpgm so bugpoint stops getting stuck on the unsupported call to abort. Emit a warning that this will only terminate the wave and not really trap. llvm-svn: 273062
* AMDGPU/SI: Simplify code in SITargetLowering::LowerGlobalAddress()Tom Stellard2016-06-171-1/+1
| | | | | | This change were suggested in http://reviews.llvm.org/D21154. llvm-svn: 273059
* AMDGPU/SI: Refactor fixup handling for constant addrspace variablesTom Stellard2016-06-141-0/+34
| | | | | | | | | | | | | | | | | | | | | | Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Re-commit this after fixing a bug where we were trying to use a reference to a Triple object that had already been destroyed. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272705
* Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"Tom Stellard2016-06-141-34/+0
| | | | | | This reverts commit r272675. llvm-svn: 272677
* AMDGPU/SI: Refactor fixup handling for constant addrspace variablesTom Stellard2016-06-141-0/+34
| | | | | | | | | | | | | | | | | | | Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272675
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-121-26/+20
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* AMDGPU: Fix trailing whitespaceMatt Arsenault2016-06-101-14/+14
| | | | llvm-svn: 272364
* AMDGPU: Fix i64 global cmpxchgMatt Arsenault2016-06-091-3/+3
| | | | | | | | | | This was using extract_subreg sub0 to extract the low register of the result instead of sub0_sub1, producing an invalid copy. There doesn't seem to be a way to use the compound subreg indices in tablegen since those are generated, so manually select it. llvm-svn: 272344
* AMDGPU/SI: Fix 32-bit fdiv loweringWei Ding2016-06-091-16/+53
| | | | | | | | | We were using the fast fdiv lowering for all division, implementation of IEEE754 fdiv is added. http://reviews.llvm.org/D20557 llvm-svn: 272292
* Revert "Differential Revision: http://reviews.llvm.org/D20557"Eric Christopher2016-06-071-55/+17
| | | | | | | | | | | | | | | | Author: Wei Ding <wei.ding2@amd.com> Date: Tue Jun 7 19:04:44 2016 +0000 Differential Revision: http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8 as it was breaking the bots. This reverts commit r272044. llvm-svn: 272056
* Differential Revision: http://reviews.llvm.org/D20557Wei Ding2016-06-071-17/+55
| | | | llvm-svn: 272044
* AMDGPU: Fix constantexpr addrspacecastsMatt Arsenault2016-06-061-1/+4
| | | | | | | | If we had a constant group address space cast the queue pointer wasn't enabled for the function, resulting in a crash on noreg later. llvm-svn: 271935
* AMDGPU: Add fract intrinsicMatt Arsenault2016-05-281-0/+4
| | | | | | | | | Remove broken patterns matching it. This was matching the unsafe math pattern and expanding the fix for the buggy instruction from the pattern. The problems are also on CI. Remove the workarounds and only use fract with unsafe math or from the intrinsic. llvm-svn: 271078
* [AMDGPU] Remove exit-on-error flag from test (PR27762)Diana Picus2016-05-261-1/+1
| | | | | | | | | | Similar to r269948, but for argument lowering. Fixes PR27762 Differential Revision: http://reviews.llvm.org/D20430 llvm-svn: 270856
* AMDGPU: Cleanup lowering actionsMatt Arsenault2016-05-211-128/+41
| | | | | | | | These are kind of a mess and hard to follow, particularly for loads and stores. Fix various redundant, unnecessary and dead settings. llvm-svn: 270307
* AMDGPU: Unify LowerGlobalAddressJan Vesely2016-05-131-16/+0
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19794 llvm-svn: 269481
* AMDGPU: Fold shift into cvt_f32_ubyteNMatt Arsenault2016-05-091-1/+15
| | | | llvm-svn: 268930
* AMDGPU: Simplify control flow / conditionsMatt Arsenault2016-05-051-19/+19
| | | | llvm-svn: 268676
* AMDGPU: Uniform branch conditions can originate with intrinsicsNicolai Haehnle2016-05-051-2/+1
| | | | | | | | | | | | | | | | | Summary: Discovered by Dave Airlie, fixes an assertion in Khronos OpenGL CTS GL43-CTS.shader_storage_buffer_object.advanced-matrix. In this particular case, the buffer load intrinsic fed into a uniform conditional branch, and led the brcond lowering down the wrong path. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19931 llvm-svn: 268650
* AMDGPU: Custom lower v2i32 loads and storesMatt Arsenault2016-05-021-7/+39
| | | | | | | This will allow us to split up 64-bit private accesses when necessary. llvm-svn: 268296
* AMDGPU: Make i64 loads/stores promote to v2i32Matt Arsenault2016-05-021-0/+12
| | | | | | | | | | | | Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293
* AMDGPU: Add kernarg.segment.ptr intrinsicMatt Arsenault2016-04-291-0/+5
| | | | llvm-svn: 268105
* AMDGPU: Stop reporting an addressing mode for unknown addrspaceMatt Arsenault2016-04-291-1/+6
| | | | | | | | | This was being treated the same as private, which has an immediate offset. For unknown, it probably means it's for a computation not actually being used for accessing memory, so it should not have a nontrivial addressing mode. llvm-svn: 268002
* [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.Ahmed Bougacha2016-04-261-5/+4
| | | | | | Differential Revision: http://reviews.llvm.org/D17176 llvm-svn: 267606
* AMDGPU: Implement addrspacecastMatt Arsenault2016-04-251-0/+84
| | | | llvm-svn: 267452
* AMDGPU: Add queue ptr intrinsicMatt Arsenault2016-04-251-1/+11
| | | | llvm-svn: 267451
* [NFC] Header cleanupMehdi Amini2016-04-181-2/+1
| | | | | | | | | | | | | | Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595
* AMDGPU/SI: Fix regression with no-return atomicsNicolai Haehnle2016-04-151-0/+1
| | | | | | | | | | | | | | | Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433
* AMDGPU: Remove custom load/store scalarizationMatt Arsenault2016-04-141-2/+2
| | | | llvm-svn: 266385
* AMDGPU: Directly emit m0 initialization with s_mov_b32Matt Arsenault2016-04-141-13/+20
| | | | | | | | | | | | | Currently what comes out of instruction selection is a register initialized to -1, and then copied to m0. MachineCSE doesn't consider copies, but we want these to be CSEed. This isn't much of a problem currently, because SIFoldOperands is run immediately after. This avoids regressions when SIFoldOperands is run later from leaving all copies to m0. llvm-svn: 266377
* AMDGPU/SI: Use the correct scratch wave offset register for shaders.Tom Stellard2016-04-141-6/+28
| | | | | | | | | | | | | | | | | | | | | | Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336
OpenPOWER on IntegriCloud