summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Improve offset folding for register indexingMatt Arsenault2016-07-091-0/+49
| | | | llvm-svn: 274954
* AMDGPU/SI: Remove address space query functions from AMDGPUDAGToDAGISelTom Stellard2016-07-051-56/+3
| | | | | | | | | | | | | | | Summary: These have been replaced with TableGen code (except for isConstantLoad, which is still used for R600). The queries were broken for cases where MemOperand was a PseudoSourceValue. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21684 llvm-svn: 274561
* AMDGPU/R600: Add PatFrags for selecting the correct vtx id for loadsTom Stellard2016-07-051-5/+0
| | | | | | | | | This moves of the r600 logic out of isGlobalLoad() and into the TableGen files. Differential Revision: http://reviews.llvm.org/D21710 llvm-svn: 274527
* AMDGPU/SI: Remove hack for selecting < 32-bit loads to MUBUF instructionsTom Stellard2016-07-041-4/+0
| | | | | | | | | | | | | | | Summary: The isGlobalLoad() query was returning true for constant address space loads with memory types less than 32-bits, which is wrong. This logic has been replaced with PatFrag in the TableGen files, to provide the same functionality. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21696 llvm-svn: 274521
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-1/+1
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU: Fix gcc warningsMatt Arsenault2016-06-221-90/+0
| | | | | | | Mostly removing dead code. Apparently gcc's warning for unused functions is better llvm-svn: 273363
* Delete more dead code.Rafael Espindola2016-06-211-32/+0
| | | | | | Found by gcc 6. llvm-svn: 273322
* Delete some dead code.Rafael Espindola2016-06-211-5/+0
| | | | | | Found by gcc 6. llvm-svn: 273303
* Reformat blank lines.NAKAMURA Takumi2016-06-201-1/+0
| | | | llvm-svn: 273131
* Untabify.NAKAMURA Takumi2016-06-201-5/+3
| | | | llvm-svn: 273129
* AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsicsNicolai Haehnle2016-06-151-13/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-121-3/+4
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permuteTom Stellard2016-06-101-2/+2
| | | | | | | | | | | | | | | Summary: This fixes a bug with ds_*permute instructions where if it was passed a constant address, then the offset operand would get assigned a register operand instead of an immediate. Reviewers: scchan, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19994 llvm-svn: 272349
* AMDGPU: Fix flat atomicsMatt Arsenault2016-06-091-0/+17
| | | | | | | | The flat atomics could already be selected, but only when using flat instructions for global memory. Add patterns for flat addresses. llvm-svn: 272345
* AMDGPU: Fix i64 global cmpxchgMatt Arsenault2016-06-091-6/+75
| | | | | | | | | | This was using extract_subreg sub0 to extract the low register of the result instead of sub0_sub1, producing an invalid copy. There doesn't seem to be a way to use the compound subreg indices in tablegen since those are generated, so manually select it. llvm-svn: 272344
* AMDGPU/R600: Implement memory loads from constant ASJan Vesely2016-05-131-3/+10
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19792 llvm-svn: 269479
* SDAG: Implement Select instead of SelectImpl in AMDGPUDAGToDAGISelJustin Bogner2016-05-121-49/+67
| | | | | | | | | | | - Where we were returning a node before, call ReplaceNode instead. - Where we would return null to fall back to another selector, rename the method to try* and return a bool for success. - Where we were calling SelectNodeTo, just return afterwards. Part of llvm.org/pr26808. llvm-svn: 269349
* Fixed unused but set variable warningSimon Pilgrim2016-05-091-3/+0
| | | | llvm-svn: 268931
* SDAG: Rename Select->SelectImpl and repurpose Select as returning voidJustin Bogner2016-05-051-2/+2
| | | | | | | | | | | | | | This is a step towards removing the rampant undefined behaviour in SelectionDAG, which is a part of llvm.org/PR26808. We rename SelectionDAGISel::Select to SelectImpl and update targets to match, and then change Select to return void and consolidate the sketchy behaviour we're trying to get away from there. Next, we'll update backends to implement `void Select(...)` instead of SelectImpl and eventually drop the base Select implementation. llvm-svn: 268693
* AMDGPU: Make i64 loads/stores promote to v2i32Matt Arsenault2016-05-021-55/+0
| | | | | | | | | | | | Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293
* AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructionsTom Stellard2016-04-291-2/+2
| | | | | | | | | | | | | | Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043
* AMDGPU: Implement addrspacecastMatt Arsenault2016-04-251-66/+0
| | | | llvm-svn: 267452
* AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.SizeMatt Arsenault2016-04-221-0/+16
| | | | llvm-svn: 267244
* [StructurizeCFG] Annotate branches that were treated as uniformNicolai Haehnle2016-04-141-1/+3
| | | | | | | | | | | | | | | | | | | Summary: This fully solves the problem where the StructurizeCFG pass does not consider the same branches as uniform as the SIAnnotateControlFlow pass. The patch in D19013 helps with this problem, but is not sufficient (and, interestingly, causes a "regression" with one of the existing test cases). No tests included here, because tests in D19013 already cover this. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19018 llvm-svn: 266346
* AMDGPU: Add atomic_inc + atomic_dec intrinsicsMatt Arsenault2016-04-121-1/+2
| | | | | | | These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074
* AMDGPU/SI: Implement atomic load/store for i32 and i64Jan Vesely2016-04-071-12/+33
| | | | | | | | | | Standard load/store instructions with GLC bit set. Reviewers: tstellardAMD, arsenm Differential Revision: http://reviews.llvm.org/D18760 llvm-svn: 265709
* Fix sequence point warning. NFC.Vasileios Kalintiris2016-03-241-1/+1
| | | | llvm-svn: 264255
* AMDGPU: Insert moves of frame index to value operandsMatt Arsenault2016-03-231-0/+56
| | | | | | | | | | | | | | | | | | | | | | | Strengthen tests of storing frame indices. Right now this just creates irrelevant scheduling changes. We don't want to have multiple frame index operands on an instruction. There seem to be various assumptions that at least the same frame index will not appear twice in the LocalStackSlotAllocation pass. There's no reason to have this happen, and it just makes it easy to introduce bugs where the immediate offset is appplied to the storing instruction when it should really be applied to the value being stored as a separate add. This might not be sufficient. It might still be problematic to have an add fi, fi situation, but that's even less unlikely to happen in real code. llvm-svn: 264200
* AMDGPU: Remove SignBitIsZero for mubuf scratch offsetsMatt Arsenault2016-03-211-1/+1
| | | | | | | These instructions do not have the same negative base address problem that DS instructions do on SI. llvm-svn: 263964
* AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.formatNicolai Haehnle2016-03-181-0/+79
| | | | | | | | | | | | | | | | | | | | | Summary: We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend cannot easily make use of an explicit soffset parameter either. Furthermore, it is likely that in the future, LLVM will be in a better position than the frontend to choose an SGPR offset if possible. Since there aren't any frontend uses of these intrinsics in upstream repositories yet, I would like to take this opportunity to change the intrinsic signatures to a single offset parameter, which is then selected to immediate offsets or voffsets using a ComplexPattern. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18218 llvm-svn: 263790
* AMDGPU: Simplify boolean conditional return statementsMatt Arsenault2016-03-021-10/+7
| | | | | | Patch by Richard Thomson llvm-svn: 262536
* AMDGPU: Check cheaper condition before SignBitIsZeroMatt Arsenault2016-02-241-7/+6
| | | | | | | Don't do an expensive computeKnownBits call when we can do the cheap check for legal offsets first. llvm-svn: 261720
* AMDGPU: Cleanup includes and random macrosMatt Arsenault2016-02-131-11/+4
| | | | llvm-svn: 260784
* AMDGPU/SI: Detect uniform branches and emit s_cbranch instructionsTom Stellard2016-02-121-0/+55
| | | | | | | | | | Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
* Refactor backend diagnostics for unsupported featuresOliver Stannard2016-02-021-3/+3
| | | | | | | | | | | | | | | | | Re-commit of r258951 after fixing layering violation. The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. llvm-svn: 259498
* Revert r259035, it introduces a cyclic library dependencyOliver Stannard2016-01-281-2/+2
| | | | llvm-svn: 259045
* Add backend dignostic printer for unsupported featuresOliver Stannard2016-01-281-2/+2
| | | | | | | | | | | | | | | | Re-commit of r258951 after fixing layering violation. The related LLVM patch adds a backend diagnostic type for reporting unsupported features, this adds a printer for them to clang. In the case where debug location information is not available, I've changed the printer to report the location as the first line of the function, rather than the closing brace, as the latter does not give the user any information. This also affects optimisation remarks. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 259035
* Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported ↵NAKAMURA Takumi2016-01-281-2/+2
| | | | | | | | | | | features" It broke layering violation in LLVMIR. clang r258950 "Add backend dignostic printer for unsupported features" llvm r258951 "Refactor backend diagnostics for unsupported features" llvm-svn: 259016
* Refactor backend diagnostics for unsupported featuresOliver Stannard2016-01-271-2/+2
| | | | | | | | | | | | | | | | | | | | | The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. The implementation of DiagnosticInfoUnsupported::print must be in lib/Codegen rather than in the existing file in lib/IR/ to avoid introducing a dependency from IR to CodeGen. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 258951
* AMDGPU: Fix old comments that mention AMDILMatt Arsenault2016-01-201-1/+1
| | | | llvm-svn: 258350
* AMDGPU/SI: Use flat for global load/store when targeting HSAChangpeng Fang2015-12-221-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256282
* Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA"Rafael Espindola2015-12-221-17/+9
| | | | | | | | This reverts commit r256273. It broke CodeGen/AMDGPU/llvm.dbg.value.ll llvm-svn: 256275
* AMDGPU/SI: Use flat for global load/store when targeting HSAChangpeng Fang2015-12-221-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256273
* AMDGPU: Error on addrspacecasts that aren't actually implementedMatt Arsenault2015-12-011-4/+7
| | | | llvm-svn: 254469
* AMDGPU/SI: Remove REGISTER_STORE/REGISTER_LOAD code which is now deadTom Stellard2015-12-011-35/+0
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15050 llvm-svn: 254427
* AMDGPU: Rework how private buffer passed for HSAMatt Arsenault2015-11-301-5/+1
| | | | | | | | | | | | | | | | If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. llvm-svn: 254331
* AMDGPU: Rename enums to be consistent with HSA code object terminologyMatt Arsenault2015-11-301-2/+2
| | | | llvm-svn: 254330
* AMDGPU: Remove SIPrepareScratchRegsMatt Arsenault2015-11-301-27/+5
| | | | | | | | | | | | | | | | | | | | | | It does not work because of emergency stack slots. This pass was supposed to eliminate dummy registers for the spill instructions, but the register scavenger can introduce more during PrologEpilogInserter, so some would end up left behind if they were needed. The potential for spilling the scratch resource descriptor and offset register makes doing something like this overly complicated. Reserve registers to use for the resource descriptor and use them directly in eliminateFrameIndex. Also removes creating another scratch resource descriptor when directly selecting scratch MUBUF instructions. The choice of which registers are reserved is temporary. For now it attempts to pick the next available registers after the user and system SGPRs. llvm-svn: 254329
* AMDGPU: Remove dead codeMatt Arsenault2015-11-111-33/+2
| | | | llvm-svn: 252675
* AMDGPU: Alphabetize includesMatt Arsenault2015-11-031-1/+1
| | | | llvm-svn: 251994
OpenPOWER on IntegriCloud