summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵Eugene Zelenko2016-12-091-13/+30
| | | | | | You Use warnings; other minor fixes (NFC). llvm-svn: 289282
* AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.Tom Stellard2016-12-071-0/+38
| | | | | | | | | | | | | | Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879
* AMDGPU/SI: Add back reverted SGPR spilling code, but disable itMarek Olsak2016-11-251-1/+1
| | | | | | suggested as a better solution by Matt llvm-svn: 287942
* Revert "AMDGPU: Make m0 unallocatable"Marek Olsak2016-11-251-1/+1
| | | | | | This reverts commit 124ad83dae04514f943902446520c859adee0e96. llvm-svn: 287932
* AMDGPU: Make m0 unallocatableMatt Arsenault2016-11-241-1/+1
| | | | | | | | | | | m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it. This introduces a few code quality regressions where copies from m0 are not coalesced into copies of a copy of m0. llvm-svn: 287841
* AMDGPU: Remove unnecessary and on conditional branchMatt Arsenault2016-11-071-16/+2
| | | | | | | The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. llvm-svn: 286134
* AMDGPU: Handle CopyToReg in getOperandRegClassMatt Arsenault2016-11-011-1/+14
| | | | llvm-svn: 285768
* AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodesNicolai Haehnle2016-10-141-10/+37
| | | | | | | | | | | | | | Summary: This will be used for 64-bit MULHU, which is in turn used for the 64-bit divide-by-constant optimization (see D24822). Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25289 llvm-svn: 284224
* [AMDGPU] Pass optimization level to SelectionDAGISelKonstantin Zhuravlyov2016-10-031-6/+6
| | | | llvm-svn: 283133
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-2/+2
| | | | llvm-svn: 283004
* AMDGPU: Fix broken FrameIndex handlingMatt Arsenault2016-09-171-59/+9
| | | | | | | | | | | | | | | | | We were trying to avoid using a FrameIndex operand in non-pointer operands in a convoluted way, and would break because of using TargetFrameIndex. The TargetFrameIndex should only be used in the case where it makes sense to fold it as part of the addressing mode, otherwise it requires materialization like a normal constant. This wasn't working reliably and failed in the added testcase, hitting the assert when processing the frame index. The TargetFrameIndex was coming from trying to produce an AssertZext limiting the maximum stack size. I'm not sure this was correct to begin with, because it is apparently possible to have a single workitem dispatch that requires all 4G of private memory. llvm-svn: 281824
* AMDGPU: Use i64 scalar compare instructionsMatt Arsenault2016-09-171-12/+27
| | | | | | VI added eq/ne for i64, so use them. llvm-svn: 281800
* AMDGPU: Run LoadStoreVectorizer pass by defaultMatt Arsenault2016-09-091-0/+3
| | | | llvm-svn: 281112
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-2/+2
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* AMDGPU: Remove analyzeImmediateMatt Arsenault2016-07-281-5/+12
| | | | | | | This no longer uses the more complicated classification of constants. llvm-svn: 276945
* AMDGPU: Unify MOVRELSOffset and MOVRELDOffsetNicolai Haehnle2016-07-121-30/+6
| | | | | | | | | | | | | | | | Summary: Previously, constant index insertelements would be turned into SI_INDIRECT_DST, which is bound to prevent some optimization opportunities. Worse, it mislead the heuristic that decides whether immediates should be lowered to S_MOV_B32 or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22217 llvm-svn: 275160
* AMDGPU: Improve offset folding for register indexingMatt Arsenault2016-07-091-0/+49
| | | | llvm-svn: 274954
* AMDGPU/SI: Remove address space query functions from AMDGPUDAGToDAGISelTom Stellard2016-07-051-56/+3
| | | | | | | | | | | | | | | Summary: These have been replaced with TableGen code (except for isConstantLoad, which is still used for R600). The queries were broken for cases where MemOperand was a PseudoSourceValue. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21684 llvm-svn: 274561
* AMDGPU/R600: Add PatFrags for selecting the correct vtx id for loadsTom Stellard2016-07-051-5/+0
| | | | | | | | | This moves of the r600 logic out of isGlobalLoad() and into the TableGen files. Differential Revision: http://reviews.llvm.org/D21710 llvm-svn: 274527
* AMDGPU/SI: Remove hack for selecting < 32-bit loads to MUBUF instructionsTom Stellard2016-07-041-4/+0
| | | | | | | | | | | | | | | Summary: The isGlobalLoad() query was returning true for constant address space loads with memory types less than 32-bits, which is wrong. This logic has been replaced with PatFrag in the TableGen files, to provide the same functionality. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21696 llvm-svn: 274521
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-1/+1
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU: Fix gcc warningsMatt Arsenault2016-06-221-90/+0
| | | | | | | Mostly removing dead code. Apparently gcc's warning for unused functions is better llvm-svn: 273363
* Delete more dead code.Rafael Espindola2016-06-211-32/+0
| | | | | | Found by gcc 6. llvm-svn: 273322
* Delete some dead code.Rafael Espindola2016-06-211-5/+0
| | | | | | Found by gcc 6. llvm-svn: 273303
* Reformat blank lines.NAKAMURA Takumi2016-06-201-1/+0
| | | | llvm-svn: 273131
* Untabify.NAKAMURA Takumi2016-06-201-5/+3
| | | | llvm-svn: 273129
* AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsicsNicolai Haehnle2016-06-151-13/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-121-3/+4
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permuteTom Stellard2016-06-101-2/+2
| | | | | | | | | | | | | | | Summary: This fixes a bug with ds_*permute instructions where if it was passed a constant address, then the offset operand would get assigned a register operand instead of an immediate. Reviewers: scchan, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19994 llvm-svn: 272349
* AMDGPU: Fix flat atomicsMatt Arsenault2016-06-091-0/+17
| | | | | | | | The flat atomics could already be selected, but only when using flat instructions for global memory. Add patterns for flat addresses. llvm-svn: 272345
* AMDGPU: Fix i64 global cmpxchgMatt Arsenault2016-06-091-6/+75
| | | | | | | | | | This was using extract_subreg sub0 to extract the low register of the result instead of sub0_sub1, producing an invalid copy. There doesn't seem to be a way to use the compound subreg indices in tablegen since those are generated, so manually select it. llvm-svn: 272344
* AMDGPU/R600: Implement memory loads from constant ASJan Vesely2016-05-131-3/+10
| | | | | | | | | | Reviewers: tstellard Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D19792 llvm-svn: 269479
* SDAG: Implement Select instead of SelectImpl in AMDGPUDAGToDAGISelJustin Bogner2016-05-121-49/+67
| | | | | | | | | | | - Where we were returning a node before, call ReplaceNode instead. - Where we would return null to fall back to another selector, rename the method to try* and return a bool for success. - Where we were calling SelectNodeTo, just return afterwards. Part of llvm.org/pr26808. llvm-svn: 269349
* Fixed unused but set variable warningSimon Pilgrim2016-05-091-3/+0
| | | | llvm-svn: 268931
* SDAG: Rename Select->SelectImpl and repurpose Select as returning voidJustin Bogner2016-05-051-2/+2
| | | | | | | | | | | | | | This is a step towards removing the rampant undefined behaviour in SelectionDAG, which is a part of llvm.org/PR26808. We rename SelectionDAGISel::Select to SelectImpl and update targets to match, and then change Select to return void and consolidate the sketchy behaviour we're trying to get away from there. Next, we'll update backends to implement `void Select(...)` instead of SelectImpl and eventually drop the base Select implementation. llvm-svn: 268693
* AMDGPU: Make i64 loads/stores promote to v2i32Matt Arsenault2016-05-021-55/+0
| | | | | | | | | | | | Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293
* AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructionsTom Stellard2016-04-291-2/+2
| | | | | | | | | | | | | | Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043
* AMDGPU: Implement addrspacecastMatt Arsenault2016-04-251-66/+0
| | | | llvm-svn: 267452
* AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.SizeMatt Arsenault2016-04-221-0/+16
| | | | llvm-svn: 267244
* [StructurizeCFG] Annotate branches that were treated as uniformNicolai Haehnle2016-04-141-1/+3
| | | | | | | | | | | | | | | | | | | Summary: This fully solves the problem where the StructurizeCFG pass does not consider the same branches as uniform as the SIAnnotateControlFlow pass. The patch in D19013 helps with this problem, but is not sufficient (and, interestingly, causes a "regression" with one of the existing test cases). No tests included here, because tests in D19013 already cover this. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19018 llvm-svn: 266346
* AMDGPU: Add atomic_inc + atomic_dec intrinsicsMatt Arsenault2016-04-121-1/+2
| | | | | | | These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074
* AMDGPU/SI: Implement atomic load/store for i32 and i64Jan Vesely2016-04-071-12/+33
| | | | | | | | | | Standard load/store instructions with GLC bit set. Reviewers: tstellardAMD, arsenm Differential Revision: http://reviews.llvm.org/D18760 llvm-svn: 265709
* Fix sequence point warning. NFC.Vasileios Kalintiris2016-03-241-1/+1
| | | | llvm-svn: 264255
* AMDGPU: Insert moves of frame index to value operandsMatt Arsenault2016-03-231-0/+56
| | | | | | | | | | | | | | | | | | | | | | | Strengthen tests of storing frame indices. Right now this just creates irrelevant scheduling changes. We don't want to have multiple frame index operands on an instruction. There seem to be various assumptions that at least the same frame index will not appear twice in the LocalStackSlotAllocation pass. There's no reason to have this happen, and it just makes it easy to introduce bugs where the immediate offset is appplied to the storing instruction when it should really be applied to the value being stored as a separate add. This might not be sufficient. It might still be problematic to have an add fi, fi situation, but that's even less unlikely to happen in real code. llvm-svn: 264200
* AMDGPU: Remove SignBitIsZero for mubuf scratch offsetsMatt Arsenault2016-03-211-1/+1
| | | | | | | These instructions do not have the same negative base address problem that DS instructions do on SI. llvm-svn: 263964
* AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.formatNicolai Haehnle2016-03-181-0/+79
| | | | | | | | | | | | | | | | | | | | | Summary: We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend cannot easily make use of an explicit soffset parameter either. Furthermore, it is likely that in the future, LLVM will be in a better position than the frontend to choose an SGPR offset if possible. Since there aren't any frontend uses of these intrinsics in upstream repositories yet, I would like to take this opportunity to change the intrinsic signatures to a single offset parameter, which is then selected to immediate offsets or voffsets using a ComplexPattern. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18218 llvm-svn: 263790
* AMDGPU: Simplify boolean conditional return statementsMatt Arsenault2016-03-021-10/+7
| | | | | | Patch by Richard Thomson llvm-svn: 262536
* AMDGPU: Check cheaper condition before SignBitIsZeroMatt Arsenault2016-02-241-7/+6
| | | | | | | Don't do an expensive computeKnownBits call when we can do the cheap check for legal offsets first. llvm-svn: 261720
* AMDGPU: Cleanup includes and random macrosMatt Arsenault2016-02-131-11/+4
| | | | llvm-svn: 260784
* AMDGPU/SI: Detect uniform branches and emit s_cbranch instructionsTom Stellard2016-02-121-0/+55
| | | | | | | | | | Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
OpenPOWER on IntegriCloud