summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Start selecting flat instruction offsetsMatt Arsenault2017-06-121-7/+30
| | | | llvm-svn: 305201
* AMDGPU: Start adding offset fields to flat instructionsMatt Arsenault2017-06-121-1/+4
| | | | llvm-svn: 305194
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex ↵Marek Olsak2017-05-241-12/+24
| | | | | | | | | | | patterns" This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa. It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of the patterns, so it was putting 32-bit literals into the 8-bit field. llvm-svn: 303754
* AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patternsMarek Olsak2017-05-231-24/+12
| | | | | | | | | | | | This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4. Reviewers: arsenm, nhaehnle, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28994 llvm-svn: 303658
* AMDGPU: Change mubuf soffset register when SP relativeMatt Arsenault2017-05-171-13/+51
| | | | | | | | | | Check the MachinePointerInfo for whether the access is supposed to be relative to the stack pointer. No tests because this is used in later commits implementing calls. llvm-svn: 303301
* AMDGPU: Make better use of op_sel with high componentsMatt Arsenault2017-05-171-8/+48
| | | | | | Handle more general swizzles. llvm-svn: 303296
* AMDGPU: Try to use op_sel when selecting packed instructionsMatt Arsenault2017-05-171-1/+29
| | | | | | | | | | | | Avoids instructions to pack a vector when the source is really a scalar being broadcast. Also be smarter and look for per-component fneg. Doesn't yet handle scalar from upper half of register or other swizzles. llvm-svn: 303291
* AMDGPU: Remove tfe bit from flat instruction definitionsMatt Arsenault2017-05-111-5/+3
| | | | | | | | | | We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814
* Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.Amara Emerson2017-05-011-2/+2
| | | | | | | | This removes BinaryWithFlagsSDNode, and flags are now all passed by value. Differential Revision: https://reviews.llvm.org/D32527 llvm-svn: 301803
* [AMDGPU] Garbage collect dead code. NFCI.Davide Italiano2017-04-261-10/+0
| | | | llvm-svn: 301375
* AMDGPU: Clean up VOP3NoMods patternMatt Arsenault2017-04-251-23/+12
| | | | | | | There is no need to copy the operands or inspect the sources. Also remove some unnecessary clamp/omod usage. llvm-svn: 301363
* AMDGPU: Select scratch mubuf offsets when pointer is a constantMatt Arsenault2017-04-241-7/+46
| | | | | | | | In call sequence setups, there may not be a frame index base and the pointer is a constant offset from the frame pointer / scratch wave offset register. llvm-svn: 301230
* AMDGPU: Fix invalid copies when copying i1 to phys regMatt Arsenault2017-04-121-1/+1
| | | | | | | Insert a VReg_1 virtual register so the i1 workaround pass can handle it. llvm-svn: 300113
* [AMDGPU][MC] Fix for Bug 28207 + LIT testsDmitry Preobrazhensky2017-03-271-0/+15
| | | | | | | | | | Enabled clamp and omod for v_cvt_* opcodes which have src0 of an integer type Reviewers: vpykhtin, arsenm Differential Revision: https://reviews.llvm.org/D31327 llvm-svn: 298852
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-5/+8
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-271-2/+67
| | | | llvm-svn: 296396
* AMDGPU: Generalize matching of v_med3_f32Matt Arsenault2017-01-311-0/+20
| | | | | | | | | | I think this is safe as long as no inputs are known to ever be nans. Also add an intrinsic for fmed3 to be able to handle all safe math cases. llvm-svn: 293598
* AMDGPU: Make i32 uaddo/usubo legalMatt Arsenault2017-01-301-0/+17
| | | | llvm-svn: 293514
* AMDGPU/SI: Move some ISel helpers into utils so they can be shared with GISelTom Stellard2017-01-271-13/+2
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D29068 llvm-svn: 293321
* AMDGPU: Remove modifiers from v_div_scale_*Matt Arsenault2017-01-191-8/+2
| | | | | | | | They seem to produce nonsense results when used. This should be applied to the release branch. llvm-svn: 292472
* AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodesJan Vesely2017-01-061-0/+4
| | | | | | | | This will make transition to SCRATCH_MEMORY easier Differential Revision: https://reviews.llvm.org/D24746 llvm-svn: 291279
* AMDGPU: Select branch on undef to uniform scc branchMatt Arsenault2016-12-151-0/+6
| | | | llvm-svn: 289877
* [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵Eugene Zelenko2016-12-091-13/+30
| | | | | | You Use warnings; other minor fixes (NFC). llvm-svn: 289282
* AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.Tom Stellard2016-12-071-0/+38
| | | | | | | | | | | | | | Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879
* AMDGPU/SI: Add back reverted SGPR spilling code, but disable itMarek Olsak2016-11-251-1/+1
| | | | | | suggested as a better solution by Matt llvm-svn: 287942
* Revert "AMDGPU: Make m0 unallocatable"Marek Olsak2016-11-251-1/+1
| | | | | | This reverts commit 124ad83dae04514f943902446520c859adee0e96. llvm-svn: 287932
* AMDGPU: Make m0 unallocatableMatt Arsenault2016-11-241-1/+1
| | | | | | | | | | | m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it. This introduces a few code quality regressions where copies from m0 are not coalesced into copies of a copy of m0. llvm-svn: 287841
* AMDGPU: Remove unnecessary and on conditional branchMatt Arsenault2016-11-071-16/+2
| | | | | | | The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. llvm-svn: 286134
* AMDGPU: Handle CopyToReg in getOperandRegClassMatt Arsenault2016-11-011-1/+14
| | | | llvm-svn: 285768
* AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodesNicolai Haehnle2016-10-141-10/+37
| | | | | | | | | | | | | | Summary: This will be used for 64-bit MULHU, which is in turn used for the 64-bit divide-by-constant optimization (see D24822). Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25289 llvm-svn: 284224
* [AMDGPU] Pass optimization level to SelectionDAGISelKonstantin Zhuravlyov2016-10-031-6/+6
| | | | llvm-svn: 283133
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-2/+2
| | | | llvm-svn: 283004
* AMDGPU: Fix broken FrameIndex handlingMatt Arsenault2016-09-171-59/+9
| | | | | | | | | | | | | | | | | We were trying to avoid using a FrameIndex operand in non-pointer operands in a convoluted way, and would break because of using TargetFrameIndex. The TargetFrameIndex should only be used in the case where it makes sense to fold it as part of the addressing mode, otherwise it requires materialization like a normal constant. This wasn't working reliably and failed in the added testcase, hitting the assert when processing the frame index. The TargetFrameIndex was coming from trying to produce an AssertZext limiting the maximum stack size. I'm not sure this was correct to begin with, because it is apparently possible to have a single workitem dispatch that requires all 4G of private memory. llvm-svn: 281824
* AMDGPU: Use i64 scalar compare instructionsMatt Arsenault2016-09-171-12/+27
| | | | | | VI added eq/ne for i64, so use them. llvm-svn: 281800
* AMDGPU: Run LoadStoreVectorizer pass by defaultMatt Arsenault2016-09-091-0/+3
| | | | llvm-svn: 281112
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-2/+2
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* AMDGPU: Remove analyzeImmediateMatt Arsenault2016-07-281-5/+12
| | | | | | | This no longer uses the more complicated classification of constants. llvm-svn: 276945
* AMDGPU: Unify MOVRELSOffset and MOVRELDOffsetNicolai Haehnle2016-07-121-30/+6
| | | | | | | | | | | | | | | | Summary: Previously, constant index insertelements would be turned into SI_INDIRECT_DST, which is bound to prevent some optimization opportunities. Worse, it mislead the heuristic that decides whether immediates should be lowered to S_MOV_B32 or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22217 llvm-svn: 275160
* AMDGPU: Improve offset folding for register indexingMatt Arsenault2016-07-091-0/+49
| | | | llvm-svn: 274954
* AMDGPU/SI: Remove address space query functions from AMDGPUDAGToDAGISelTom Stellard2016-07-051-56/+3
| | | | | | | | | | | | | | | Summary: These have been replaced with TableGen code (except for isConstantLoad, which is still used for R600). The queries were broken for cases where MemOperand was a PseudoSourceValue. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21684 llvm-svn: 274561
* AMDGPU/R600: Add PatFrags for selecting the correct vtx id for loadsTom Stellard2016-07-051-5/+0
| | | | | | | | | This moves of the r600 logic out of isGlobalLoad() and into the TableGen files. Differential Revision: http://reviews.llvm.org/D21710 llvm-svn: 274527
* AMDGPU/SI: Remove hack for selecting < 32-bit loads to MUBUF instructionsTom Stellard2016-07-041-4/+0
| | | | | | | | | | | | | | | Summary: The isGlobalLoad() query was returning true for constant address space loads with memory types less than 32-bits, which is wrong. This logic has been replaced with PatFrag in the TableGen files, to provide the same functionality. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21696 llvm-svn: 274521
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-1/+1
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU: Fix gcc warningsMatt Arsenault2016-06-221-90/+0
| | | | | | | Mostly removing dead code. Apparently gcc's warning for unused functions is better llvm-svn: 273363
* Delete more dead code.Rafael Espindola2016-06-211-32/+0
| | | | | | Found by gcc 6. llvm-svn: 273322
* Delete some dead code.Rafael Espindola2016-06-211-5/+0
| | | | | | Found by gcc 6. llvm-svn: 273303
* Reformat blank lines.NAKAMURA Takumi2016-06-201-1/+0
| | | | llvm-svn: 273131
* Untabify.NAKAMURA Takumi2016-06-201-5/+3
| | | | llvm-svn: 273129
* AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsicsNicolai Haehnle2016-06-151-13/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761
OpenPOWER on IntegriCloud