summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Split MUBUF offset into aligned componentsNicolai Haehnle2017-10-101-10/+16
| | | | | | | | | | | | | | | | | | | | Summary: Atomic buffer operations do not work (and trap on gfx9) when the components are unaligned, even if their sum is aligned. Previously, we generated an offset of 4156 without an SGPR by splitting it as 4095 + 61 (immediate + inline constant). The highest offset for which we can do this correctly is 4156 = 4092 + 64. Fixes dEQP-GLES31.functional.ssbo.atomic.* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37850 llvm-svn: 315302
* AMDGPU: Start selecting v_mad_mixlo_f16Matt Arsenault2017-09-201-0/+9
| | | | | | | | Also add some tests that should be able to use v_mad_mixhi_f16, but do not yet. This is trickier because we don't really model the partial update of the register done by 16-bit instructions. llvm-svn: 313806
* AMDGPU: Match load d16 hi instructionsMatt Arsenault2017-09-201-6/+6
| | | | | | | | | | | | Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716
* [AMDGPU] Remove unused function. NFCI.Davide Italiano2017-09-081-9/+0
| | | | llvm-svn: 312836
* AMDGPU: Start selecting v_mad_mix_f32Matt Arsenault2017-09-071-5/+96
| | | | llvm-svn: 312732
* AMDGPU: Fix warnings introduced by r310336Tom Stellard2017-08-081-4/+2
| | | | llvm-svn: 310337
* AMDGPU: Move R600 parts of AMDGPUISelDAGToDAG into their own classTom Stellard2017-08-081-112/+177
| | | | | | | | | | | | Summary: This refactoring is required in order to split the R600 and GCN tablegen files. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D36286 llvm-svn: 310336
* AMDGPU: Add analysis pass for function argument infoMatt Arsenault2017-08-031-4/+17
| | | | | | | This will allow only adding necessary inputs to callee functions that need special inputs forwarded from the kernel. llvm-svn: 309996
* AMDGPU: Start selecting global instructionsMatt Arsenault2017-07-291-3/+17
| | | | llvm-svn: 309470
* [AMDGPU][MC][GFX9] Added support of VOP3 'op_sel' modifierDmitry Preobrazhensky2017-07-211-0/+44
| | | | | | | | | | See bug 33591: https://bugs.llvm.org//show_bug.cgi?id=33591 Reviewers: vpykhtin, artem.tamazov, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D35424 llvm-svn: 308740
* AMDGPU: Rename _RTN atomic instructionsMatt Arsenault2017-07-201-4/+4
| | | | | | | | | | | Move the _RTN to the end of the name. It reads better if the other addressing mode components line up with the non-RTN version. It is also more convenient to define saddr variants of FLAT atomics to have the RTN last, and it is good to have a consistent naming scheme. llvm-svn: 308674
* AMDGPU: Start selecting flat instruction offsetsMatt Arsenault2017-06-121-7/+30
| | | | llvm-svn: 305201
* AMDGPU: Start adding offset fields to flat instructionsMatt Arsenault2017-06-121-1/+4
| | | | llvm-svn: 305194
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* Revert "AMDGPU: Fold CI-specific complex SMRD patterns into existing complex ↵Marek Olsak2017-05-241-12/+24
| | | | | | | | | | | patterns" This reverts commit e065977c4b5f68ab845400b256f6a3822b1325fa. It doesn't work. S_LOAD_DWORD_IMM_ci and friends aren't selected by any of the patterns, so it was putting 32-bit literals into the 8-bit field. llvm-svn: 303754
* AMDGPU: Fold CI-specific complex SMRD patterns into existing complex patternsMarek Olsak2017-05-231-24/+12
| | | | | | | | | | | | This is just a cleanup. Also, it adds checking that ByteCount is aligned to 4. Reviewers: arsenm, nhaehnle, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28994 llvm-svn: 303658
* AMDGPU: Change mubuf soffset register when SP relativeMatt Arsenault2017-05-171-13/+51
| | | | | | | | | | Check the MachinePointerInfo for whether the access is supposed to be relative to the stack pointer. No tests because this is used in later commits implementing calls. llvm-svn: 303301
* AMDGPU: Make better use of op_sel with high componentsMatt Arsenault2017-05-171-8/+48
| | | | | | Handle more general swizzles. llvm-svn: 303296
* AMDGPU: Try to use op_sel when selecting packed instructionsMatt Arsenault2017-05-171-1/+29
| | | | | | | | | | | | Avoids instructions to pack a vector when the source is really a scalar being broadcast. Also be smarter and look for per-component fneg. Doesn't yet handle scalar from upper half of register or other swizzles. llvm-svn: 303291
* AMDGPU: Remove tfe bit from flat instruction definitionsMatt Arsenault2017-05-111-5/+3
| | | | | | | | | | We don't use it and it was removed in gfx9, and the encoding bit repurposed. Additionally actually using it requires changing the output register class, which wasn't done anyway. llvm-svn: 302814
* Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.Amara Emerson2017-05-011-2/+2
| | | | | | | | This removes BinaryWithFlagsSDNode, and flags are now all passed by value. Differential Revision: https://reviews.llvm.org/D32527 llvm-svn: 301803
* [AMDGPU] Garbage collect dead code. NFCI.Davide Italiano2017-04-261-10/+0
| | | | llvm-svn: 301375
* AMDGPU: Clean up VOP3NoMods patternMatt Arsenault2017-04-251-23/+12
| | | | | | | There is no need to copy the operands or inspect the sources. Also remove some unnecessary clamp/omod usage. llvm-svn: 301363
* AMDGPU: Select scratch mubuf offsets when pointer is a constantMatt Arsenault2017-04-241-7/+46
| | | | | | | | In call sequence setups, there may not be a frame index base and the pointer is a constant offset from the frame pointer / scratch wave offset register. llvm-svn: 301230
* AMDGPU: Fix invalid copies when copying i1 to phys regMatt Arsenault2017-04-121-1/+1
| | | | | | | Insert a VReg_1 virtual register so the i1 workaround pass can handle it. llvm-svn: 300113
* [AMDGPU][MC] Fix for Bug 28207 + LIT testsDmitry Preobrazhensky2017-03-271-0/+15
| | | | | | | | | | Enabled clamp and omod for v_cvt_* opcodes which have src0 of an integer type Reviewers: vpykhtin, arsenm Differential Revision: https://reviews.llvm.org/D31327 llvm-svn: 298852
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-5/+8
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-271-2/+67
| | | | llvm-svn: 296396
* AMDGPU: Generalize matching of v_med3_f32Matt Arsenault2017-01-311-0/+20
| | | | | | | | | | I think this is safe as long as no inputs are known to ever be nans. Also add an intrinsic for fmed3 to be able to handle all safe math cases. llvm-svn: 293598
* AMDGPU: Make i32 uaddo/usubo legalMatt Arsenault2017-01-301-0/+17
| | | | llvm-svn: 293514
* AMDGPU/SI: Move some ISel helpers into utils so they can be shared with GISelTom Stellard2017-01-271-13/+2
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D29068 llvm-svn: 293321
* AMDGPU: Remove modifiers from v_div_scale_*Matt Arsenault2017-01-191-8/+2
| | | | | | | | They seem to produce nonsense results when used. This should be applied to the release branch. llvm-svn: 292472
* AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodesJan Vesely2017-01-061-0/+4
| | | | | | | | This will make transition to SCRATCH_MEMORY easier Differential Revision: https://reviews.llvm.org/D24746 llvm-svn: 291279
* AMDGPU: Select branch on undef to uniform scc branchMatt Arsenault2016-12-151-0/+6
| | | | llvm-svn: 289877
* [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵Eugene Zelenko2016-12-091-13/+30
| | | | | | You Use warnings; other minor fixes (NFC). llvm-svn: 289282
* AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.Tom Stellard2016-12-071-0/+38
| | | | | | | | | | | | | | Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879
* AMDGPU/SI: Add back reverted SGPR spilling code, but disable itMarek Olsak2016-11-251-1/+1
| | | | | | suggested as a better solution by Matt llvm-svn: 287942
* Revert "AMDGPU: Make m0 unallocatable"Marek Olsak2016-11-251-1/+1
| | | | | | This reverts commit 124ad83dae04514f943902446520c859adee0e96. llvm-svn: 287932
* AMDGPU: Make m0 unallocatableMatt Arsenault2016-11-241-1/+1
| | | | | | | | | | | m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it. This introduces a few code quality regressions where copies from m0 are not coalesced into copies of a copy of m0. llvm-svn: 287841
* AMDGPU: Remove unnecessary and on conditional branchMatt Arsenault2016-11-071-16/+2
| | | | | | | The comment explaining why this was necessary is incorrect in its description of v_cmp's behavior for inactive workitems. llvm-svn: 286134
* AMDGPU: Handle CopyToReg in getOperandRegClassMatt Arsenault2016-11-011-1/+14
| | | | llvm-svn: 285768
* AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodesNicolai Haehnle2016-10-141-10/+37
| | | | | | | | | | | | | | Summary: This will be used for 64-bit MULHU, which is in turn used for the 64-bit divide-by-constant optimization (see D24822). Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25289 llvm-svn: 284224
* [AMDGPU] Pass optimization level to SelectionDAGISelKonstantin Zhuravlyov2016-10-031-6/+6
| | | | llvm-svn: 283133
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-2/+2
| | | | llvm-svn: 283004
* AMDGPU: Fix broken FrameIndex handlingMatt Arsenault2016-09-171-59/+9
| | | | | | | | | | | | | | | | | We were trying to avoid using a FrameIndex operand in non-pointer operands in a convoluted way, and would break because of using TargetFrameIndex. The TargetFrameIndex should only be used in the case where it makes sense to fold it as part of the addressing mode, otherwise it requires materialization like a normal constant. This wasn't working reliably and failed in the added testcase, hitting the assert when processing the frame index. The TargetFrameIndex was coming from trying to produce an AssertZext limiting the maximum stack size. I'm not sure this was correct to begin with, because it is apparently possible to have a single workitem dispatch that requires all 4G of private memory. llvm-svn: 281824
* AMDGPU: Use i64 scalar compare instructionsMatt Arsenault2016-09-171-12/+27
| | | | | | VI added eq/ne for i64, so use them. llvm-svn: 281800
* AMDGPU: Run LoadStoreVectorizer pass by defaultMatt Arsenault2016-09-091-0/+3
| | | | llvm-svn: 281112
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-2/+2
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* AMDGPU: Remove analyzeImmediateMatt Arsenault2016-07-281-5/+12
| | | | | | | This no longer uses the more complicated classification of constants. llvm-svn: 276945
* AMDGPU: Unify MOVRELSOffset and MOVRELDOffsetNicolai Haehnle2016-07-121-30/+6
| | | | | | | | | | | | | | | | Summary: Previously, constant index insertelements would be turned into SI_INDIRECT_DST, which is bound to prevent some optimization opportunities. Worse, it mislead the heuristic that decides whether immediates should be lowered to S_MOV_B32 or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22217 llvm-svn: 275160
OpenPOWER on IntegriCloud