summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Fixed SIInstrInfo::getOpSize to handle subregsStanislav Mekhanoshin2018-10-011-0/+5
| | | | | | | | | | Currently it returns incorrect operand size for a target independet node such as COPY if operand is a register with subreg. Instead of correct subreg size it returns a size of the whole superreg. Differential Revision: https://reviews.llvm.org/D52736 llvm-svn: 343508
* [AMDGPU] Divergence driven instruction selection. Shift operations.Alexander Timofeev2018-10-013-23/+45
| | | | | | | | | | Summary: This change enables VOP3 shifts to be explicitly selected dependent on the divergence. Differential Revision: https://reviews.llvm.org/D52559 Reviewers: rampitec llvm-svn: 343455
* [cxx2a] Fix warning triggered by r343285Vitaly Buka2018-09-291-1/+0
| | | | llvm-svn: 343369
* AMDGPU: Split HasExt into HasExtDPP/SDWA/SDWA9Konstantin Zhuravlyov2018-09-274-21/+43
| | | | llvm-svn: 343264
* AMDGPU: Split VOP2Inst into VOP2Inst_e32/e64/sdwaKonstantin Zhuravlyov2018-09-271-10/+32
| | | | llvm-svn: 343259
* AMDGPU/NFC: Simplify VOP_MAC_F16/F32Konstantin Zhuravlyov2018-09-271-11/+2
| | | | llvm-svn: 343254
* [AMDGPU] Fold copy (copy vgpr)Stanislav Mekhanoshin2018-09-271-0/+14
| | | | | | | | This allows to reduce a number of used VGPRs in some cases. Differential Revision: https://reviews.llvm.org/D52577 llvm-svn: 343249
* llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)Fangrui Song2018-09-272-12/+10
| | | | | | | | | | | | Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163
* AMDGPU/SI: Change predicate to isCIOnly for 32-bit imm s_buffer_load* patternsTom Stellard2018-09-261-1/+1
| | | | | | | | | | | | | | | | | | Summary: This is essentially NFC, because the complex pattern used for these patterns will fail on non-CI, but this makes the pattern consistent with other CI smrd patterns. It is also a performance improvement, because the pattern will now fail earlier on non-CI. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52469 llvm-svn: 343125
* [AMDGPU] Fix ds combine with subregsStanislav Mekhanoshin2018-09-251-14/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D52522 llvm-svn: 343047
* AMDGPU: Add Selection patterns to support add of one bit.Changpeng Fang2018-09-251-0/+12
| | | | | | | | | | | | | | Summary: We generate s_xor to lower add of i1s in general cases, and s_not to lower add with a one-bit imm of -1 (true). Reviewers: rampitec Differential Revision: https://reviews.llvm.org/D52518 llvm-svn: 343030
* [AMDGPU] restore r342722 which was reverted with r342743Sameer Sahasrabuddhe2018-09-251-0/+2
| | | | | | | | | | | | | | | | | | | [AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. llvm-svn: 342956
* AMDGPU: Fix private handling for allowsMisalignedMemoryAccessesMatt Arsenault2018-09-241-1/+5
| | | | | | | | | | | | | If the alignment is at least 4, this should report true. Something still seems off with how < 4-byte types are handled here though. Fixing this seems to change how some combines get to where they get, but somehow isn't changing the net result. llvm-svn: 342879
* revert changes from r342722Sameer Sahasrabuddhe2018-09-211-2/+0
| | | | | | | | | "[AMDGPU] lower-switch in preISel as a workaround for legacy DA" This broke regression tests. The first breakage was noticed here: http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549 llvm-svn: 342743
* [AMDGPU] lower-switch in preISel as a workaround for legacy DASameer Sahasrabuddhe2018-09-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll Differential Revision: https://reviews.llvm.org/D52221 llvm-svn: 342722
* [AMDGPU] Divergence driven instruction selection. Part 1.Alexander Timofeev2018-09-216-47/+166
| | | | | | | | | | | | | Summary: This change is the first part of the AMDGPU target description change. The aim of it is the effective splitting the vector and scalar flows at the selection stage. Selection uses predicate functions based on the framework implemented earlier - https://reviews.llvm.org/D35267 Differential revision: https://reviews.llvm.org/D52019 Reviewers: rampitec llvm-svn: 342719
* [AMDGPU] Add instruction selection for i1 to f16 conversionCarl Ritson2018-09-191-0/+10
| | | | | | | | | | | | | | | | | | Summary: This is required for GPUs with 16 bit instructions where f16 is a legal register type and hence int_to_fp i1 to f16 is not lowered by legalizing. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52018 Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851 llvm-svn: 342558
* ScheduleDAG: Cleanup dumping code; NFCMatthias Braun2018-09-194-14/+12
| | | | | | | | | | | | - Instead of having both `SUnit::dump(ScheduleDAG*)` and `ScheduleDAG::dumpNode(ScheduleDAG*)`, just keep the latter around. - Add `ScheduleDAG::dump()` and avoid code duplication in several places. Implement it for different ScheduleDAG variants. - Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()` functions. They were only ever used for debug dumping and putting the function into ScheduleDAG is consistent with the `dumpNode()` change. llvm-svn: 342520
* [AMDGPU] Match udot8 patternFarhana Aleen2018-09-181-22/+47
| | | | | | | | | | | | | | | | | | | | | Summary: D.u32 = S0.u4[0] * S1.u4[0] + S0.u4[1] * S1.u4[1] + S0.u4[2] * S1.u4[2] + S0.u4[3] * S1.u4[3] + S0.u4[4] * S1.u4[4] + S0.u4[5] * S1.u4[5] + S0.u4[6] * S1.u4[6] + S0.u4[7] * S1.u4[7] + S2.u32 Author: FarhanaAleen Reviewed By: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D51947 llvm-svn: 342497
* AMDGPU: Don't form fmed3 if it will require materializationMatt Arsenault2018-09-181-2/+9
| | | | | | | If there is a single use constant, it can be folded into the min/max, but not into med3. llvm-svn: 342443
* AMDGPU: Expand vector canonicalizesMatt Arsenault2018-09-181-0/+1
| | | | llvm-svn: 342439
* [AMDGPU] Initialize instruction itinerary from GCNSubtargetStanislav Mekhanoshin2018-09-172-0/+6
| | | | | | | | I need to use it in the GCN codegen. Differential Revision: https://reviews.llvm.org/D52123 llvm-svn: 342400
* AMDGPU: Clear the bits before they are being set in program resource registersKonstantin Zhuravlyov2018-09-141-0/+1
| | | | | | Change by Tony Tye llvm-svn: 342270
* [AMDGPU] Ensure trig range reduction only used for subtargets that require itDavid Stuttard2018-09-144-9/+28
| | | | | | | | | | | | | | | | | | Summary: GFX9 and above support sin/cos instructions with a greater range and thus don't require a fract instruction prior to invocation. Added a subtarget feature to reflect this and added code to take advantage of expanded range on GFX9+ Also updated the tests to check correct behaviour Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51933 Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0 llvm-svn: 342222
* [AMDGPU] Removed unused methodTim Renouf2018-09-131-22/+0
| | | | | | | | | | | | | Summary: I accidentally left this behind in D50306, and it causes a build warning when I build with gcc7. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52022 Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c llvm-svn: 342189
* AMDGPU: Fix not preserving alignent in call setupsMatt Arsenault2018-09-131-1/+7
| | | | | | | | | | | | If an argument was passed on the stack, this was using the default alignment. I'm not sure there's an observable change from this. This was observable due to bugs in expansion of unaligned loads and stores, but since that is fixed I don't think this matters much. llvm-svn: 342133
* [AMDGPU] Load divergence predicate refactoringAlexander Timofeev2018-09-132-8/+26
| | | | | | | | Differential revision: https://reviews.llvm.org/D51931 Reviewers: rampitec llvm-svn: 342120
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-131-0/+1
| | | | | | | | | | Load offset inlining pattern changed. Differential revision: https://reviews.llvm.org/D51975 Reviewers: rampitec llvm-svn: 342115
* AMDGPU: Print all kernel descriptor directives (including the ones with ↵Konstantin Zhuravlyov2018-09-121-101/+88
| | | | | | | | | | default values) Change by Tony Tye Differential Revision: https://reviews.llvm.org/D51954 llvm-svn: 342077
* AMDGPU: Re-apply r341982 after fixing the layering issueKonstantin Zhuravlyov2018-09-1211-391/+364
| | | | | | | | | | | | Move isa version determination into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). llvm-svn: 342069
* Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into ↵Ilya Biryukov2018-09-1211-274/+391
| | | | | | | | | | | TargetParser." This reverts commit r341982. The change introduced a layering violation. Reverting to unbreak our integrate. llvm-svn: 342023
* AMDGPU: Move isa version and EF_AMDGPU_MACH_* determinationKonstantin Zhuravlyov2018-09-1111-391/+274
| | | | | | | | | | | | | | into TargetParser. Also switch away from target features to CPU string when determining isa version. This fixes an issue when we output wrong isa version in the object code when features of a particular CPU are altered (i.e. gfx902 w/o xnack used to result in gfx900). Differential Revision: https://reviews.llvm.org/D51890 llvm-svn: 341982
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-114-21/+62
| | | | | | | | | Immediate selection predicate changed Differential revision: https://reviews.llvm.org/D51734 Reviewers: rampitec llvm-svn: 341928
* AMDGPU: Remove leftovers from configurable address spacesMatt Arsenault2018-09-112-34/+12
| | | | llvm-svn: 341895
* [AMDGPU] Preliminary patch for divergence driven instruction selection. ↵Alexander Timofeev2018-09-101-5/+33
| | | | | | | | | | Inline immediate move to V_MADAK_F32. Differential revision: https://reviews.llvm.org/D51586 Reviewer: rampitec llvm-svn: 341843
* AMDGPU: Remove function pointer type hackMatt Arsenault2018-09-101-7/+4
| | | | | | | Now the pointer size should always be correct and we don't need to improperly inspect the pointee type. llvm-svn: 341806
* AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bitMatt Arsenault2018-09-101-2/+1
| | | | | | | This will require something to cast. Before this would eliminate the cast, which would result in copies of $noreg. llvm-svn: 341803
* DAG: Handle odd vector sizes in calling conv splittingMatt Arsenault2018-09-101-8/+5
| | | | | | | | | | | | | | This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces. Fixes not splitting 3i16/v3f16 into two registers for AMDGPU. This will also allow fixing the ABI for 16-bit vectors in a future commit so that it's the same for all subtargets. llvm-svn: 341801
* [AMDGPU] Prevent sequences of non-instructions disrupting ↵Carl Ritson2018-09-101-2/+9
| | | | | | | | | | | | | | | | | | GCNHazardRecognizer wait state counting Summary: This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted. Reviewers: nhaehnle, arsenm Reviewed By: arsenm Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D51726 Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b llvm-svn: 341798
* AMDGPU: Use GOT PSV since it has an address space nowMatt Arsenault2018-09-101-2/+2
| | | | llvm-svn: 341768
* AMDGPU: Don't abort on unknown addrspace argumentMatt Arsenault2018-09-101-8/+10
| | | | llvm-svn: 341767
* [AMDGPU] Preliminary patch for divergence driven instruction selection. Fold ↵Alexander Timofeev2018-09-071-3/+11
| | | | | | | | | immediate SMRD offset. Differential revision: https://reviews.llvm.org/D51610 Reviewer: rampitec llvm-svn: 341636
* Revert r341413Scott Linder2018-09-063-232/+67
| | | | | | Causes a regression in expensive checks. llvm-svn: 341589
* AMDGPU: Remove old hack for function addressesMatt Arsenault2018-09-061-13/+0
| | | | llvm-svn: 341567
* [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructionsScott Linder2018-09-043-67/+232
| | | | | | | | | Emit a waterfall loop in the general case for a potentially-divergent Rsrc operand. When practical, avoid this by using Addr64 instructions. Differential Revision: https://reviews.llvm.org/D50982 llvm-svn: 341413
* AMDGPU: Fix DAG divergence not reporting flat loadsMatt Arsenault2018-09-041-4/+4
| | | | | | Match behavior in DAG of r340343 llvm-svn: 341393
* Remove unnecessary semicolon to silence -Wpedantic warning. NFCI.Simon Pilgrim2018-09-031-1/+1
| | | | llvm-svn: 341303
* AMDGPU/GlobalISel: Define instruction mapping for G_SELECTTom Stellard2018-09-011-0/+54
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49737 llvm-svn: 341271
* [AMDGPU] Split v32i32 loadsStanislav Mekhanoshin2018-08-311-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D51555 llvm-svn: 341266
* AMDGPU: Restrict extract_vector_elt combine to loadsMatt Arsenault2018-08-311-1/+2
| | | | | | | | | | | The intention is to enable the extract_vector_elt load combine, and doing this for other operations interferes with more useful optimizations on vectors. Handle any type of load since in principle we should do the same combine for the various load intrinsics. llvm-svn: 341219
OpenPOWER on IntegriCloud