summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Fix regbankselect for uniform extloadsMatt Arsenault2019-09-091-4/+4
| | | | | | There are no scalar extloads. llvm-svn: 371414
* AMDGPU: Remove code address space predicatesMatt Arsenault2019-09-093-25/+57
| | | | | | | Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due to not be reported as legal. llvm-svn: 371413
* AMDGPU/GlobalISel: Select G_PTR_MASKMatt Arsenault2019-09-093-0/+70
| | | | llvm-svn: 371412
* AMDGPU/GlobalISel: Fix reg bank for uniform LDS loadsMatt Arsenault2019-09-091-8/+14
| | | | | | | The pointer is always a VGPR. Also fix hardcoding the pointer size to 64. llvm-svn: 371411
* AMDGPU/GlobalISel: Use known bits for selectionMatt Arsenault2019-09-091-8/+3
| | | | llvm-svn: 371409
* AMDGPU/GlobalISel: Legalize wavefrontsize intrinsicMatt Arsenault2019-09-091-0/+6
| | | | llvm-svn: 371407
* AMDGPU/GlobalISel: Try generated matcher before add/sub codeMatt Arsenault2019-09-091-4/+4
| | | | | | This will allow optimization patterns which fold adds away to work. llvm-svn: 371406
* AMDGPU/GlobalISel: Remove dead patternsMatt Arsenault2019-09-091-5/+0
| | | | llvm-svn: 371404
* AMDGPU: Remove pointless wrapper nodes for init.exec intrinsicsMatt Arsenault2019-09-095-28/+6
| | | | llvm-svn: 371364
* Change TargetLibraryInfo analysis passes to always require FunctionTeresa Johnson2019-09-071-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284
* GlobalISel: Support physical register inputs in patternsMatt Arsenault2019-09-061-5/+7
| | | | llvm-svn: 371253
* AMDGPU: Fix typoMatt Arsenault2019-09-061-4/+4
| | | | llvm-svn: 371249
* [AMDGPU] Enable constant offset promotion to immediate operand for VMEM storesValery Pykhtin2019-09-061-4/+5
| | | | | | Differential revision: https://reviews.llvm.org/D66958 llvm-svn: 371214
* [AMDGPU] Mark s_barrier as having side effects but not accessing memory.Jay Foad2019-09-061-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes poor scheduling in a function containing a barrier and a few load instructions. Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial edge in the dependency graph from the barrier instruction to the exit node representing live-out latency, with a latency of about 500 cycles. Because of this it thinks the critical path through the graph also has a latency of about 500 cycles. And because of that it does not think that any of the load instructions are on the critical path, so it schedules them with no regard for their (80 cycle) latency, which gives poor results. Reviewers: arsenm, dstuttard, tpr, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67218 llvm-svn: 371192
* AMDGPU/GlobalISel: Avoid repeating 32-bit type listsMatt Arsenault2019-09-064-6/+14
| | | | llvm-svn: 371156
* AMDGPU/GlobalISel: Fix load/store of types in other address spacesMatt Arsenault2019-09-062-5/+26
| | | | | | There should probably be a size only matcher. llvm-svn: 371155
* AMDGPU: Allow getMemOperandWithOffset to analyze stack accessesMatt Arsenault2019-09-051-2/+19
| | | | | | | Report soffset as a base register if the scratch resource can be ignored. llvm-svn: 371149
* AMDGPU: Fix emitting multiple stack loads for stack passed workitemsMatt Arsenault2019-09-051-1/+15
| | | | | | | | | | The same stack is loaded for each workitem ID, and each use. Nothing prevents you from creating multiple fixed stack objects with the same offsets, so this was creating a load for each unique frame index, despite them being the same offset. Re-use the same frame index so the loads are CSEable. llvm-svn: 371148
* AMDGPU: Fix Register copypaste errorMatt Arsenault2019-09-051-2/+2
| | | | llvm-svn: 371141
* AMDGPU: Avoid constructing new std::vector in initCandidateMatt Arsenault2019-09-052-2/+5
| | | | | | | | | | | | Approximately 30% of the time was spent in the std::vector constructor. In one testcase this pushes the scheduler to being the second slowest pass. I'm not sure I understand why these vector are necessary. The default scheduler initCandidate seems to use some pre-existing vectors for the pressure. llvm-svn: 371136
* [LLVM][Alignment] Make functions using log of alignment explicitGuillaume Chatelet2019-09-054-16/+15
| | | | | | | | | | | | | | | | | | | | | Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045
* AMDGPU: Add intrinsics for address space identificationMatt Arsenault2019-09-055-1/+50
| | | | | | | The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009
* AMDGPU/GlobalISel: Restore insert point when getting apertureMatt Arsenault2019-09-051-0/+6
| | | | | | Avoids SSA violations in a future patch. llvm-svn: 371008
* AMDGPU/GlobalISel: Fix placeholder value used for addrspacecastMatt Arsenault2019-09-051-4/+6
| | | | llvm-svn: 371007
* AMDGPU/GlobalISel: Fix assert on load from constant addressMatt Arsenault2019-09-051-4/+4
| | | | llvm-svn: 371006
* AMDGPU/GlobalISel: Select G_BITREVERSEMatt Arsenault2019-09-042-1/+2
| | | | llvm-svn: 370980
* GlobalISel: Add basic legalization for G_BITREVERSEMatt Arsenault2019-09-041-1/+1
| | | | llvm-svn: 370979
* AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9Matt Arsenault2019-09-042-26/+58
| | | | | | | | | | | | | | Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929
* AMDGPU/GlobalISel: Make 16-bit constants legalMatt Arsenault2019-09-041-11/+5
| | | | | | This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921
* [GlobalISel][CallLowering] Add support for splitting types according to ↵Amara Emerson2019-09-031-1/+2
| | | | | | | | | | | | | | calling conventions. On AArch64, s128 types have to be split into s64 GPRs when passed as arguments. This change adds the generic support in call lowering for dealing with multiple registers, for incoming and outgoing args. Support for splitting for return types not yet implemented. Differential Revision: https://reviews.llvm.org/D66180 llvm-svn: 370822
* Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in ↵Jay Foad2019-09-022-5/+2
| | | | | | | | | | | | | | | | | | | SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667
* [AMDGPU][MC][GFX10] Corrected constant bus checks to exclude nullDmitry Preobrazhensky2019-09-021-3/+6
| | | | | | | | | | See AMD SWDEV-157286 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65229 llvm-svn: 370665
* [AMDGPU][MC][GFX10] Enabled null with 64-bit operandsDmitry Preobrazhensky2019-09-021-0/+2
| | | | | | | | | | See Bug 42745: https://bugs.llvm.org/show_bug.cgi?id=42745 Reviewers: atamazov, arsenm https://reviews.llvm.org/D65231 llvm-svn: 370660
* [AMDGPU][MC][GFX10] Corrected constant bus limit for 64-bit shift instructionsDmitry Preobrazhensky2019-09-021-4/+23
| | | | | | | | | | See bug 42744: https://bugs.llvm.org/show_bug.cgi?id=42744 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65228 llvm-svn: 370652
* AMDGPU: Remove unused custom node definitionMatt Arsenault2019-09-013-12/+0
| | | | llvm-svn: 370603
* [NFC] Fixed -Wdocumentation warningDavid Bolvansky2019-08-311-8/+8
| | | | | | | /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/lib/Target/AMDGPU/AMDGPUGenRegisterBankInfo.def:98:1: warning: not a Doxygen trailing comment [-Wdocumentation] 1 warning generated. llvm-svn: 370596
* Fix the build for MSVC builds using M_PIReid Kleckner2019-08-291-0/+7
| | | | llvm-svn: 370405
* AMDGPU/GlobalISel: Legalize sin/cosMatt Arsenault2019-08-292-0/+43
| | | | llvm-svn: 370402
* AMDGPU: Don't use frame virtual registersMatt Arsenault2019-08-293-41/+66
| | | | | | | | | | | | | | SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). llvm-svn: 370281
* GlobalISel/TableGen: Handle setcc patternsMatt Arsenault2019-08-292-5/+4
| | | | | | | | | | | This is a special case because one node maps to two different G_ instructions, and the operand order is changed. This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually selected for now since it has the SALU and VALU complication to deal with. llvm-svn: 370280
* [AMDGPU] Fix bug when calculating user_spgr_count for Code Object V3 assemblerScott Linder2019-08-281-7/+14
| | | | | | | | Stop counting explicitly disabled user_spgr's in the user_sgpr_count field of the kernel descriptor. Differential Revision: https://reviews.llvm.org/D66900 llvm-svn: 370250
* [AMDGPU] Adjust number of SGPRs available in Calling ConventionRyan Taylor2019-08-281-18/+2
| | | | | | | | | This reduces the number of SGPRs due to some concerns about running out of SGPRs if you make all the SGPRs that aren't reserved available for the calling convention. Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1 llvm-svn: 370215
* AMDGPU/GlobalISel: Fix constraining scalar and/or/xorMatt Arsenault2019-08-281-8/+1
| | | | | | | If the result register already had a register class assigned, the sources may not have been properly constrained. llvm-svn: 370150
* AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspaceMatt Arsenault2019-08-281-8/+31
| | | | llvm-svn: 370140
* AMDGPU: Add amdgpu-32bit-address-high-bits to MIR serializationMatt Arsenault2019-08-272-1/+6
| | | | llvm-svn: 370089
* AMDGPU: Fix crash from inconsistent register types for v3i16/v3f16Matt Arsenault2019-08-271-3/+3
| | | | | | | This is something of a workaround since computeRegisterProperties seems to be doing the wrong thing. llvm-svn: 370086
* AMDGPU: Combine directly on mul24 intrinsicsMatt Arsenault2019-08-272-3/+28
| | | | | | | | | The problem these are supposed to work around can occur before the intrinsics are lowered into the nodes. Try to directly simplify them so they are matched before the bit assert operations can be optimized out. llvm-svn: 369994
* AMDGPU: Run AMDGPUCodeGenPrepare after scalar optsMatt Arsenault2019-08-271-6/+5
| | | | | | | | | | | | | | | | | | | | The mul24 matching could interfere with SLSR and the other addressing mode related passes. This probably is not the optimal placement, but is an intermediate step. This should probably be moved after all the generic IR passes, particularly LSR. Moving this after LSR seems to help in some cases, and hurts others. As-is in this patch, in idiv-licm, it saves 1-2 instructions inside some of the loop bodies, but increases the number in others. Moving this later helps these loops. In the new lsr tests in mul24-pass-ordering, the intrinsic prevents introducing more instructions in the loop preheader, so moving this later ends up hurting them. This shouldn't be any worse than before the intrinsics were introduced in r366094, and LSR should probably be smarter. I think it's because it doesn't know the and inside the loop will be folded away. llvm-svn: 369991
* [AMDGPU] Downgrade from StringLiteral to const char* in an attempt to make ↵Benjamin Kramer2019-08-251-3/+3
| | | | | | GCC 5 happy llvm-svn: 369867
* AMDGPU: Preserve value name when inserting mul24 intrinsicMatt Arsenault2019-08-241-1/+3
| | | | llvm-svn: 369857
OpenPOWER on IntegriCloud