summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Legalize constant 32-bit loadsMatt Arsenault2019-09-101-0/+64
| | | | | | | Legalize by casting to a 64-bit constant address. This isn't how the DAG implements it, but it should. llvm-svn: 371535
* AMDGPU/GlobalISel: First pass at attempting to legalize load/storesMatt Arsenault2019-09-1010-1058/+55549
| | | | | | | | There's still a lot more to do, but this handles decomposing due to alignment. I've gotten it to the point where nothing crashes or infinite loops the legalizer. llvm-svn: 371533
* [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.Alexander Timofeev2019-09-102-1/+55
| | | | | | | | Reviewers: rampitec, vpykhtin Differential Revision: https://reviews.llvm.org/D67101 llvm-svn: 371508
* AMDGPU/GlobalISel: Fix insert point when lowering fminnum/fmaxnumMatt Arsenault2019-09-092-206/+206
| | | | llvm-svn: 371471
* AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR v2s16Matt Arsenault2019-09-091-0/+99
| | | | | | | | Handle it the same way as G_BUILD_VECTOR_TRUNC. Arguably only G_BUILD_VECTOR_TRUNC should be legal for this, but G_BUILD_VECTOR will probably be more convenient in most cases. llvm-svn: 371440
* AMDGPU: Make VReg_1 size be 1Matt Arsenault2019-09-093-18/+18
| | | | | | | This was getting chosen as the preferred 32-bit register class based on how TableGen selects subregister classes. llvm-svn: 371438
* AMDGPU/GlobalISel: Select llvm.amdgcn.classMatt Arsenault2019-09-092-0/+271
| | | | | | Also fixes missing SubtargetPredicate on f16 class instructions. llvm-svn: 371436
* AMDGPU/GlobalISel: Select fmed3Matt Arsenault2019-09-092-0/+266
| | | | llvm-svn: 371435
* AMDGPU: Use PatFrags to allow selecting custom nodes or intrinsicsMatt Arsenault2019-09-0915-0/+915
| | | | | | | | | | | | | | | | | This enables GlobalISel to handle various intrinsics. The custom node pattern will be ignored, and the intrinsic will work. This will also allow SelectionDAG to directly select the intrinsics, but as they are all custom lowered to the nodes, this ends up leaving dead code in the table. Eventually either GlobalISel should add the equivalent of custom nodes equivalent, or intrinsics should be directly used. These each have different tradeoffs. There are a few more to handle, but these are easy to handle ones. Some others fail for other reasons. llvm-svn: 371432
* AMDGPU: Move MnemonicAlias out of instruction def hierarchyMatt Arsenault2019-09-093-55/+54
| | | | | | | | | | | | | | | | | | Unfortunately MnemonicAlias defines a "Predicates" field just like an instruction or pattern, with a somewhat different interpretation. This ends up overriding the intended Predicates set by PredicateControl on the pseudoinstruction defintions with an empty list. This allowed incorrectly selecting instructions that should have been rejected due to the SubtargetPredicate from patterns on the instruction definition. This does remove the divergent predicate from the 64-bit shift patterns, which were already not used for the 32-bit shift, so I'm not sure what the point was. This also removes a second, redundant copy of the 64-bit divergent patterns. llvm-svn: 371427
* AMDGPU/GlobalISel: Implement LDS G_GLOBAL_VALUEMatt Arsenault2019-09-094-0/+54
| | | | | | Handle the simple case that lowers to a constant. llvm-svn: 371424
* AMDGPU/GlobalISel: Legalize G_BUILD_VECTOR_TRUNCMatt Arsenault2019-09-092-0/+102
| | | | | | | | | | Treat this as legal on gfx9 since it can use S_PACK_* instructions for this. This isn't used by anything yet. The same will probably apply to 16-bit G_BUILD_VECTOR without the trunc. llvm-svn: 371423
* AMDGPU/GlobalISel: Select atomic loadsMatt Arsenault2019-09-095-152/+985
| | | | | | | A new check for an explicitly atomic MMO is needed to avoid incorrectly matching pattern for non-atomic loads llvm-svn: 371418
* AMDGPU/GlobalISel: Fix RegBankSelect for unaligned, uniform constant loadsMatt Arsenault2019-09-091-2/+51
| | | | llvm-svn: 371416
* AMDGPU/GlobalISel: Fix regbankselect for uniform extloadsMatt Arsenault2019-09-091-18/+85
| | | | | | There are no scalar extloads. llvm-svn: 371414
* AMDGPU: Remove code address space predicatesMatt Arsenault2019-09-092-4/+342
| | | | | | | Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due to not be reported as legal. llvm-svn: 371413
* AMDGPU/GlobalISel: Select G_PTR_MASKMatt Arsenault2019-09-091-0/+475
| | | | llvm-svn: 371412
* AMDGPU/GlobalISel: Fix reg bank for uniform LDS loadsMatt Arsenault2019-09-091-2/+35
| | | | | | | The pointer is always a VGPR. Also fix hardcoding the pointer size to 64. llvm-svn: 371411
* AMDGPU/GlobalISel: Use known bits for selectionMatt Arsenault2019-09-092-0/+85
| | | | llvm-svn: 371409
* AMDGPU/GlobalISel: Legalize wavefrontsize intrinsicMatt Arsenault2019-09-091-0/+18
| | | | llvm-svn: 371407
* GlobalISel: Support physical register inputs in patternsMatt Arsenault2019-09-064-11/+49
| | | | llvm-svn: 371253
* [AMDGPU] Enable constant offset promotion to immediate operand for VMEM storesValery Pykhtin2019-09-061-0/+24
| | | | | | Differential revision: https://reviews.llvm.org/D66958 llvm-svn: 371214
* [AMDGPU] Mark s_barrier as having side effects but not accessing memory.Jay Foad2019-09-065-65/+112
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This fixes poor scheduling in a function containing a barrier and a few load instructions. Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial edge in the dependency graph from the barrier instruction to the exit node representing live-out latency, with a latency of about 500 cycles. Because of this it thinks the critical path through the graph also has a latency of about 500 cycles. And because of that it does not think that any of the load instructions are on the critical path, so it schedules them with no regard for their (80 cycle) latency, which gives poor results. Reviewers: arsenm, dstuttard, tpr, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67218 llvm-svn: 371192
* AMDGPU/GlobalISel: Fix load/store of types in other address spacesMatt Arsenault2019-09-063-90/+90
| | | | | | There should probably be a size only matcher. llvm-svn: 371155
* AMDGPU: Allow getMemOperandWithOffset to analyze stack accessesMatt Arsenault2019-09-055-41/+39
| | | | | | | Report soffset as a base register if the scratch resource can be ignored. llvm-svn: 371149
* AMDGPU: Fix emitting multiple stack loads for stack passed workitemsMatt Arsenault2019-09-051-11/+17
| | | | | | | | | | The same stack is loaded for each workitem ID, and each use. Nothing prevents you from creating multiple fixed stack objects with the same offsets, so this was creating a load for each unique frame index, despite them being the same offset. Re-use the same frame index so the loads are CSEable. llvm-svn: 371148
* AMDGPU: Add intrinsics for address space identificationMatt Arsenault2019-09-055-0/+326
| | | | | | | The library currently uses ptrtoint and directly checks the queue ptr for this, which counts as a pointer capture. llvm-svn: 371009
* AMDGPU/GlobalISel: Restore insert point when getting apertureMatt Arsenault2019-09-051-33/+33
| | | | | | Avoids SSA violations in a future patch. llvm-svn: 371008
* AMDGPU/GlobalISel: Fix placeholder value used for addrspacecastMatt Arsenault2019-09-051-34/+82
| | | | llvm-svn: 371007
* AMDGPU/GlobalISel: Fix assert on load from constant addressMatt Arsenault2019-09-051-0/+27
| | | | llvm-svn: 371006
* Use -mtriple to fix AMDGPU test sensitive to object file formatReid Kleckner2019-09-051-3/+3
| | | | | | | GOTPCREL32 doesn't exist on COFF, so it isn't used when this test runs on Windows. llvm-svn: 371000
* AMDGPU/GlobalISel: Select G_BITREVERSEMatt Arsenault2019-09-042-0/+84
| | | | llvm-svn: 370980
* GlobalISel: Add basic legalization for G_BITREVERSEMatt Arsenault2019-09-041-0/+157
| | | | llvm-svn: 370979
* AMDGPU: Handle frame index expansion with no free SGPRs pre gfx9Matt Arsenault2019-09-041-0/+93
| | | | | | | | | | | | | | Since an add instruction must produce an unused carry out, this requires additional SGPRs. This can be avoided by keeping the entire offset computation in SGPRs. If one SGPR is still available, this only costs one extra mov. If none are available, the entire computation can be done in place and reversed. This does assume the use is a VGPR operand. This was already assumed, and we currently only select frame indexes to VALU instructions. This should probably be fixed at some point to handle more possible MIR. llvm-svn: 370929
* AMDGPU/GlobalISel: Make 16-bit constants legalMatt Arsenault2019-09-0420-663/+486
| | | | | | This is mostly for the benefit of patterns which use 16-bit constants. llvm-svn: 370921
* Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in ↵Jay Foad2019-09-023-23/+25
| | | | | | | | | | | | | | | | | | | SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667
* [AMDGPU] Add testPiotr Sobczak2019-09-021-0/+41
| | | | | | | | | | | | | | | | | | Summary: Add test checking that the redundant immediate MOV instruction (by-product of handling phi nodes) is not found in the generated code. Reviewers: arsenm, anton-afanasyev, craig.topper, rtereshin, bogner Reviewed By: arsenm Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, wdng, jvesely, nhaehnle, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63860 llvm-svn: 370634
* AMDGPU/GlobalISel: Legalize sin/cosMatt Arsenault2019-08-292-0/+1082
| | | | llvm-svn: 370402
* Revert [MBP] Disable aggressive loop rotate in plain modeJordan Rupprecht2019-08-2911-111/+118
| | | | | | | | This reverts r369664 (git commit 51f48295cbe8fa3a44db263b528dd9f7bae7bf9a) It causes many benchmark regressions, internally and in llvm's benchmark suite. llvm-svn: 370398
* AMDGPU: Don't use frame virtual registersMatt Arsenault2019-08-293-22/+62
| | | | | | | | | | | | | | SGPR spills aren't really handled after SILowerSGPRSpills. In order to directly control what happens if the scavenger needs to spill, the scavenger needs to be used directly. There is an alternative to spilling in these contexts anyway since the frame register can be increment and restored. This does present another possible issue if spilling is needed for the unused carry out if an add is needed. I think this can be avoided by using a scalar add (although that clobbers SCC, which happens anyway). llvm-svn: 370281
* GlobalISel/TableGen: Handle setcc patternsMatt Arsenault2019-08-292-0/+1240
| | | | | | | | | | | This is a special case because one node maps to two different G_ instructions, and the operand order is changed. This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually selected for now since it has the SALU and VALU complication to deal with. llvm-svn: 370280
* [AMDGPU] Adjust number of SGPRs available in Calling ConventionRyan Taylor2019-08-283-265/+239
| | | | | | | | | This reduces the number of SGPRs due to some concerns about running out of SGPRs if you make all the SGPRs that aren't reserved available for the calling convention. Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1 llvm-svn: 370215
* AMDGPU/GlobalISel: Fix constraining scalar and/or/xorMatt Arsenault2019-08-283-168/+185
| | | | | | | If the result register already had a register class assigned, the sources may not have been properly constrained. llvm-svn: 370150
* AMDGPU/GlobalISel: Implement addrspacecast for 32-bit constant addrspaceMatt Arsenault2019-08-281-0/+209
| | | | llvm-svn: 370140
* DAG: computeNumSignBits for MULMatt Arsenault2019-08-271-14/+5
| | | | | | | | | | Copied directly from the IR version. Most of the testcases I've added for this are somewhat problematic because they really end up testing the yet to be implemented version for MUL_I24/MUL_U24. llvm-svn: 370099
* AMDGPU: Add baseline test for num sign bits of mulMatt Arsenault2019-08-271-0/+168
| | | | llvm-svn: 370098
* AMDGPU: Fix crash from inconsistent register types for v3i16/v3f16Matt Arsenault2019-08-271-0/+89
| | | | | | | This is something of a workaround since computeRegisterProperties seems to be doing the wrong thing. llvm-svn: 370086
* [DAGCombiner] cancel fnegs from multiplied operands of FMASanjay Patel2019-08-271-1/+1
| | | | | | | | | | | | | | | | | | (-X) * (-Y) + Z --> X * Y + Z This is a missing optimization that shows up as a potential regression in D66050, so we should solve it first. We appear to be partly missing this fold in IR as well. We do handle the simpler case already: (-X) * (-Y) --> X * Y And it might be beneficial to make the constraint less conservative (eg, if both operands are cheap, but not necessarily cheaper), but that causes infinite looping for the existing fmul transform. Differential Revision: https://reviews.llvm.org/D66755 llvm-svn: 370071
* [GlobalISel] Fix narrowScalar for shifts to match algorithm from SDAGPetar Avramovic2019-08-275-261/+261
| | | | | | | | | Fix typos. Use Hi and Lo prefixes for Or instead of LHS and RHS to match names of surrounding variables. Differential Revision: https://reviews.llvm.org/D66587 llvm-svn: 370062
* AMDGPU: Combine directly on mul24 intrinsicsMatt Arsenault2019-08-271-4/+101
| | | | | | | | | The problem these are supposed to work around can occur before the intrinsics are lowered into the nodes. Try to directly simplify them so they are matched before the bit assert operations can be optimized out. llvm-svn: 369994
OpenPOWER on IntegriCloud