summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Start redefining atomic PatFragsMatt Arsenault2019-08-016-189/+210
| | | | | | | | Start migrating to a form that will be compatible with the global isel emitter. Also should fix some overly lax checks on the memory type, which allowed mis-selecting some illegal atomics. llvm-svn: 367506
* AMDGPU: Correct FP atomic patternsMatt Arsenault2019-08-013-9/+10
| | | | | | | These need to use an fadd, not an add. Also make the noret part clear in the name. llvm-svn: 367505
* AMDGPU/GlobalISel: Select simple local storesMatt Arsenault2019-08-016-19/+52
| | | | llvm-svn: 367504
* GlobalISel: moreElementsVector for G_LOAD/G_STOREMatt Arsenault2019-08-011-0/+1
| | | | | | | AMDGPU change and test is a placeholder until a future patch with complete handling. llvm-svn: 367503
* Reapply "AMDGPU: Split block for si_end_cf"Matt Arsenault2019-08-015-25/+139
| | | | | | This reverts commit r359363, reapplying r357634 llvm-svn: 367500
* AMDGPU/GlobalISel: Select local loadsMatt Arsenault2019-08-015-9/+108
| | | | llvm-svn: 367498
* [AMDGPU] Fix high occupancy calculation and print itStanislav Mekhanoshin2019-07-315-10/+43
| | | | | | | | | | | We had couple places which still return 10 as a maximum occupancy. Fixed. Also print comment about occupancy as compiler see it. Differential Revision: https://reviews.llvm.org/D65423 llvm-svn: 367381
* TableGen: Add MinAlignment predicateMatt Arsenault2019-07-312-42/+53
| | | | | | | | | | AMDGPU uses some custom code predicates for testing alignments. I'm still having trouble comprehending the behavior of predicate bits in the PatFrag hierarchy. Any attempt to abstract these properties unexpectdly fails to apply them. llvm-svn: 367373
* [AMDGPU] Print register pressure for agpr and vgpr separatelyStanislav Mekhanoshin2019-07-301-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D65476 llvm-svn: 367355
* [AMDGPU] Reserve all AGPRs on targets which do not have themStanislav Mekhanoshin2019-07-302-0/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D65471 llvm-svn: 367347
* [AMDGPU/GlobalISel] Add llvm.amdgcn.fdiv.fast legalization.Austin Kerbow2019-07-303-1/+41
| | | | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64966 llvm-svn: 367344
* AMDGPU: Avoid emitting "true" predicatesMatt Arsenault2019-07-301-1/+1
| | | | | | | | | | Empty condition strings are considerde always true. This removes a lot of clutter from the generated matcher tables. This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to 6.1M. llvm-svn: 367326
* AMDGPU/LoadStoreOptimizer: combine MMOs when merging instructionsTom Stellard2019-07-291-3/+38
| | | | | | | | | | | | | | | | | | | | | | | Summary: The LoadStoreOptimizer was creating instructions with 2 MachineMemOperands, which meant they were assumed to alias with all other instructions, because MachineInstr:mayAlias() returns true when an instruction has multiple MachineMemOperands. This was preventing these instructions from being merged again, and was giving the scheduler less freedom to reorder them. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65036 llvm-svn: 367237
* [AMDGPU] Fix typo in error messageJay Foad2019-07-291-1/+1
| | | | llvm-svn: 367235
* [DivergenceAnalysis] Add methods for querying divergence at useJay Foad2019-07-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The existing isDivergent(Value) methods query whether a value is divergent at its definition. However even if a value is uniform at its definition, a use of it in another basic block can be divergent because of divergent control flow between the def and the use. This patch adds new isDivergent(Use) methods to DivergenceAnalysis, LegacyDivergenceAnalysis and GPUDivergenceAnalysis. This might allow D63953 or other similar workarounds to be removed. Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue Reviewed By: nhaehnle Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65141 llvm-svn: 367218
* [AMDGPU] Enable v4f16 and above for v_pk_fma instructionsDavid Stuttard2019-07-292-0/+28
| | | | | | | | | | | | | | | | | | | | Summary: If isel is presented with <2 x half> vectors then it will correctly select v_pk_fma style instructions. If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for other instruction types (such as fadd, fmul etc.) Added extra support to enable this. Updated one of the tests to include a test for this (as well as extending the test to GFX9) Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65325 Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee llvm-svn: 367206
* [AMDGPU] Fix typo.Michael Liao2019-07-261-2/+2
| | | | llvm-svn: 367131
* [AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAGCarl Ritson2019-07-262-10/+6
| | | | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65328 llvm-svn: 367105
* [AMDGPU] Add llvm.amdgcn.softwqm intrinsicCarl Ritson2019-07-265-1/+38
| | | | | | | | | | | | | | | | | Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 llvm-svn: 367097
* AMDGPU/GlobalISel: Handle most function return typesMatt Arsenault2019-07-262-32/+141
| | | | | | | | | handleAssignments gives up pretty easily on structs, and i8 values for some reason. The other case that doesn't work is when an implicit sret needs to be inserted if the return size exceeds the number of return registers. llvm-svn: 367082
* [AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs.Michael Liao2019-07-251-0/+3
| | | | | | | | | | | | | | | | | | | | Summary: - As LCSSA is turned on just before isel, it may create PHI of the flow, which is consumed by pseudo structurized CFG instructions. When that PHIs are eliminated in O0, COPY may be placed wrongly as the these pseudo structurized CFG instructions are considering prologue of MBB. - Run extra `unreachable-mbb-elimination` at the end of isel to clean up PHIs. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64353 llvm-svn: 367023
* AMDGPU: Don't assert on v4f16 arguments to shader calling conventionsMatt Arsenault2019-07-251-1/+2
| | | | llvm-svn: 367018
* [AMDGPU] Increase kernel paddingStanislav Mekhanoshin2019-07-241-2/+2
| | | | | | | | | | To support prefetch mode 3 we need to pad current cacheline and fill 3 cachelines after. Current padding is only sufficient for mode 2. Differential Revision: https://reviews.llvm.org/D65236 llvm-svn: 366938
* [AMDGPU][MC][GFX10] Enabled GFX10 assembly with arbitrary wavesize assumed ↵Dmitry Preobrazhensky2019-07-241-2/+2
| | | | | | | | | | by the code Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D65216 llvm-svn: 366921
* [AMDGPU] Add all vgpr classes to asm parserStanislav Mekhanoshin2019-07-241-1/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D65158 llvm-svn: 366917
* AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting extsMatt Arsenault2019-07-241-6/+8
| | | | | | | The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915
* Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI.Simon Pilgrim2019-07-231-2/+2
| | | | llvm-svn: 366808
* AMDGPU: Don't use SDNodeXForm for DS offset outputMatt Arsenault2019-07-221-12/+12
| | | | | | | | | | | The xform has no real valuewhen it's using out of a complex pattern output. The complex pattern was already creating TargetConstants with i16, so this was just unnecessary machinery. This allows global isel to import the simple cases once the complex pattern is implemented. llvm-svn: 366743
* AMDGPU/GlobalISel: Remove unnecessary codeMatt Arsenault2019-07-221-4/+0
| | | | | | | The minnum/maxnum case are dead, and the cvt is handled by the default. llvm-svn: 366685
* [AMDGPU] Save some work when an atomic op has no usesJay Foad2019-07-221-67/+70
| | | | | | | | | | | | | | | | | | Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an atomic op (i.e. the value that was in memory before the atomic op was performed) is not used. NFC. Reviewers: arsenm, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64981 llvm-svn: 366667
* AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spacesMatt Arsenault2019-07-191-1/+3
| | | | llvm-svn: 366621
* [AMDGPU] Autogenerate register sequences in tuplesStanislav Mekhanoshin2019-07-191-272/+47
| | | | | | Differential Revision: https://reviews.llvm.org/D65007 llvm-svn: 366619
* [AMDGPU] Fixed occupancy calculation for gfx10Stanislav Mekhanoshin2019-07-194-28/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616
* AMDGPU: Avoid custom predicates for stores with glueMatt Arsenault2019-07-191-18/+24
| | | | llvm-svn: 366613
* AMDGPU: Redefine setcc condition PatLeafsMatt Arsenault2019-07-193-67/+36
| | | | | | Avoid using custom code predicates. llvm-svn: 366609
* AMDGPU: Don't rely on m0 being -1 for GWS offsetsMatt Arsenault2019-07-191-4/+6
| | | | | | | This only works if the high bits of m0 are also 0, so m0 would have to be set to 0xffff. llvm-svn: 366608
* AMDGPU: Force s_waitcnt after GWS instructionsMatt Arsenault2019-07-194-5/+26
| | | | | | | This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607
* [AMDGPU] Allow register tuples to set asm namesStanislav Mekhanoshin2019-07-194-139/+99
| | | | | | | | | | | | This change reverts most of the previous register name generation. The real problem is that RegisterTuple does not generate asm names. Added optional operand to RegisterTuple. This way we can simplify register name access and dramatically reduce the size of static tables for the backend. Differential Revision: https://reviews.llvm.org/D64967 llvm-svn: 366598
* AMDGPU/GlobalISel: Fix MMO flags for kernel argument loadsMatt Arsenault2019-07-191-1/+1
| | | | | | The DAG lowering sets dereferencable and invariant, not nontemporal. llvm-svn: 366597
* AMDGPU/GlobalISel: Selection for fminnum/fmaxnumMatt Arsenault2019-07-191-2/+4
| | | | | | | v2f16 case doesn't work yet because the VOP3P complex patterns haven't been ported yet. llvm-svn: 366585
* AMDGPU/GlobalISel: Support arguments with multiple registersMatt Arsenault2019-07-192-30/+47
| | | | | | Handles structs used directly in argument lists. llvm-svn: 366584
* AMDGPU/GlobalISel: Rewrite lowerFormalArgumentsMatt Arsenault2019-07-194-200/+374
| | | | | | | | | | | | | | | | | This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582
* AMDGPU: Decompose all values to 32-bit pieces for calling conventionsMatt Arsenault2019-07-193-92/+18
| | | | | | | | | | This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578
* [AMDGPU][MC] Corrected parsing of branch offsetsDmitry Preobrazhensky2019-07-191-20/+43
| | | | | | | | | | See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64629 llvm-svn: 366571
* [AMDGPU] Simplify the exclusive scan used for optimized atomicsJay Foad2019-07-191-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3 and then doing a 3-way ADD, because: 1. It simplifies the compiler a little. 2. It minimizes vgpr pressure because each instruction is now of the form vn = vn + vn << c. 3. It is more friendly to the DPP combiner, which currently can't combine into an ADD3 instruction. Because of #2 and #3 the end result is improved from this: v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf To this: v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf I.e. two fewer computational instructions, one extra nop where we could schedule something else. Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64411 llvm-svn: 366543
* [AMDGPU] Drop Reg32 and use regular AsmNameStanislav Mekhanoshin2019-07-183-25/+21
| | | | | | | | This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb. Differential Revision: https://reviews.llvm.org/D64952 llvm-svn: 366505
* [AMDGPU] Simplify AMDGPUInstPrinter::printRegOperand()Stanislav Mekhanoshin2019-07-172-157/+37
| | | | | | Differential Revision: https://reviews.llvm.org/D64892 llvm-svn: 366385
* [AMDGPU] Stop special casing flat_scratch for register nameStanislav Mekhanoshin2019-07-172-13/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D64885 llvm-svn: 366376
* [AMDGPU] Tune inlining parameters for AMDGPU targetDaniil Fukalov2019-07-172-2/+4
| | | | | | | | | | | | | | | | | | | Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 llvm-svn: 366348
* AMDGPU: Use getTargetConstantMatt Arsenault2019-07-171-2/+2
| | | | | | Avoids creating an extra intermediate mov. llvm-svn: 366340
OpenPOWER on IntegriCloud