summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Increase kernel paddingStanislav Mekhanoshin2019-07-241-2/+2
| | | | | | | | | | To support prefetch mode 3 we need to pad current cacheline and fill 3 cachelines after. Current padding is only sufficient for mode 2. Differential Revision: https://reviews.llvm.org/D65236 llvm-svn: 366938
* [AMDGPU][MC][GFX10] Enabled GFX10 assembly with arbitrary wavesize assumed ↵Dmitry Preobrazhensky2019-07-241-2/+2
| | | | | | | | | | by the code Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D65216 llvm-svn: 366921
* [AMDGPU] Add all vgpr classes to asm parserStanislav Mekhanoshin2019-07-241-1/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D65158 llvm-svn: 366917
* AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting extsMatt Arsenault2019-07-241-6/+8
| | | | | | | The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915
* Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI.Simon Pilgrim2019-07-231-2/+2
| | | | llvm-svn: 366808
* AMDGPU: Don't use SDNodeXForm for DS offset outputMatt Arsenault2019-07-221-12/+12
| | | | | | | | | | | The xform has no real valuewhen it's using out of a complex pattern output. The complex pattern was already creating TargetConstants with i16, so this was just unnecessary machinery. This allows global isel to import the simple cases once the complex pattern is implemented. llvm-svn: 366743
* AMDGPU/GlobalISel: Remove unnecessary codeMatt Arsenault2019-07-221-4/+0
| | | | | | | The minnum/maxnum case are dead, and the cvt is handled by the default. llvm-svn: 366685
* [AMDGPU] Save some work when an atomic op has no usesJay Foad2019-07-221-67/+70
| | | | | | | | | | | | | | | | | | Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an atomic op (i.e. the value that was in memory before the atomic op was performed) is not used. NFC. Reviewers: arsenm, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64981 llvm-svn: 366667
* AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spacesMatt Arsenault2019-07-191-1/+3
| | | | llvm-svn: 366621
* [AMDGPU] Autogenerate register sequences in tuplesStanislav Mekhanoshin2019-07-191-272/+47
| | | | | | Differential Revision: https://reviews.llvm.org/D65007 llvm-svn: 366619
* [AMDGPU] Fixed occupancy calculation for gfx10Stanislav Mekhanoshin2019-07-194-28/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616
* AMDGPU: Avoid custom predicates for stores with glueMatt Arsenault2019-07-191-18/+24
| | | | llvm-svn: 366613
* AMDGPU: Redefine setcc condition PatLeafsMatt Arsenault2019-07-193-67/+36
| | | | | | Avoid using custom code predicates. llvm-svn: 366609
* AMDGPU: Don't rely on m0 being -1 for GWS offsetsMatt Arsenault2019-07-191-4/+6
| | | | | | | This only works if the high bits of m0 are also 0, so m0 would have to be set to 0xffff. llvm-svn: 366608
* AMDGPU: Force s_waitcnt after GWS instructionsMatt Arsenault2019-07-194-5/+26
| | | | | | | This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607
* [AMDGPU] Allow register tuples to set asm namesStanislav Mekhanoshin2019-07-194-139/+99
| | | | | | | | | | | | This change reverts most of the previous register name generation. The real problem is that RegisterTuple does not generate asm names. Added optional operand to RegisterTuple. This way we can simplify register name access and dramatically reduce the size of static tables for the backend. Differential Revision: https://reviews.llvm.org/D64967 llvm-svn: 366598
* AMDGPU/GlobalISel: Fix MMO flags for kernel argument loadsMatt Arsenault2019-07-191-1/+1
| | | | | | The DAG lowering sets dereferencable and invariant, not nontemporal. llvm-svn: 366597
* AMDGPU/GlobalISel: Selection for fminnum/fmaxnumMatt Arsenault2019-07-191-2/+4
| | | | | | | v2f16 case doesn't work yet because the VOP3P complex patterns haven't been ported yet. llvm-svn: 366585
* AMDGPU/GlobalISel: Support arguments with multiple registersMatt Arsenault2019-07-192-30/+47
| | | | | | Handles structs used directly in argument lists. llvm-svn: 366584
* AMDGPU/GlobalISel: Rewrite lowerFormalArgumentsMatt Arsenault2019-07-194-200/+374
| | | | | | | | | | | | | | | | | This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582
* AMDGPU: Decompose all values to 32-bit pieces for calling conventionsMatt Arsenault2019-07-193-92/+18
| | | | | | | | | | This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578
* [AMDGPU][MC] Corrected parsing of branch offsetsDmitry Preobrazhensky2019-07-191-20/+43
| | | | | | | | | | See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64629 llvm-svn: 366571
* [AMDGPU] Simplify the exclusive scan used for optimized atomicsJay Foad2019-07-191-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3 and then doing a 3-way ADD, because: 1. It simplifies the compiler a little. 2. It minimizes vgpr pressure because each instruction is now of the form vn = vn + vn << c. 3. It is more friendly to the DPP combiner, which currently can't combine into an ADD3 instruction. Because of #2 and #3 the end result is improved from this: v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf To this: v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf I.e. two fewer computational instructions, one extra nop where we could schedule something else. Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64411 llvm-svn: 366543
* [AMDGPU] Drop Reg32 and use regular AsmNameStanislav Mekhanoshin2019-07-183-25/+21
| | | | | | | | This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb. Differential Revision: https://reviews.llvm.org/D64952 llvm-svn: 366505
* [AMDGPU] Simplify AMDGPUInstPrinter::printRegOperand()Stanislav Mekhanoshin2019-07-172-157/+37
| | | | | | Differential Revision: https://reviews.llvm.org/D64892 llvm-svn: 366385
* [AMDGPU] Stop special casing flat_scratch for register nameStanislav Mekhanoshin2019-07-172-13/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D64885 llvm-svn: 366376
* [AMDGPU] Tune inlining parameters for AMDGPU targetDaniil Fukalov2019-07-172-2/+4
| | | | | | | | | | | | | | | | | | | Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 llvm-svn: 366348
* AMDGPU: Use getTargetConstantMatt Arsenault2019-07-171-2/+2
| | | | | | Avoids creating an extra intermediate mov. llvm-svn: 366340
* [AMDGPU] Optimize atomic AND/OR/XORJay Foad2019-07-171-16/+55
| | | | | | | | | | | | | | Summary: Extend the atomic optimizer to handle AND, OR and XOR. Reviewers: arsenm, sheredom Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64809 llvm-svn: 366323
* AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXECNicolai Haehnle2019-07-171-1/+1
| | | | | | | | | | | | | | | Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630 Reviewers: rampitec, mareko Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64807 Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc llvm-svn: 366314
* AMDGPU: Improve alias analysis for GDSNicolai Haehnle2019-07-171-4/+4
| | | | | | | | | | | | | | | | | Summary: GDS cannot alias anything else. Original patch by: Marek Olšák Reviewers: arsenm, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64114 Change-Id: I07bfbd96f5d5c37a6dfba7997df12f291dd794b0 llvm-svn: 366313
* [AMDGPU] Autogenerate register asm namesStanislav Mekhanoshin2019-07-165-721/+139
| | | | | | Differential Revision: https://reviews.llvm.org/D64839 llvm-svn: 366283
* AMDGPU/GlobalISel: Select G_ASHRMatt Arsenault2019-07-164-13/+4
| | | | llvm-svn: 366257
* AMDGPU/GlobalISel: Select G_LSHRMatt Arsenault2019-07-163-4/+4
| | | | llvm-svn: 366256
* AMDGPU/GlobalISel: Select G_SHLMatt Arsenault2019-07-163-4/+4
| | | | | | | | | | I think this manages to not break the DAG handling with the divergent predicates because the stadalone divergent patterns end up with a higher priority than the pattern on the instruction definition. The 16-bit versions don't work yet. llvm-svn: 366254
* [AMDGPU] Change register type for v32 vectorsStanislav Mekhanoshin2019-07-161-2/+2
| | | | | | | | | | When it is AReg_1024 this results in unnecessary copying into AGPRs of a 32 element vectors even though they are not intended for an mfma instruction. Differential Revision: https://reviews.llvm.org/D64815 llvm-svn: 366252
* AMDGPU/GlobalISel: Fix selection of private storesMatt Arsenault2019-07-161-6/+7
| | | | llvm-svn: 366249
* AMDGPU/GlobalISel: Select private loadsMatt Arsenault2019-07-163-1/+147
| | | | llvm-svn: 366248
* AMDGPU/GlobalISel: Select flat storesMatt Arsenault2019-07-161-2/+4
| | | | llvm-svn: 366246
* AMDGPU: Add register classes to flat store patternsMatt Arsenault2019-07-161-25/+25
| | | | | | | For some reason GlobalISelEmitter needs register classes to import these, although it works for the load patterns. llvm-svn: 366242
* AMDGPU: Replace store PatFragsMatt Arsenault2019-07-162-14/+34
| | | | | | Convert the easy cases to formats understood for GlobalISel. llvm-svn: 366240
* AMDGPU/GlobalISel: Select flat loadsMatt Arsenault2019-07-167-56/+102
| | | | | | | | Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237
* [AMDGPU] Optimize atomic max/minJay Foad2019-07-161-36/+141
| | | | | | | | | | | | | | | | Summary: Extend the atomic optimizer to handle signed and unsigned max and min operations, as well as add and subtract. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64328 llvm-svn: 366235
* AMDGPU: Redefine load PatFragsMatt Arsenault2019-07-164-76/+105
| | | | | | | Rewrite PatFrags using the new PatFrag address space matching in tablegen. These will now work with both SelectionDAG and GlobalISel. llvm-svn: 366234
* [AMDGPU] Add the adjusted FP as a livein register.Michael Liao2019-07-163-34/+41
| | | | | | | | | | | | Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64145 llvm-svn: 366223
* AMDGPU/GlobalISel: Fix test failures in release buildMatt Arsenault2019-07-162-3/+6
| | | | | | | | | | | | Apparently the check for legal instructions during instruction select does not happen without an asserts build, so these would successfully select in release, and fail in debug. Make s16 and/or/xor legal. These can just be selected directly to the 32-bit operation, as is already done in SelectionDAG, so just make them legal. llvm-svn: 366210
* Fix parameter name comments using clang-tidy. NFC.Rui Ueyama2019-07-161-2/+2
| | | | | | | | | | | | | | | | | | | | | This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-*,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h} llvm-svn: 366177
* AMDGPU: Avoid code predicates for extload PatFragsMatt Arsenault2019-07-165-48/+72
| | | | | | | | | | Use the MemoryVT field. This will be necessary for tablegen to automatically handle patterns for GlobalISel. Doesn't handle the d16 lo/hi patterns. Those are a special case since it involvess the custom node type. llvm-svn: 366168
* [AMDGPU] Enable merging m0 initializations.Austin Kerbow2019-07-151-15/+32
| | | | | | | | | | | | | | | | | | | | Summary: Enable hoisting and merging m0 defs that are initialized with the same immediate value. Fixes bug where removed instructions are not considered to interfere with other inits, and make sure to not hoist inits before block prologues. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64766 llvm-svn: 366135
* AMDGPU: Use standalone MUBUF load patternsMatt Arsenault2019-07-151-20/+37
| | | | | | | | | | | | | | | | | | We already do this for the flat and DS instructions, although it is certainly uglier and more verbose. This will allow using separate pattern definitions for extload and zextload. Currently we get away with using a single PatFrag with custom predicate code to check if the extension type is a zextload or anyextload. The generic mechanism the global isel emitter understands treats these as mutually exclusive. I was considering making the pattern emitter accept zextload or sextload extensions for anyextload patterns, but in global isel, the different extending loads have distinct opcodes, and there is currently no mechanism for an opcode matcher to try multiple (and there probably is very little need for one beyond this case). llvm-svn: 366132
OpenPOWER on IntegriCloud