summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Optimize atomic AND/OR/XORJay Foad2019-07-171-16/+55
| | | | | | | | | | | | | | Summary: Extend the atomic optimizer to handle AND, OR and XOR. Reviewers: arsenm, sheredom Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64809 llvm-svn: 366323
* AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXECNicolai Haehnle2019-07-171-1/+1
| | | | | | | | | | | | | | | Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630 Reviewers: rampitec, mareko Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64807 Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc llvm-svn: 366314
* AMDGPU: Improve alias analysis for GDSNicolai Haehnle2019-07-171-4/+4
| | | | | | | | | | | | | | | | | Summary: GDS cannot alias anything else. Original patch by: Marek Olšák Reviewers: arsenm, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64114 Change-Id: I07bfbd96f5d5c37a6dfba7997df12f291dd794b0 llvm-svn: 366313
* [AMDGPU] Autogenerate register asm namesStanislav Mekhanoshin2019-07-165-721/+139
| | | | | | Differential Revision: https://reviews.llvm.org/D64839 llvm-svn: 366283
* AMDGPU/GlobalISel: Select G_ASHRMatt Arsenault2019-07-164-13/+4
| | | | llvm-svn: 366257
* AMDGPU/GlobalISel: Select G_LSHRMatt Arsenault2019-07-163-4/+4
| | | | llvm-svn: 366256
* AMDGPU/GlobalISel: Select G_SHLMatt Arsenault2019-07-163-4/+4
| | | | | | | | | | I think this manages to not break the DAG handling with the divergent predicates because the stadalone divergent patterns end up with a higher priority than the pattern on the instruction definition. The 16-bit versions don't work yet. llvm-svn: 366254
* [AMDGPU] Change register type for v32 vectorsStanislav Mekhanoshin2019-07-161-2/+2
| | | | | | | | | | When it is AReg_1024 this results in unnecessary copying into AGPRs of a 32 element vectors even though they are not intended for an mfma instruction. Differential Revision: https://reviews.llvm.org/D64815 llvm-svn: 366252
* AMDGPU/GlobalISel: Fix selection of private storesMatt Arsenault2019-07-161-6/+7
| | | | llvm-svn: 366249
* AMDGPU/GlobalISel: Select private loadsMatt Arsenault2019-07-163-1/+147
| | | | llvm-svn: 366248
* AMDGPU/GlobalISel: Select flat storesMatt Arsenault2019-07-161-2/+4
| | | | llvm-svn: 366246
* AMDGPU: Add register classes to flat store patternsMatt Arsenault2019-07-161-25/+25
| | | | | | | For some reason GlobalISelEmitter needs register classes to import these, although it works for the load patterns. llvm-svn: 366242
* AMDGPU: Replace store PatFragsMatt Arsenault2019-07-162-14/+34
| | | | | | Convert the easy cases to formats understood for GlobalISel. llvm-svn: 366240
* AMDGPU/GlobalISel: Select flat loadsMatt Arsenault2019-07-167-56/+102
| | | | | | | | Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237
* [AMDGPU] Optimize atomic max/minJay Foad2019-07-161-36/+141
| | | | | | | | | | | | | | | | Summary: Extend the atomic optimizer to handle signed and unsigned max and min operations, as well as add and subtract. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64328 llvm-svn: 366235
* AMDGPU: Redefine load PatFragsMatt Arsenault2019-07-164-76/+105
| | | | | | | Rewrite PatFrags using the new PatFrag address space matching in tablegen. These will now work with both SelectionDAG and GlobalISel. llvm-svn: 366234
* [AMDGPU] Add the adjusted FP as a livein register.Michael Liao2019-07-163-34/+41
| | | | | | | | | | | | Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64145 llvm-svn: 366223
* AMDGPU/GlobalISel: Fix test failures in release buildMatt Arsenault2019-07-162-3/+6
| | | | | | | | | | | | Apparently the check for legal instructions during instruction select does not happen without an asserts build, so these would successfully select in release, and fail in debug. Make s16 and/or/xor legal. These can just be selected directly to the 32-bit operation, as is already done in SelectionDAG, so just make them legal. llvm-svn: 366210
* Fix parameter name comments using clang-tidy. NFC.Rui Ueyama2019-07-161-2/+2
| | | | | | | | | | | | | | | | | | | | | This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-*,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h} llvm-svn: 366177
* AMDGPU: Avoid code predicates for extload PatFragsMatt Arsenault2019-07-165-48/+72
| | | | | | | | | | Use the MemoryVT field. This will be necessary for tablegen to automatically handle patterns for GlobalISel. Doesn't handle the d16 lo/hi patterns. Those are a special case since it involvess the custom node type. llvm-svn: 366168
* [AMDGPU] Enable merging m0 initializations.Austin Kerbow2019-07-151-15/+32
| | | | | | | | | | | | | | | | | | | | Summary: Enable hoisting and merging m0 defs that are initialized with the same immediate value. Fixes bug where removed instructions are not considered to interfere with other inits, and make sure to not hoist inits before block prologues. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64766 llvm-svn: 366135
* AMDGPU: Use standalone MUBUF load patternsMatt Arsenault2019-07-151-20/+37
| | | | | | | | | | | | | | | | | | We already do this for the flat and DS instructions, although it is certainly uglier and more verbose. This will allow using separate pattern definitions for extload and zextload. Currently we get away with using a single PatFrag with custom predicate code to check if the extension type is a zextload or anyextload. The generic mechanism the global isel emitter understands treats these as mutually exclusive. I was considering making the pattern emitter accept zextload or sextload extensions for anyextload patterns, but in global isel, the different extending loads have distinct opcodes, and there is currently no mechanism for an opcode matcher to try multiple (and there probably is very little need for one beyond this case). llvm-svn: 366132
* AMDGPU/GlobalISel: Allow scalar s1 and/or/xorMatt Arsenault2019-07-151-6/+91
| | | | | | | | If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to whether the result is 0. If the inputs are SCC, these can be copied to a 32-bit SGPR to produce an SCC result. llvm-svn: 366125
* AMDGPU/GlobalISel: Select G_AND/G_OR/G_XORMatt Arsenault2019-07-152-0/+66
| | | | llvm-svn: 366121
* AMDGPU/GlobalISel: Don't constrain source register of VCC copiesMatt Arsenault2019-07-151-0/+20
| | | | | | | | | | | | | This is a hack until I come up with a better way of dealing with the pseudo-register banks used for boolean values. If the use instruction constrains the register, the selector for the def instruction won't see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have been SCCRegBank or VCCRegBank in wave32. This is necessary to successfully select branches with and and/or/xor condition. llvm-svn: 366120
* AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copiesMatt Arsenault2019-07-151-10/+12
| | | | | | | | | The extra test change is correct, although how it arrives there is a bug that needs work. With wave32, the test for isVCC ambiguously reports true for an SCC or VCC source. A new allocatable pseudo register class for SCC may be necesssary. llvm-svn: 366119
* AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCCMatt Arsenault2019-07-151-0/+4
| | | | llvm-svn: 366118
* AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCCMatt Arsenault2019-07-151-17/+23
| | | | | | This was emitting a copy from a 32-bit register to a 64-bit. llvm-svn: 366117
* AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELTMatt Arsenault2019-07-152-1/+33
| | | | llvm-svn: 366116
* AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELTMatt Arsenault2019-07-152-1/+36
| | | | | | Turn the constant cases into G_EXTRACTs. llvm-svn: 366115
* AMDGPU/GlobalISel: Fix G_ICMP for wave32Matt Arsenault2019-07-151-2/+2
| | | | llvm-svn: 366114
* AMDGPU/GlobalISel: Widen vector extractsMatt Arsenault2019-07-151-5/+8
| | | | llvm-svn: 366103
* AMDGPU/GlobalISel: Handle llvm.amdgcn.if.breakMatt Arsenault2019-07-152-0/+32
| | | | llvm-svn: 366102
* AMDGPU/GlobalISel: Select llvm.amdgcn.end.cfMatt Arsenault2019-07-152-0/+19
| | | | llvm-svn: 366099
* AMDGPU: Add 24-bit mul intrinsicsMatt Arsenault2019-07-152-0/+132
| | | | | | | | | | | Insert these during codegenprepare. This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an unknown read source to the mul combine. It doesn't worth the hassle of trying to insert an AssertZext or something to try to deal with it. llvm-svn: 366094
* [AMDGPU] Copy missing predicate from pseudo to realStanislav Mekhanoshin2019-07-151-0/+1
| | | | | | | | NFC at the momemnt, needed for future commit. Differential Revision: https://reviews.llvm.org/D64761 llvm-svn: 366092
* AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTORMatt Arsenault2019-07-151-0/+4
| | | | llvm-svn: 366087
* AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORSMatt Arsenault2019-07-151-1/+2
| | | | llvm-svn: 366086
* [AMDGPU] fixed scheduler crash in gfx908Stanislav Mekhanoshin2019-07-151-2/+2
| | | | | | | | | For some reason scheduler can send down an SUnit without an instruction. Differential Revision: https://reviews.llvm.org/D64709 llvm-svn: 366074
* [AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL messageDmitry Preobrazhensky2019-07-153-4/+10
| | | | | | | | Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64729 llvm-svn: 366071
* [AMDGPU][MC] Corrected encoding of src0 for DS_GWS_* instructionsDmitry Preobrazhensky2019-07-151-3/+5
| | | | | | | | | | See bug 42599: https://bugs.llvm.org/show_bug.cgi?id=42599 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64716 llvm-svn: 366067
* Remove set but unused variable.Bill Wendling2019-07-151-5/+1
| | | | llvm-svn: 366041
* [AMDGPU] use v32f32 for 3 mfma intrinsicsStanislav Mekhanoshin2019-07-125-10/+32
| | | | | | | | | These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972
* AMDGPU: Drop remnants of byval support for shadersMatt Arsenault2019-07-121-2/+1
| | | | | | | | Before 2018, mesa used to use byval interchangably with inreg, which didn't really make sense. Fix tests still using it to avoid breaking in a future commit. llvm-svn: 365953
* Fix missing use of defined() in include guardDavid Tenty2019-07-121-1/+1
| | | | | | | | | | Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64657 llvm-svn: 365952
* [AMDGPU] Extend MIMG opcode to 8 bitsStanislav Mekhanoshin2019-07-122-22/+28
| | | | | | | | This is NFC, but required for future commit. Differential Revision: https://reviews.llvm.org/D64649 llvm-svn: 365940
* [AMDGPU] Fix DPP combiner check for exec modificationJay Foad2019-07-124-29/+63
| | | | | | | | | | | | | | | | | | | | | | | | Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 llvm-svn: 365910
* [AMDGPU] Restrict v_cndmask_b32 abs/neg modifiers to f32Jay Foad2019-07-123-2/+14
| | | | | | | | | | | | | | | | | | Summary: D64497 allowed abs/neg source modifiers on v_cndmask_b32 but it doesn't make any sense to apply them to f16 operands; they would interpret the bits of the value as an f32, giving nonsensical results. This patch restricts them to f32 operands. Reviewers: arsenm, hakzsam Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64636 llvm-svn: 365904
* Delete dead storesFangrui Song2019-07-123-10/+6
| | | | llvm-svn: 365903
* [AMDGPU] Skip calculating callee saved registers for entry function.Michael Liao2019-07-111-1/+5
| | | | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64596 llvm-svn: 365846
OpenPOWER on IntegriCloud