summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/SILoadStoreOptimillzer: Refactor CombineInfo structTom Stellard2019-12-171-241/+216
| | | | | | | | | | | | | | | | | Summary: Modify CombineInfo to only store information about a single instruction. This is a little easier to work with and removes a lot of duplicate initialization code. Reviewers: arsenm, nhaehnle Reviewed By: arsenm, nhaehnle Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71045
* AMDGPU/SILoadStoreOptimizer: Simplify functionTom Stellard2019-12-121-62/+50
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71044
* [AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/storesPiotr Sobczak2019-11-201-7/+268
| | | | | | | | | | | | | | Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69794
* [AMDGPU] Keep consistent check of legal addressing mode.Michael Liao2019-11-201-1/+3
| | | | | | | | | | | | | | Summary: - Add test cases for GFX10, which has narrower offset range compared to GFX9. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70473
* AMDGPU/SILoadStoreOptimizer: fix a likely bug introduced recentlyNicolai Hähnle2019-11-161-2/+2
| | | | | | | | | | | | | | | | | Summary: We should check for same instruction class before checking whether they have the same base address, else we might iterate out of bounds of a MachineInstr operands list. The InstClass check is also cheaper. This was introduced in SVN r373630. Reviewers: tstellar Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68690
* Sink all InitializePasses.h includesReid Kleckner2019-11-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211
* [SILoadStoreOptimizer] Fixed typo. NFCI.Dávid Bolvanský2019-11-031-1/+1
|
* [AMDGPU] Extend the SI Load/Store optimizerPiotr Sobczak2019-10-161-13/+174
| | | | | | | | | | | | | | | | | | | | | Summary: Extend the SI Load/Store optimizer to merge MIMG load instructions. Handle different flavours of image_load and image_sample instructions. When the instructions of the same subclass differ only in dmask, merge them and update dmask accordingly. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64911 llvm-svn: 374984
* AMDGPU: Use SGPR_128 instead of SReg_128 for vregsMatt Arsenault2019-10-101-1/+1
| | | | | | | | | SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds the additional non-allocatable TTMP registers. There's no point in allocating SReg_128 vregs. This shrinks the size of the classes regalloc needs to consider, which is usually good. llvm-svn: 374284
* AMDGPU: Relax register classes usedMatt Arsenault2019-10-091-2/+2
| | | | llvm-svn: 374254
* [AMDGPU][SILoadStoreOptimizer] NFC: Refactor codePiotr Sobczak2019-10-041-120/+80
| | | | | | | | | | | | | | | | | | | | | | | Summary: This patch fixes a potential aliasing problem in InstClassEnum, where local values were mixed with machine opcodes. Introducing InstSubclass will keep them separate and help extending InstClassEnum with other instruction types (e.g. MIMG) in the future. This patch also makes getSubRegIdxs() more concise. Reviewers: nhaehnle, arsenm, tstellar Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68384 llvm-svn: 373699
* AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructionsTom Stellard2019-10-031-82/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This adds a pre-pass to this optimization that scans through the basic block and generates lists of mergeable instructions with one list per unique address. In the optimization phase instead of scanning through the basic block for mergeable instructions, we now iterate over the lists generated by the pre-pass. The decision to re-optimize a block is now made per list, so if we fail to merge any instructions with the same address, then we do not attempt to optimize them in future passes over the block. This will help to reduce the time this pass spends re-optimizing instructions. In one pathological test case, this change reduces the time spent in the SILoadStoreOptimizer from 0.2s to 0.03s. This restructuring will also make it possible to implement further solutions in this pass, because we can now add less expensive checks to the pre-pass and filter instructions out early which will avoid the need to do the expensive scanning during the optimization pass. For example, checking for adjacent offsets is an inexpensive test we can move to the pre-pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65961 llvm-svn: 373630
* [AMDGPU] Extend buffer intrinsics with swizzlingPiotr Sobczak2019-10-021-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491
* AMDGPU/SILoadStoreOptimizer: Add helper functions for working with CombineInfoTom Stellard2019-10-011-205/+244
| | | | | | | | | | | | | | | | | | Summary: This is a refactoring that will make future improvements to this pass easier. This change should not change the behavior of the pass. Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Reviewed By: nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65496 llvm-svn: 373366
* AMDGPU/SILoadStoreOptimizer: Add const to more functionsTom Stellard2019-09-191-12/+12
| | | | | | | | | | | | Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65901 llvm-svn: 372298
* [AMDGPU] Enable constant offset promotion to immediate operand for VMEM storesValery Pykhtin2019-09-061-4/+5
| | | | | | Differential revision: https://reviews.llvm.org/D66958 llvm-svn: 371214
* AMDGPU: Disambiguate v3f16 format in load/store tablesMatt Arsenault2019-08-181-1/+3
| | | | | | | | | Currently the searchable tables report the number of dwords. These round to the same number for 3 and 4 component d16 instructions. Change this to report the number of elements so this isn't ambiguous. llvm-svn: 369202
* Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVMDaniel Sanders2019-08-151-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041
* AMDGPU/LoadStoreOptimizer: Set the correct offset whem merging MMOsTom Stellard2019-08-051-1/+6
| | | | | | | | | | | | | | | | | | Summary: This is a follow up to r367237. MachineFunction::getMachineMemOperand() adds the offset parameter to the existing offset instead of resetting it. So we need to reset the offset to the correct value after calling this function. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65557 llvm-svn: 367881
* Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders2019-08-011-4/+3
| | | | | | llvm::Register as started by r367614. NFC llvm-svn: 367633
* AMDGPU/SILoadStoreOptimizer: Make some functions constTom Stellard2019-08-011-6/+6
| | | | | | | | | | | | | | Reviewers: arsenm, pendingchaos, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65316 llvm-svn: 367517
* AMDGPU/LoadStoreOptimizer: combine MMOs when merging instructionsTom Stellard2019-07-291-3/+38
| | | | | | | | | | | | | | | | | | | | | | | Summary: The LoadStoreOptimizer was creating instructions with 2 MachineMemOperands, which meant they were assumed to alias with all other instructions, because MachineInstr:mayAlias() returns true when an instruction has multiple MachineMemOperands. This was preventing these instructions from being merged again, and was giving the scheduler less freedom to reorder them. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65036 llvm-svn: 367237
* [AMDGPU] gfx10 conditional registers handlingStanislav Mekhanoshin2019-06-161-3/+4
| | | | | | | | | This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513
* [AMDGPU] detect WaW hazards when moving/merging load/store instructionsRhys Perry2019-05-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In order to combine memory operations efficiently, the load/store optimizer might move some instructions around. It's usually safe to move instructions down past the merged instruction because the pass checks if memory operations can be re-ordered. Though, the current logic doesn't handle Write-after-Write hazards. This fixes a reflection issue with Monster Hunter World and DXVK. v2: - rebased on top of master - clean up the test case - handle WaW hazards correctly Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130 Original patch by Samuel Pitoiset. Reviewers: tpr, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D61313 llvm-svn: 361008
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-301-4/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmeticTim Renouf2019-03-181-4/+8
| | | | | | | | | | | | | | Allow the clamp modifier on vop3 int arithmetic instructions in assembly and disassembly. This involved adding a clamp operand to the affected instructions in MIR and MC, and thus having to fix up several places in codegen and MIR tests. Differential Revision: https://reviews.llvm.org/D59267 Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e llvm-svn: 356399
* AMDGPU: Use MachineInstr::mayAlias to replace ↵Changpeng Fang2019-02-181-10/+8
| | | | | | | | | | | | | | | areMemAccessesTriviallyDisjoint in LoadStoreOptimizer pass. Summary: This is to fix a memory dependence bug in LoadStoreOptimizer. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58295 llvm-svn: 354295
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [AMDGPU] Fix dwordx3/southern-islands failures.Neil Henning2019-01-101-4/+5
| | | | | | | | | | | This commit fixes the dwordx3/southern-islands failures that were found in bugzilla https://bugs.llvm.org/show_bug.cgi?id=40129, by not generating the dwordx3 variants of load/store instructions that were added to the ISA after southern islands. Differential Revision: https://reviews.llvm.org/D56434 llvm-svn: 350838
* [AMDGPU] Removed the unnecessary operand size-check-assert from ↵Farhana Aleen2018-12-181-2/+0
| | | | | | | | | | processBaseWithConstOffset(). Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert. Author: FarhanaAleen llvm-svn: 349529
* Fix -Wunused-variable warning. NFCI.Simon Pilgrim2018-12-151-0/+4
| | | | llvm-svn: 349265
* [SILoadStoreOptimizer] Use std::abs to avoid truncation.Florian Hahn2018-12-151-2/+2
| | | | | | | | | | | | | | Using regular abs() causes the following warning error: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long') but has parameter of type 'int' which may cause truncation of value [-Werror,-Wabsolute-value] (uint32_t)abs(Dist) > MaxDist) { ^ lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:1369:19: note: use function 'std::abs' instead which causes a bot to fail: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18284/steps/bootstrap%20clang/logs/stdio llvm-svn: 349224
* [AMDGPU] Promote constant offset to the immediate by finding a new base with ↵Farhana Aleen2018-12-141-0/+361
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 13bit constant offset from the nearby instructions. Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions. E.g. s_movk_i32 s0, 0x1800 v_add_co_u32_e32 v0, vcc, s0, v2 v_addc_co_u32_e32 v1, vcc, 0, v6, vcc s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[0:1], off => s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[5:6], off offset:2048 Author: FarhanaAleen Reviewed By: arsenm, rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D55539 llvm-svn: 349196
* [AMDGPU] Extend the SI Load/Store optimizer to combine more things.Neil Henning2018-12-121-238/+442
| | | | | | | | | | I've extended the load/store optimizer to be able to produce dwordx3 loads and stores, This change allows many more load/stores to be combined, and results in much more optimal code for our hardware. Differential Revision: https://reviews.llvm.org/D54042 llvm-svn: 348937
* [AMDGPU] Fix ds combine with subregsStanislav Mekhanoshin2018-09-251-14/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D52522 llvm-svn: 343047
* [MI] Change the array of `MachineMemOperand` pointers to beChandler Carruth2018-08-161-22/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a generically extensible collection of extra info attached to a `MachineInstr`. The primary change here is cleaning up the APIs used for setting and manipulating the `MachineMemOperand` pointer arrays so chat we can change how they are allocated. Then we introduce an extra info object that using the trailing object pattern to attach some number of MMOs but also other extra info. The design of this is specifically so that this extra info has a fixed necessary cost (the header tracking what extra info is included) and everything else can be tail allocated. This pattern works especially well with a `BumpPtrAllocator` which we use here. I've also added the basic scaffolding for putting interesting pointers into this, namely pre- and post-instruction symbols. These aren't used anywhere yet, they're just there to ensure I've actually gotten the data structure types correct. I'll flesh out support for these in a subsequent patch (MIR dumping, parsing, the works). Finally, I've included an optimization where we store any single pointer inline in the `MachineInstr` to avoid the allocation overhead. This is expected to be the overwhelmingly most common case and so should avoid any memory usage growth due to slightly less clever / dense allocation when dealing with >1 MMO. This did require several ergonomic improvements to the `PointerSumType` to reasonably support the various usage models. This also has a side effect of freeing up 8 bits within the `MachineInstr` which could be repurposed for something else. The suggested direction here came largely from Hal Finkel. I hope it was worth it. ;] It does hopefully clear a path for subsequent extensions w/o nearly as much leg work. Lots of thanks to Reid and Justin for careful reviews and ideas about how to do all of this. Differential Revision: https://reviews.llvm.org/D50701 llvm-svn: 339940
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-2/+2
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen2018-05-141-3/+3
| | | | | | | | | | | | | | | | The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
* AMDGPU: Track physreg uses in SILoadStoreOptimizerNicolai Haehnle2018-02-231-32/+32
| | | | | | | | | | | | | | | | Summary: This handles def-after-use of physregs, and allows us to merge loads and stores even across some physreg defs (typically M0 defs). Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D42647 llvm-svn: 325882
* AMDGPU: Do not combine loads/store across physreg defsNicolai Haehnle2018-02-211-1/+19
| | | | | | | | | | | | | | | | | | | Summary: Since this pass operates on machine SSA form, this should only really affect M0 in practice. Fixes various piglit variable-indexing/vs-varying-array-mat4-index-* Change-Id: Ib2a1dc3a8d7b08225a8da49a86f533faa0986aa8 Fixes: r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4") Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40343 llvm-svn: 325677
* AMDGPU: Fix incorrect reordering when inline asm defines LDS addressMatt Arsenault2018-02-081-3/+4
| | | | | | | Defs of operands outside of the instruction's explicit defs need to be checked. llvm-svn: 324554
* AMDGPU: Remove the s_buffer workaround for GFX9 chipsMarek Olsak2018-02-071-3/+2
| | | | | | | | | | | | | | | | | | Summary: I checked the AMD closed source compiler and the workaround is only needed when x3 is emulated as x4, which we don't do in LLVM. SMEM x3 opcodes don't exist, and instead there is a possibility to use x4 with the last component being unused. If the last component is out of buffer bounds and falls on the next 4K page, the hw hangs. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D42756 llvm-svn: 324486
* [AMDGPU] SI Load Store Optimizer: When merging with offset, use ↵Mark Searles2018-01-221-14/+18
| | | | | | | | | | | V_ADD_{I|U}32_e64 - Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register. - Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts. Differential Revision: https://reviews.llvm.org/D42124 llvm-svn: 323153
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-1/+1
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* AMDGPU: Use gfx9 carry-less add/sub instructionsMatt Arsenault2017-11-301-6/+12
| | | | llvm-svn: 319491
* AMDGPU: Select DS insts without m0 initializationMatt Arsenault2017-11-291-17/+50
| | | | | | | | | GFX9 stopped using m0 for most DS instructions. Select a different instruction without the use. I think this will be less error prone than trying to manually maintain m0 uses as needed. llvm-svn: 319270
* AMDGPU: Re-organize the outer loop of SILoadStoreOptimizerNicolai Haehnle2017-11-281-6/+5
| | | | | | | | | | | | | | | | | | | Summary: The entire algorithm operates per basic-block, so for cache locality it should be better to re-optimize a basic-block immediately rather than in a separate loop. I don't have performance measurements. Change-Id: I85106570bd623c4ff277faaa50ee43258e1ddcc5 Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40344 llvm-svn: 319156
* AMDGPU: Consider memory dependencies with moved instructions in ↵Nicolai Haehnle2017-11-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | SILoadStoreOptimizer Summary: This bug seems to have gone unnoticed because critical cases with LDS instructions are eliminated by the peephole optimizer. However, equivalent situations arise with buffer loads and stores as well, so this fixes regressions since r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4"). Fixes at least: KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs KHR-GL45.cull_distance.functional piglit tes-input-gl_ClipDistance.shader_test ... and probably more Change-Id: I0e371536288eb8e6afeaa241a185266fd45d129d Reviewers: arsenm, mareko, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D40303 llvm-svn: 318829
* Fix "default label in switch which covers all enumeration values" warningVitaly Buka2017-11-091-2/+0
| | | | llvm-svn: 317771
OpenPOWER on IntegriCloud