summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPU.h
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/STRon Lieberman2018-11-161-0/+4
| | | | | | | | | Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. llvm-svn: 347008
* Allow subclassing ExternalAAMatt Arsenault2018-11-071-0/+2
| | | | | | | | | | | | | | This allows testing AMDGPU alias analysis like any other alias analysis pass. This fixes the existing test pointlessly running opt -O3 when it really just wants to run the one analysis. Before there was no way to test this using -aa-eval with opt, since the default constructed pass is run. The wrapper subclass allows the default constructor to pass the necessary callback. llvm-svn: 346353
* [AMDGPU] Add a pass to promote bitcast callsScott Linder2018-10-261-0/+4
| | | | | | | | | | | | AMDGPU currently only supports direct calls, but at lower optimisation levels it fails to lower statically direct calls which appear indirect due to a bitcast. Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize" calls. Differential Revision: https://reviews.llvm.org/D52741 llvm-svn: 345382
* [AMDGPU] Add an AMDGPU specific atomic optimizer.Neil Henning2018-10-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | This commit adds a new IR level pass to the AMDGPU backend to perform atomic optimizations. It works by: - Running through a function and finding atomicrmw add/sub or uses of the atomic buffer intrinsics for add/sub. - If all arguments except the value to be added/subtracted are uniform, record the value to be optimized. - Run through the atomic operations we can optimize and, depending on whether the value is uniform/divergent use wavefront wide operations (DPP in the divergent case) to calculate the total amount to be atomically added/subtracted. - Then let only a single lane of each wavefront perform the atomic operation, reducing the total number of atomic operations in flight. - Lastly we recombine the result from the single lane to each lane of the wavefront, and calculate our individual lanes offset into the final result. Differential Revision: https://reviews.llvm.org/D51969 llvm-svn: 343973
* Remove unnecessary semicolon to silence -Wpedantic warning. NFCI.Simon Pilgrim2018-09-031-1/+1
| | | | llvm-svn: 341303
* AMDGPU: Remove remnants of old address space mappingMatt Arsenault2018-08-311-14/+5
| | | | llvm-svn: 341165
* AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr spaceSamuel Pitoiset2018-08-221-1/+1
| | | | | | | | | | | | | | | | 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS v4: - fix compilation issues - fix out of bounds access v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 llvm-svn: 340417
* Revert "AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space"Vitaly Buka2018-08-201-1/+1
| | | | | | | | As it introduces out of bound access. This reverts commit r340172 and r340171 llvm-svn: 340202
* AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr spaceSamuel Pitoiset2018-08-201-1/+1
| | | | | | | | | | | | | | 32-bit constant address space is declared as 6, so the maximum number of address spaces is 6, not 5. Fixes "LLVM ERROR: Pointer address space out of range". v3: use static_assert() v2: add a very simple test for 32-bit addr space Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 340171
* AMDGPU: Add pass to lower kernel arguments to loadsMatt Arsenault2018-06-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650
* [AMDGPU] Construct memory clauses before RAStanislav Mekhanoshin2018-05-311-0/+4
| | | | | | | | | | | | | | | | | | Memory clauses are formed into bundles in presence of xnack. Their source operands are marked as early-clobber. This allows to allocate distinct source and destination registers within a clause and prevent breaking the clause with s_nop in the hazard recognizer. Clauses are undone before post-RA scheduler to allow some rescheduling, which will not break the clause since artificial edges are created in the dag to keep memory operations together. Yet this allows a better ILP in some cases. Differential Revision: https://reviews.llvm.org/D47511 llvm-svn: 333691
* [AMDGPU] Add perf hints to functionsStanislav Mekhanoshin2018-05-251-0/+3
| | | | | | | | | | | | | | | This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* AMDGPU: Add pass to optimize reqd_work_group_sizeMatt Arsenault2018-05-181-0/+4
| | | | | | | | | | | Eliminate loads from the dispatch packet when they will have a known value. Also pattern match the code used by the library to handle partial workgroup dispatches, which isn't necessary if reqd_work_group_size is used. llvm-svn: 332771
* AMDGPU: Rename OpenCL lowering pass to be R600 specific.Matt Arsenault2018-05-131-1/+1
| | | | | | | | | | | | | | | | | | This pass is a) broken. b) r600 specific. Fixing (a) is a bit more non-trivial, but fixing (b) is easy. Move this pass to being R600 only for now. This pass does pass all the unit tests, however clang no longer generates code that looks like the unit test input, so fixing the pass requires fixing the tests and the pass as one, and checking it works with clang still. Patch by Dave Airlie llvm-svn: 332196
* [AMDGPU][Waitcnt] Remove the old waitcnt passMark Searles2018-05-071-4/+0
| | | | | | | | | Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained and getting crufty Differential Revision: https://reviews.llvm.org/D46448 llvm-svn: 331641
* [AMDGPU] Change constant addr space to 4Yaxun Liu2018-02-131-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D43170 llvm-svn: 325030
* Reapply "AMDGPU: Add 32-bit constant address space"Matt Arsenault2018-02-091-0/+3
| | | | | | This reverts r324494 and reapplies r324487. llvm-svn: 324747
* Revert "AMDGPU: Add 32-bit constant address space"Rafael Espindola2018-02-071-3/+0
| | | | | | | | This reverts commit r324487. It broke clang tests. llvm-svn: 324494
* AMDGPU: Add 32-bit constant address spaceMarek Olsak2018-02-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Note: This is a candidate for LLVM 6.0, because it was planned to be in that release but was delayed due to a long review period. Merge conflict in release_60 - resolution: Add "-p6:32:32" into the second (non-amdgiz) string. Only scalar loads support 32-bit pointers. An address in a VGPR will fail to compile. That's OK because the results of loads will only be used in places where VGPRs are forbidden. Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC. The tests cover all uses cases we need for Mesa. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D41651 llvm-svn: 324487
* [AMDGPU] Clean up symbols in the global namespace.Benjamin Kramer2017-10-311-33/+38
| | | | llvm-svn: 317051
* [AMDGPU] Lower enqueued blocks and generate runtime metadataYaxun Liu2017-10-101-0/+4
| | | | | | | | | | | | | | | | | | | | | | | This patch adds a post-linking pass which replaces the function pointer of enqueued block kernel with a global variable (runtime handle) and adds runtime-handle attribute to the enqueued block kernel. In LLVM CodeGen the runtime-handle metadata will be translated to RuntimeHandle metadata in code object. Runtime allocates a global buffer for each kernel with RuntimeHandel metadata and saves the kernel address required for the AQL packet into the buffer. __enqueue_kernel function in device library knows that the invoke function pointer in the block literal is actually runtime handle and loads the kernel address from it and puts it into AQL packet for dispatching. This cannot be done in FE since FE cannot create a unique global variable with external linkage across LLVM modules. The global variable with internal linkage does not work since optimization passes will try to replace loads of the global variable with its initialization value. Differential Revision: https://reviews.llvm.org/D38610 llvm-svn: 315352
* [AMDGPU] Set fast-math flags on functions given the optionsStanislav Mekhanoshin2017-09-291-1/+2
| | | | | | | | | | | | | | | | We have a single library build without relaxation options. When inlined library functions remove fast math attributes from the functions they are integrated into. This patch sets relaxation attributes on the functions after linking provided corresponding relaxation options are given. Math instructions inside the inlined functions remain to have no fast flags, but inlining does not prevent fast math transformations of a surrounding caller code anymore. Differential Revision: https://reviews.llvm.org/D38325 llvm-svn: 314568
* [AMDGPU] Port of HSAIL inlinerStanislav Mekhanoshin2017-09-201-0/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714
* [AMDGPU] Ported and adopted AMDLibCalls passStanislav Mekhanoshin2017-08-111-0/+8
| | | | | | | | | | | | | | | | | | The pass does simplifications of well known AMD library calls. If given -amdgpu-prelink option it works in a pre-link mode which allows to reference new library functions which will be linked in later. In addition it also used to process traditional AMD option -fuse-native which allows to replace some of the functions with their fast native implementations from the library. The necessary glue to pass the prelink option and translate -fuse-native is to be added to the driver. Differential Revision: https://reviews.llvm.org/D36436 llvm-svn: 310731
* AMDGPU: Move R600 parts of AMDGPUISelDAGToDAG into their own classTom Stellard2017-08-081-0/+1
| | | | | | | | | | | | Summary: This refactoring is required in order to split the R600 and GCN tablegen files. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D36286 llvm-svn: 310336
* AMDGPU: Remove FixControlFlowLiveIntervals passMatt Arsenault2017-08-071-3/+0
| | | | | | | | | | | | | | This hasn't done anything in a long time. This was running after the the control flow pseudos were expanded, so this would never find them. The control flow pseudo expansion was moved to solve the problem this pass was supposed to solve in the first place, except handling it earlier also fixes it for fast regalloc which doesn't use LiveIntervals. Noticed by checking LCOV reports. llvm-svn: 310274
* [AMDGPU] Add support for Whole Wavefront ModeConnor Abbott2017-08-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 llvm-svn: 310087
* AMDGPU: Add analysis pass for function argument infoMatt Arsenault2017-08-031-2/+7
| | | | | | | This will allow only adding necessary inputs to callee functions that need special inputs forwarded from the kernel. llvm-svn: 309996
* AMDGPU/R600: Initialize more passesTom Stellard2017-08-021-0/+15
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36128 llvm-svn: 309893
* [AMDGPU] Collapse adjacent SI_END_CFStanislav Mekhanoshin2017-08-011-0/+4
| | | | | | | | | | | | | | | | Add a pass to remove redundant S_OR_B64 instructions enabling lanes in the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any vector instructions between them we can only keep outer SI_END_CF, given that CFG is structured and exec bits of the outer end statement are always not less than exec bit of the inner one. This needs to be done before the RA to eliminate saved exec bits registers but after register coalescer to have no vector registers copies in between of different end cf statements. Differential Revision: https://reviews.llvm.org/D35967 llvm-svn: 309762
* AMDGPU: Add pass to replace out argumentsMatt Arsenault2017-07-281-0/+4
| | | | | | | | | | | | | | | | | | | | | | | It is better to return arguments directly in registers if we are making a call rather than introducing expensive stack usage. In one of sample compile from one of Blender's many kernel variants, this fires on about ~20 different functions. Future improvements may be to recognize simple cases where the pointer is indexing a small array. This also fails when the store to the out argument is in a separate block from the return, which happens in a few of the Blender functions. This should also probably be using MemorySSA which might help with that. I'm not sure this is correct as a FunctionPass, but MemoryDependenceAnalysis seems to not work with a ModulePass. I'm also not sure where it should run.I think it should run before DeadArgumentElimination, so maybe either EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate. llvm-svn: 309416
* AMDGPU: Implement memory modelKonstantin Zhuravlyov2017-07-211-0/+4
| | | | llvm-svn: 308781
* AMDGPU: Annotate call graph with used featuresMatt Arsenault2017-07-131-1/+1
| | | | | | | Previously this wouldn't detect used features indirectly used in callee functions. llvm-svn: 307967
* AMDGPU: Remove SITypeRewriterMatt Arsenault2017-06-281-1/+0
| | | | | | | This was an old workaround for using v16i8 in some old intrinsics for resource descriptors. llvm-svn: 306603
* AMDGPU: Register AMDGPUAlwaysInlineMatt Arsenault2017-06-021-0/+2
| | | | llvm-svn: 304574
* [LegacyPassManager] Remove TargetMachine constructorsFrancis Visoiu Mistrih2017-05-181-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360
* Re-submit AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-151-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111
* Revert 303091.Jan Sjodin2017-05-151-4/+0
| | | | llvm-svn: 303098
* Add AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-151-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091
* [AMDGPU] Generate range metadata for workitem idStanislav Mekhanoshin2017-04-121-1/+1
| | | | | | | | | If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102
* [AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.Kannan Narayanan2017-04-121-0/+4
| | | | | | Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023
* AMDGPU: Fix crash when disassembling VOP3 macMatt Arsenault2017-04-101-0/+1
| | | | | | | | | | | | The unused dummy src2_modifiers is missing, so it crashes when trying to print it. I tried to fully remove src2_modifiers, but there are some irritations in the places where it is converted to mad since it starts to require modifying use lists while iterating over them. llvm-svn: 299861
* [AMDGPU] Temporarily change constant address space from 4 to 2Yaxun Liu2017-04-061-1/+1
| | | | | | | | | | Our final address space mapping is to let constant address space to be 4 to match nvptx. However for now we will make it 2 to avoid unnecessary work in FE/BE/devlib about intrinsics returning constant pointers. Differential Revision: https://reviews.llvm.org/D31770 llvm-svn: 299690
* [AMDGPU] Add GlobalOpt parameter to Always Inliner passStanislav Mekhanoshin2017-03-301-1/+1
| | | | | | | | | If set to false it does not remove global aliases. With this parameter set to false it should be safe to run the pass before link. Differential Revision: https://reviews.llvm.org/D31489 llvm-svn: 299108
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-28/+39
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* AMDGPU: Unify divergent function exits.Matt Arsenault2017-03-241-0/+3
| | | | | | | | | | StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. llvm-svn: 298729
* [ADMGPU] SDWA peephole optimization pass.Sam Kolton2017-03-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365
* [AMDGPU] Add address space based alias analysis passStanislav Mekhanoshin2017-03-171-0/+3
| | | | | | | | | This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172
* AMDGPU: Merge initial gfx9 supportMatt Arsenault2017-02-181-1/+1
| | | | llvm-svn: 295554
OpenPOWER on IntegriCloud