summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Move GISel accessor initialization from TargetMachine to Subtarget.Quentin Colombet2017-07-051-43/+0
| | | | | | NFC llvm-svn: 307186
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-041-1/+1
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097
* Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"NAKAMURA Takumi2017-07-041-1/+1
| | | | | | | | | It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-031-1/+1
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026
* AMDGPU: Remove SITypeRewriterMatt Arsenault2017-06-281-1/+0
| | | | | | | This was an old workaround for using v16i8 in some old intrinsics for resource descriptors. llvm-svn: 306603
* [AMDGPU] Add infer address spaces pass before SROAStanislav Mekhanoshin2017-06-191-0/+8
| | | | | | | | | It adds it for the target after inlining but before SROA where we can get most out of it. Differential Revision: https://reviews.llvm.org/D34366 llvm-svn: 305759
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [AMDGPU] Untangle SDWA pass from SIShrinkInstructionsStanislav Mekhanoshin2017-06-031-1/+1
| | | | | | | | | | | | Remove dependency of SDWA pass on SIShrinkInstructions. The goal is to move SDWA even higher in the stack to avoid second run of MachineLICM, MachineCSE and SIFoldOperands. Also added handling to preserve original src modifiers. Differential Revision: https://reviews.llvm.org/D33860 llvm-svn: 304665
* AMDGPU: Register AMDGPUAlwaysInlineMatt Arsenault2017-06-021-0/+1
| | | | llvm-svn: 304574
* [AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests.Mark Searles2017-06-021-1/+1
| | | | | | | | | -enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass Differential Revision: https://reviews.llvm.org/D33730 llvm-svn: 304551
* TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFCMatthias Braun2017-05-301-5/+5
| | | | | | | | | | | TargetPassConfig is not useful for targets that do not use the CodeGen library, so we may just as well store a pointer to an LLVMTargetMachine instead of just to a TargetMachine. While at it, also change the constructor to take a reference instead of a pointer as the TM must not be nullptr. llvm-svn: 304247
* [AMDGPU] Allow SDWA in instructions with immediates and SGPRsStanislav Mekhanoshin2017-05-301-0/+3
| | | | | | | | | | | | | | | | An encoding does not allow to use SDWA in an instruction with scalar operands, either literals or SGPRs. That is however possible to copy these operands into a VGPR first. Several copies of the value are produced if multiple SDWA conversions were done. To cleanup MachineLICM (to hoist copies out of loops), MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace SGPR to VGPR copy with immediate copy right to the VGPR) runs are added after the SDWA pass. Differential Revision: https://reviews.llvm.org/D33583 llvm-svn: 304219
* [LegacyPassManager] Remove TargetMachine constructorsFrancis Visoiu Mistrih2017-05-181-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360
* Re-submit AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-151-2/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111
* Revert 303091.Jan Sjodin2017-05-151-16/+2
| | | | llvm-svn: 303098
* Add AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-151-2/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091
* AMDGPU: Add AMDGPU_HS calling conventionMarek Olsak2017-05-021-0/+1
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32644 llvm-svn: 301930
* [AMDGPU] Generate range metadata for workitem idStanislav Mekhanoshin2017-04-121-3/+3
| | | | | | | | | If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102
* [AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.Kannan Narayanan2017-04-121-1/+11
| | | | | | Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023
* [AMDGPU] Add A5 to data layout for amdgiz environmentYaxun Liu2017-04-111-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D31589 llvm-svn: 299964
* [AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passesSam Kolton2017-04-071-5/+5
| | | | | | | | | | | | | | Summary: Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled. With this change order of passes will not change. Reviewers: arsenm, vpykhtin, rampitec Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31705 llvm-svn: 299757
* [AMDGPU] Temporarily change constant address space from 4 to 2Yaxun Liu2017-04-061-1/+1
| | | | | | | | | | Our final address space mapping is to let constant address space to be 4 to match nvptx. However for now we will make it 2 to avoid unnecessary work in FE/BE/devlib about intrinsics returning constant pointers. Differential Revision: https://reviews.llvm.org/D31770 llvm-svn: 299690
* [AMDGPU] Resubmit SDWA peephole: enable by defaultSam Kolton2017-04-061-1/+1
| | | | | | | | | | Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299654
* Revert r299536. [AMDGPU] SDWA peephole: enable by default.Ivan Krasin2017-04-051-1/+1
| | | | | | | | | | | Reason: breaks multiple bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988 http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173 Original Review URL: https://reviews.llvm.org/D31671 llvm-svn: 299583
* [AMDGPU] SDWA peephole: enable by defaultSam Kolton2017-04-051-1/+1
| | | | | | | | | | Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299536
* [AMDGPU] Add GlobalOpt parameter to Always Inliner passStanislav Mekhanoshin2017-03-301-1/+1
| | | | | | | | | If set to false it does not remove global aliases. With this parameter set to false it should be safe to run the pass before link. Differential Revision: https://reviews.llvm.org/D31489 llvm-svn: 299108
* [AMDGPU] Split -amdgpu-early-inline-all optionStanislav Mekhanoshin2017-03-281-3/+13
| | | | | | | | | | Previously it was covered by the internalization. It turns out we cannot run internalizer in FE, it break separate compilation tests. Thus early inliner gets its own option. Differential Revision: https://reviews.llvm.org/D31429 llvm-svn: 298935
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-0/+2
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* [AMDGPU] Switch data layout by triple environment amdgizYaxun Liu2017-03-251-1/+6
| | | | | | | | | | | | Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0. amdgiz is for non-OpenCL environment where generic address space is 0. amdgizcl is for OpenCL environment where generic address space is 0. Differential Revision: https://reviews.llvm.org/D31211 llvm-svn: 298758
* AMDGPU: Unify divergent function exits.Matt Arsenault2017-03-241-0/+5
| | | | | | | | | | StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. llvm-svn: 298729
* [AMDGPU] Add AMDGPUAliasAnalysis to opt pipelineStanislav Mekhanoshin2017-03-241-1/+24
| | | | | | | | Previously it was added only to the BE. Differential Revision: https://reviews.llvm.org/D31323 llvm-svn: 298721
* [AMDGPU] Iterative scheduling infrastructure + minimal registry schedulerValery Pykhtin2017-03-211-0/+25
| | | | | | Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368
* [ADMGPU] SDWA peephole optimization pass.Sam Kolton2017-03-211-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365
* [AMDGPU] Run always inliner early in optKonstantin Zhuravlyov2017-03-201-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D31141 llvm-svn: 298281
* Revert "[AMDGPU] Run always inliner early in opt"Konstantin Zhuravlyov2017-03-201-1/+0
| | | | | | This reverts commit r297958, it breaks device-libs build. llvm-svn: 298239
* [AMDGPU] Add address space based alias analysis passStanislav Mekhanoshin2017-03-171-0/+16
| | | | | | | | | This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172
* Only unswitch loops with uniform conditionsStanislav Mekhanoshin2017-03-171-0/+2
| | | | | | | | | | | | | | | | | | Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104
* [AMDGPU] Run always inliner early in optStanislav Mekhanoshin2017-03-161-0/+1
| | | | | | | | | | We can mark functions to always inline early in the opt. Since we do not have call support this early inlining creates opportunities for inter-procedural optimizations which would not occur otherwise. Differential Revision: https://reviews.llvm.org/D31016 llvm-svn: 297958
* AMDGPU: Merge initial gfx9 supportMatt Arsenault2017-02-181-1/+2
| | | | llvm-svn: 295554
* [AMDGPU] Revert failed schedulingStanislav Mekhanoshin2017-02-151-2/+1
| | | | | | | | | | | | | | This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206
* AMDGPU: Add pass to expand memcpy/memmove/memsetMatt Arsenault2017-02-091-0/+3
| | | | llvm-svn: 294635
* AMDGPU: Enable InferAddressSpacesMatt Arsenault2017-02-081-0/+1
| | | | llvm-svn: 294408
* Re-commit AMDGPU/GlobalISel: Add support for simple shadersTom Stellard2017-01-301-6/+29
| | | | | | | | | | | | | | Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293551
* [AMDGPU] Internalize non-kernel symbolsStanislav Mekhanoshin2017-01-301-2/+33
| | | | | | | | | | | | | Since we have no call support and late linking we can produce code only for used symbols. This saves compilation time, size of the final executable, and size of any intermediate dumps. Run Internalize pass early in the opt pipeline followed by global DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option. Differential Revision: https://reviews.llvm.org/D29214 llvm-svn: 293549
* AMDGPU: Run AMDGPUCodeGenPrepare after inliningMatt Arsenault2017-01-301-9/+9
| | | | | | | With leaf functions, this makes nonsensical decisions based on the uniformity of the arguments. llvm-svn: 293525
* Revert "AMDGPU/GlobalISel: Add support for simple shaders"Tom Stellard2017-01-301-27/+6
| | | | | | | | This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509
* AMDGPU/GlobalISel: Add support for simple shadersTom Stellard2017-01-301-6/+27
| | | | | | | | | | | | Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503
* [AMDGPU] Turn AMDGPUUnifyMetadata back into module passStanislav Mekhanoshin2017-01-271-1/+1
| | | | | | | | | With the adjustPassManager interface that is now possible to use custom early module passes. Differential Revision: https://reviews.llvm.org/D29189 llvm-svn: 293300
* Replace addEarlyAsPossiblePasses callback with adjustPassManagerStanislav Mekhanoshin2017-01-261-2/+7
| | | | | | | | | | | | | | This change introduces adjustPassManager target callback giving a target an opportunity to tweak PassManagerBuilder before pass managers are populated. This generalizes and replaces addEarlyAsPossiblePasses target callback. In particular that can be used to add custom passes to extension points other than EP_EarlyAsPossible. Differential Revision: https://reviews.llvm.org/D28336 llvm-svn: 293189
* AMDGPU: Implement early ifcvt target hooks.Matt Arsenault2017-01-251-0/+14
| | | | | | | | | | | | Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016
OpenPOWER on IntegriCloud