summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Re-commit AMDGPU/GlobalISel: Add support for simple shadersTom Stellard2017-01-301-6/+29
| | | | | | | | | | | | | | Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293551
* [AMDGPU] Internalize non-kernel symbolsStanislav Mekhanoshin2017-01-301-2/+33
| | | | | | | | | | | | | Since we have no call support and late linking we can produce code only for used symbols. This saves compilation time, size of the final executable, and size of any intermediate dumps. Run Internalize pass early in the opt pipeline followed by global DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option. Differential Revision: https://reviews.llvm.org/D29214 llvm-svn: 293549
* AMDGPU: Run AMDGPUCodeGenPrepare after inliningMatt Arsenault2017-01-301-9/+9
| | | | | | | With leaf functions, this makes nonsensical decisions based on the uniformity of the arguments. llvm-svn: 293525
* Revert "AMDGPU/GlobalISel: Add support for simple shaders"Tom Stellard2017-01-301-27/+6
| | | | | | | | This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509
* AMDGPU/GlobalISel: Add support for simple shadersTom Stellard2017-01-301-6/+27
| | | | | | | | | | | | Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503
* [AMDGPU] Turn AMDGPUUnifyMetadata back into module passStanislav Mekhanoshin2017-01-271-1/+1
| | | | | | | | | With the adjustPassManager interface that is now possible to use custom early module passes. Differential Revision: https://reviews.llvm.org/D29189 llvm-svn: 293300
* Replace addEarlyAsPossiblePasses callback with adjustPassManagerStanislav Mekhanoshin2017-01-261-2/+7
| | | | | | | | | | | | | | This change introduces adjustPassManager target callback giving a target an opportunity to tweak PassManagerBuilder before pass managers are populated. This generalizes and replaces addEarlyAsPossiblePasses target callback. In particular that can be used to add custom passes to extension points other than EP_EarlyAsPossible. Differential Revision: https://reviews.llvm.org/D28336 llvm-svn: 293189
* AMDGPU: Implement early ifcvt target hooks.Matt Arsenault2017-01-251-0/+14
| | | | | | | | | | | | Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016
* [AMDGPU] Add VGPR copies post regalloc fix passStanislav Mekhanoshin2017-01-241-0/+2
| | | | | | | | | | | | | | Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 llvm-svn: 292956
* [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What ↵Eugene Zelenko2016-12-121-17/+26
| | | | | | You Use warnings; other minor fixes (NFC). llvm-svn: 289475
* [AMDGPU] Add amdgpu-unify-metadata passStanislav Mekhanoshin2016-12-081-0/+6
| | | | | | | | | | | | | | | | | | Multiple metadata values for records such as opencl.ocl.version, llvm.ident and similar are created after linking several modules. For some of them, notably opencl.ocl.version, this creates semantic problem because we cannot tell which version of OpenCL the composite module conforms. Moreover, such repetitions of identical values often create a huge list of unneeded metadata, which grows bitcode size both in memory and stored on disk. It can go up to several Mb when linked against our OpenCL library. Lastly, such long lists obscure reading of dumped IR. The pass unifies metadata after linking. Differential Revision: https://reviews.llvm.org/D25381 llvm-svn: 289092
* [AMDGPU] Scalarization of global uniform loads.Alexander Timofeev2016-12-081-0/+10
| | | | | | | | | | | | | | | | | | Summary: LC can currently select scalar load for uniform memory access basing on readonly memory address space only. This restriction originated from the fact that in HW prior to VI vector and scalar caches are not coherent. With MemoryDependenceAnalysis we can check that the memory location corresponding to the memory operand of the LOAD is not clobbered along the all paths from the function entry. Reviewers: rampitec, tstellarAMD, arsenm Subscribers: wdng, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D26917 llvm-svn: 289076
* AMDGPU: Don't required structured CFGMatt Arsenault2016-12-061-2/+3
| | | | | | | | | | | The structured CFG is just an aid to inserting exec mask modification instructions, once that is done we don't really need it anymore. We also do not analyze blocks with terminators that modify exec, so this should only be impacting true branches. llvm-svn: 288744
* MachineScheduler: Export function to construct "default" scheduler.Matthias Braun2016-11-281-8/+10
| | | | | | | | | | | | | | | | | | This makes the createGenericSchedLive() function that constructs the default scheduler available for the public API. This should help when you want to get a scheduler and the default list of DAG mutations. This also shrinks the list of default DAG mutations: {Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer added by default. Targets can easily add them if they need them. It also makes it easier for targets to add alternative/custom macrofusion or clustering mutations while staying with the default createGenericSchedLive(). It also saves the callback back and forth in TargetInstrInfo::enableClusterLoads()/enableClusterStores(). Differential Revision: https://reviews.llvm.org/D26986 llvm-svn: 288057
* Revert "AMDGPU: Enable ConstrainCopy DAG mutation"Konstantin Zhuravlyov2016-11-171-3/+0
| | | | | | | | This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233
* AMDGPU: Enable ConstrainCopy DAG mutationMatt Arsenault2016-11-161-0/+3
| | | | | | | This fixes a probably unintended divergence from the default scheduler behavior. llvm-svn: 287146
* AMDGPU: Enable store clusteringMatt Arsenault2016-11-151-1/+8
| | | | | | | Also respect the TII hook for these like the generic code does in case we want a flag later to disable this. llvm-svn: 287021
* Move the global variables representing each Target behind accessor functionMehdi Amini2016-10-091-2/+2
| | | | | | | | This avoids "static initialization order fiasco" Differential Revision: https://reviews.llvm.org/D25412 llvm-svn: 283702
* BranchRelaxation: Support expanding unconditional branchesMatt Arsenault2016-10-061-0/+1
| | | | | | | AMDGPU needs to expand unconditional branches in a new block with an indirect branch. llvm-svn: 283464
* [AMDGPU] Pass optimization level to SelectionDAGISelKonstantin Zhuravlyov2016-10-031-1/+1
| | | | llvm-svn: 283133
* [AMDGPU] Do not run scalar optimization passes at "-O0"Konstantin Zhuravlyov2016-09-301-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D25055 llvm-svn: 282873
* AMDGPU: Partially fix control flow at -O0Matt Arsenault2016-09-291-5/+16
| | | | | | | | | | | | | | | Fixes to allow spilling all registers at the end of the block work with exec modifications. Don't emit s_and_saveexec_b64 for if lowering, and instead emit copies. Mark control flow mask instructions as terminators to get correct spill code placement with fast regalloc, and then have a separate optimization pass form the saveexec. This should work if SGPRs are spilled to VGPRs, but will likely fail in the case that an SGPR spills to memory and no workitem takes a divergent branch. llvm-svn: 282667
* AMDGPU: Run LoadStoreVectorizer pass by defaultMatt Arsenault2016-09-091-1/+1
| | | | llvm-svn: 281112
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-291-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
* AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the schedulerTom Stellard2016-08-291-11/+1
| | | | | | | | | | | | | | | | | | | | Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge. Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23814 llvm-svn: 279991
* AMDGPU: Split SILowerControlFlow into two piecesMatt Arsenault2016-08-221-8/+14
| | | | | | | | | | | | | | Do most of the lowering in a pre-RA pass. Keep the skip jump insertion late, plus a few other things that require more work to move out. One concern I have is now there may be COPY instructions which do not have the necessary implicit exec uses if they will be lowered to v_mov_b32. This has a positive effect on SGPR usage in shader-db. llvm-svn: 279464
* [PM] Port the always inliner to the new pass manager in a much moreChandler Carruth2016-08-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | minimal and boring form than the old pass manager's version. This pass does the very minimal amount of work necessary to inline functions declared as always-inline. It doesn't support a wide array of things that the legacy pass manager did support, but is alse ... about 20 lines of code. So it has that going for it. Notably things this doesn't support: - Array alloca merging - To support the above, bottom-up inlining with careful history tracking and call graph updates - DCE of the functions that become dead after this inlining. - Inlining through call instructions with the always_inline attribute. Instead, it focuses on inlining functions with that attribute. The first I've omitted because I'm hoping to just turn it off for the primary pass manager. If that doesn't pan out, I can add it here but it will be reasonably expensive to do so. The second should really be handled by running global-dce after the inliner. I don't want to re-implement the non-trivial logic necessary to do comdat-correct DCE of functions. This means the -O0 pipeline will have to be at least 'always-inline,global-dce', but that seems reasonable to me. If others are seriously worried about this I'd like to hear about it and understand why. Again, this is all solveable by factoring that logic into a utility and calling it here, but I'd like to wait to do that until there is a clear reason why the existing pass-based factoring won't work. The final point is a serious one. I can fairly easily add support for this, but it seems both costly and a confusing construct for the use case of the always inliner running at -O0. This attribute can of course still impact the normal inliner easily (although I find that a questionable re-use of the same attribute). I've started a discussion to sort out what semantics we want here and based on that can figure out if it makes sense ta have this complexity at O0 or not. One other advantage of this design is that it should be quite a bit faster due to checking for whether the function is a viable candidate for inlining exactly once per function instead of doing it for each call site. Anyways, hopefully a reasonable starting point for this pass. Differential Revision: https://reviews.llvm.org/D23299 llvm-svn: 278896
* [AMDGPU] Remove duplicate initialization of SIDebuggerInsertNops passKonstantin Zhuravlyov2016-08-161-1/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D23556 llvm-svn: 278863
* AMDGPU: Prune includesMatt Arsenault2016-08-111-9/+5
| | | | llvm-svn: 278391
* [GlobalISel] Introduce an instruction selector.Ahmed Bougacha2016-07-271-0/+5
| | | | | | | | And implement it for AArch64, supporting x/w ADD/OR. Differential Revision: https://reviews.llvm.org/D22373 llvm-svn: 276875
* GlobalISel: implement legalization pass, with just one transformation.Tim Northover2016-07-221-0/+5
| | | | | | | | | This adds the actual MachineLegalizeHelper to do the work and a trivial pass wrapper that legalizes all instructions in a MachineFunction. Currently the only transformation supported is splitting up a vector G_ADD into one acting on smaller vectors. llvm-svn: 276461
* AMDGPU: Change fdiv lowering based on !fpmath metadataMatt Arsenault2016-07-191-0/+8
| | | | | | | | | | | If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051
* AMDGPU/R600: Delete/rename intrinsics no longer used by mesaMatt Arsenault2016-07-141-1/+0
| | | | | | Use the replacement pass to update the tests, and delete old names. llvm-svn: 275375
* AMDGPU/SI: Add support for R_AMDGPU_GOTPCRELTom Stellard2016-07-131-3/+3
| | | | | | | | | | Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21484 llvm-svn: 275268
* AMDGPU: Add option to run the load/store vectorizerMatt Arsenault2016-07-011-0/+16
| | | | llvm-svn: 274329
* AMDGPU: Fix global isel crashesMatt Arsenault2016-06-281-4/+7
| | | | llvm-svn: 274039
* AMDGPU: Fix typoMatt Arsenault2016-06-281-7/+6
| | | | llvm-svn: 274034
* AMDGPU: Fix global isel buildMatt Arsenault2016-06-281-0/+12
| | | | llvm-svn: 273964
* AMDGPU: Implement per-function subtargetsMatt Arsenault2016-06-271-6/+64
| | | | llvm-svn: 273940
* AMDGPU: Move subtarget feature checks into passesMatt Arsenault2016-06-271-14/+30
| | | | llvm-svn: 273937
* AMDGPU: Add stub custom CodeGenPrepare passMatt Arsenault2016-06-241-0/+1
| | | | | | | | This will do various things including ones CodeGenPrepare does, but with knowledge of uniform values. llvm-svn: 273657
* AMDGPU: Remove disable-irstructurizer subtarget featureMatt Arsenault2016-06-241-2/+7
| | | | | | | | The only real reason to use it is for testing, so replace it with a command line option instead of a potentially function dependent feature. llvm-svn: 273653
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-20/+28
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* AMDGPU: Run verifier after 2nd run of SIShrinkInstructionsMatt Arsenault2016-06-221-1/+1
| | | | llvm-svn: 273469
* AMDGPU: Fix verifier errors in SILowerControlFlowMatt Arsenault2016-06-221-2/+2
| | | | | | | | | | | | | The main sin this was committing was using terminator instructions in the middle of the block, and then not updating the block successors / predecessors. Split the blocks up to avoid this and introduce new pseudo instructions for branches taken with exec masking. Also use a pseudo instead of emitting s_endpgm and erasing it in the special case of a non-void return. llvm-svn: 273467
* AMDGPU: Run pointer optimization passesMatt Arsenault2016-06-151-7/+46
| | | | llvm-svn: 272736
* AMDGPU: Run verifer after insert waits passMatt Arsenault2016-06-091-1/+1
| | | | llvm-svn: 272338
* AMDGPU: Properly initialize SIShrinkInstructionsMatt Arsenault2016-06-091-0/+1
| | | | llvm-svn: 272336
* AMDGPU: Fix crashes on unknown processor nameMatt Arsenault2016-06-021-1/+1
| | | | | | | | | | | | | | If the processor name failed to parse for amdgcn, the resulting output would have R600 ISA in it. If the processor name was missing or invalid for R600, the wavefront size would not be set and there would be crashes from missing itinerary data. Fixes crashes in future commit caused by dividing by the unset/0 wavefront size. llvm-svn: 271561
* AMDGPU: SIDebuggerInsertNops preserves CFGMatt Arsenault2016-06-021-0/+1
| | | | | | | This saves an additional run of the DominatorTree and MachineLoopInfo llvm-svn: 271444
OpenPOWER on IntegriCloud