summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Remove a useless variable which caused build failure for lld.Yaxun Liu2016-09-071-1/+1
| | | | llvm-svn: 280841
* AMDGPU: Add hidden kernel arguments to runtime metadataYaxun Liu2016-09-072-83/+157
| | | | | | | | OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata. Differential Revision: https://reviews.llvm.org/D23424 llvm-svn: 280829
* AMDGPU: Make some scalar instructions commutableMatt Arsenault2016-09-071-2/+9
| | | | llvm-svn: 280784
* Remove unnecessary call to getAllocatableRegClassMatt Arsenault2016-09-072-9/+4
| | | | | | | | This reapplies r252565 and r252674, effectively reverting r252956. This allows VS_32/VS_64 to be unallocatable like they should be. llvm-svn: 280783
* [AMDGPU] Wave and register controlsKonstantin Zhuravlyov2016-09-0614-196/+555
| | | | | | | | | | | | | | - Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
* AMDGPU/SI: Teach SIInstrInfo::FoldImmediate() to fold immediates into copiesTom Stellard2016-09-062-3/+29
| | | | | | | | | | | | | | | | | | | | Summary: I put this code here, because I want to re-use it in a few other places. This supersedes some of the immediate folding code we have in SIFoldOperands. I think the peephole optimizers is probably a better place for folding immediates into copies, since it does some register coalescing in the same time. This will also make it easier to transition SIFoldOperands into a smarter pass, where it looks at all uses of instruction at once to determine the optimal way to fold operands. Right now, the pass just considers one operand at a time. Reviewers: arsenm Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23402 llvm-svn: 280744
* AMDGPU : Add XNACK feature to GPUs that support it.Wei Ding2016-09-061-2/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D24276 llvm-svn: 280742
* [AMDGPU] Refactor FLAT TD instructionsValery Pykhtin2016-09-056-438/+525
| | | | | | Differential revision: https://reviews.llvm.org/D24072 llvm-svn: 280655
* AMDGPU: Set sizes of spill pseudosMatt Arsenault2016-09-033-3/+13
| | | | llvm-svn: 280595
* AMDGPU: Fix adding duplicate implicit exec usesMatt Arsenault2016-09-031-1/+15
| | | | | | | | I'm not sure if this should be considered a bug in copyImplicitOps or not, but implicit operands that are part of the static instruction definition should not be copied. llvm-svn: 280594
* AMDGPU: Reduce the duration of whole-quad-modeNicolai Haehnle2016-09-031-54/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This contains two changes that reduce the time spent in WQM, with the intention of reducing bandwidth required by VMEM loads: 1. Sampling instructions by themselves don't need to run in WQM, only their coordinate inputs need it (unless of course there is a dependent sampling instruction). The initial scanInstructions step is modified accordingly. 2. When switching back from WQM to Exact, switch back as soon as possible. This affects the logic in processBlock. This should always be a win or at best neutral. There are also some cleanups (e.g. remove unused ExecExports) and some new debugging output. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D22092 llvm-svn: 280590
* AMDGPU: Fix an interaction between WQM and polygon stipplingNicolai Haehnle2016-09-032-7/+1
| | | | | | | | | | | | | | | | | | | | | Summary: This fixes a rare bug in polygon stippling with non-monolithic pixel shaders. The underlying problem is as follows: the prolog part contains the polygon stippling sequence, i.e. a kill. The main part then enables WQM based on the _reduced_ exec mask, effectively undoing most of the polygon stippling. Since we cannot know whether polygon stippling will be used, the main part of a non-monolithic shader must always return to exact mode to fix this problem. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23131 llvm-svn: 280589
* AMDGPU: Fix spilling of m0Matt Arsenault2016-09-033-20/+44
| | | | | | | | | readlane/writelane do not support using m0 as the output/input. Constrain the register class of spill vregs to try to avoid this, but also handle spilling of the physreg when necessary by inserting an additional copy to a normal SGPR. llvm-svn: 280584
* AMDGPU/R600: EXTRACT_VECT_ELT should only bypass BUILD_VECTOR if the vectors ↵Jan Vesely2016-09-021-1/+3
| | | | | | | | | | have the same number of elements. Fixes R600 piglit regressions since r280298 Differential Revision: https://reviews.llvm.org/D24174 llvm-svn: 280535
* AMDGPU/R600: Expand unaligned writes to local and global ASJan Vesely2016-09-021-2/+11
| | | | | | | | | LOCAL and GLOBAL AS only PRIVATE needs special treatment Differential Revision: https://reviews.llvm.org/D23971 llvm-svn: 280526
* AMDGPU: Add runtime metadata for pointee alignment of argument.Yaxun Liu2016-09-012-1/+8
| | | | | | | | Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer. Differential Revision: https://reviews.llvm.org/D24145 llvm-svn: 280399
* AMDGPU/SI: MIMG TD Refactoring.Changpeng Fang2016-09-013-744/+751
| | | | | | | | | | | | | | Summary: Created a new td file MIMGInstructions.td which contains all definitions of MIMG related instructions. Reviewed by: kzhuravl, vpykhtin Differential Revision: http://reviews.llvm.org/D24106 llvm-svn: 280385
* [AMDGPU] Scalar Memory instructions TD refactoringValery Pykhtin2016-09-017-398/+431
| | | | | | Differential revision: https://reviews.llvm.org/D23996 llvm-svn: 280349
* AMDGPU: Fix introducing stack access on unaligned v16i8Matt Arsenault2016-08-311-1/+8
| | | | llvm-svn: 280298
* AMDGPU: Use copy instead of mov during frame loweringMatt Arsenault2016-08-311-15/+6
| | | | | | | This occurs before RA pseudos are expanded. It's less code to emit the copy. llvm-svn: 280297
* AMDGPU: Refactor frame loweringMatt Arsenault2016-08-312-121/+181
| | | | | | This will make future changes easier. llvm-svn: 280296
* AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is at least 4-byte alignedTom Stellard2016-08-311-1/+1
| | | | | | | | | | | | Summary: This fixes some OpenCV tests that were broken by libclc commit r276443. Reviewers: arsenm, jvesely Subscribers: arsenm, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D24051 llvm-svn: 280274
* AMDGPU/SI: Handle aliases in AMDGPUAlwaysInlinePassNikolay Haustov2016-08-311-0/+12
| | | | | | | | | | | | | | | | | Summary: Simply replace usage of aliases to functions with aliasee. This came up when bitcode linking to builtin library and calls to aliases not being resolved. Also made minor improvements to existing test. Reviewers: tstellarAMD, alex-t, vpykhtin Subscribers: arsenm, wdng, rampitec Differential Revision: https://reviews.llvm.org/D24023 llvm-svn: 280221
* AMDGPU: Relax SGPR asm constraint register classMatt Arsenault2016-08-301-1/+1
| | | | | | | s should be SReg_32 to be as general as possible. This can avoid a copy from m0. llvm-svn: 280154
* [AMDGPU] Refactor SOP instructions TD files.Valery Pykhtin2016-08-304-914/+1105
| | | | | | Differential revision: https://reviews.llvm.org/D23617 llvm-svn: 280101
* SILoadStoreOptimizer.cpp: Fix a warning in r279991. [-Wunused-variable]NAKAMURA Takumi2016-08-301-0/+1
| | | | llvm-svn: 280075
* AMDGPU/R600: Cleanup DAGCombineJan Vesely2016-08-291-15/+12
| | | | | | | | | Move SDLoc initialization to comon place. fall back to AMDGPU version in one place Differential Revision: https://reviews.llvm.org/D23900 llvm-svn: 280030
* AMDGPU/R600: Remove MergeVectorStores from legalizationJan Vesely2016-08-293-65/+0
| | | | | | | | This is handled by DAGCombiner in a more generic way Differential Revision: https://reviews.llvm.org/D23970 llvm-svn: 280019
* AMDGPU: fix mismatch tags, NFCSaleem Abdulrasool2016-08-292-2/+2
| | | | llvm-svn: 280006
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-299-1/+445
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
* AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the schedulerTom Stellard2016-08-292-111/+148
| | | | | | | | | | | | | | | | | | | | Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge. Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23814 llvm-svn: 279991
* AMDGPU/R600: Fix fixups used for constant arraysMatt Arsenault2016-08-291-0/+1
| | | | | | Fixes bug 29289 llvm-svn: 279986
* AMDGPU/SI: Improve register allocation hints for sopk instructionsTom Stellard2016-08-291-0/+1
| | | | | | | | | | | | | | | | | | | Summary: For shrinking SOPK instructions, we were creating a hint to tell the register allocator to use the register allocated for src0 for the dst operand as well. However, this seems to not work sometimes depending on the order virtual registers are assigned physical registers. To fix this, I've added a second allocation hint which does the reverse, asks that the register allocated for dst is used for src0. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23862 llvm-svn: 279968
* AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint()Tom Stellard2016-08-291-0/+11
| | | | | | | | | | | | | | Summary: The SILoadStoreOptimizer will need to use AliasAnalysis here in order to move it before scheduling. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23813 llvm-svn: 279963
* AMDGPU/R600: Enable Load combineJan Vesely2016-08-271-0/+1
| | | | | | | | Fix and improve tests Differential Revision: https://reviews.llvm.org/D23899 llvm-svn: 279925
* AMDGPU: Mark sched model completeMatt Arsenault2016-08-271-1/+1
| | | | | | Fixes bug 26800 llvm-svn: 279910
* AMDGPU: Remove unneeded implicit exec uses/defsMatt Arsenault2016-08-272-40/+48
| | | | | | | SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec. SI_IF_BREAK and SI_ELSE_BREAK do not read it either. llvm-svn: 279909
* AMDGPU: Select mulhi 24-bit instructionsMatt Arsenault2016-08-277-20/+163
| | | | llvm-svn: 279902
* AMDGPU: Move cndmask pseudo to be isel pseudoMatt Arsenault2016-08-273-23/+31
| | | | | | | | There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901
* AMDGPU: Fix sched type for branchesMatt Arsenault2016-08-271-1/+1
| | | | llvm-svn: 279900
* AMDGPU: Remove register operand from si_mask_branchMatt Arsenault2016-08-272-5/+3
| | | | | | | | | It isn't used for anything, and is also misleading since it could be spilled at the end of the block, so it can't be relied on. There ends up being a verifier error about using an undefined register since the spill kills the register. llvm-svn: 279899
* AMDGPU: Improve error reporting for maximum branch distanceMatt Arsenault2016-08-271-30/+61
| | | | | | Unfortunately this seems to only help the assembler diagnostic. llvm-svn: 279895
* AMDGPU/SI: Canonicalize offset order for merged DS instructionsTom Stellard2016-08-261-3/+15
| | | | | | | | | | | | | | | | | | | Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads together without out explicitly clustering them, which would give us non-sorted offsets. Also, we will want to do this if we move the load/store optimizer before the scheduler. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23776 llvm-svn: 279870
* XXXTom Stellard2016-08-261-1/+1
| | | | llvm-svn: 279868
* AMDGPU/SI: Use a better method for determining the largest pressure setsTom Stellard2016-08-263-15/+41
| | | | | | | | | | | | | | | | | | | | | | Summary: There are a few different sgpr pressure sets, but we only care about the one which covers all of the sgprs. We were using hard-coded register pressure set names to determine the reg set id for the biggest sgpr set. However, we were using the wrong name, and this method is pretty fragile, since the reg pressure set names may change. The new method just looks for the pressure set that contains the most reg units and sets that set as our SGPR pressure set. We've also adopted the same technique for determining our VGPR pressure set. Reviewers: arsenm Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23687 llvm-svn: 279867
* [MC] Move .cv_loc management logic out of MCContextReid Kleckner2016-08-261-0/+1
| | | | | | | | | | | MCContext already has many tasks, and separating CodeView out from it is probably a good idea. The .cv_loc tracking was modelled on the DWARF tracking which lived directly in MCContext. Removes the inclusion of MCCodeView.h from MCContext.h, so now there are only 10 build actions while I hack on CodeView support instead of 265. llvm-svn: 279847
* AMDGCN/SI: Implement readlane/readfirstlane intrinsicsChangpeng Fang2016-08-241-4/+5
| | | | | | | | | | | | | | | Summary: This patch implements readlane/readfirstlane intrinsics. TODO: need to define a new register class to consider the case that the source could be a vector register or M0. Reviewed by: arsenm and tstellarAMD Differential Revision: http://reviews.llvm.org/D22489 llvm-svn: 279660
* AMDGPU : Add V_SAD_U32 instruction pattern.Wei Ding2016-08-242-0/+27
| | | | | | Differential Revision: http://reviews.llvm.org/D23069 llvm-svn: 279629
* CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePassesMatthias Braun2016-08-241-3/+1
| | | | | | | | | | | | | | | | | | | | | | Re-apply this patch, hopefully I will get away without any warnings in the constructor now. This patch removes the MachineFunctionAnalysis. Instead we keep a map from IR Function to MachineFunction in the MachineModuleInfo. This allows the insertion of ModulePasses into the codegen pipeline without breaking it because the MachineFunctionAnalysis gets dropped before a module pass. Peak memory should stay unchanged without a ModulePass in the codegen pipeline: Previously the MachineFunction was freed at the end of a codegen function pipeline because the MachineFunctionAnalysis was dropped; With this patch the MachineFunction is freed after the AsmPrinter has finished. Differential Revision: http://reviews.llvm.org/D23736 llvm-svn: 279602
* Revert r279564. It introduces undefined behavior (binding a reference to aRichard Smith2016-08-231-1/+3
| | | | | | | dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes -Werror builds (including several buildbots) to fail. llvm-svn: 279580
OpenPOWER on IntegriCloud