summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Don't fold subregister extracts into tied operandsMatt Arsenault2016-08-151-0/+15
| | | | llvm-svn: 278676
* [LSR] Don't try and create post-inc expressions on non-rotated loopsJames Molloy2016-08-151-2/+3
| | | | | | | | | | | | | | | If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs! Motivating testcase: void f(float *a, float *b, float *c, int n) { while (n-- > 0) *c++ = *a++ + *b++; } It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage. llvm-svn: 278658
* Revert "Revert "Invariant start/end intrinsics overloaded for address space""Mehdi Amini2016-08-131-4/+4
| | | | | | This reverts commit 32fc6488e48eafc0ca1bac1bd9cbf0008224d530. llvm-svn: 278609
* Revert "Invariant start/end intrinsics overloaded for address space"Mehdi Amini2016-08-131-4/+4
| | | | | | This reverts commit r276447. llvm-svn: 278608
* AMDGPU: Fix missing test for addressing mode with odd offsetsMatt Arsenault2016-08-131-5/+31
| | | | | | Add test if the constant offset looks unaligned. llvm-svn: 278589
* AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32Wei Ding2016-08-111-0/+60
| | | | | | Differential Revision: http://reviews.llvm.org/D23336 llvm-svn: 278403
* AMDGPU: Fix crashes on memory functionsMatt Arsenault2016-08-111-0/+54
| | | | llvm-svn: 278369
* AMDGPU : Fix SAD related instruction LIT tests function atttibute issues.Wei Ding2016-08-117-21/+14
| | | | | | Differential Revision: http://reviews.llvm.org/D23133 llvm-svn: 278360
* AMDGPU : Add LLVM intrinsics for SAD related instructions.Wei Ding2016-08-117-0/+161
| | | | | | Differential Revision: http://reviews.llvm.org/D23133 llvm-svn: 278354
* AMDGPU/SI: Implement amdgcn image intrinsics with samplerChangpeng Fang2016-08-104-0/+817
| | | | | | | | | | | | | | | | | | | | | | Summary: This patch define and implement amdgcn image intrinsics with sampler. 1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty, and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name to have three suffixes to overload each of these three types; 2. D128 as well as two other flags are implied in the three types, for example, if you use v8i32 as resource type, then r128 is 0! 3. don't expose TFE flag, and other flags are exposed in the instruction order: unrm, glc, slc, lwe and da. Differential Revision: http://reviews.llvm.org/D22838 Reviewed by: arsenm and tstellarAMD llvm-svn: 278291
* AMDGPU: Change insertion point of si_mask_branchMatt Arsenault2016-08-105-17/+30
| | | | | | | | | | | | | Insert before the skip branch if one is created. This is a somewhat more natural placement relative to the skip branches, and makes it possible to implement analyzeBranch for skip blocks. The test changes are mostly due to a quirk where the block label is not emitted if there is a terminator that is not also a branch. llvm-svn: 278273
* LiveIntervalAnalysis: fix a crash in repairOldRegInRangeNicolai Haehnle2016-08-101-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: See the new test case for one that was (non-deterministically) crashing on trunk and deterministically hit the assertion that I added in D23302. Basically, the machine function contains a sequence DS_WRITE_B32 %vreg4, %vreg14:sub0, ... DS_WRITE_B32 %vreg4, %vreg14:sub0, ... %vreg14:sub1<def> = COPY %vreg14:sub0 and SILoadStoreOptimizer::mergeWrite2Pair merges the two DS_WRITE_B32 instructions into one before calling repairIntervalsInRange. Now repairIntervalsInRange wants to repair %vreg14, in particular, and ends up trying to repair %vreg14:sub1 as well, but that only becomes active _after_ the range that is to be repaired, hence the crash due to LR.find(...) == LR.begin() at the start of repairOldRegInRange. I believe that just skipping those subrange is fine, but again, not too familiar with that code. Reviewers: MatzeB, kparzysz, tstellarAMD Subscribers: llvm-commits, MatzeB Differential Revision: https://reviews.llvm.org/D23303 llvm-svn: 278268
* AMDGPU/SI: Increase SGPR limit to 96 on Tonga/IcelandMarek Olsak2016-08-052-34/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the setting of the Vulkan closed source driver. It decreases the max wave count from 10 to 8. 26010 shaders in 14650 tests Totals: VGPRS: 829593 -> 808440 (-2.55 %) Spilled SGPRs: 81878 -> 42226 (-48.43 %) Spilled VGPRs: 367 -> 358 (-2.45 %) Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread Code Size: 36677864 -> 35923932 (-2.06 %) bytes There is a massive decrease in SGPR spilling in general and -7.4% spilled VGPRs for DiRT Showdown (= SGPRs spilled to scratch?) Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23034 llvm-svn: 277867
* [OpenCL] Add missing tests for getOCLTypeNameYaxun Liu2016-08-041-0/+217
| | | | | | | | | | Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown. Patch by Aaron En Ye Shi. Differential Revision: https://reviews.llvm.org/D22964 llvm-svn: 277759
* AMDGPU: Fix a slow test by using basic regallocMatt Arsenault2016-08-041-1/+1
| | | | | | | | | | This just tests that the register limit isn't exceeded, so the regisetr allocation doesn't need to be great.' The critically slow part is all in greedy RA, so switch to basic. llvm-svn: 277700
* RenameIndependentSubregs: Fix liveness query in rewriteOperands()Matthias Braun2016-08-031-2/+42
| | | | | | | | rewriteOperands() always performed liveness queries at the base index rather than the RegSlot/Base as apropriate for the machine operand. This could lead to illegal rewriting in some cases. llvm-svn: 277661
* AMDGPU: fdiv -1, x -> rcp -xMatt Arsenault2016-08-022-1/+98
| | | | llvm-svn: 277535
* AMDGPU: Stay in WQM for non-intrinsic storesNicolai Haehnle2016-08-022-44/+74
| | | | | | | | | | | | | | | | | | | | | | | Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays. For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway. This is a candidate for the 3.9 release branch. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D22675 llvm-svn: 277504
* AMDGPU: Track physical registers in SIWholeQuadModeNicolai Haehnle2016-08-022-1/+38
| | | | | | | | | | | | | | | | | | | | Summary: There are cases where uniform branch conditions are computed in VGPRs, and we didn't correctly mark those as WQM. The stray change in basic-branch.ll is because invoking the LiveIntervals analysis leads to the detection of a dead register that would otherwise not be seen at -O0. This is a candidate for the 3.9 branch, as it fixes a possible hang. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22673 llvm-svn: 277500
* AMDGPU: Fix shouldConvertConstantLoadToIntImm behaviorMatt Arsenault2016-07-301-1/+46
| | | | | | | This should really be true for any immediate, not just inline ones. llvm-svn: 277260
* AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.Changpeng Fang2016-07-281-0/+71
| | | | | | | | Differential Revision: http://reviews.llvm.org/D22021 Reviewed by: arsenm llvm-svn: 277073
* AMDGPU : Add intrinsics for compare with the full wavefront resultWei Ding2016-07-282-0/+400
| | | | | | Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998
* AMDGPU: add execfix flag to SI_ELSENicolai Haehnle2016-07-281-0/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 llvm-svn: 276972
* AMDGPU: Use rcp for fdiv 1, x with fpmath metadataMatt Arsenault2016-07-263-18/+94
| | | | | | | Using rcp should be OK for safe math usually, so this should not be replacing the original fdiv. llvm-svn: 276823
* AMDGPU: Add more tests for LDS size with occupancyMatt Arsenault2016-07-261-2/+149
| | | | llvm-svn: 276821
* MIRParser: Use dot instead of colon to mark subregistersMatthias Braun2016-07-263-85/+85
| | | | | | | | | | | | | | | | | Change the syntax to use `%0.sub8` to denote a subregister. This seems like a more natural fit to denote subregisters; I also plan to introduce a new ":classname" syntax in upcoming patches to denote the register class of a vreg. Note that this commit disallows plain identifiers to start with a '.' character. This shouldn't affect anything as external names/IR references are all prefixed with '$'/'%', plain identifiers are only used for instruction names, register mask names and subreg indexes. Differential Revision: https://reviews.llvm.org/D22390 llvm-svn: 276815
* GlobalISel: omit braces on MachineInstr types when there's only one.Tim Northover2016-07-261-1/+1
| | | | | | Tidies up the representation a bit in the common case. llvm-svn: 276772
* AMDGPU: Add missing tests for xnack option for HSAMatt Arsenault2016-07-261-5/+21
| | | | llvm-svn: 276765
* AMDGPU: Add fp legacy instruction intrinsicsMatt Arsenault2016-07-262-0/+96
| | | | | | | This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764
* AMDGPU: Remove read_workdim intrinsicJan Vesely2016-07-254-115/+0
| | | | | | Differential revision: https://reviews.llvm.org/D22732 llvm-svn: 276682
* AMDGPU: Fix missing verify-machineinstrs in control flow testMatt Arsenault2016-07-251-1/+1
| | | | llvm-svn: 276679
* Revert "[AMDGPU] Emit read-only data to .rodata for hsa"Tom Stellard2016-07-222-2/+2
| | | | | | | | | | | | This reverts commit r276298. Data stored in .rodata can have a negative offset from .text, but we don't support negative values in relocations yet. This caused a regression in one of the amp conformance tests: 5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01 llvm-svn: 276498
* GlobalISel: allow multiple types on MachineInstrs.Tim Northover2016-07-221-1/+1
| | | | llvm-svn: 276481
* Invariant start/end intrinsics overloaded for address spaceAnna Thomas2016-07-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: The llvm.invariant.start and llvm.invariant.end intrinsics currently support specifying invariant memory objects only in the default address space. With this change, these intrinsics are overloaded for any adddress space for memory objects and we can use these llvm invariant intrinsics in non-default address spaces. Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr) This overloaded intrinsic is needed for representing final or invariant memory in managed languages. Reviewers: apilipenko, reames Subscribers: llvm-commits llvm-svn: 276447
* AMDGPU: Remove redundant testMatt Arsenault2016-07-222-115/+1
| | | | llvm-svn: 276439
* AMDGPU: Fix groupstaticsize for large LDSMatt Arsenault2016-07-221-2/+15
| | | | | | | | | The size can exceed s_movk_i32's limit, and we don't want to use it this early since it inhibits optimizations. This should probably be merged to the release branch. llvm-svn: 276438
* AMDGPU: Add HSA dispatch id intrinsicMatt Arsenault2016-07-221-0/+19
| | | | llvm-svn: 276437
* AMDGPU: Fix i1 fp_to_intMatt Arsenault2016-07-224-7/+97
| | | | | | | R600's i1 fp_to_uint selected but was incorrect according to what instcombine constant folds to. llvm-svn: 276435
* Revert "Invariant start/end intrinsics overloaded for address space"Anna Thomas2016-07-211-4/+4
| | | | | | This reverts commit r276316. llvm-svn: 276320
* Invariant start/end intrinsics overloaded for address spaceAnna Thomas2016-07-211-4/+4
| | | | | | | | | | | | | | | | | | | | | Summary: The llvm.invariant.start and llvm.invariant.end intrinsics currently support specifying invariant memory objects only in the default address space. With this change, these intrinsics are overloaded for any adddress space for memory objects and we can use these llvm invariant intrinsics in non-default address spaces. Example: llvm.invariant.start.p1i8(i64 4, i8 addrspace(1)* %ptr) This overloaded intrinsic is needed for representing final or invariant memory in managed languages. Reviewers: tstellarAMD, reames, apilipenko Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D22519 llvm-svn: 276316
* [AMDGPU] Emit read-only data to .rodata for hsaKonstantin Zhuravlyov2016-07-212-2/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D22538 llvm-svn: 276298
* AMDGPU: Fix phis from blocks split due to register indexingMatt Arsenault2016-07-211-12/+47
| | | | llvm-svn: 276257
* GlobalISel: implement low-level type with just size & vector lanes.Tim Northover2016-07-201-1/+1
| | | | | | | | This should be all the low-level instruction selection needs to determine how to implement an operation, with the remaining context taken from the opcode (e.g. G_ADD vs G_FADD) or other flags not based on type (e.g. fast-math). llvm-svn: 276158
* AMDGPU: Add missing test coverage for control flow breaksMatt Arsenault2016-07-201-0/+71
| | | | | | None of the current lit tests hit si_break handling. llvm-svn: 276129
* AMDGPU: Fix bug causing crash due to invalid opencl version metadata.Yaxun Liu2016-07-203-0/+26
| | | | | | Differential Revision: https://reviews.llvm.org/D22526 llvm-svn: 276119
* Revert "RegScavenging: Add scavengeRegisterBackwards()"Matthias Braun2016-07-201-2/+2
| | | | | | | | | Reverting this commit for now as it seems to be causing failures on test-suite tests on the clang-ppc64le-linux-lnt bot. This reverts commit r276044. llvm-svn: 276068
* AMDGPU: Change fdiv lowering based on !fpmath metadataMatt Arsenault2016-07-193-145/+366
| | | | | | | | | | | If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051
* RegScavenging: Add scavengeRegisterBackwards()Matthias Braun2016-07-191-2/+2
| | | | | | | | | | | | | | This is a variant of scavengeRegister() that works for enterBasicBlockEnd()/backward(). The benefit of the backward mode is that it is not affected by incomplete kill flags. This patch also changes PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register scavenger in backwards mode. Differential Revision: http://reviews.llvm.org/D21885 llvm-svn: 276044
* AMDGPU: Expand register indexing pseudos in custom inserterMatt Arsenault2016-07-193-412/+130
| | | | | | | | | | | | | | | | | | | | | | | This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
* AMDGPU: Fix test name and broken CHECK-LABELMatt Arsenault2016-07-181-6/+3
| | | | llvm-svn: 275928
OpenPOWER on IntegriCloud