summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: fdiv -1, x -> rcp -xMatt Arsenault2016-08-021-16/+25
| | | | llvm-svn: 277535
* AMDGPU: Stay in WQM for non-intrinsic storesNicolai Haehnle2016-08-026-10/+33
| | | | | | | | | | | | | | | | | | | | | | | Summary: Two types of stores are possible in pixel shaders: stores to memory that are explicitly requested at the API level, and stores that are an implementation detail of register spilling or lowering of arrays. For the first kind of store, we must ensure that helper pixels have no effect and hence WQM must be disabled. The second kind of store must always be executed, because the written value may be loaded again in a way that is relevant for helper pixels as well -- and there are no externally visible effects anyway. This is a candidate for the 3.9 release branch. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D22675 llvm-svn: 277504
* AMDGPU: Track physical registers in SIWholeQuadModeNicolai Haehnle2016-08-021-26/+53
| | | | | | | | | | | | | | | | | | | | Summary: There are cases where uniform branch conditions are computed in VGPRs, and we didn't correctly mark those as WQM. The stray change in basic-branch.ll is because invoking the LiveIntervals analysis leads to the detection of a dead register that would otherwise not be seen at -O0. This is a candidate for the 3.9 branch, as it fixes a possible hang. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22673 llvm-svn: 277500
* [AMDGPU] refactor DS instruction definitions. NFC.Valery Pykhtin2016-08-017-608/+896
| | | | | | Differential revision: https://reviews.llvm.org/D22522 llvm-svn: 277344
* [AMDGPU] Fix lifetime of SmallVector temporaries.Benjamin Kramer2016-07-301-6/+4
| | | | | | Found by asan -fsanitize-address-use-after-scope. llvm-svn: 277265
* AMDGPU: Fix shouldConvertConstantLoadToIntImm behaviorMatt Arsenault2016-07-301-2/+2
| | | | | | | This should really be true for any immediate, not just inline ones. llvm-svn: 277260
* AMDGPU: Set s_setpc_b64 as a terminatorMatt Arsenault2016-07-301-0/+3
| | | | llvm-svn: 277259
* AMDGPU: Remove unused patternMatt Arsenault2016-07-301-8/+7
| | | | llvm-svn: 277258
* TargetInstrInfo: add virtual function getInstSizeInBytesSjoerd Meijer2016-07-291-1/+1
| | | | | | | | | This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of subclasses already implement. Differential Revision: https://reviews.llvm.org/D22885 llvm-svn: 277126
* AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.Changpeng Fang2016-07-281-0/+2
| | | | | | | | Differential Revision: http://reviews.llvm.org/D22021 Reviewed by: arsenm llvm-svn: 277073
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-289-44/+44
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* AMDGPU : Add intrinsics for compare with the full wavefront resultWei Ding2016-07-285-0/+103
| | | | | | Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998
* AMDGPU/SI: Don't use reserved VGPRs for SGPR spillingTom Stellard2016-07-284-6/+12
| | | | | | | | | | | | | | | Summary: We were using reserved VGPRs for SGPR spilling and this was causing some programs with a workgroup size of 1024 to use more than 64 registers, which is illegal. Reviewers: arsenm, mareko, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22032 llvm-svn: 276980
* AMDGPU: add execfix flag to SI_ELSENicolai Haehnle2016-07-283-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: SI_ELSE is lowered into two parts: s_or_saveexec_b64 dst, src (at the start of the basic block) s_xor_b64 exec, exec, dst (at the end of the basic block) The idea is that dst contains the exec mask of the preceding IF block. It can happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside the basic block that contains SI_ELSE, in which case it introduces an instruction s_and_b64 exec, exec, s[...] which masks out bits that can correspond to both the IF and the ELSE paths. So the resulting sequence must be: s_or_savexec_b64 dst, src s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode s_and_b64 dst, dst, exec <-- added by SILowerControlFlow s_xor_b64 exec, exec, dst Whether to add the additional s_and_b64 dst, dst, exec is currently determined via the ExecModified tracking. With this change, it is instead determined by an additional flag on SI_ELSE which is set by SIWholeQuadMode. Finally: It also occured to me that an alternative approach for the long run is for SILowerControlFlow to unconditionally emit s_or_saveexec_b64 dst, src ... s_and_b64 dst, dst, exec s_xor_b64 exec, exec, dst and have a pass that detects and cleans up the "redundant AND with exec" pattern where possible. This could be useful anyway, because we also add instructions s_and_b64 vcc, exec, vcc before s_cbranch_scc (in moveToALU), and those are often redundant. I have some pending changes to how KILL is lowered that could also benefit from such a cleanup pass. In any case, this current patch could help in the short term with the whole ExecModified business. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D22846 llvm-svn: 276972
* AMDGPU: Turn dead checks into assertsMatt Arsenault2016-07-281-9/+5
| | | | llvm-svn: 276946
* AMDGPU: Remove analyzeImmediateMatt Arsenault2016-07-283-34/+12
| | | | | | | This no longer uses the more complicated classification of constants. llvm-svn: 276945
* Remove MCAsmInfo.h include from TargetOptions.hReid Kleckner2016-07-271-0/+1
| | | | | | | | | TargetOptions wants the ExceptionHandling enum. Move that to MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in clang. Now you can add a DWARF tag without a full rebuild of clang semantic analysis. llvm-svn: 276883
* [GlobalISel] Introduce an instruction selector.Ahmed Bougacha2016-07-271-0/+5
| | | | | | | | And implement it for AArch64, supporting x/w ADD/OR. Differential Revision: https://reviews.llvm.org/D22373 llvm-svn: 276875
* AMDGPU: Use rcp for fdiv 1, x with fpmath metadataMatt Arsenault2016-07-261-1/+1
| | | | | | | Using rcp should be OK for safe math usually, so this should not be replacing the original fdiv. llvm-svn: 276823
* AMDGPU: Use implicit_def for selecting anyextMatt Arsenault2016-07-261-4/+7
| | | | llvm-svn: 276819
* AMDGPU/R600: Remove dead custom insertersMatt Arsenault2016-07-261-209/+1
| | | | | | The intrinsics for these were removed, so this is dead. llvm-svn: 276805
* AMDGPU: Minor AsmPrinter cleanupsMatt Arsenault2016-07-261-79/+84
| | | | llvm-svn: 276804
* AMDGPU: Make AMDGPUMachineFunction fields privateMatt Arsenault2016-07-2610-56/+80
| | | | | | | | | ABIArgOffset is a problem because properly fsetting the KernArgSize requires that the reserved area before the real kernel arguments be correctly aligned, which requires fixing clover. llvm-svn: 276766
* AMDGPU: Add fp legacy instruction intrinsicsMatt Arsenault2016-07-265-2/+21
| | | | | | | This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764
* AMDGPU: Remove read_workdim intrinsicJan Vesely2016-07-253-14/+0
| | | | | | Differential revision: https://reviews.llvm.org/D22732 llvm-svn: 276682
* AMDGPU: Make skip threshold an optionMatt Arsenault2016-07-251-3/+8
| | | | llvm-svn: 276680
* AMDGPU: Delete dead codeMatt Arsenault2016-07-254-41/+0
| | | | llvm-svn: 276675
* MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFCJoel Jones2016-07-252-2/+5
| | | | | | | | | | | | | | | Some targets, notably AArch64 for ILP32, have different relocation encodings based upon the ABI. This is an enabling change, so a future patch can use the ABIName from MCTargetOptions to chose which relocations to use. Tested using check-llvm. The corresponding change to clang is in: http://reviews.llvm.org/D16538 Patch by: Joel Jones Differential Revision: https://reviews.llvm.org/D16213 llvm-svn: 276654
* AMDGPU: Delete dead codeMatt Arsenault2016-07-232-97/+0
| | | | | | This has been dead since r269479 llvm-svn: 276518
* Revert "[AMDGPU] Emit read-only data to .rodata for hsa"Tom Stellard2016-07-221-2/+1
| | | | | | | | | | | | This reverts commit r276298. Data stored in .rodata can have a negative offset from .text, but we don't support negative values in relocations yet. This caused a regression in one of the amp conformance tests: 5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01 llvm-svn: 276498
* GlobalISel: implement legalization pass, with just one transformation.Tim Northover2016-07-221-0/+5
| | | | | | | | | This adds the actual MachineLegalizeHelper to do the work and a trivial pass wrapper that legalizes all instructions in a MachineFunction. Currently the only transformation supported is splitting up a vector G_ADD into one acting on smaller vectors. llvm-svn: 276461
* AMDGPU: Fix groupstaticsize for large LDSMatt Arsenault2016-07-221-3/+3
| | | | | | | | | The size can exceed s_movk_i32's limit, and we don't want to use it this early since it inhibits optimizations. This should probably be merged to the release branch. llvm-svn: 276438
* AMDGPU: Add HSA dispatch id intrinsicMatt Arsenault2016-07-225-8/+31
| | | | llvm-svn: 276437
* AMDGPU: Delete more dead codeMatt Arsenault2016-07-2210-182/+15
| | | | | | | Remove dead code from r600 intrinsic removal. Remove unset members, rename StackSize to be less ambiguous. llvm-svn: 276436
* AMDGPU: Fix i1 fp_to_intMatt Arsenault2016-07-224-7/+34
| | | | | | | R600's i1 fp_to_uint selected but was incorrect according to what instcombine constant folds to. llvm-svn: 276435
* AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIsMatt Arsenault2016-07-221-26/+2
| | | | llvm-svn: 276434
* [AMDGPU] Emit read-only data to .rodata for hsaKonstantin Zhuravlyov2016-07-211-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D22538 llvm-svn: 276298
* AMDGPU/SI: Add support for R_AMDGPU_ABS32Konstantin Zhuravlyov2016-07-211-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D21646 llvm-svn: 276294
* [AMDGPU] Some code cleaning in SIRegisterInfo.tdSam Kolton2016-07-211-33/+23
| | | | | | | | | | Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl Differential Revision: https://reviews.llvm.org/D22620 llvm-svn: 276274
* AMDGPU: Fix phis from blocks split due to register indexingMatt Arsenault2016-07-211-15/+22
| | | | llvm-svn: 276257
* AMDGPU: Fix bug causing crash due to invalid opencl version metadata.Yaxun Liu2016-07-201-9/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D22526 llvm-svn: 276119
* AMDGPU: Change fdiv lowering based on !fpmath metadataMatt Arsenault2016-07-198-49/+227
| | | | | | | | | | | If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051
* [AMDGPU] Remove spurious line (should've been removed in r276029).Davide Italiano2016-07-191-3/+0
| | | | llvm-svn: 276030
* [AMDGPU] Remove dead code.Davide Italiano2016-07-191-25/+0
| | | | | | LGTM'd by Matt Arsenault. llvm-svn: 276029
* AMDGPU: Only use legal inline immediates with kill pseudoMatt Arsenault2016-07-195-3/+15
| | | | | | | | | | | Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988
* AMDGPU/SI: Fix SI scheduler refcount issueMatt Arsenault2016-07-191-0/+3
| | | | | | | | | Without this fix, releaseSuccessors when InOrOutBlock is false could release SUs outside the schedule BasicBlock. Patch by Axel Davy llvm-svn: 275935
* AMDGPU: Expand register indexing pseudos in custom inserterMatt Arsenault2016-07-198-300/+451
| | | | | | | | | | | | | | | | | | | | | | | This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
* AMDGPU: Remove pointless dyn_cast_or_nullMatt Arsenault2016-07-181-4/+3
| | | | | | This is already casted above so non-null llvm-svn: 275881
* AMDGPU: Fix missing switch case warningMatt Arsenault2016-07-181-0/+1
| | | | llvm-svn: 275873
* AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32Matt Arsenault2016-07-185-1/+8
| | | | llvm-svn: 275871
OpenPOWER on IntegriCloud