summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Fix i1 fp_to_intMatt Arsenault2016-07-224-7/+34
| | | | | | | R600's i1 fp_to_uint selected but was incorrect according to what instcombine constant folds to. llvm-svn: 276435
* AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIsMatt Arsenault2016-07-221-26/+2
| | | | llvm-svn: 276434
* [AMDGPU] Emit read-only data to .rodata for hsaKonstantin Zhuravlyov2016-07-211-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D22538 llvm-svn: 276298
* AMDGPU/SI: Add support for R_AMDGPU_ABS32Konstantin Zhuravlyov2016-07-211-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D21646 llvm-svn: 276294
* [AMDGPU] Some code cleaning in SIRegisterInfo.tdSam Kolton2016-07-211-33/+23
| | | | | | | | | | Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl Differential Revision: https://reviews.llvm.org/D22620 llvm-svn: 276274
* AMDGPU: Fix phis from blocks split due to register indexingMatt Arsenault2016-07-211-15/+22
| | | | llvm-svn: 276257
* AMDGPU: Fix bug causing crash due to invalid opencl version metadata.Yaxun Liu2016-07-201-9/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D22526 llvm-svn: 276119
* AMDGPU: Change fdiv lowering based on !fpmath metadataMatt Arsenault2016-07-198-49/+227
| | | | | | | | | | | If 2.5 ulp is acceptable, denormals are not required, and isn't a reciprocal which will already be handled, replace with a faster fdiv. Simplify the lowering tests by using per function subtarget features. llvm-svn: 276051
* [AMDGPU] Remove spurious line (should've been removed in r276029).Davide Italiano2016-07-191-3/+0
| | | | llvm-svn: 276030
* [AMDGPU] Remove dead code.Davide Italiano2016-07-191-25/+0
| | | | | | LGTM'd by Matt Arsenault. llvm-svn: 276029
* AMDGPU: Only use legal inline immediates with kill pseudoMatt Arsenault2016-07-195-3/+15
| | | | | | | | | | | Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988
* AMDGPU/SI: Fix SI scheduler refcount issueMatt Arsenault2016-07-191-0/+3
| | | | | | | | | Without this fix, releaseSuccessors when InOrOutBlock is false could release SUs outside the schedule BasicBlock. Patch by Axel Davy llvm-svn: 275935
* AMDGPU: Expand register indexing pseudos in custom inserterMatt Arsenault2016-07-198-300/+451
| | | | | | | | | | | | | | | | | | | | | | | This is to help moveSILowerControlFlow to before regalloc. There are a couple of tradeoffs with this. The complete CFG is visible to more passes, the loop body avoids an extra copy of m0, vcc isn't required, and immediate offsets can be shrunk into s_movk_i32. The disadvantage is the register allocator doesn't understand that the single lane's vector is dead within the loop body, so an extra register is used to outlive the loop block when expanding the VGPR -> m0 loop. This also now results in worse waitcnt insertion before the loop instead of after for pending operations at the point of the indexing, but that should be fixed by future improvements to cross block waitcnt insertion. v_movreld_b32's operands are now modeled more correctly since vdst is not a true output. This is kind of a hack to treat vdst as a use operand. Extra checking is required in the verifier since I can't seem to get tablegen to emit an implicit operand for a virtual register. llvm-svn: 275934
* AMDGPU: Remove pointless dyn_cast_or_nullMatt Arsenault2016-07-181-4/+3
| | | | | | This is already casted above so non-null llvm-svn: 275881
* AMDGPU: Fix missing switch case warningMatt Arsenault2016-07-181-0/+1
| | | | llvm-svn: 275873
* AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32Matt Arsenault2016-07-185-1/+8
| | | | llvm-svn: 275871
* AMDGPU/R600: Replace barrier intrinsicsMatt Arsenault2016-07-183-21/+1
| | | | llvm-svn: 275870
* AMDGPU: Remove dead check in AMDGPUPromoteAllocaMatt Arsenault2016-07-181-9/+10
| | | | | | | | | | This is currently only called with GEP users. A direct alloca would only happen with current typed pointers for arrays which are a perverse case. Also fix crashes on 0 x and 1 x arrays. llvm-svn: 275869
* AMDGPU: Remove dead code and redundant checkMatt Arsenault2016-07-181-27/+1
| | | | | | | Non intrinsic calls aren't really handled, and this IntrinsicInst dyn_cast checks for the function for us. llvm-svn: 275868
* AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions.Nicolai Haehnle2016-07-181-0/+6
| | | | | | | | | | | | | | | | Summary: The work item intrinsics are not available for the shader calling conventions. And even if we did hook them up most shader stages haves some extra restrictions on the amount of available LDS. Reviewers: tstellarAMD, arsenm Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D20728 llvm-svn: 275779
* Re-commit [AMDGPU] Add metadata for runtimeYaxun Liu2016-07-163-0/+371
| | | | | | Attempting to fix lit test failure on ppc. llvm-svn: 275676
* AMDGPU: Fix verifier error from partially undef copyMatt Arsenault2016-07-151-5/+3
| | | | | | | | | | | | | | In this situation: %VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11, %VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use> %VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3 %VGPR4<def> = COPY %VGPR2 The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1, but VGPR4 is defined immediately after this copy. llvm-svn: 275635
* AMDGPU: Remove brev intrinsicMatt Arsenault2016-07-152-6/+0
| | | | llvm-svn: 275620
* AMDGPU: Fix TargetPrefix for remaining r600 intrinsicsMatt Arsenault2016-07-153-51/+53
| | | | llvm-svn: 275619
* AMDGPU: Remove AMDGPU.ldexpMatt Arsenault2016-07-151-4/+0
| | | | llvm-svn: 275618
* AMDGPU: Remove legacy rsq.clamped intrinsicMatt Arsenault2016-07-154-15/+7
| | | | | | | | Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining. Also fix mismatch with non-IEEE rsq selecting to IEEE rsq. llvm-svn: 275617
* AMDGPU/R600: Delete dead code.Matt Arsenault2016-07-152-58/+1
| | | | | | Dead or the same as the base implementation. llvm-svn: 275616
* Revert "[AMDGPU] Add metadata for runtime"Vitaly Buka2016-07-153-371/+0
| | | | | | This reverts commit r275566. llvm-svn: 275599
* [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, ↵Justin Lebar2016-07-153-67/+41
| | | | | | | | | | | | | | | | | | | | | | | getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592
* [AMDGPU] Add metadata for runtimeYaxun Liu2016-07-153-0/+371
| | | | | | | | | | Added emitting metadata to elf for runtime. Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream. Differential Revision: https://reviews.llvm.org/D21849 llvm-svn: 275566
* Rename AnalyzeBranch* to analyzeBranch*.Jacques Pienaar2016-07-154-11/+8
| | | | | | | | | | | | Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect. Reviewers: tstellarAMD, mcrosier Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai Differential Revision: https://reviews.llvm.org/D22409 llvm-svn: 275564
* AMDGPU: Fix not expanding control flow after some kill blocksMatt Arsenault2016-07-151-7/+2
| | | | | | | | | | | | | Also stop trying to insert skip blocks at end_cf. This was inserting them at the end of the block which doesn't make sense. The skip should be inserted at the beginning of the block right after the end cf. Just remove this for now since no tests seem to stress this and I think this can be handled more generally later. Fixes bug 28550 llvm-svn: 275510
* AMDGPU: Fix trying to skip from a block with no successorsMatt Arsenault2016-07-151-2/+3
| | | | | | Found while reducing bug 28550 llvm-svn: 275509
* AMDGPU: Fix splitting kill blocks with defs before killMatt Arsenault2016-07-151-13/+3
| | | | llvm-svn: 275508
* [AMDGPU] Assembler: fix row_bcast parsingSam Kolton2016-07-141-0/+2
| | | | | | | | | | | | Summary: This change fix bug 28538 Reviewers: tstellarAMD, vpykhtin Subscribers: arsenm, kzhuravl Differential Revision: https://reviews.llvm.org/D22355 llvm-svn: 275422
* AMDGPU/R600: Delete/rename intrinsics no longer used by mesaMatt Arsenault2016-07-147-326/+7
| | | | | | Use the replacement pass to update the tests, and delete old names. llvm-svn: 275375
* AMDGPU/R600: Remove intrinsics with no tests and no usersMatt Arsenault2016-07-144-76/+15
| | | | | | Mesa removed this path, so nothing is using these anymore. llvm-svn: 275372
* AMDGPU: Remove unused intrinsicsMatt Arsenault2016-07-142-12/+0
| | | | llvm-svn: 275371
* AMDGPU: Remove dead codeMatt Arsenault2016-07-142-10/+0
| | | | llvm-svn: 275369
* AMDGPU: Remove last AMDIL intrinsicsMatt Arsenault2016-07-132-11/+1
| | | | llvm-svn: 275309
* AMDGPU/SI: Emit the number of SGPR and VGPR spillsMarek Olsak2016-07-135-0/+30
| | | | | | | | | | | | | | | | | | | | | Summary: v2: don't count SGPRs spilled to scratch twice I think this is sufficient. It doesn't count private memory usage, which happens often and uses scratch but isn't technically a spill. The private memory usage can be computed by: [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills]. The fact SGPR spills add very high numbers to the scratch size make that computation a guessing game, but I don't have a solution to that. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D22197 llvm-svn: 275288
* AMDGPU/SI: Add support for R_AMDGPU_GOTPCRELTom Stellard2016-07-135-28/+69
| | | | | | | | | | Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21484 llvm-svn: 275268
* AMDGPU: Fold out no-op kill intrinsicsMatt Arsenault2016-07-131-0/+8
| | | | llvm-svn: 275253
* AMDGPU: WQM cleanupsMatt Arsenault2016-07-132-42/+39
| | | | | | | | - Add new TTI instruction checks - Don't use const for blocks that are mutated. - Checking isBranch and isTerminator should be redundant llvm-svn: 275252
* AMDGPU: Follow up to r275203Matt Arsenault2016-07-125-33/+101
| | | | | | I meant to squash this into it. llvm-svn: 275220
* AMDGPU: Fix verifier error with kill intrinsicMatt Arsenault2016-07-121-65/+122
| | | | | | | Don't create a terminator in the middle of the block. We should probably get rid of this intrinsic. llvm-svn: 275203
* AMDGPU: Set isConvergent on v_cmpx* instructionsMatt Arsenault2016-07-121-2/+3
| | | | | | | No test since these aren't used now, except for one place in a pre-emit pass. llvm-svn: 275200
* AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8Wei Ding2016-07-121-0/+4
| | | | | | Differential Revision: http://reviews.llvm.org/D22239 llvm-svn: 275197
* AMDGPU: Unify MOVRELSOffset and MOVRELDOffsetNicolai Haehnle2016-07-123-34/+9
| | | | | | | | | | | | | | | | Summary: Previously, constant index insertelements would be turned into SI_INDIRECT_DST, which is bound to prevent some optimization opportunities. Worse, it mislead the heuristic that decides whether immediates should be lowered to S_MOV_B32 or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D22217 llvm-svn: 275160
* AMDGPU: Cleanup pseudoinstructionsMatt Arsenault2016-07-123-58/+55
| | | | llvm-svn: 275133
OpenPOWER on IntegriCloud