summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Add new name for barrier intrinsicMatt Arsenault2016-01-221-0/+28
| | | | llvm-svn: 258558
* AMDGPU: Rename intrinsics to use amdgcn prefixMatt Arsenault2016-01-2219-212/+308
| | | | | | | | | | | The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557
* AMDGPU: Fix crash with invariant markersMatt Arsenault2016-01-221-0/+25
| | | | | | | | The promote alloca pass didn't handle these intrinsics and crashed. These intrinsics should accept any address space, but for now just erase them to avoid breaking. llvm-svn: 258537
* AMDGPU: Rename some r600 intrinsics to use correct TargetPrefixMatt Arsenault2016-01-222-9/+9
| | | | | | These ones aren't directly emitted by mesa and inserted by a pass. llvm-svn: 258523
* AMDGPU: Remove AMDGPU.fract intrinsicMatt Arsenault2016-01-224-71/+87
| | | | | | | Mesa doesn't use this, and this is pattern matched already from fsub x, (ffloor x) llvm-svn: 258513
* AMDGPU/SI: Promote i1 SETCC operationsTom Stellard2016-01-201-0/+20
| | | | | | | | | | | | | | Summary: While working on uniform branching, I've hit a few cases where we emit i1 SETCC operations. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16233 llvm-svn: 258352
* AMDGPU: Remove AMDGPU.trunc intrinsicMatt Arsenault2016-01-201-17/+0
| | | | llvm-svn: 258348
* AMDGPU: Remove AMDIL.fraction intrinsicMatt Arsenault2016-01-202-19/+3
| | | | llvm-svn: 258347
* AMDGPU: Remove AMDIL.round.nearest intrinsicMatt Arsenault2016-01-201-12/+0
| | | | llvm-svn: 258346
* AMDGPU: Remove abs intrinsicMatt Arsenault2016-01-201-47/+0
| | | | llvm-svn: 258343
* AMDGPU: Remove min/max intrinsicsMatt Arsenault2016-01-205-163/+2
| | | | | | This removes support for mesa 11.0.x llvm-svn: 258342
* AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputsTom Stellard2016-01-203-2/+69
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15035 llvm-svn: 258256
* AMDGPU: Reduce 64-bit SRAsMatt Arsenault2016-01-183-20/+30
| | | | llvm-svn: 258096
* AMDGPU: Split 64-bit and of constant upMatt Arsenault2016-01-183-51/+399
| | | | | | | | | | This breaks the tests that were meant for testing 64-bit inline immediates, so move those to shl where they won't be broken up. This should be repeated for the other related bit ops. llvm-svn: 258095
* AMDGPU: Generalize shl combineMatt Arsenault2016-01-181-0/+47
| | | | | | | Reduce 64-bit shl with constant > 32. We already special cased this for the == 32 case, but this also works for any >= 32 constant. llvm-svn: 258092
* AMDGPU: Reduce 64-bit lshr by constant to 32-bitMatt Arsenault2016-01-182-2/+64
| | | | | | 64-bit shifts are very slow on some subtargets. llvm-svn: 258090
* AMDGPU: Cleanup sra testMatt Arsenault2016-01-181-163/+206
| | | | llvm-svn: 258086
* AMDGPU/SI: Update ISA version for FIJIChangpeng Fang2016-01-131-0/+2
| | | | llvm-svn: 257666
* AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabledMarek Olsak2016-01-131-0/+17
| | | | | | | | | | Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16037 llvm-svn: 257625
* AMDGPU/SI: Add tests for non-void functions and InitialPSInputAddrMarek Olsak2016-01-131-0/+228
| | | | | | | | Reviewers: tstellarAMD, arsenm Differential Revision: http://reviews.llvm.org/D16036 llvm-svn: 257624
* AMDGPU/SI: Add SI Machine SchedulerNicolai Haehnle2016-01-131-0/+55
| | | | | | | | | | | | | | | | Summary: It is off by default, but can be used with --misched=si Patch by: Axel Davy Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D11885 llvm-svn: 257609
* AMDGPU: Emit note directive for HSA even if there are no functionsTom Stellard2016-01-121-0/+6
| | | | | | | | | | Reviewers: arsenm, echristo Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16010 llvm-svn: 257488
* AMDGPU: Implement {{s|u}}int_to_fp i64 -> f32Matt Arsenault2016-01-113-7/+128
| | | | | | | The old lowering for uint_to_fp failed opencl conformance. It might be OK for fast math mode, but I'm not sure. llvm-svn: 257393
* AMDGPU: Cleanup udiv testMatt Arsenault2016-01-111-22/+67
| | | | llvm-svn: 257387
* AMDGPU: Fix crash with dispatch.ptr intrinsic with non-HSA targetMatt Arsenault2016-01-111-0/+3
| | | | | | It might be better to let this be a select failure instead. llvm-svn: 257386
* AMDGPU: int_to_fp test cleanupsMatt Arsenault2016-01-112-50/+162
| | | | llvm-svn: 257354
* AMDGPU: Fix ctlz combine for sub 32-bit typesMatt Arsenault2016-01-112-12/+83
| | | | llvm-svn: 257353
* AMDGPU: Pattern match ffbh pattern to instruction.Matt Arsenault2016-01-112-0/+181
| | | | | | | | The hardware instruction's output on 0 is -1 rather than 32. Eliminate a test and select to -1. This removes an extra instruction from the compatability function with HSAIL's firstbit instruction. llvm-svn: 257352
* AMDGPU: Custom lower i64 ctlzMatt Arsenault2016-01-112-0/+83
| | | | llvm-svn: 257348
* LegalizeDAG: Expand ctlz with ctlz_zero_undef if legalMatt Arsenault2016-01-111-0/+131
| | | | llvm-svn: 257345
* AMDGPU/SI: Emit global variable sizes when targeting HSATom Stellard2016-01-081-0/+16
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15952 llvm-svn: 257173
* AMDGPU: Emit functions sizesTom Stellard2016-01-081-0/+4
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15951 llvm-svn: 257172
* AMDGPU/SI: Fold operands with sub-registersNicolai Haehnle2016-01-073-12/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074
* AMDGPU/SI: xnack_mask is always reserved on VINicolai Haehnle2016-01-071-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Somehow, I first interpreted the docs as saying space for xnack_mask is only reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and went back to actually test what is happening, and it turns out that xnack_mask is always reserved at least on Tonga and Carrizo, in the sense that flat_scr is always fixed below the SGPRs that are used to implement xnack_mask, whether or not they are actually used. I confirmed this by writing a shader using inline assembly to tease out the aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so xnack_mask is s[76:77] and vcc is s[78:79]). This patch changes both the calculation of the total number of SGPRs and the various register reservations to account for this. It ought to be possible to use the gap left by xnack_mask when the feature isn't used, but this patch doesn't try to do that. (Note that the same applies to vcc.) Note that previously, even before my earlier change in r256794, the SGPRs that alias to xnack_mask could end up being used as well when flat_scr was unused and the total number of SGPRs happened to fall on the right alignment (e.g. highest regular SGPR being used s29 and VCC used would lead to number of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there were some conflict due to such aliasing, we should have noticed that already. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15898 llvm-svn: 257073
* AMDGPU/SI: Fix crash when inline assembly is used in a graphics shaderNicolai Haehnle2016-01-061-0/+11
| | | | | | | | | | | | | | | Summary: This is admittedly something that you could only run into by manually playing around with shader assembly because the SITypeWriter pass is skipped for compute. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15902 llvm-svn: 256980
* AMDGPU/SI: Do not move scratch resource register on Tonga & IcelandNicolai Haehnle2016-01-053-3/+27
| | | | | | | | | | | | | Due to the SGPR init bug, every program claims to use the same number of SGPRs anyway, so there's no point in trying to shift those registers down from their initial spot of reservation. Add a test that uses VGPR spilling and blocks most SGPRs from being used for the scratch resource register. Previously, this would run into an assertion. Differential Revision: http://reviews.llvm.org/D15724 llvm-svn: 256870
* AMDGPU/SI: Select non-uniform constant addrspace loads to flat instructions ↵Tom Stellard2016-01-052-165/+244
| | | | | | | | | | | | | | for HSA Summary: This fixes a regression caused by r256282. Reviewers: arsenm, cfang Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15736 llvm-svn: 256810
* AMDGPU: add +xnack featureNicolai Haehnle2016-01-041-6/+11
| | | | | | | | | | | | | | | | | | | Summary: Enabling this feature will account for the two SGPRs used by the hardware to store the XNACK_MASK physically. The hardware only requires this reservation when the XNACK feature is explicitly enabled. At some point, HSA will probably want to do that, but it does increase SGPR register pressure, so leave it disabled by default for now (but do add a small test). Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15869 llvm-svn: 256794
* AMDGPU/SI: Use flat for global load/store when targeting HSAChangpeng Fang2015-12-227-15/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256282
* Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA"Rafael Espindola2015-12-227-39/+15
| | | | | | | | This reverts commit r256273. It broke CodeGen/AMDGPU/llvm.dbg.value.ll llvm-svn: 256275
* AMDGPU/SI: Use flat for global load/store when targeting HSAChangpeng Fang2015-12-227-15/+39
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256273
* AMDGPU: Switch barrier intrinsics to using convergentMatt Arsenault2015-12-1918-32/+27
| | | | | | | | noduplicate prevents unrolling of small loops that happen to have barriers in them. If a loop has a barrier in it, it is OK to duplicate it for the unroll. llvm-svn: 256075
* Fix broken type legalization of min/maxMatt Arsenault2015-12-192-0/+70
| | | | | | | This was using an anyext when promoting the type when zext/sext is required. llvm-svn: 256074
* AMDGPU: fix overlapping copies in copyPhysRegNicolai Haehnle2015-12-192-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When copying aggregate registers within the same register class, there may be an overlap between source and destination that forces us to do the copy backwards. Do the simplest possible thing that guarantees the correct order of moves when there are overlaps, and does whatever when there is no overlap. (The last part forces some trivial adjustments to test cases.) Together with r255906, this fixes a VM fault in Unreal Elemental Demo. While at it, change the generation of kill and def flags to something that looks more reasonable. This method is used very late during compilation, so it probably doesn't matter in practice, and to be honest, I don't know if this change is actually correct because the semantics in connection with aggregate registers vs. sub-registers are not clear to me. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15622 llvm-svn: 256072
* AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.Tom Stellard2015-12-171-0/+36
| | | | | | | | | | | | Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15583 Patch by: Changpeng Fang llvm-svn: 255908
* AMDGPU/SI: Set the code object work group segment size when targeting HSATom Stellard2015-12-151-0/+5
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15493 llvm-svn: 255702
* AMDGPU/SI: Set the code objects private segment size when targeting HSA.Tom Stellard2015-12-152-1/+8
| | | | | | | | | | | | Summary: I'm not sure how things worked before without this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15492 llvm-svn: 255692
* AMDGPU/SI: Emit constant variables in the .hsatext section when targeting HSATom Stellard2015-12-152-15/+8
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15426 llvm-svn: 255689
* AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF ↵Tom Stellard2015-12-151-37/+40
| | | | | | | | | | | | | | | | | | | | | | | | instructions Summary: We were previously selecting all constant loads to SMRD instructions and legalizing the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass. This new solution is more simple and also generates much better code, because the instruction selector is able to take advantage of all the MUBUF addressing modes that are legalization pass wasn't able to. We also no longer need to generate v_add_* instructions when we have a uniform pointer and a non-uniform offset, as this is now folded into the MUBUF instruction during instruction selection. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15425 llvm-svn: 255672
* AMDGPU/SI: Add llvm.amdgcn.mbcnt.* intrinsicsTom Stellard2015-12-151-0/+24
| | | | | | | | | | | | | | Summary: These are meant to be used instead of the llvm.SI.tid intrinsic which will be deprecated at some point. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15475 llvm-svn: 255652
OpenPOWER on IntegriCloud