summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: fix overlapping copies in copyPhysRegNicolai Haehnle2015-12-192-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When copying aggregate registers within the same register class, there may be an overlap between source and destination that forces us to do the copy backwards. Do the simplest possible thing that guarantees the correct order of moves when there are overlaps, and does whatever when there is no overlap. (The last part forces some trivial adjustments to test cases.) Together with r255906, this fixes a VM fault in Unreal Elemental Demo. While at it, change the generation of kill and def flags to something that looks more reasonable. This method is used very late during compilation, so it probably doesn't matter in practice, and to be honest, I don't know if this change is actually correct because the semantics in connection with aggregate registers vs. sub-registers are not clear to me. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15622 llvm-svn: 256072
* AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.Tom Stellard2015-12-171-0/+36
| | | | | | | | | | | | Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15583 Patch by: Changpeng Fang llvm-svn: 255908
* AMDGPU/SI: Set the code object work group segment size when targeting HSATom Stellard2015-12-151-0/+5
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15493 llvm-svn: 255702
* AMDGPU/SI: Set the code objects private segment size when targeting HSA.Tom Stellard2015-12-152-1/+8
| | | | | | | | | | | | Summary: I'm not sure how things worked before without this. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15492 llvm-svn: 255692
* AMDGPU/SI: Emit constant variables in the .hsatext section when targeting HSATom Stellard2015-12-152-15/+8
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15426 llvm-svn: 255689
* AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF ↵Tom Stellard2015-12-151-37/+40
| | | | | | | | | | | | | | | | | | | | | | | | instructions Summary: We were previously selecting all constant loads to SMRD instructions and legalizing the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass. This new solution is more simple and also generates much better code, because the instruction selector is able to take advantage of all the MUBUF addressing modes that are legalization pass wasn't able to. We also no longer need to generate v_add_* instructions when we have a uniform pointer and a non-uniform offset, as this is now folded into the MUBUF instruction during instruction selection. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15425 llvm-svn: 255672
* AMDGPU/SI: Add llvm.amdgcn.mbcnt.* intrinsicsTom Stellard2015-12-151-0/+24
| | | | | | | | | | | | | | Summary: These are meant to be used instead of the llvm.SI.tid intrinsic which will be deprecated at some point. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15475 llvm-svn: 255652
* AMDGPU/SI: Add llvm.amdgcn.v.interp.p[12] intrinsicsTom Stellard2015-12-151-0/+30
| | | | | | | | | | | | | | Summary: These are meant to be used instead of the llvm.SI.fs.interp intrinsic which will be deprecated at some point. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15474 llvm-svn: 255651
* AMDGPU: Use generic bitreverse intrinsicMatt Arsenault2015-12-142-28/+115
| | | | | | Also fix bug in vector legalization for bitreverse. llvm-svn: 255512
* AMDGPU: Fix splitting vector loads with existing offsetsMatt Arsenault2015-12-141-0/+104
| | | | | | | If the original MMO had an offset, it was dropped. Also use the correct alignment after adding the new offset. llvm-svn: 255508
* SelectionDAG: Match min/max if the scalar operation is legalMatt Arsenault2015-12-115-77/+319
| | | | llvm-svn: 255388
* AMDGPU/SI: Emit constant arrays in the .text sectionTom Stellard2015-12-101-0/+25
| | | | | | | | | | | | | | | Summary: This allows us to remove the END_OF_TEXT_LABEL hack we had been using and simplifies the fixups used to compute the address of constant arrays. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15257 llvm-svn: 255204
* AMDGPU/SI: Add support for sgpr and vgpr inline assembly constraintsTom Stellard2015-12-101-0/+23
| | | | | | | | | | | | Summary: The 's' constraint represents sgprs and the 'v' constraint represents vgprs. Reviewers: arsenm, echristo Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15342 llvm-svn: 255203
* ScheduleDAGInstrs: Rework schedule graph builder.Matthias Braun2015-12-0414-84/+84
| | | | | | | | | | | | | | | | | | Re-comitting with a change that avoids undefined uses getting put into the VRegUses list. The new algorithm remembers the uses encountered while walking backwards until a matching def is found. Contrary to the previous version this: - Works without LiveIntervals being available - Allows to increase the precision to subregisters/lanemasks (not used for now) The changes in the AMDGPU tests are necessary because the R600 scheduler is not stable with respect to the order of nodes in the ready queues. Differential Revision: http://reviews.llvm.org/D9068 llvm-svn: 254683
* AMDGPU/SI: Emit constant arrays in the .hsrodata_readonly_agent sectionTom Stellard2015-12-031-0/+36
| | | | | | | | | | | | Summary: This is done only when targeting HSA. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13807 llvm-svn: 254587
* Revert "ScheduleDAGInstrs: Rework schedule graph builder."Matthias Braun2015-12-0314-84/+84
| | | | | | | | | | This works mostly fine but breaks some stage 1 builders when compiling compiler-rt on i386. Revert for further investigation as I can't see an obvious cause/fix. This reverts commit r254577. llvm-svn: 254586
* ScheduleDAGInstrs: Rework schedule graph builder.Matthias Braun2015-12-0314-84/+84
| | | | | | | | | | | | | | | The new algorithm remembers the uses encountered while walking backwards until a matching def is found. Contrary to the previous version this: - Works without LiveIntervals being available - Allows to increase the precision to subregisters/lanemasks (not used for now) The changes in the AMDGPU tests are necessary because the R600 scheduler is not stable with respect to the order of nodes in the ready queues. Differential Revision: http://reviews.llvm.org/D9068 llvm-svn: 254577
* AMDGPU/SI: Correctly emit agent global segment variables when targeting HSATom Stellard2015-12-021-0/+105
| | | | | | Differential Revision: http://reviews.llvm.org/D14508 llvm-svn: 254540
* AMDGPU/SI: Don't emit group segment global variablesTom Stellard2015-12-021-0/+14
| | | | | | | | | | | | Summary: Only global or readonly segment variables should appear in object files. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15111 llvm-svn: 254519
* AMDGPU: Error on addrspacecasts that aren't actually implementedMatt Arsenault2015-12-012-52/+66
| | | | llvm-svn: 254469
* AMDGPU: Implement isNoopAddrSpaceCastMatt Arsenault2015-12-011-0/+66
| | | | llvm-svn: 254468
* AMDGPU: Rework how private buffer passed for HSAMatt Arsenault2015-11-3012-195/+406
| | | | | | | | | | | | | | | | If we know we have stack objects, we reserve the registers that the private buffer resource and wave offset are passed and use them directly. If not, reserve the last 5 SGPRs just in case we need to spill. After register allocation, try to pick the next available registers instead of the last SGPRs, and then insert copies from the inputs to the reserved registers in the progloue. This also only selectively enables all of the input registers which are really required instead of always enabling them. llvm-svn: 254331
* AMDGPU: Remove SIPrepareScratchRegsMatt Arsenault2015-11-307-32/+704
| | | | | | | | | | | | | | | | | | | | | | It does not work because of emergency stack slots. This pass was supposed to eliminate dummy registers for the spill instructions, but the register scavenger can introduce more during PrologEpilogInserter, so some would end up left behind if they were needed. The potential for spilling the scratch resource descriptor and offset register makes doing something like this overly complicated. Reserve registers to use for the resource descriptor and use them directly in eliminateFrameIndex. Also removes creating another scratch resource descriptor when directly selecting scratch MUBUF instructions. The choice of which registers are reserved is temporary. For now it attempts to pick the next available registers after the user and system SGPRs. llvm-svn: 254329
* AMDGPU: Use assert zext for workgroup sizesMatt Arsenault2015-11-301-0/+60
| | | | llvm-svn: 254328
* AMDGPU: Don't reserve SCRATCH_PTR input registerMatt Arsenault2015-11-301-1/+1
| | | | | | This hasn't been doing anything since using relocations was added. llvm-svn: 254304
* AMDGPU: Add llvm.amdgcn.dispatch.ptr intrinsicTom Stellard2015-11-261-0/+16
| | | | | | | | | | | | | | Summary: This returns a pointer to the dispatch packet, which can be used to load information about the kernel dispach. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D14898 llvm-svn: 254116
* AMDGPU/SI: select S_ABS_I32 when possible (v2)Marek Olsak2015-11-252-3/+157
| | | | | | | | | | v2: added more tests, moved the SALU->VALU conversion to a separate function It looks like it's not possible to get subregisters in the S_ABS lowering code, and I don't feel like guessing without testing what the correct code would look like. llvm-svn: 254095
* AMDGPU: Add some tests for promotion of v2i64 scalar_to_vectorMatt Arsenault2015-11-251-0/+71
| | | | llvm-svn: 254087
* AMDGPU: Make v2i64/v2f64 legal types.Matt Arsenault2015-11-259-180/+201
| | | | | | | They can be loaded and stored, so count them as legal. This is mostly to fix a number of common cases for load/store merging. llvm-svn: 254086
* AMDGPU: Split LDS vector loadsMatt Arsenault2015-11-245-92/+70
| | | | | | If properly aligned this could allow using ds_read_b64. llvm-svn: 253975
* AMDGPU: Split x8 and x16 vector loads instead of scalarizeMatt Arsenault2015-11-248-289/+156
| | | | | | | | The one regression in the builtin tests is in the read2 test which now (again) has many extra copies, but this should be solved once the pass is replaced with a DAG combine. llvm-svn: 253974
* Revert "Change memcpy/memset/memmove to have dest and source alignments."Pete Cooper2015-11-191-11/+11
| | | | | | | | | | This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543
* Change memcpy/memset/memmove to have dest and source alignments.Pete Cooper2015-11-181-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511
* Use TargetRegisterInfo for printing MachineOperand register commentsDan Gohman2015-11-171-1/+1
| | | | | | | | | | | | | | | | Several places in AsmPrinter.cpp print comments describing MachineOperand registers using MCRegisterInfo, which uses MCOperand-oriented names. This doesn't work for targets that use virtual registers exclusively, as WebAssembly does, since virtual registers are represented and printed differently. This patch preserves what seems to be the spirit of r229978, avoiding the use of TM.getSubtargetImpl(), while still using MachineOperand-oriented printing for MachineOperands. Differential Revision: http://reviews.llvm.org/D14709 llvm-svn: 253338
* Revert "Remove unnecessary call to getAllocatableRegClass"Tom Stellard2015-11-122-3/+3
| | | | | | | | | | | | | This reverts commit r252565. This also includes the revert of the commit mentioned below in order to avoid breaking tests in AMDGPU: Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64" This reverts commit r252674. llvm-svn: 252956
* AMDGPU: Set isAllocatable = 0 on VS_32/VS_64Matt Arsenault2015-11-112-3/+3
| | | | llvm-svn: 252674
* DAGCombiner: Check shouldReduceLoadWidth before combining (and (load), x) -> ↵Tom Stellard2015-11-063-10/+32
| | | | | | | | | | | | extload Reviewers: resistor, arsenm Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13805 llvm-svn: 252349
* AMDGPU: Create emergency stack slots during frame loweringMatt Arsenault2015-11-062-1/+487
| | | | | | Test has a bogus verifier error which will be fixed by later commits. llvm-svn: 252327
* AMDGPU: Add pass to detect used kernel featuresMatt Arsenault2015-11-061-0/+193
| | | | | | | | | | | Mark kernels that use certain features that require user SGPRs to support with kernel attributes. We need to know before instruction selection begins because it impacts the kernel calling convention lowering. For now this only detects the workitem intrinsics. llvm-svn: 252323
* AMDGPU: Hack for VS_32 register pressureMatt Arsenault2015-11-062-11/+11
| | | | | | | | | | | | | For some reason VS_32 ends up factoring into the pressure heuristics even though we should never see a virtual register with this class. When SGPRs are reserved for register spilling, this for some reason triggers reg-crit scheduling. Setting isAllocatable = 0 may help with this since that seems to remove it from the default implementation's generated table. llvm-svn: 252321
* AMDGPU/SI: Emit HSA kernels with symbol type STT_AMDGPU_HSA_KERNELTom Stellard2015-11-061-2/+8
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D13804 llvm-svn: 252291
* DI: Reverse direction of subprogram -> function edge.Peter Collingbourne2015-11-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219
* AMDGPU: Fix assert when legalizing atomic operandsMatt Arsenault2015-11-051-0/+52
| | | | | | | | | | The operand layout is slightly different for the atomic opcodes from the usual MUBUF loads and stores. This should only fix it on SI/CI. VI is still broken because it still emits the addr64 replacement. llvm-svn: 252140
* AMDGPU: Add missing v2f64 fadd testsMatt Arsenault2015-11-051-10/+42
| | | | llvm-svn: 252117
* AMDGPU: Stop assuming vreg for build_vectorMatt Arsenault2015-11-021-8/+34
| | | | | | | | | | | | | This was causing a variety of test failures when v2i64 is added as a legal type. SIFixSGPRCopies should correctly handle the case of vector inputs to a scalar reg_sequence, so this isn't necessary anymore. This was hiding some deficiencies in how reg_sequence is handled later, but this shouldn't be a problem anymore since the register class copy of a reg_sequence is now done before the reg_sequence. llvm-svn: 251860
* AMDGPU: Error on graphics shaders with HSAMatt Arsenault2015-11-021-0/+18
| | | | | | | | I've found myself pointlessly debugging problems from running graphics tests with an HSA triple a few times, so stop this from happening again. llvm-svn: 251858
* AMDGPU: Un XFAIL a testMatt Arsenault2015-11-021-7/+10
| | | | | | | This should probably be merged with one of the other private memory tests, but it fails on r600. llvm-svn: 251856
* AMDGPU: Distribute SGPR->VGPR copies of REG_SEQUENCEMatt Arsenault2015-11-026-39/+22
| | | | | | | Make the REG_SEQUENCE be a VGPR, and do the register class copy first. llvm-svn: 251855
* AMDGPU/SI: handle undef for llvm.SI.packf16Marek Olsak2015-10-291-0/+29
| | | | llvm-svn: 251632
* AMDGPU/SI: use S_OR for fneg (fabs f32)Marek Olsak2015-10-291-18/+9
| | | | llvm-svn: 251631
OpenPOWER on IntegriCloud