summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* SelectionDAG: Fix a crash on inline asm when output register supports ↵Tom Stellard2016-03-091-0/+12
| | | | | | | | | | | | | | | | multiple types Summary: The code in SelectionDAG did not handle the case where the register type and output types were different, but had the same size. Reviewers: arsenm, echristo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17940 llvm-svn: 263022
* [AMDGPU] Assembler: Support DPP instructions.Sam Kolton2016-03-091-3/+3
| | | | | | | | | | | | | | | | | | | | Supprot DPP syntax as used in SP3 (except several operands syntax). Added dpp-specific operands in td-files. Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter. Support for VOP2 DPP instructions in td-files. Some tests for DPP instructions. ToDo: - VOP2bInst: - vcc is considered as operand - AsmMatcher doesn't apply mnemonic aliases when parsing operands - v_mac_f32 - v_nop - disable instructions with 64-bit operands - change dpp_ctrl assembler representation to conform sp3 Review: http://reviews.llvm.org/D17804 llvm-svn: 263008
* AMDGPU: Match more med3 integer patternsMatt Arsenault2016-03-072-0/+694
| | | | llvm-svn: 262864
* RegisterCoalescer: Remap subregister lanemasks before exchanging operandsMatthias Braun2016-03-051-3/+2
| | | | | | | | | | Rematerializing and merging into a bigger register class at the same time, requires the subregister range lanemasks getting remapped to the new register class. This fixes http://llvm.org/PR26805 llvm-svn: 262768
* AMDGPU/SI: Add support for spiling SGPRs to scratch bufferTom Stellard2016-03-041-0/+54
| | | | | | | | | | | | | | Summary: This is necessary for when we run out of VGPRs and can no longer use v_{read,write}_lane for spilling SGPRs. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17592 llvm-svn: 262732
* AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsicsNikolay Haustov2016-03-041-0/+124
| | | | | | | | | | | These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. Initial change by Nicolai H.hnle Differential Revision: http://reviews.llvm.org/D17401 llvm-svn: 262701
* DAGCombiner: Make sure an integer is being truncatedMatt Arsenault2016-03-021-1/+28
| | | | llvm-svn: 262446
* DAGCombiner: Turn truncate of a bitcasted vector to an extractMatt Arsenault2016-03-012-3/+95
| | | | | | | | | | | On AMDGPU where operations i64 operations are often bitcasted to v2i32 and back, this pattern shows up regularly where it breaks some expected combines on i64, such as load width reducing. This fixes some test failures in a future commit when i64 loads are changed to promote. llvm-svn: 262397
* DAGCombiner: Turn extract of bitcasted integer into truncateMatt Arsenault2016-03-011-0/+17
| | | | | | | This reduces the number of bitcast nodes and generally cleans up the DAG when bitcasting between integers and vectors everywhere. llvm-svn: 262358
* AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and ↵Changpeng Fang2016-03-012-0/+26
| | | | | | | | | | | | | | | | Intrinsics Summary: This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics, which are new since VI. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D17614 llvm-svn: 262356
* AMDGPU: Set HasExtractBitInsnMatt Arsenault2016-03-011-0/+303
| | | | | | | | | | This currently does not have the control over the bitwidth, and there are missing optimizations to reduce the integer to 32-bit if it can be. But in most situations we do want the sinking to occur. llvm-svn: 262296
* AMDGPU: More bits of frame index are known to be zeroMatt Arsenault2016-02-272-13/+14
| | | | | | | | | | | | The maximum private allocation for the whole GPU is 4G, so the maximum possible index for a single workitem is the maximum size divided by the smallest granularity for a dispatch. This increases the number of known zero high bits, which enables more offset folding. The maximum private size per workitem with this is 128M but may be smaller still. llvm-svn: 262153
* DAGCombiner: Don't unnecessarily swap operands in ReassociateOpsMatt Arsenault2016-02-272-1/+34
| | | | | | | | | | | | | | | | | | In the case where op = add, y = base_ptr, and x = offset, this transform: (op y, (op x, c1)) -> (op (op x, y), c1) breaks the canonical form of add by putting the base pointer in the second operand and the offset in the first. This fix is important for the R600 target, because for some address spaces the base pointer and the offset are stored in separate register classes. The old pattern caused the ISel code for matching addressing modes to put the base pointer and offset in the wrong register classes, which required no-trivial code transformations to fix. llvm-svn: 262148
* DAGCombiner: Relax sqrt NaN folding checkMatt Arsenault2016-02-271-2/+14
| | | | | | This is OK for +0 since compares to +/-0 give the same result. llvm-svn: 262125
* AMDGPU: Add s_sleep intrinsicMatt Arsenault2016-02-271-0/+45
| | | | llvm-svn: 262120
* AMDGPU: Implement readcyclecounterMatt Arsenault2016-02-273-0/+70
| | | | | | | | | | This matches the behavior of the HSAIL clock instruction. s_realmemtime is used if the subtarget supports it, and falls back to s_memtime if not. Also introduces new intrinsics for each of s_memtime / s_memrealtime. llvm-svn: 262119
* [AMDGPU] Assembler: Basic support for MIMGNikolay Haustov2016-02-2610-104/+104
| | | | | | | | | | | Add parsing and printing of image operands. Matches legacy sp3 assembler. Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last. Update SITargetLowering for new order. Add basic MC test. Update CodeGen tests. Review: http://reviews.llvm.org/D17574 llvm-svn: 261995
* MachineCopyPropagation: Catch copies of the form A<-B;A<-BMatthias Braun2016-02-262-8/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D17475 llvm-svn: 261966
* AMDGPU: Add failing testcase for register coalescerMatt Arsenault2016-02-221-0/+44
| | | | llvm-svn: 261592
* AMDGPU: Fix alignments in testMatt Arsenault2016-02-221-12/+22
| | | | | | | | | I don't think this test was intending to test unaligned load/store. Change it to use the natural alignment to avoid regressing. Also adds missing SI checks. llvm-svn: 261571
* AMDGPU/R600: Implement allowsMisalignedMemoryAccessMatt Arsenault2016-02-221-11/+6
| | | | | | | | This avoids some test regressions in a future commit when unaligned operations are expanded when they have custom lowering. llvm-svn: 261570
* AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointerTom Stellard2016-02-203-14/+21
| | | | | | | | | | | | | | | | | | | | | | Summary: Instead of trying to replace SMRD instructions with a VGPR base pointer with an equivalent MUBUF instruction, we now copy the base pointer to SGPRs using v_readfirstlane. This is safe to do, because any load selected as an SMRD instruction has been proven to have a uniform base pointer, so each thread in the wave will have the same pointer value in VGPRs. This will fix some errors on VI from trying to replace SMRD instructions with addr64-enabled MUBUF instructions that don't exist. Reviewers: arsenm, cfang, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17305 llvm-svn: 261385
* AMDGPU/SI: Fix s_waitcnt insertion for flat instructionsTom Stellard2016-02-191-0/+16
| | | | | | | | | | | | | | | | Summary: This was broken in r260694 which swapped the address and data operands for flat store instructions. The code in SIInsertWaits assumes that the data operand always comes before the address operand, so we need to add a special case for flat. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17366 llvm-svn: 261330
* AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsicsNicolai Haehnle2016-02-181-0/+111
| | | | | | | | | | | | | Summary: These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has a legacy name and pretends not to read memory. Differential Revision: http://reviews.llvm.org/D17276 llvm-svn: 261224
* AMDGPU: Prepare for reducing private element size.Matt Arsenault2016-02-133-2/+131
| | | | | | | | | | | | Tests for the new scalarize all private access options will be included with a future commit. The only functional change is to make the split/scalarize behavior for private access of > 4 element vectors to be consistent with the flat/global handling. This makes the spilling worse in the two changed tests. llvm-svn: 260804
* AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsicTom Stellard2016-02-131-0/+13
| | | | | | | This intrinsic will be used to expose dpp functionality to higher-level languages. It will map to the dpp version of v_mov_b32. llvm-svn: 260792
* AMDGPU: Add intrinsics for sin/cosMatt Arsenault2016-02-133-0/+47
| | | | | | | These provide direct access to the hardware instruction without the unit version required like llvm.sin/llvm.cos lowering requires. llvm-svn: 260782
* AMDGPU: Rename intrinsic to better match instruction nameMatt Arsenault2016-02-132-23/+43
| | | | | | Also fixes missing f32 test. llvm-svn: 260780
* AMDGPU/SI: Detect uniform branches and emit s_cbranch instructionsTom Stellard2016-02-1220-78/+671
| | | | | | | | | | Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
* [AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD ↵Tom Stellard2016-02-1213-23/+23
| | | | | | | | | | | | | | assembler Historically, AMD internal sp3 assembler has flat_store* addr, data format. To match existing code and to enable reuse, change LLVM definitions to match. Also update MC and CodeGen tests. Differential Revision: http://reviews.llvm.org/D16927 Patch by: Nikolay Haustov llvm-svn: 260694
* AMDGPU/SI: Annotate Loops with Constant Condition in SIAnnotateControlFlow pass.Changpeng Fang2016-02-121-0/+24
| | | | | | | | | | | | | | | Summary: It is possible that the loop condition can be a boolean constant (infinite loop, for example). So we sould handle constant condition in annotating a loop. This patch adds this functionality to support annotating constant condition. Reviewers: tstellarAMD, arsenm Subscribers: llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D15093 llvm-svn: 260692
* AMDGPU: Set flat_scratch from flat_scratch_init regMatt Arsenault2016-02-127-99/+40
| | | | | | | | | | | | | | This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658
* AMDGPU: Set element_size in private resource descriptorMatt Arsenault2016-02-124-10/+10
| | | | | | | | | Introduce a subtarget feature for this, and leave the default with the current behavior which assumes up to 16-byte loads/stores can be used. The field also seems to have the ability to be set to 2 bytes, but I'm not sure what that would be used for. llvm-svn: 260651
* AMDGPU: Quick fix for extreme slowness in spill-scavenge-offset.ll testNicolai Haehnle2016-02-121-6/+13
| | | | | | | | | | | | Summary: Also, some cosmetic fixes. Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, llvm-commits Differential Revision: http://reviews.llvm.org/D17161 llvm-svn: 260625
* AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRsTom Stellard2016-02-112-0/+78
| | | | | | | | | | | | | | | | Summary: It's possible to have resource descriptors and samplers stored in VGPRs, either by a VMEM instruction or in the case of samplers, floating-point calculations. When this happens, we need to use v_readfirstlane to copy these values back to sgprs. Reviewers: mareko, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D17102 llvm-svn: 260599
* AMDGPU: Fix constant bus use check with subregistersMatt Arsenault2016-02-112-8/+29
| | | | | | | | | | | If the two operands to an instruction were both subregisters of the same super register, it would incorrectly think this counted as the same constant bus use. This fixes the verifier error in fmin_legacy.ll which was missing -verify-machineinstrs. llvm-svn: 260495
* AMDGPU: Remove some old intrinsic uses from testsMatt Arsenault2016-02-1163-513/+478
| | | | llvm-svn: 260493
* AMDGPU: Release the scavenged offset register during VGPR spillNicolai Haehnle2016-02-101-0/+33
| | | | | | | | | | | | | | | | | | | Summary: This fixes a crash where subsequent spills would be unable to scavenge a register. In particular, it fixes a crash in piglit's spec@glsl-1.50@execution@geometry@max-input-components (the test still has a shader that fails to compile because of too many SGPR spills, but at least it doesn't crash any more). This is a candidate for the release branch. Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, arsenm Differential Revision: http://reviews.llvm.org/D16558 llvm-svn: 260427
* AMDGPU: Remove bfi and bfm intrinsicsMatt Arsenault2016-02-083-102/+24
| | | | | | Nothing is using them. llvm-svn: 260123
* SelectionDAG: Lower some range metadata to AssertZextMatt Arsenault2016-02-081-0/+46
| | | | | | | | | | If a range has a lower bound of 0, add an AssertZext from the nearest floor power of two. This allows operations with some workitem intrinsics with known maximum ranges to use fast 24-bit multiplies. llvm-svn: 260109
* AMDGPU: Account for LDS alignmentMatt Arsenault2016-02-051-0/+268
| | | | | | | | | | | | | The current situation isn't great, because the amount of padding requires is determined by the inverse order of the first encountered use. We should eventually somehow sort these to minimize wasted space. Another problem is the alignment of kernel arguments isn't respected. The group_segment_alignment is always emitted as the default 16, and typed arguments with higher alignments or an explicitly set alignment are also ignored. llvm-svn: 259912
* AMDGPU: Preserve alignments on new created globalsMatt Arsenault2016-02-052-5/+31
| | | | | | | Also switch to internal linkage, and include the name of the function in the name. llvm-svn: 259911
* [ScheduleDAGInstrs::buildSchedGraph()] Handling of memory dependecies rewritten.Jonas Paulsson2016-02-031-0/+1
| | | | | | | | | | | | | | | | | | | Recommited, after some fixing with test cases. Updated test cases: test/CodeGen/AArch64/arm64-misched-memdep-bug.ll test/CodeGen/AArch64/tailcall_misched_graph.ll Temporarily disabled test cases: test/CodeGen/AMDGPU/split-vector-memoperand-offsets.ll test/CodeGen/PowerPC/ppc64-fastcc.ll (partially updated) test/CodeGen/PowerPC/vsx-fma-m.ll test/CodeGen/PowerPC/vsx-fma-sp.ll http://reviews.llvm.org/D8705 Reviewers: Hal Finkel, Andy Trick. llvm-svn: 259673
* AMDGPU: Do not promote allocas with non-inbounds GEPsMatt Arsenault2016-02-025-76/+104
| | | | | | | | If we can't assume the pointer value isn't within the bounds of the object, it seems risky to try to replace the pointer calculations. llvm-svn: 259573
* AMDGPU: Handle promoting memmoveMatt Arsenault2016-02-021-0/+65
| | | | | | Also add missing tests for the others. llvm-svn: 259558
* AMDGPU: Skip promote alloca with no optimizationsMatt Arsenault2016-02-021-0/+38
| | | | llvm-svn: 259551
* AMDGPU: Whitelist handled intrinsicsMatt Arsenault2016-02-021-0/+24
| | | | | | | We shouldn't crash on unhandled intrinsics. Also simplify failure handling in loop. llvm-svn: 259546
* AMDGPU: Use inbounds when calculating workitem offsetMatt Arsenault2016-02-021-3/+16
| | | | | | | | | | | | | When promoting allocas to LDS, we know we are indexing into a specific area just created, and the calculation will also never overflow. Also emit some of the muls as nsw nuw, because instcombine infers this already from the range metadata. I think putting this on the other adds and muls might be OK too, but I'm not 100% sure. llvm-svn: 259545
* Refactor backend diagnostics for unsupported featuresOliver Stannard2016-02-029-9/+9
| | | | | | | | | | | | | | | | | Re-commit of r258951 after fixing layering violation. The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. llvm-svn: 259498
* AMDGPU: Fix emitting invalid workitem intrinsics for HSAMatt Arsenault2016-01-302-3/+331
| | | | | | | | | | | | | | | | | | The AMDGPUPromoteAlloca pass was emitting the read.local.size calls, which with HSA was incorrectly selected to reading from the offset mesa uses off of the kernarg pointer. Error on intrinsics which aren't supported by HSA, and start emitting the correct IR to read the workgroup size out of the dispatch pointer. Also initialize the pass so it can be tested with opt, and start moving towards not depending on the subtarget as an argument. Start emitting errors for the intrinsics not handled with HSA. llvm-svn: 259297
OpenPOWER on IntegriCloud