summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Implement canonicalizeMatt Arsenault2016-04-141-0/+43
| | | | | | Also add generic DAG node for it. llvm-svn: 266272
* AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32Matt Arsenault2016-04-121-0/+30
| | | | | | | | | | This helps clean up some of the mess when expanding unaligned 64-bit loads when changed to be promote to v2i32, and fixes situations where or x, 0 was emitted after splitting 64-bit ors during moveToVALU. I think this could be a generic combine but I'm not sure. llvm-svn: 266104
* AMDGPU: Add atomic_inc + atomic_dec intrinsicsMatt Arsenault2016-04-121-1/+48
| | | | | | | These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074
* AMDGPU: Add a shader calling conventionNicolai Haehnle2016-04-061-9/+8
| | | | | | | | | | | This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589
* AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}Tom Stellard2016-04-011-0/+62
| | | | | | | | | | | | | | | | | Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 llvm-svn: 265170
* AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDSChangpeng Fang2016-03-151-2/+14
| | | | | | | | | | | | | | | | | | | Summary: Static LDS size is saved in MachineFunctionInfo::LDSSize, We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter, we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize. Reviewers: arsenm tstellarAMD Subscribers llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18064 llvm-svn: 263563
* AMDGPU: More bits of frame index are known to be zeroMatt Arsenault2016-02-271-15/+25
| | | | | | | | | | | | The maximum private allocation for the whole GPU is 4G, so the maximum possible index for a single workitem is the maximum size divided by the smallest granularity for a dispatch. This increases the number of known zero high bits, which enables more offset folding. The maximum private size per workitem with this is 128M but may be smaller still. llvm-svn: 262153
* AMDGPU: Implement readcyclecounterMatt Arsenault2016-02-271-0/+3
| | | | | | | | | | This matches the behavior of the HSAIL clock instruction. s_realmemtime is used if the subtarget supports it, and falls back to s_memtime if not. Also introduces new intrinsics for each of s_memtime / s_memrealtime. llvm-svn: 262119
* [AMDGPU] Assembler: Basic support for MIMGNikolay Haustov2016-02-261-3/+6
| | | | | | | | | | | Add parsing and printing of image operands. Matches legacy sp3 assembler. Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last. Update SITargetLowering for new order. Add basic MC test. Update CodeGen tests. Review: http://reviews.llvm.org/D17574 llvm-svn: 261995
* AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsicsNicolai Haehnle2016-02-181-3/+4
| | | | | | | | | | | | | Summary: These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa for the GL_ARB_shader_image_load_store extension. IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has a legacy name and pretends not to read memory. Differential Revision: http://reviews.llvm.org/D17276 llvm-svn: 261224
* AMDGPU: Prepare for reducing private element size.Matt Arsenault2016-02-131-14/+48
| | | | | | | | | | | | Tests for the new scalarize all private access options will be included with a future commit. The only functional change is to make the split/scalarize behavior for private access of > 4 element vectors to be consistent with the flat/global handling. This makes the spilling worse in the two changed tests. llvm-svn: 260804
* AMDGPU: Add intrinsics for sin/cosMatt Arsenault2016-02-131-0/+16
| | | | | | | These provide direct access to the hardware instruction without the unit version required like llvm.sin/llvm.cos lowering requires. llvm-svn: 260782
* AMDGPU: Rename intrinsic to better match instruction nameMatt Arsenault2016-02-131-2/+2
| | | | | | Also fixes missing f32 test. llvm-svn: 260780
* AMDGPU: Fix broken condition causing warningMatt Arsenault2016-02-131-1/+1
| | | | llvm-svn: 260773
* AMDGPU/SI: Detect uniform branches and emit s_cbranch instructionsTom Stellard2016-02-121-5/+33
| | | | | | | | | | Reviewers: arsenm Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16603 llvm-svn: 260765
* AMDGPU: Set flat_scratch from flat_scratch_init regMatt Arsenault2016-02-121-1/+10
| | | | | | | | | | | | | | This was hardcoded to the static private size, but this would be missing the offset and additional size for someday when we have dynamic sizing. Also stops always initializing flat_scratch even when unused. In the future we should stop emitting this unless flat instructions are used to access private memory. For example this will initialize it almost always on VI because flat is used for global access. llvm-svn: 260658
* AMDGPU: Split R600 and SI store loweringMatt Arsenault2016-02-111-13/+17
| | | | | | | These were only sharing some somewhat incorrect logic for when to scalarize or split vectors. llvm-svn: 260490
* AMDGPU: Fix indentation and variable namesMatt Arsenault2016-02-101-34/+31
| | | | llvm-svn: 260399
* AMDGPU: Split R600 and SI load loweringMatt Arsenault2016-02-101-1/+23
| | | | | | | These weren't actually sharing anything in the common LowerLOAD. llvm-svn: 260398
* [CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI.Ahmed Bougacha2016-02-091-4/+2
| | | | llvm-svn: 260316
* Refactor backend diagnostics for unsupported featuresOliver Stannard2016-02-021-4/+6
| | | | | | | | | | | | | | | | | Re-commit of r258951 after fixing layering violation. The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. llvm-svn: 259498
* AMDGPU: Fix emitting invalid workitem intrinsics for HSAMatt Arsenault2016-01-301-0/+34
| | | | | | | | | | | | | | | | | | The AMDGPUPromoteAlloca pass was emitting the read.local.size calls, which with HSA was incorrectly selected to reading from the offset mesa uses off of the kernarg pointer. Error on intrinsics which aren't supported by HSA, and start emitting the correct IR to read the workgroup size out of the dispatch pointer. Also initialize the pass so it can be tested with opt, and start moving towards not depending on the subtarget as an argument. Start emitting errors for the intrinsics not handled with HSA. llvm-svn: 259297
* AMDGPU: Add new amdgcn workitem intrinsicsMatt Arsenault2016-01-301-0/+6
| | | | | | | These use the correct prefix and follow the HSA naming convention rather than the config register option names. llvm-svn: 259293
* AMDGPU: Match fmed3 patterns with legacy fmin/fmaxMatt Arsenault2016-01-281-26/+32
| | | | llvm-svn: 259090
* AMDGPU: Match some med3 patternsMatt Arsenault2016-01-281-5/+88
| | | | llvm-svn: 259089
* Revert r259035, it introduces a cyclic library dependencyOliver Stannard2016-01-281-6/+4
| | | | llvm-svn: 259045
* Add backend dignostic printer for unsupported featuresOliver Stannard2016-01-281-4/+6
| | | | | | | | | | | | | | | | Re-commit of r258951 after fixing layering violation. The related LLVM patch adds a backend diagnostic type for reporting unsupported features, this adds a printer for them to clang. In the case where debug location information is not available, I've changed the printer to report the location as the first line of the function, rather than the closing brace, as the latter does not give the user any information. This also affects optimisation remarks. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 259035
* Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported ↵NAKAMURA Takumi2016-01-281-6/+4
| | | | | | | | | | | features" It broke layering violation in LLVMIR. clang r258950 "Add backend dignostic printer for unsupported features" llvm r258951 "Refactor backend diagnostics for unsupported features" llvm-svn: 259016
* Refactor backend diagnostics for unsupported featuresOliver Stannard2016-01-271-4/+6
| | | | | | | | | | | | | | | | | | | | | The BPF and WebAssembly backends had identical code for emitting errors for unsupported features, and AMDGPU had very similar code. This merges them all into one DiagnosticInfo subclass, that can be used by any backend. There should be minimal functional changes here, but some AMDGPU tests have been updated for the new format of errors (it used a slightly different format to BPF and WebAssembly). The AMDGPU error messages will now benefit from having precise source locations when debug info is available. The implementation of DiagnosticInfoUnsupported::print must be in lib/Codegen rather than in the existing file in lib/IR/ to avoid introducing a dependency from IR to CodeGen. Differential Revision: http://reviews.llvm.org/D16590 llvm-svn: 258951
* AMDGPU: Make v32i8/v64i8 illegal typesMatt Arsenault2016-01-261-3/+0
| | | | | | | | Old intrinsics were forcing these, but they have now all been removed. This fixes large i8 vector operations generally being broken. llvm-svn: 258788
* AMDGPU: Remove old sample intrinsicsMatt Arsenault2016-01-261-17/+0
| | | | | | | | | | | I did my best to try to update all the uses in tests that just happened to use the old ones to the newer intrinsics. I'm not sure I got all of the immediate operand conversions correct, since the value seems to have been ignored by the old pattern but I don't think it really matters. llvm-svn: 258787
* AMDGPU: Implement read_register and write_register intrinsicsMatt Arsenault2016-01-261-0/+47
| | | | | | | | | | | | | | Some of the special intrinsics now that now correspond to a instruction also have special setting of some registers, e.g. llvm.SI.sendmsg sets m0 as well as use s_sendmsg. Using these explicit register intrinsics may be a better option. Reading the exec mask and others may be useful for debugging. For this I'm not sure this is entirely correct because we would want this to be convergent, although it's possible this is already treated sufficently conservatively. llvm-svn: 258785
* AMDGPU: Restore AMDGPU prefixed rsq intrinsic for nowMatt Arsenault2016-01-261-1/+2
| | | | | | Also move into backend intrinsics to discourage use of the old name. llvm-svn: 258783
* AMDGPU: Move amdgcn intrinsic handling into SITargetLoweringMatt Arsenault2016-01-231-1/+66
| | | | llvm-svn: 258608
* AMDGPU: Rename intrinsics to use amdgcn prefixMatt Arsenault2016-01-221-1/+2
| | | | | | | | | | | The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557
* AMDGPU: Remove AMDGPU.fract intrinsicMatt Arsenault2016-01-221-3/+0
| | | | | | | Mesa doesn't use this, and this is pattern matched already from fsub x, (ffloor x) llvm-svn: 258513
* AMDGPU/SI: Promote i1 SETCC operationsTom Stellard2016-01-201-0/+1
| | | | | | | | | | | | | | Summary: While working on uniform branching, I've hit a few cases where we emit i1 SETCC operations. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16233 llvm-svn: 258352
* AMDGPU: Remove AMDIL.fraction intrinsicMatt Arsenault2016-01-201-2/+1
| | | | llvm-svn: 258347
* AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputsTom Stellard2016-01-201-0/+10
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15035 llvm-svn: 258256
* AMDGPU: Split 64-bit and of constant upMatt Arsenault2016-01-181-0/+3
| | | | | | | | | | This breaks the tests that were meant for testing 64-bit inline immediates, so move those to shl where they won't be broken up. This should be repeated for the other related bit ops. llvm-svn: 258095
* AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabledMarek Olsak2016-01-131-1/+8
| | | | | | | | | | Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16037 llvm-svn: 257625
* AMDGPU/SI: Add s_waitcnt at the end of non-void functionsMarek Olsak2016-01-131-0/+2
| | | | | | | | | | | | | | Summary: v2: Make ReturnsVoid private, so that I can another 8 lines of code and look more productive. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16034 llvm-svn: 257622
* AMDGPU/SI: Add support for non-void functionsMarek Olsak2016-01-131-0/+89
| | | | | | | | | | | | | | | | | | | Summary: Return values can be stored in SGPRs (i32) and VGPRs (f32). This will be used by functions which expect some bytecode or other binary to be appended at the end. It allows defining in which registers the return values will be stored. v2: don't do this for compute shaders Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16033 llvm-svn: 257621
* AMDGPU/SI: Allow any number of PS inputsMarek Olsak2016-01-131-3/+1
| | | | | | | | | | | | | | Summary: With the ability to concatenate shader binaries, the limit of 15 no longer applies. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16031 llvm-svn: 257592
* AMDGPU/SI: Add new target attribute InitialPSInputAddrMarek Olsak2016-01-131-4/+15
| | | | | | | | | | | | | | | | | | | | | Summary: This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM. The register assigns VGPR locations to PS inputs, while the ENA register determines whether or not they are loaded. Mesa needs to set some inputs as not-movable, so that a pixel shader prolog binary appended at the beginning can assume where some inputs are. v2: Make PSInputAddr private, because there is never enough silly getters and setters for people to read. Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16030 llvm-svn: 257591
* AMDGPU: Fix crash with dispatch.ptr intrinsic with non-HSA targetMatt Arsenault2016-01-111-0/+7
| | | | | | It might be better to let this be a select failure instead. llvm-svn: 257386
* AMDGPU: Pattern match ffbh pattern to instruction.Matt Arsenault2016-01-111-2/+1
| | | | | | | | The hardware instruction's output on 0 is -1 rather than 32. Eliminate a test and select to -1. This removes an extra instruction from the compatability function with HSAIL's firstbit instruction. llvm-svn: 257352
* AMDGPU: Remove dead target dag combineMatt Arsenault2016-01-111-1/+0
| | | | llvm-svn: 257344
* AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF ↵Tom Stellard2015-12-151-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | instructions Summary: We were previously selecting all constant loads to SMRD instructions and legalizing the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass. This new solution is more simple and also generates much better code, because the instruction selector is able to take advantage of all the MUBUF addressing modes that are legalization pass wasn't able to. We also no longer need to generate v_add_* instructions when we have a uniform pointer and a non-uniform offset, as this is now folded into the MUBUF instruction during instruction selection. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15425 llvm-svn: 255672
* AMDGPU/SI: Add llvm.amdgcn.v.interp.p[12] intrinsicsTom Stellard2015-12-151-0/+13
| | | | | | | | | | | | | | Summary: These are meant to be used instead of the llvm.SI.fs.interp intrinsic which will be deprecated at some point. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15474 llvm-svn: 255651
OpenPOWER on IntegriCloud