summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Match load d16 hi instructionsMatt Arsenault2017-09-201-1/+6
| | | | | | | | | | | | Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716
* AMDGPU: Cleanup load/store PatFragsMatt Arsenault2017-09-201-120/+97
| | | | | | Try to use a consistent naming scheme. llvm-svn: 313713
* AMDGPU: Match store d16_hi instructionsMatt Arsenault2017-09-201-18/+45
| | | | llvm-svn: 313712
* [AMDGPU] Use v_max_f* for fcanonicalizeStanislav Mekhanoshin2017-08-301-3/+6
| | | | | | | | | | If denorms are not flushed we can use max instead of multiplication by 1. For double that is simply faster, while for float and half it is shorter, because mul uses constant bus and VOP3. Differential Revision: https://reviews.llvm.org/D36856 llvm-svn: 312095
* [AMDGPU][MC][GFX9] Added integer clamping support for VOP3 opcodesDmitry Preobrazhensky2017-08-161-4/+6
| | | | | | | | | | See Bug 34152: https://bugs.llvm.org//show_bug.cgi?id=34152 Reviewers: SamWot, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D36674 llvm-svn: 311006
* AMDGPU: Start selecting global instructionsMatt Arsenault2017-07-291-0/+1
| | | | llvm-svn: 309470
* [AMDGPU][MC] Added check for truncation of SOPK imm operandDmitry Preobrazhensky2017-04-261-0/+16
| | | | | | | | | | See bug 30827: https://bugs.llvm.org//show_bug.cgi?id=30827 Reviewers: artem.tamazov, vpykhtin Differential Revision: https://reviews.llvm.org/D32535 llvm-svn: 301418
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-16/+16
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* AMDGPU: Use v_med3_{f16|i16|u16}Matt Arsenault2017-02-271-3/+4
| | | | llvm-svn: 296401
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-271-0/+10
| | | | llvm-svn: 296396
* AMDGPU: Add another BFE patternMatt Arsenault2017-02-231-36/+50
| | | | | | | This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-1/+1
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Use source modifiers with f16->f32 conversionsMatt Arsenault2017-02-021-0/+3
| | | | | | | | | | | The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
* AMDGPU: Generalize matching of v_med3_f32Matt Arsenault2017-01-311-0/+2
| | | | | | | | | | I think this is safe as long as no inputs are known to ever be nans. Also add an intrinsic for fmed3 to be able to handle all safe math cases. llvm-svn: 293598
* [AMDGPU] Implement f16 fcopysign and fcopysign(f32, f64)Konstantin Zhuravlyov2017-01-131-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D28496 llvm-svn: 291954
* AMDGPU: Fix sub_oneuse being marked commutativeMatt Arsenault2017-01-121-1/+2
| | | | llvm-svn: 291748
* AMDGPU: split ret/noret patterns for global atomicsJan Vesely2016-12-231-18/+48
| | | | | | Differential Revision: https://reviews.llvm.org/D27989 llvm-svn: 290435
* AMDGPU: Implement f16 fcanonicalizeMatt Arsenault2016-12-221-0/+1
| | | | llvm-svn: 290300
* [AMDGPU] Add f16 support (VI+)Konstantin Zhuravlyov2016-11-131-0/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
* AMDGPU: Add VI i16 supportTom Stellard2016-11-101-3/+3
| | | | | | | | Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 286464
* Revert "AMDGPU: Add VI i16 support"Tom Stellard2016-11-041-3/+3
| | | | | | This reverts commit r285939 and r285948. These broke some conformance tests. llvm-svn: 285995
* AMDGPU: Add VI i16 supportTom Stellard2016-11-031-3/+3
| | | | | | | | Patch By: Wei Ding Differential Revision: https://reviews.llvm.org/D18049 llvm-svn: 285939
* [AMDGPU] add fcopysign(f64, f32) patternValery Pykhtin2016-10-201-0/+9
| | | | | | Differential revision: https://reviews.llvm.org/D25827 llvm-svn: 284743
* Target: Remove unused patterns and transforms. NFC.Peter Collingbourne2016-10-071-13/+0
| | | | llvm-svn: 283515
* AMDGPU] Assembler: better support for immediate literals in assembler.Sam Kolton2016-09-091-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals. E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least. With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction). Here are rules how we convert literals: - We parsed fp literal: - Instruction expects 64-bit operand: - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5) - then we do nothing this literal - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5) - report error - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant - If instruction expect fp operand type (f64) - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5) - If so then do nothing - Else (e.g. v_fract_f64 v[0:1], 3.1415) - report warning that low 32 bits will be set to zeroes and precision will be lost - set low 32 bits of literal to zeroes - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5) - report error as it is unclear how to encode this literal - Instruction expects 32-bit operand: - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5) - do nothing - Else report error - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0) - Parsed binary literal: - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35) - do nothing - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35) - report error - Else, literal is not-inlinable and we are not required to inline it - Are high 32 bit of literal zeroes or same as sign bit (32 bit) - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef) - Else - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0) For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types: ''' enum OperandType { OPERAND_REG_IMM32_INT, OPERAND_REG_IMM32_FP, OPERAND_REG_INLINE_C_INT, OPERAND_REG_INLINE_C_FP, } ''' This is not working yet: - Several tests are failing - Problems with predicate methods for inline immediates - LLVM generated assembler parts try to select e64 encoding before e32. More changes are required for several AsmOperands. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, artem.tamazov Differential Revision: https://reviews.llvm.org/D22922 llvm-svn: 281050
* [AMDGPU] Refactor FLAT TD instructionsValery Pykhtin2016-09-051-19/+0
| | | | | | Differential revision: https://reviews.llvm.org/D24072 llvm-svn: 280655
* AMDGPU : Add V_SAD_U32 instruction pattern.Wei Ding2016-08-241-0/+8
| | | | | | Differential Revision: http://reviews.llvm.org/D23069 llvm-svn: 279629
* AMDGPU: Fix i1 fp_to_intMatt Arsenault2016-07-221-1/+2
| | | | | | | R600's i1 fp_to_uint selected but was incorrect according to what instcombine constant folds to. llvm-svn: 276435
* AMDGPU/R600: Remove intrinsics with no tests and no usersMatt Arsenault2016-07-141-4/+4
| | | | | | Mesa removed this path, so nothing is using these anymore. llvm-svn: 275372
* AMDGPU: Fix folding SGPRs into madak/madmk src0Matt Arsenault2016-07-051-0/+7
| | | | | | | | | | Because of the special immediate operand, the constant bus is already used so SGPRs are never useful. r263212 changed the name of the immediate operand, which broke the verifier check for the restriction. llvm-svn: 274564
* AMDGPU/SI: Remove address space query functions from AMDGPUDAGToDAGISelTom Stellard2016-07-051-92/+65
| | | | | | | | | | | | | | | Summary: These have been replaced with TableGen code (except for isConstantLoad, which is still used for R600). The queries were broken for cases where MemOperand was a PseudoSourceValue. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21684 llvm-svn: 274561
* AMDGPU: Fix trailing whitespaceMatt Arsenault2016-06-101-1/+1
| | | | llvm-svn: 272364
* AMDGPU: Fix flat atomicsMatt Arsenault2016-06-091-0/+19
| | | | | | | | The flat atomics could already be selected, but only when using flat instructions for global memory. Add patterns for flat addresses. llvm-svn: 272345
* AMDGPU: Implement canonicalizeMatt Arsenault2016-04-141-0/+1
| | | | | | Also add generic DAG node for it. llvm-svn: 266272
* AMDGPU/SI: Implement atomic load/store for i32 and i64Jan Vesely2016-04-071-0/+5
| | | | | | | | | | Standard load/store instructions with GLC bit set. Reviewers: tstellardAMD, arsenm Differential Revision: http://reviews.llvm.org/D18760 llvm-svn: 265709
* AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}Tom Stellard2016-04-011-0/+7
| | | | | | | | | | | | | | | | | Summary: Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+. 32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý. Patch by: Vedran Miletić Reviewers: arsenm, tstellarAMD, nhaehnle Subscribers: jvesely, scchan, kanarayan, arsenm Differential Revision: http://reviews.llvm.org/D17280 llvm-svn: 265170
* AMDGPU: Match more med3 integer patternsMatt Arsenault2016-03-071-0/+30
| | | | llvm-svn: 262864
* [AMDGPU] Disassembler: Added basic disassembler for AMDGPU targetTom Stellard2016-02-181-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes: - Added disassembler project - Fixed all decoding conflicts in .td files - Added DecoderMethod=“NONE” option to Target.td that allows to disable decoder generation for an instruction. - Created decoding functions for VS_32 and VReg_32 register classes. - Added stubs for decoding all register classes. - Added several tests for disassembler Disassembler only supports: - VI subtarget - VOP1 instruction encoding - 32-bit register operands and inline constants [Valery] One of the point that requires to pay attention to is how decoder conflicts were resolved: - Groups of target instructions were separated by using different DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate approach. - There were conflicts in IMAGE_<> instructions caused by two different reasons: 1. dmask wasn’t specified for the output (fixed) 2. There are image instructions that differ only by the number of the address components but have the same encoding by the HW spec. The actual number of address components is determined by the HW at runtime using image resource descriptor starting from the VGPR encoded in an IMAGE instruction. This means that we should choose only one instruction from conflicting group to be the rule for decoder. I didn’t find the way to disable decoder generation for an arbitrary instruction and therefore made a onelinear fix to tablegen generator that would suppress decoder generation when DecoderMethod is set to “NONE”. This is a change that should be reviewed and submitted first. Otherwise I would need to specify different DecoderNamespace for every instruction in the conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not used in other targets. 3. IMAGE_GATHER decoder generation is for now disabled and to be done later. [/Valery] Patch By: Sam Kolton Differential Revision: http://reviews.llvm.org/D16723 llvm-svn: 261185
* AMDGPU: Remove 24-bit intrinsicsMatt Arsenault2016-01-291-24/+0
| | | | | | | The known bit matching code seems to work reasonably well, so these shouldn't really be needed. llvm-svn: 259180
* AMDGPU: Tidy minor td file issuesMatt Arsenault2016-01-261-7/+0
| | | | | | | | | | Make comments and indentation more consistent. Rearrange a few things to be in a more consistent order, such as organizing subtarget features from those describing an actual device property, and those used as options. llvm-svn: 258789
* AMDGPU/SI: Consolidate FLAT patternsTom Stellard2016-01-051-35/+0
| | | | | | | | | | | | | | | | | | | | | | | Summary: We had to sets of identical FLAT patterns one inside the HasFlatAddressSpace predicate and one inside the useFlatForGloabl predicate. This patch merges these sets into a single pattern under the isCIVI predicate. The reason we can remove the predicates is that when MUBUF instructions are legal, the instruction selector will prefer selecting those over FLAT instructions because MUBUF patterns have a higher complexity score. So, in this case having patterns for FLAT instructions will have no effect. This change also simplifies the process for forcing global address space loads to use FLAT instructions, since we no only have to disable the MUBUF patterns instead of having to disable the MUBUF patterns and enable the FLAT patterns. Reviewers: arsenm, cfang Subscribers: llvm-commits llvm-svn: 256807
* Start replacing vector_extract/vector_insert with extractelt/inserteltMatt Arsenault2015-12-111-2/+2
| | | | | | | | | | | | | | | | | | | | These are redundant pairs of nodes defined for INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT. insertelement/extractelement are slightly closer to the corresponding C++ node name, and has stricter type checking so prefer it. Update targets to only use these nodes where it is trivial to do so. AArch64, ARM, and Mips all have various type errors on simple replacement, so they will need work to fix. Example from AArch64: def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8), (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>; Which is trying to do sext_inreg i8, i8. llvm-svn: 255359
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+682
| | | | llvm-svn: 239657
* Revert "AMDGPU: Add core backend files for R600/SI codegen v6"Tom Stellard2012-07-161-123/+0
| | | | | | This reverts commit 4ea70107c5e51230e9e60f0bf58a0f74aa4885ea. llvm-svn: 160303
* AMDGPU: Add core backend files for R600/SI codegen v6Tom Stellard2012-07-161-0/+123
llvm-svn: 160270
OpenPOWER on IntegriCloud