summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIInstructions.td
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Turn int pack pattern into build_vectorMatt Arsenault2017-08-311-0/+7
| | | | | | | | | | build_vector is a more useful canonical form when pattern matching packed operations, so turn shift into high element into a build_vector. Should show no change for now. llvm-svn: 312282
* [AMDGPU] Use v_max_f* for fcanonicalizeStanislav Mekhanoshin2017-08-301-0/+27
| | | | | | | | | | If denorms are not flushed we can use max instead of multiplication by 1. For double that is simply faster, while for float and half it is shorter, because mul uses constant bus and VOP3. Differential Revision: https://reviews.llvm.org/D36856 llvm-svn: 312095
* AMDGPU: Select clamp pattern with v2f16Matt Arsenault2017-08-301-0/+6
| | | | llvm-svn: 312087
* [AMDGPU][MC][GFX9] Added integer clamping support for VOP3 opcodesDmitry Preobrazhensky2017-08-161-4/+4
| | | | | | | | | | See Bug 34152: https://bugs.llvm.org//show_bug.cgi?id=34152 Reviewers: SamWot, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D36674 llvm-svn: 311006
* AMDGPU: Start adding tail call supportMatt Arsenault2017-08-111-0/+25
| | | | | | Handle the sibling call cases. llvm-svn: 310753
* [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsicConnor Abbott2017-08-041-0/+14
| | | | | | | | | | | | | | | | | | | Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefront Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan. Lowering the intrinsic needs to happen post-RA so that RA knows that the destination isn't completely overwritten due to the EXEC shenanigans, so we need another pseudo-instruction to represent the un-lowered intrinsic. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34719 llvm-svn: 310088
* [AMDGPU] Add support for Whole Wavefront ModeConnor Abbott2017-08-041-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 llvm-svn: 310087
* [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQMConnor Abbott2017-08-041-0/+5
| | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 llvm-svn: 310085
* AMDGPU: Analyze callee resource usage in AsmPrinterMatt Arsenault2017-08-021-4/+16
| | | | llvm-svn: 309781
* AMDGPU: Initial implementation of callsMatt Arsenault2017-08-011-0/+39
| | | | | | | | | Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732
* AMDGPU: Introduce maybeAtomic instruction flagKonstantin Zhuravlyov2017-07-211-0/+1
| | | | | | Testing is in the follow up change llvm-svn: 308779
* [AMDGPU][MC][GFX9] Added support of VOP3 'op_sel' modifierDmitry Preobrazhensky2017-07-211-3/+23
| | | | | | | | | | See bug 33591: https://bugs.llvm.org//show_bug.cgi?id=33591 Reviewers: vpykhtin, artem.tamazov, SamWot, arsenm Differential Revision: https://reviews.llvm.org/D35424 llvm-svn: 308740
* AMDGPU: Add encoding for carryless add/sub instructionsMatt Arsenault2017-07-201-0/+35
| | | | llvm-svn: 308639
* [AMDGPU] resubmit r308179: CodeGen: check dst operand type to determine if ↵Sam Kolton2017-07-181-1/+1
| | | | | | omod is supported for VOP3 instructions llvm-svn: 308310
* Revert r308179 which causes tablegen to spam stderr on every build.Chandler Carruth2017-07-181-1/+1
| | | | | | | Original commit log: [AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions llvm-svn: 308270
* [AMDGPU] CodeGen: check dst operand type to determine if omod is supported ↵Sam Kolton2017-07-171-1/+1
| | | | | | | | | | | | | | | | for VOP3 instructions Summary: Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod. Changed .td files to check if dst operand instead of src operand. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D35350 llvm-svn: 308179
* [AMDGPU] Add pattern for v_alignbit_b32 with immediateStanislav Mekhanoshin2017-06-281-3/+2
| | | | | | | | If immediate in shift is less than 32 we can use alignbit too. Differential Revision: https://reviews.llvm.org/D34729 llvm-svn: 306500
* [AMDGPU] Add 2 new alignbit patternsStanislav Mekhanoshin2017-06-271-0/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D34655 llvm-svn: 306449
* Re-submit AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-151-0/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111
* Revert 303091.Jan Sjodin2017-05-151-7/+0
| | | | llvm-svn: 303098
* Add AMDGPUMachineCFGStructurizer.Jan Sjodin2017-05-151-0/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091
* AMDGPU: Add new amdgcn.init.exec intrinsicsMarek Olsak2017-04-281-0/+23
| | | | | | | | | | v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677
* AMDGPU: Clean up VOP3NoMods patternMatt Arsenault2017-04-251-10/+9
| | | | | | | There is no need to copy the operands or inspect the sources. Also remove some unnecessary clamp/omod usage. llvm-svn: 301363
* CodeGen: Add a hook for getFenceOperandTyYaxun Liu2017-04-241-0/+6
| | | | | | | | | | | | | | Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, the size of pointer in address space 0 depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is 32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target triple environment. Therefore a hook is need in target lowering for getting the fence operand type. This patch has no effect on targets other than amdgcn. Differential Revision: https://reviews.llvm.org/D32186 llvm-svn: 301215
* AMDGPU: Move trap lowering to DAGMatt Arsenault2017-04-241-13/+2
| | | | | | | | | | | Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206
* AMDGPU: Diagnose illegal SGPR to VGPR copiesMatt Arsenault2017-04-061-0/+4
| | | | | | | | | | This is possible in ways that are not compiler bugs, so stop asserting on them. This emits an extra error when emitting objects when it can't encode the new pseudo, but I'm not sure that matters. llvm-svn: 299712
* AMDGPU: Remove unnecessary ands when f16 is legalMatt Arsenault2017-03-311-0/+5
| | | | | | | | | | Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
* AMDGPU: Unify divergent function exits.Matt Arsenault2017-03-241-2/+13
| | | | | | | | | | StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. llvm-svn: 298729
* AMDGPU: Remove hasSideEffects from SI_RETURN_TO_EPILOGMatt Arsenault2017-03-211-1/+0
| | | | llvm-svn: 298454
* AMDGPU: Rename SI_RETURNMatt Arsenault2017-03-211-2/+3
| | | | | | | | This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
* AMDGPU: Cleanup control flow intrinsicsMatt Arsenault2017-03-171-14/+9
| | | | | | | | | | | | | | | | Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
* AMDGPU: Fix unnecessary ands when packing f16 vectorsMatt Arsenault2017-03-151-1/+1
| | | | | | | | | computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
* AMDGPU: Use v_med3_{f16|i16|u16}Matt Arsenault2017-02-271-8/+13
| | | | llvm-svn: 296401
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-271-0/+63
| | | | llvm-svn: 296396
* AMDGPU: Add VOP3P instruction formatMatt Arsenault2017-02-271-0/+6
| | | | | | | | Add a few non-VOP3P but instructions related to packed. Includes hack with dummy operands for the benefit of the assembler llvm-svn: 296368
* AMDGPU : Replace FMAD with FMA when denormals are enabled.Wei Ding2017-02-241-0/+10
| | | | | | Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
* AMDGPU/SI: Fix trunc i16 patternJan Vesely2017-02-231-0/+5
| | | | | | | | Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990
* AMDGPU: Add another BFE patternMatt Arsenault2017-02-231-2/+1
| | | | | | | This is the pattern that falls out of the instruction's definition if offset == 0. llvm-svn: 295912
* AMDGPU: Use clamp with f64Matt Arsenault2017-02-221-1/+1
| | | | llvm-svn: 295908
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-221-2/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295904
* Revert "AMDGPU : Update TrapCode based on Trap Handler ABI."Wei Ding2017-02-221-2/+2
| | | | | | This reverts commit r295867. llvm-svn: 295871
* AMDGPU : Update TrapCode based on Trap Handler ABI.Wei Ding2017-02-221-2/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-4/+12
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Remove llvm.AMDGPU.cube intrinsicMatt Arsenault2017-02-161-21/+0
| | | | llvm-svn: 295359
* AMDGPU : Add trap handler support.Wei Ding2017-02-101-2/+10
| | | | | | Differential Revision: http://reviews.llvm.org/D26010 llvm-svn: 294692
* AMDGPU: Use source modifiers with f16->f32 conversionsMatt Arsenault2017-02-021-4/+22
| | | | | | | | | | | The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
* AMDGPU: Use source mods with fcanonicalizeMatt Arsenault2017-01-311-6/+6
| | | | llvm-svn: 293654
* AMDGPU: Generalize matching of v_med3_f32Matt Arsenault2017-01-311-0/+14
| | | | | | | | | | I think this is safe as long as no inputs are known to ever be nans. Also add an intrinsic for fmed3 to be able to handle all safe math cases. llvm-svn: 293598
* AMDGPU: Undo sub x, c -> add x, -c canonicalizationMatt Arsenault2017-01-301-0/+9
| | | | | | | | | This is worse if the original constant is an inline immediate. This should also be done for 64-bit adds, but requires fixing operand folding bugs first. llvm-svn: 293540
* AMDGPU : Add trap handler support.Wei Ding2017-01-241-0/+7
| | | | llvm-svn: 292893
OpenPOWER on IntegriCloud