summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
Commit message (Collapse)AuthorAgeFilesLines
* [AMDGPU] Add intrinsics for tbuffer load and storeDavid Stuttard2017-06-221-0/+2
| | | | | | | | | | | | | | | Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031
* AMDGPU: Cleanup CreateLiveInRegisterMatt Arsenault2017-06-191-4/+19
| | | | llvm-svn: 305748
* [AMDGPU] Convert shl (add) into add (shl)Stanislav Mekhanoshin2017-05-231-0/+3
| | | | | | | | | | | shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-3/+2
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: Pull fneg out of extract_vector_eltMatt Arsenault2017-05-111-0/+2
| | | | | | | This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813
* Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.Amara Emerson2017-05-011-2/+3
| | | | | | | | This removes BinaryWithFlagsSDNode, and flags are now all passed by value. Differential Revision: https://reviews.llvm.org/D32527 llvm-svn: 301803
* AMDGPU: Add new amdgcn.init.exec intrinsicsMarek Olsak2017-04-281-0/+2
| | | | | | | | | | v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677
* [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and ↵Craig Topper2017-04-281-2/+1
| | | | | | | | | | | | simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620
* CodeGen: Add a hook for getFenceOperandTyYaxun Liu2017-04-241-0/+4
| | | | | | | | | | | | | | Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, the size of pointer in address space 0 depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is 32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target triple environment. Therefore a hook is need in target lowering for getting the fence operand type. This patch has no effect on targets other than amdgcn. Differential Revision: https://reviews.llvm.org/D32186 llvm-svn: 301215
* AMDGPU: Move trap lowering to DAGMatt Arsenault2017-04-241-0/+1
| | | | | | | | | | | Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206
* AMDGPU: Refactor argument loweringMatt Arsenault2017-04-111-2/+2
| | | | | | | Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
* AMDGPU: Remove llvm.SI.vs.load.inputMatt Arsenault2017-04-031-1/+0
| | | | llvm-svn: 299391
* AMDGPU: Remove legacy bfe intrinsicsMatt Arsenault2017-04-031-1/+0
| | | | llvm-svn: 299372
* AMDGPU: Remove unnecessary ands when f16 is legalMatt Arsenault2017-03-311-0/+3
| | | | | | | | | | Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
* [DAGCombiner] Add vector demanded elements support to ComputeNumSignBitsSimon Pilgrim2017-03-311-1/+2
| | | | | | | | | | | | | | Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219
* [DAGCombiner] Add vector demanded elements support to ↵Simon Pilgrim2017-03-311-0/+1
| | | | | | | | | | computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-0/+6
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* AMDGPU: Implement f16 froundMatt Arsenault2017-03-241-1/+1
| | | | llvm-svn: 298730
* AMDGPU: Rename SI_RETURNMatt Arsenault2017-03-211-2/+11
| | | | | | | | This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
* AMDGPU: Cleanup control flow intrinsicsMatt Arsenault2017-03-171-0/+6
| | | | | | | | | | | | | | | | Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
* AMDGPU: Fix unnecessary ands when packing f16 vectorsMatt Arsenault2017-03-151-0/+4
| | | | | | | | | computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
* AMDGPU : Replace FMAD with FMA when denormals are enabled.Wei Ding2017-02-241-0/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-221-0/+5
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-0/+5
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Use source modifiers with f16->f32 conversionsMatt Arsenault2017-02-021-0/+1
| | | | | | | | | | | The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
* AMDGPU: Cleanup fmin/fmax legacy functionMatt Arsenault2017-02-011-1/+1
| | | | | | Use a more specific subtarget check and combine hasOneUse checks llvm-svn: 293726
* AMDGPU: Check nsz instead of unsafe mathMatt Arsenault2017-01-251-1/+1
| | | | llvm-svn: 293028
* AMDGPU/R600: Serialize vector trunc stores to private ASJan Vesely2017-01-201-0/+1
| | | | | | | | | | | Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651
* AMDGPU: Disable some fneg combines unless nszMatt Arsenault2017-01-191-0/+10
| | | | | | | | | | | | For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473
* AMDGPU: Fold fneg into faddMatt Arsenault2017-01-121-0/+1
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291731
* AMDGPU/SI: Implement sendmsghalt intrinsicJan Vesely2017-01-041-0/+1
| | | | | | | | v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977
* AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.*Tom Stellard2016-12-201-0/+2
| | | | | | | | | | Reviewers: arsenm, nhaehnle, mareko Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27834 llvm-svn: 290184
* AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.Tom Stellard2016-12-071-0/+4
| | | | | | | | | | | | | | Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879
* AMDGPU: Refactor exp instructionsMatt Arsenault2016-12-051-1/+3
| | | | | | | | | | | | | | | Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695
* [DAG Combiner] Fix the native computation of the Newton series for reciprocalsEvandro Menezes2016-11-101-3/+3
| | | | | | | | | | | | The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own. Differential revision: https://reviews.llvm.org/D22975 llvm-svn: 286523
* [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32Konstantin Zhuravlyov2016-11-011-0/+7
| | | | | | | | | | | This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716
* AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64Tom Stellard2016-11-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | I wanted to implement this as a target independent expansion, however when targets say they want to expand FP_TO_FP16 what they actually want is the unsafe math expansion when possible and expansion to a libcall in all other cases. The only way to make this work as a target independent would be to add logic to target's TargetLowering construction to mark theses nodes as Expand when LegalizeDAG can use the unsafe expansion and mark them as LibCall when it cannot. I think this would be possible, but I think it would be too fragile and complex as it would require targets to keep their expansion logic up to date with the code in LegalizeDAG. Reviewers: bogner, ab, t.p.northover, arsenm Subscribers: wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25999 llvm-svn: 285704
* [Target] remove TargetRecip class; 2nd trySanjay Patel2016-10-201-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284746
* revert r284495: [Target] remove TargetRecip classSanjay Patel2016-10-181-4/+6
| | | | | | There's something wrong with the StringRef usage while parsing the attribute string. llvm-svn: 284513
* [Target] remove TargetRecip class; move reciprocal estimate isel ↵Sanjay Patel2016-10-181-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | functionality to TargetLowering This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284495
* AMDGPU: Refactor kernel argument loweringTom Stellard2016-09-161-10/+2
| | | | | | | | | | | | | | | | | | | Summary: The main challenge in lowering kernel arguments for AMDGPU is determing the memory type of the argument. The generic calling convention code assumes that only legal register types can be stored in memory, but this is not the case for AMDGPU. This consolidates all the logic AMDGPU uses for deducing memory types into a single function. This will make it much easier to support different ABIs in the future. Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24614 llvm-svn: 281781
* AMDGPU: Improve splitting 64-bit bit ops by constantsMatt Arsenault2016-09-141-1/+4
| | | | | | | | This addresses a TODO to handle operations besides and. This also starts eliminating no-op operations with a constant that can emerge later. llvm-svn: 281488
* AMDGPU/R600: Remove MergeVectorStores from legalizationJan Vesely2016-08-291-3/+0
| | | | | | | | This is handled by DAGCombiner in a more generic way Differential Revision: https://reviews.llvm.org/D23970 llvm-svn: 280019
* AMDGPU: Select mulhi 24-bit instructionsMatt Arsenault2016-08-271-2/+9
| | | | llvm-svn: 279902
* [X86] Heuristic to selectively build Newton-Raphson SQRT estimationNikolai Bozhenov2016-08-041-0/+3
| | | | | | | | | | | | | | | | | | | | | On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725
* AMDGPU : Add intrinsics for compare with the full wavefront resultWei Ding2016-07-281-0/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D22482 llvm-svn: 276998
* AMDGPU: Add fp legacy instruction intrinsicsMatt Arsenault2016-07-261-0/+2
| | | | | | | This could use some additional optimization work to use mad/mac legacy. llvm-svn: 276764
* AMDGPU: Delete dead codeMatt Arsenault2016-07-251-1/+0
| | | | llvm-svn: 276675
* AMDGPU: Delete dead codeMatt Arsenault2016-07-231-4/+0
| | | | | | This has been dead since r269479 llvm-svn: 276518
* AMDGPU: Only use legal inline immediates with kill pseudoMatt Arsenault2016-07-191-0/+1
| | | | | | | | | | | Only if the value is negative or positive is what matters, so use a constant that doesn't require an instruction to materialize. These should really just emit the write exec directly, but for stick with the kill pseudo-terminator. llvm-svn: 275988
OpenPOWER on IntegriCloud