summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Refactor argument loweringMatt Arsenault2017-04-111-5/+18
| | | | | | | Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
* AMDGPU: Stop using CCAssignToRegWithShadowMatt Arsenault2017-04-061-0/+31
| | | | | | | This does not do what it is attempting to use it for and requires working around in LowerFormalArguments. llvm-svn: 299667
* AMDGPU: Remove llvm.SI.vs.load.inputMatt Arsenault2017-04-031-1/+0
| | | | llvm-svn: 299391
* AMDGPU: Remove legacy bfe intrinsicsMatt Arsenault2017-04-031-27/+0
| | | | llvm-svn: 299372
* AMDGPU: Remove unnecessary ands when f16 is legalMatt Arsenault2017-03-311-2/+8
| | | | | | | | | | Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
* [DAGCombiner] Add vector demanded elements support to ComputeNumSignBitsSimon Pilgrim2017-03-311-1/+2
| | | | | | | | | | | | | | Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219
* [DAGCombiner] Add vector demanded elements support to ↵Simon Pilgrim2017-03-311-1/+1
| | | | | | | | | | computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201
* [AMDGPU] Tidy up ↵Simon Pilgrim2017-03-291-13/+6
| | | | | | | | computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI. Based on comment in D31249. llvm-svn: 298991
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-8/+6
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* AMDGPU: Implement f16 froundMatt Arsenault2017-03-241-13/+18
| | | | llvm-svn: 298730
* AMDGPU: Rename SI_RETURNMatt Arsenault2017-03-211-2/+3
| | | | | | | | This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
* AMDGPU: Cleanup control flow intrinsicsMatt Arsenault2017-03-171-0/+3
| | | | | | | | | | | | | | | | Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
* AMDGPU: Fix unnecessary ands when packing f16 vectorsMatt Arsenault2017-03-151-3/+16
| | | | | | | | | computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
* AMDGPU: Constant fold rcp nodeMatt Arsenault2017-03-081-2/+12
| | | | | | | When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. llvm-svn: 297248
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-271-3/+8
| | | | llvm-svn: 296396
* AMDGPU : Replace FMAD with FMA when denormals are enabled.Wei Ding2017-02-241-1/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-221-0/+1
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* AMDGPU: Remove llvm.AMDGPU.clamp intrinsicMatt Arsenault2017-02-211-11/+0
| | | | llvm-svn: 295789
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-14/+45
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Add pass to expand memcpy/memmove/memsetMatt Arsenault2017-02-091-4/+5
| | | | llvm-svn: 294635
* AMDGPU: Fold fneg into fmin/fmax_legacyMatt Arsenault2017-02-031-2/+24
| | | | llvm-svn: 293972
* AMDGPU: Fold fneg into fminnum/fmaxnumMatt Arsenault2017-02-031-0/+30
| | | | llvm-svn: 293968
* AMDGPU: Check if users of fneg can fold modsMatt Arsenault2017-02-021-4/+64
| | | | | | In multi-use cases this can save a few instructions. llvm-svn: 293962
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-02-021-0/+10
| | | | | | | | | UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-021-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893
* AMDGPU: Use source modifiers with f16->f32 conversionsMatt Arsenault2017-02-021-0/+42
| | | | | | | | | | | The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
* AMDGPU: Cleanup fmin/fmax legacy functionMatt Arsenault2017-02-011-11/+8
| | | | | | Use a more specific subtarget check and combine hasOneUse checks llvm-svn: 293726
* Re-commit AMDGPU/GlobalISel: Add support for simple shadersTom Stellard2017-01-301-0/+6
| | | | | | | | | | | | | | Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293551
* Revert "AMDGPU/GlobalISel: Add support for simple shaders"Tom Stellard2017-01-301-6/+0
| | | | | | | | This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509
* AMDGPU/GlobalISel: Add support for simple shadersTom Stellard2017-01-301-0/+6
| | | | | | | | | | | | Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503
* Cleanup dump() functions.Matthias Braun2017-01-281-1/+1
| | | | | | | | | | | | | | | | | | We had various variants of defining dump() functions in LLVM. Normalize them (this should just consistently implement the things discussed in http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html For reference: - Public headers should just declare the dump() method but not use LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) - The definition of a dump method should look like this: #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD void MyClass::dump() { // print stuff to dbgs()... } #endif llvm-svn: 293359
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2017-01-261-0/+10
| | | | | | | | UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-01-261-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184
* AMDGPU: Fold fneg into round instructionsMatt Arsenault2017-01-261-1/+7
| | | | llvm-svn: 293127
* AMDGPU: Propagate fast math flags in fneg combinesMatt Arsenault2017-01-231-3/+3
| | | | | | Can't for fma/mad since it seems they can't have flags currently. llvm-svn: 292818
* AMDGPU/R600: Serialize vector trunc stores to private ASJan Vesely2017-01-201-0/+1
| | | | | | | | | | | Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651
* AMDGPU: Disable some fneg combines unless nszMatt Arsenault2017-01-191-0/+6
| | | | | | | | | | | | For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473
* AMDGPU: Skip fneg/select combine if it can fold into otherMatt Arsenault2017-01-121-29/+40
| | | | llvm-svn: 291792
* AMDGPU: Fold free fneg into sinMatt Arsenault2017-01-121-1/+5
| | | | llvm-svn: 291790
* AMDGPU: Fold fneg into fmul_legacyMatt Arsenault2017-01-121-2/+5
| | | | llvm-svn: 291784
* AMDGPU: Fold fneg into rcpMatt Arsenault2017-01-121-1/+7
| | | | llvm-svn: 291779
* AMDGPU: Fold fneg into fp_roundMatt Arsenault2017-01-121-2/+18
| | | | llvm-svn: 291778
* AMDGPU: Fold fneg into fp_extendMatt Arsenault2017-01-121-0/+14
| | | | llvm-svn: 291777
* AMDGPU: Fold fneg into fma or fmadMatt Arsenault2017-01-121-0/+24
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291733
* AMDGPU: Fold fneg into fmulMatt Arsenault2017-01-121-0/+17
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291732
* AMDGPU: Fold fneg into faddMatt Arsenault2017-01-121-0/+60
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291731
* AMDGPU: Pull fneg/fabs out of a selectMatt Arsenault2017-01-111-0/+74
| | | | | | Allows better source modifier usage. llvm-svn: 291729
* AMDGPU: Add tests for HasMultipleConditionRegistersMatt Arsenault2017-01-101-0/+7
| | | | | | This was enabled without many specific tests or the comment. llvm-svn: 291586
* AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodesJan Vesely2017-01-061-10/+0
| | | | | | | | This will make transition to SCRATCH_MEMORY easier Differential Revision: https://reviews.llvm.org/D24746 llvm-svn: 291279
* AMDGPU/SI: Implement sendmsghalt intrinsicJan Vesely2017-01-041-0/+1
| | | | | | | | v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977
OpenPOWER on IntegriCloud