summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-7/+7
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* AMDGPU: Fix build warning about overrideMatt Arsenault2018-03-051-3/+3
| | | | llvm-svn: 326713
* Pass Divergence Analysis data to Selection DAG to drive divergenceAlexander Timofeev2018-03-051-0/+3
| | | | | | | | dependent instruction selection. Differential revision: https://reviews.llvm.org/D35267 llvm-svn: 326703
* AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}Marek Olsak2018-01-311-0/+4
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D41663 llvm-svn: 323908
* AMDGPU/SI: Add d16 support for image intrinsics.Changpeng Fang2018-01-181-0/+85
| | | | | | | | | | | | | Summary: This patch implements d16 support for image load, image store and image sample intrinsics. Reviewers: Matt, Brian. Differential Revision: https://reviews.llvm.org/D3991 llvm-svn: 322903
* [AMDGPU] add LDS f32 intrinsicsDaniil Fukalov2018-01-171-0/+3
| | | | | | | | | | | | added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics to allow generate ds_{add|min|max}[_rtn]_f32 instructions needed for OpenCL float atomics in LDS Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D37985 llvm-svn: 322656
* AMDGPU/SI: Add d16 support for buffer intrinsics.Changpeng Fang2018-01-121-0/+4
| | | | | | | | | | Differential Revision: https://reviews.llvm.org/D38906 Reviewers: Matt and Brian. llvm-svn: 322402
* [AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for ↵Mark Searles2017-12-191-0/+10
| | | | | | | | AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed. Differential Revision: https://reviews.llvm.org/D41377 llvm-svn: 321100
* [AMDGPU] Add custom lowering for llvm.log{,10}.{f16,f32} intrinsicsVedran Miletic2017-11-271-0/+2
| | | | | | | | | | | | | | AMDGPU backend errors with "unsupported call to function" upon encountering a call to llvm.log{,10}.{f16,f32} intrinsics. This patch adds custom lowering to avoid that error on both R600 and SI. Reviewers: arsenm, jvesely Subscribers: tstellar Differential Revision: https://reviews.llvm.org/D29942 llvm-svn: 319025
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-1/+1
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* AMDGPU: Lower buffer store and atomic intrinsics manuallyMarek Olsak2017-11-091-0/+13
| | | | | | | | | | | | | | Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 llvm-svn: 317754
* AMDGPU: Remove redundant combineMatt Arsenault2017-11-071-1/+0
| | | | | | | | | | | | | | | | | | | | This combine was already done in two places. The generic combiner already has done this since r217610, for adds (with a single use). This one was added in r303641, and added support for handling or as well. r313251 later added support to the generic combine for or. It also turns out the isOrEquivalentToAdd check is not necessary for this combine. Additionally, we already reproduce this combine in yet another place in the backend, although in that version multiple uses of the add are still folded if it will allow a fold into the addressing mode. That version needs to be improved to understand ors though, as well as the correct legal offsets for private. llvm-svn: 317526
* AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32Matt Arsenault2017-11-061-0/+4
| | | | llvm-svn: 317492
* AMDGPU: Implement isFPExtFoldableMatt Arsenault2017-10-131-0/+1
| | | | | | This helps match v_mad_mix* in some cases. llvm-svn: 315744
* Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.Wei Ding2017-10-121-3/+4
| | | | | | Differential Revision: http://reviews.llvm.org/D37348 llvm-svn: 315610
* AMDGPU: Start adding tail call supportMatt Arsenault2017-08-111-0/+6
| | | | | | Handle the sibling call cases. llvm-svn: 310753
* AMDGPU: Don't use report_fatal_error for unsupported call typesMatt Arsenault2017-08-031-0/+4
| | | | llvm-svn: 310004
* AMDGPU: Pass special input registers to functionsMatt Arsenault2017-08-031-1/+20
| | | | llvm-svn: 309998
* AMDGPU: Return correct type during argument loweringMatt Arsenault2017-07-151-0/+1
| | | | | | | | | | | | | | | | | | | The type needs to be casted back to the original argument type. Fixes an assert that for some reason is only run when using -debug. Includes an additional combine to avoid test regressions from having conversions mixed with multiple Assert[SZ]ext nodes. On subtargets where i16 is legal, this was producing an i32 register with an i16 AssertZExt, truncated to i16 with another i8 AssertZExt. t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 t3: i16 = truncate t2 t5: i16 = AssertZext t3, ValueType:ch:i8 t6: i8 = truncate t5 t7: i32 = zero_extend t6 llvm-svn: 308082
* [AMDGPU] Add intrinsics for tbuffer load and storeDavid Stuttard2017-06-221-0/+2
| | | | | | | | | | | | | | | Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031
* AMDGPU: Cleanup CreateLiveInRegisterMatt Arsenault2017-06-191-4/+19
| | | | llvm-svn: 305748
* [AMDGPU] Convert shl (add) into add (shl)Stanislav Mekhanoshin2017-05-231-0/+3
| | | | | | | | | | | shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1) This allows to fold a constant into an address in some cases as well as to eliminate second shift if the expression is used as an address and second shift is a result of a GEP. Differential Revision: https://reviews.llvm.org/D33432 llvm-svn: 303641
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-3/+2
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: Pull fneg out of extract_vector_eltMatt Arsenault2017-05-111-0/+2
| | | | | | | This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813
* Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.Amara Emerson2017-05-011-2/+3
| | | | | | | | This removes BinaryWithFlagsSDNode, and flags are now all passed by value. Differential Revision: https://reviews.llvm.org/D32527 llvm-svn: 301803
* AMDGPU: Add new amdgcn.init.exec intrinsicsMarek Olsak2017-04-281-0/+2
| | | | | | | | | | v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677
* [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and ↵Craig Topper2017-04-281-2/+1
| | | | | | | | | | | | simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620
* CodeGen: Add a hook for getFenceOperandTyYaxun Liu2017-04-241-0/+4
| | | | | | | | | | | | | | Currently the operand type for ATOMIC_FENCE assumes value type of a pointer in address space 0. This is fine for most targets. However for amdgcn target, the size of pointer in address space 0 depends on triple environment. For amdgiz environment, it is 64 bit but for other environment it is 32 bit. On the other hand, amdgcn target expects 32 bit fence operands independent of the target triple environment. Therefore a hook is need in target lowering for getting the fence operand type. This patch has no effect on targets other than amdgcn. Differential Revision: https://reviews.llvm.org/D32186 llvm-svn: 301215
* AMDGPU: Move trap lowering to DAGMatt Arsenault2017-04-241-0/+1
| | | | | | | | | | | Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206
* AMDGPU: Refactor argument loweringMatt Arsenault2017-04-111-2/+2
| | | | | | | Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
* AMDGPU: Remove llvm.SI.vs.load.inputMatt Arsenault2017-04-031-1/+0
| | | | llvm-svn: 299391
* AMDGPU: Remove legacy bfe intrinsicsMatt Arsenault2017-04-031-1/+0
| | | | llvm-svn: 299372
* AMDGPU: Remove unnecessary ands when f16 is legalMatt Arsenault2017-03-311-0/+3
| | | | | | | | | | Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
* [DAGCombiner] Add vector demanded elements support to ComputeNumSignBitsSimon Pilgrim2017-03-311-1/+2
| | | | | | | | | | | | | | Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements. This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1. I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course. Followup to D25691. Differential Revision: https://reviews.llvm.org/D31311 llvm-svn: 299219
* [DAGCombiner] Add vector demanded elements support to ↵Simon Pilgrim2017-03-311-0/+1
| | | | | | | | | | computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201
* [AMDGPU] Get address space mapping by target triple environmentYaxun Liu2017-03-271-0/+6
| | | | | | | | | | | | | | | | | | As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846
* AMDGPU: Implement f16 froundMatt Arsenault2017-03-241-1/+1
| | | | llvm-svn: 298730
* AMDGPU: Rename SI_RETURNMatt Arsenault2017-03-211-2/+11
| | | | | | | | This is used for a specific type of return to a shader part's epilog code. Rename to try avoiding confusion from a true call's return. llvm-svn: 298452
* AMDGPU: Cleanup control flow intrinsicsMatt Arsenault2017-03-171-0/+6
| | | | | | | | | | | | | | | | Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
* AMDGPU: Fix unnecessary ands when packing f16 vectorsMatt Arsenault2017-03-151-0/+4
| | | | | | | | | computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873
* AMDGPU : Replace FMAD with FMA when denormals are enabled.Wei Ding2017-02-241-0/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-221-0/+5
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-0/+5
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Use source modifiers with f16->f32 conversionsMatt Arsenault2017-02-021-0/+1
| | | | | | | | | | | The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857
* AMDGPU: Cleanup fmin/fmax legacy functionMatt Arsenault2017-02-011-1/+1
| | | | | | Use a more specific subtarget check and combine hasOneUse checks llvm-svn: 293726
* AMDGPU: Check nsz instead of unsafe mathMatt Arsenault2017-01-251-1/+1
| | | | llvm-svn: 293028
* AMDGPU/R600: Serialize vector trunc stores to private ASJan Vesely2017-01-201-0/+1
| | | | | | | | | | | Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651
* AMDGPU: Disable some fneg combines unless nszMatt Arsenault2017-01-191-0/+10
| | | | | | | | | | | | For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473
* AMDGPU: Fold fneg into faddMatt Arsenault2017-01-121-0/+1
| | | | | | Patch mostly by Fiona Glaser llvm-svn: 291731
* AMDGPU/SI: Implement sendmsghalt intrinsicJan Vesely2017-01-041-0/+1
| | | | | | | | v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977
OpenPOWER on IntegriCloud