summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQMConnor Abbott2017-08-041-0/+5
| | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 llvm-svn: 310085
* AMDGPU: Remove pointless assertsMatt Arsenault2017-08-041-3/+0
| | | | llvm-svn: 310007
* AMDGPU: Don't use report_fatal_error for unsupported call typesMatt Arsenault2017-08-031-6/+16
| | | | llvm-svn: 310004
* AMDGPU: Remove error on calls for amdgcnMatt Arsenault2017-08-031-5/+0
| | | | | | | | Repurpose the -amdgpu-function-calls flag. Rather than require it to emit a call, only use it to run the always inline path or not. llvm-svn: 310003
* AMDGPU: Fix implicitarg.ptr handling special inputsMatt Arsenault2017-08-031-3/+18
| | | | llvm-svn: 310002
* AMDGPU: Pass special input registers to functionsMatt Arsenault2017-08-031-45/+250
| | | | llvm-svn: 309998
* AMDGPU: Analyze callee resource usage in AsmPrinterMatt Arsenault2017-08-021-3/+16
| | | | llvm-svn: 309781
* AMDGPU: Don't place arguments in emergency stack slotMatt Arsenault2017-08-021-1/+9
| | | | | | | | When finding the fixed offsets for function arguments, this needs to skip over the 4 bytes reserved for the emergency stack slot. llvm-svn: 309776
* AMDGPU: Fix handling of div_scale with undef inputsMatt Arsenault2017-08-011-1/+55
| | | | | | | | | | | | The src0 register must match src1 or src2, but if these were undefined they could end up using different implicit_defed virtual registers. Force these to use one undef vreg or pick the defined other register. Also fixes producing invalid nodes without the right number of inputs when src2 is undef. llvm-svn: 309743
* AMDGPU: Initial implementation of callsMatt Arsenault2017-08-011-7/+399
| | | | | | | | | Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732
* AMDGPU: Teach isLegalAddressingMode about global_* instructionsMatt Arsenault2017-07-291-16/+24
| | | | | | | | Also refine the flat check to respect flat-for-global feature, and constant fallback should check global handling, not specifically MUBUF. llvm-svn: 309471
* AMDGPU: Annotate implicitarg.ptr usageMatt Arsenault2017-07-281-2/+10
| | | | | | | | | | | We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398
* TargetLowering: Change isShuffleMaskLegal's mask argument type to ↵Zvi Rackover2017-07-261-2/+1
| | | | | | | | | | | | | ArrayRef<int>. NFCI. Changing mask argument type from const SmallVectorImpl<int>& to ArrayRef<int>. This came up in D35700 where a mask is received as an ArrayRef<int> and we want to pass it to TargetLowering::isShuffleMaskLegal(). Also saves a few lines of code. llvm-svn: 309085
* [SystemZ, LoopStrengthReduce]Jonas Paulsson2017-07-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729
* AMDGPU: Figure out private memory regs after loweringMatt Arsenault2017-07-181-23/+42
| | | | | | | | | | | | | | | | | | Introduce pseudo-registers for registers needed for stack access, which are replaced during finalizeLowering. Note these pseudo-registers are currently only used for the used register location, and not for determining their input argument register. This is better because it avoids the need to try to predict whether a call will be emitted from the IR, and also detects stack objects introduced by legalization. Test changes are from the HasStackObjects check being more accurate since stack objects introduced during legalization are now known. llvm-svn: 308325
* AMDGPU: Return correct type during argument loweringMatt Arsenault2017-07-151-0/+27
| | | | | | | | | | | | | | | | | | | The type needs to be casted back to the original argument type. Fixes an assert that for some reason is only run when using -debug. Includes an additional combine to avoid test regressions from having conversions mixed with multiple Assert[SZ]ext nodes. On subtargets where i16 is legal, this was producing an i32 register with an i16 AssertZExt, truncated to i16 with another i8 AssertZExt. t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0 t3: i16 = truncate t2 t5: i16 = AssertZext t3, ValueType:ch:i8 t6: i8 = truncate t5 t7: i32 = zero_extend t6 llvm-svn: 308082
* [AMDGPU] fcaninicalize optimization for GFX9+Stanislav Mekhanoshin2017-07-131-8/+19
| | | | | | | | | | | | | | Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that is possible to further optimize fcanonicalize and remove it if applied to min/max given their operands are known not to be an sNaN or that sNaNs are not supported. Additionally we can remove fcanonicalize if denorms are supported for the VT and we know that its argument is never a NaN. Differential Revision: https://reviews.llvm.org/D35335 llvm-svn: 307976
* [AMDGPU] fcanonicalize elimination optimizationStanislav Mekhanoshin2017-07-121-9/+86
| | | | | | | | | | | | We are using multiplication by 1.0 to flush denormals and quiet sNaNs. That is possible to omit this multiplication if source of the fcanonicalize instruction is known to be flushed/quieted, i.e. if it comes from another instruction known to do the normalization and we are using IEEE mode to quiet sNaNs. Differential Revision: https://reviews.llvm.org/D35218 llvm-svn: 307848
* Add DAG argument to canMergeStoresTo NFC.Nirav Dave2017-07-101-1/+2
| | | | llvm-svn: 307583
* [AMDGPU] Fix -Wimplicit-fallthrough warning. NFCI.Simon Pilgrim2017-07-081-6/+2
| | | | llvm-svn: 307485
* [AMDGPU] Always use rcp + mul with fast mathStanislav Mekhanoshin2017-07-061-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308
* [Constants] If we already have a ConstantInt*, prefer to use ↵Craig Topper2017-07-061-1/+1
| | | | | | | | isZero/isOne/isMinusOne instead of isNullValue/isOneValue/isAllOnesValue inherited from Constant. NFCI Going through the Constant methods requires redetermining that the Constant is a ConstantInt and then calling isZero/isOne/isMinusOne. llvm-svn: 307292
* [AMDGPU] Simplify setcc (sext from i1 b), -1|0, ccStanislav Mekhanoshin2017-06-271-1/+29
| | | | | | | | | | | Depending on the compare code that can be either an argument of sext or negate of it. This helps to avoid v_cndmask_b64 instruction for sext. A reversed value can be further simplified and folded into its parent comparison if possible. Differential Revision: https://reviews.llvm.org/D34545 llvm-svn: 306446
* [AMDGPU] Combine and x, (sext cc from i1) => select cc, x, 0Stanislav Mekhanoshin2017-06-271-2/+28
| | | | | | | | | | Also factored out function to check if a boolean is an already deserialized value which does not require v_cndmask_b32 to be loaded. Added binary logical operators to its check. Differential Revision: https://reviews.llvm.org/D34500 llvm-svn: 306439
* AMDGPU: Whitespace fixesMatt Arsenault2017-06-261-1/+1
| | | | llvm-svn: 306265
* AMDGPU: Partially fix implicit.buffer.ptr intrinsic handlingMatt Arsenault2017-06-261-5/+9
| | | | | | | | | | | | | | This should not be treated as a different version of private_segment_buffer. These are distinct things with different uses and register classes, and requires the function argument info to have more context about the function's type and environment. Also add missing test coverage for the intrinsic, and emit an error for HSA. This also encovers that the intrinsic is broken unless there happen to be stack objects. llvm-svn: 306264
* [AMDGPU] Add intrinsics for tbuffer load and store - build error fixDavid Stuttard2017-06-221-2/+1
| | | | | | | Variable was unused in non-debug build (used in assert) causing compile time warning and eventual build failure llvm-svn: 306034
* [AMDGPU] Add intrinsics for tbuffer load and storeDavid Stuttard2017-06-221-30/+96
| | | | | | | | | | | | | | | Intrinsic already existed for llvm.SI.tbuffer.store Needed tbuffer.load and also re-implementing the intrinsic as llvm.amdgcn.tbuffer.* Added CodeGen tests for the 2 new variants added. Left the original llvm.SI.tbuffer.store implementation to avoid issues with existing code Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr Differential Revision: https://reviews.llvm.org/D30687 llvm-svn: 306031
* [AMDGPU] Add FP_CLASS to the add/setcc combineStanislav Mekhanoshin2017-06-211-1/+3
| | | | | | | | This is one of the nodes which also compile as v_cmp_*. Differential Revision: https://reviews.llvm.org/D34485 llvm-svn: 305970
* [AMDGPU] Combine add and adde, sub and subeStanislav Mekhanoshin2017-06-211-9/+79
| | | | | | | | | If one of the arguments of adde/sube is zero we can fold another add/sub into it. Differential Revision: https://reviews.llvm.org/D34374 llvm-svn: 305964
* [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setccStanislav Mekhanoshin2017-06-211-0/+39
| | | | | | | | | This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use. Differential Revision: https://reviews.llvm.org/D34300 llvm-svn: 305962
* AMDGPU: Cleanup CreateLiveInRegisterMatt Arsenault2017-06-191-9/+0
| | | | llvm-svn: 305748
* AMDGPU: Teach isLegalAddressingMode about flat offsetsMatt Arsenault2017-06-121-3/+11
| | | | | | | Also fix reporting r+r as a valid addressing mode without offsets. llvm-svn: 305203
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [llvm] Remove double semicolonsMandeep Singh Grang2017-06-061-1/+1
| | | | | | | | | | | | Reviewers: craig.topper, arsenm, mehdi_amini Reviewed By: mehdi_amini Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33924 llvm-svn: 304767
* AMDGPUAnnotateUniformValue should always treat volatile loads as divergentAlexander Timofeev2017-06-021-1/+1
| | | | llvm-svn: 304554
* [AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI.Nirav Dave2017-05-241-0/+12
| | | | | | | | | Various address spaces on the SI and R600 subtargets have stricter limits on memory access size that other address spaces. Use canMergeStoresTo predicate to prevent the DAGCombiner from creating these stores as they will be split up during legalization. llvm-svn: 303767
* [AMDGPU] Combine and (srl) into shl (bfe)Stanislav Mekhanoshin2017-05-231-6/+34
| | | | | | | | | | | | | | | | | | | Perform DAG combine: and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb Where nb is a number of trailing zeroes in mask. It replaces two instructions with two and BFE is generally a more expensive one. However this is only done if we are selecting a byte or word at an aligned boundary which results in a proper SDWA operand pattern. It is only done if SDWA is supported. TODO: improve SDWA pass to actually convert this pattern. It is not done now because we have an immediate in the instruction, which has be moved into a VGPR. Differential Revision: https://reviews.llvm.org/D33455 llvm-svn: 303681
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-15/+150
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: Make better use of op_sel with high componentsMatt Arsenault2017-05-171-0/+9
| | | | | | Handle more general swizzles. llvm-svn: 303296
* AMDGPU: Fix min3/max3 combines for f16/i16Matt Arsenault2017-05-171-1/+2
| | | | | | Fix missing instruction definitions for min3/max3. llvm-svn: 303284
* [AMDGPU] Placate unused variable warning in release builds.Davide Italiano2017-05-111-0/+1
| | | | llvm-svn: 302821
* AMDGPU: Pull fneg out of extract_vector_eltMatt Arsenault2017-05-111-0/+21
| | | | | | | This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813
* AMDGPU: GFX9 GS and HS shaders always have the scratch wave offset in SGPR5Marek Olsak2017-05-041-3/+11
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D32645 llvm-svn: 302200
* Generalize the specialized flag-carrying SDNodes by moving flags into SDNode.Amara Emerson2017-05-011-9/+8
| | | | | | | | This removes BinaryWithFlagsSDNode, and flags are now all passed by value. Differential Revision: https://reviews.llvm.org/D32527 llvm-svn: 301803
* AMDGPU: Add new amdgcn.init.exec intrinsicsMarek Olsak2017-04-281-0/+65
| | | | | | | | | | v2: More tests, bug fixes, cosmetic changes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D31762 llvm-svn: 301677
* [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and ↵Craig Topper2017-04-281-2/+3
| | | | | | | | | | | | simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620
* Move size and alignment information of regclass to TargetRegisterInfoKrzysztof Parzyszek2017-04-241-10/+11
| | | | | | | | | | | | | | | 1. RegisterClass::getSize() is split into two functions: - TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const; - TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const; 2. RegisterClass::getAlignment() is replaced by: - TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const; This will allow making those values depend on subtarget features in the future. Differential Revision: https://reviews.llvm.org/D31783 llvm-svn: 301221
* AMDGPU: Move trap lowering to DAGMatt Arsenault2017-04-241-46/+57
| | | | | | | | | | | Fixes traps in any block besides the entry block, and fixes depending on a live-in physical register by using a virtual register copy. Also happens to stop emitting a nop in the case debug trap is not supported. llvm-svn: 301206
* AMDGPU: Do not lower fast unsafe div for safe, f32, with fp32 denormalsKonstantin Zhuravlyov2017-04-211-2/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D32085 llvm-svn: 301023
OpenPOWER on IntegriCloud