summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIISelLowering.h
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Fix implementation of isCanonicalizedMatt Arsenault2018-08-061-0/+4
| | | | | | | | | | | | | | | If denormals are enabled, denormals are canonical. Also fix a few other issues. minnum/maxnum are supposed to canonicalize. Temporarily improve workaround for the instruction behavior change in gfx9. Handle selects and fcopysign. The tests were also largely broken, since they were checking for a flush used on some targets after the store of the result. llvm-svn: 339061
* [AMDGPU] Minor change to d16 buffer load implementationTim Renouf2018-08-021-1/+1
| | | | | | | | | | | | | Summary: By not reconstructing the operand list of the SDNode, this change makes it easier to add the forthcoming new tbuffer and buffer intrinsics. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49995 Change-Id: I0cb79ef0801532645d7dd954a6d7355139db7b38 llvm-svn: 338784
* AMDGPU: Stop wasting argument registers with v3i32/v3f32Matt Arsenault2018-07-281-0/+13
| | | | | | | | | | SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197
* [AMDGPU] [AMDGPU] Support a fdot2 pattern.Farhana Aleen2018-07-161-0/+1
| | | | | | | | | | | | | | | Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198
* AMDGPU: Refactor Subtarget classesTom Stellard2018-07-111-3/+3
| | | | | | | | | | | | | | | | | Summary: This is a follow-up to r335942. - Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget - Rename AMDGPUCommonSubtarget to AMDGPUSubtarget - Merge R600Subtarget::Generation and GCNSubtarget::Generation into AMDGPUSubtarget::Generation. Reviewers: arsenm, jvesely Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D49037 llvm-svn: 336851
* AMDGPU: Separate R600 and GCN TableGen filesTom Stellard2018-06-281-0/+3
| | | | | | | | | | | | | | | | | | | | | Summary: We now have two sets of generated TableGen files, one for R600 and one for GCN, so each sub-target now has its own tables of instructions, registers, ISel patterns, etc. This should help reduce compile time since each sub-target now only has to consider information that is specific to itself. This will also help prevent the R600 sub-target from slowing down new features for GCN, like disassembler support, GlobalISel, etc. Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46365 llvm-svn: 335942
* [AMDGPU] Convert rcp to rcp_iflagStanislav Mekhanoshin2018-06-271-0/+1
| | | | | | | | | | | If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742
* AMDGPU: Select MIMG instructions manually in SITargetLoweringNicolai Haehnle2018-06-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Having TableGen patterns for image intrinsics is hitting limitations: for D16 we already have to manually pre-lower the packing of data values, and we will have to do the same for A16 eventually. Since there is already some custom C++ code anyway, it is arguably easier to just do everything in C++, now that we can use the beefed-up generic tables backend of TableGen to provide all the required metadata and map intrinsics to corresponding opcodes. With this approach, all image intrinsic lowering happens in SITargetLowering::lowerImage. That code is dense due to all the cases that it handles, but it should still be easier to follow than what we had before, by virtue of it all being done in a single location, and by virtue of not relying on the TableGen pattern magic that very few people really understand. This means that we will have MachineSDNodes with MIMG instructions during DAG combining, but that seems alright: previously we had intrinsic nodes instead, but those are similarly opaque to the generic CodeGen infrastructure, and the final pattern matching just did a 1:1 translation to machine instructions anyway. If anything, the fact that we now merge the address words into a vector before DAG combine should be an advantage. Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6 Reviewers: arsenm, rampitec, rtaylor, tstellar Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48017 llvm-svn: 335228
* AMDGPU: Make v4i16/v4f16 legalMatt Arsenault2018-06-151-0/+3
| | | | | | | Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835
* AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLoweringTom Stellard2018-06-131-0/+3
| | | | | | | | | | | | | | | | | | Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 llvm-svn: 334607
* AMDGPU: Try a lot harder to emit scalar loadsMatt Arsenault2018-06-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case. The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types. When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out. There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG. Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged. Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this. The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types. llvm-svn: 334180
* AMDGPU: Use better alignment for kernarg loweringMatt Arsenault2018-05-301-1/+1
| | | | | | | | | | This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. llvm-svn: 333558
* AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMPTom Stellard2018-05-241-0/+1
| | | | | | | | | | | | | | | | Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153
* AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLoweringTom Stellard2018-05-221-0/+2
| | | | | | | | | | | | | | Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 llvm-svn: 333016
* AMDGPU: Make v2i16/v2f16 legal on VIMatt Arsenault2018-05-221-2/+4
| | | | | | | | | | | | This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953
* [AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume ↵Tony Tye2018-05-161-0/+1
| | | | | | | | | | execution. No longer require the queue pointer to be passed in in fixed SGPRs. Differential Revision: https://reviews.llvm.org/D46769 llvm-svn: 332485
* AMDGPU: Custom lower v4i16/v4f16 vector operationsMatt Arsenault2018-05-161-0/+1
| | | | | | | | | Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. llvm-svn: 332453
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-3/+3
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* AMDGPU/SI: Add d16 support for buffer intrinsics.Changpeng Fang2018-01-121-0/+4
| | | | | | | | | | Differential Revision: https://reviews.llvm.org/D38906 Reviewers: Matt and Brian. llvm-svn: 322402
* TLI: Allow using PSV for intrinsic mem operandsMatt Arsenault2017-12-141-0/+1
| | | | llvm-svn: 320756
* AMDGPU: Fix creating invalid copy when adjusting dmaskMatt Arsenault2017-12-041-1/+1
| | | | | | | | | Move the entire optimization to one place. Before it was possible to adjust dmask without changing the register class of the output instruction, since they were done in separate places. Fix all lane sizes and move all of the optimization into the DAG folding. llvm-svn: 319705
* AMDGPU: Don't use MUBUF vaddr if address may overflowMatt Arsenault2017-11-151-0/+6
| | | | | | | Effectively revert r263964. Before we would not allow this if vaddr was not known to be positive. llvm-svn: 318240
* AMDGPU: Fix multi-use shl/add combineMatt Arsenault2017-11-131-0/+1
| | | | | | | | | | | This was using a custom function that didn't handle the addressing modes properly for private. Use isLegalAddressingMode to avoid duplicating this. Additionally, skip the combine if there is only one use since the standard combine will handle it. llvm-svn: 318013
* AMDGPU: Implement hasBitPreservingFPLogicMatt Arsenault2017-10-131-0/+2
| | | | llvm-svn: 315754
* AMDGPU: Start selecting v_mad_mixhi_f16Matt Arsenault2017-09-201-0/+1
| | | | llvm-svn: 313814
* AMDGPU: Start adding tail call supportMatt Arsenault2017-08-111-0/+9
| | | | | | Handle the sibling call cases. llvm-svn: 310753
* AMDGPU: Pass special input registers to functionsMatt Arsenault2017-08-031-0/+13
| | | | llvm-svn: 309998
* AMDGPU: Initial implementation of callsMatt Arsenault2017-08-011-0/+15
| | | | | | | | | Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732
* AMDGPU: Teach isLegalAddressingMode about global_* instructionsMatt Arsenault2017-07-291-0/+1
| | | | | | | | Also refine the flat check to respect flat-for-global feature, and constant fallback should check global handling, not specifically MUBUF. llvm-svn: 309471
* AMDGPU: Annotate implicitarg.ptr usageMatt Arsenault2017-07-281-0/+1
| | | | | | | | | | | We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398
* TargetLowering: Change isShuffleMaskLegal's mask argument type to ↵Zvi Rackover2017-07-261-2/+1
| | | | | | | | | | | | | ArrayRef<int>. NFCI. Changing mask argument type from const SmallVectorImpl<int>& to ArrayRef<int>. This came up in D35700 where a mask is received as an ArrayRef<int> and we want to pass it to TargetLowering::isShuffleMaskLegal(). Also saves a few lines of code. llvm-svn: 309085
* [SystemZ, LoopStrengthReduce]Jonas Paulsson2017-07-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729
* AMDGPU: Figure out private memory regs after loweringMatt Arsenault2017-07-181-0/+2
| | | | | | | | | | | | | | | | | | Introduce pseudo-registers for registers needed for stack access, which are replaced during finalizeLowering. Note these pseudo-registers are currently only used for the used register location, and not for determining their input argument register. This is better because it avoids the need to try to predict whether a call will be emitted from the IR, and also detects stack objects introduced by legalization. Test changes are from the HasStackObjects check being more accurate since stack objects introduced during legalization are now known. llvm-svn: 308325
* Add DAG argument to canMergeStoresTo NFC.Nirav Dave2017-07-101-1/+2
| | | | llvm-svn: 307583
* [AMDGPU] Combine add and adde, sub and subeStanislav Mekhanoshin2017-06-211-0/+2
| | | | | | | | | If one of the arguments of adde/sube is zero we can fold another add/sub into it. Differential Revision: https://reviews.llvm.org/D34374 llvm-svn: 305964
* [AMDGPU] simplify add x, *ext (setcc) => addc|subb x, 0, setccStanislav Mekhanoshin2017-06-211-0/+1
| | | | | | | | | This simplification allows to avoid generating v_cndmask_b32 to serialize condition code between compare and use. Differential Revision: https://reviews.llvm.org/D34300 llvm-svn: 305962
* AMDGPU: Cleanup CreateLiveInRegisterMatt Arsenault2017-06-191-2/+0
| | | | llvm-svn: 305748
* [AMDGPU] Prevent too large store merges in AMDGPU Subtargets. NFCI.Nirav Dave2017-05-241-0/+2
| | | | | | | | | Various address spaces on the SI and R600 subtargets have stricter limits on memory access size that other address spaces. Use canMergeStoresTo predicate to prevent the DAGCombiner from creating these stores as they will be split up during legalization. llvm-svn: 303767
* AMDGPU: Start defining a calling conventionMatt Arsenault2017-05-171-1/+10
| | | | | | | | Partially implement callee-side for arguments and return values. byval doesn't work properly, and most likely sret or other on-stack return values most as well. llvm-svn: 303308
* AMDGPU: Pull fneg out of extract_vector_eltMatt Arsenault2017-05-111-0/+1
| | | | | | | This allows folding source modifiers in more f16 cases. Makes it easier to select per-component packed neg modifiers. llvm-svn: 302813
* AMDGPU: Fix invalid copies when copying i1 to phys regMatt Arsenault2017-04-121-1/+1
| | | | | | | Insert a VReg_1 virtual register so the i1 workaround pass can handle it. llvm-svn: 300113
* AMDGPU: Refactor argument loweringMatt Arsenault2017-04-111-5/+11
| | | | | | | Split into smaller functions and prepare for handling non-entry functions. llvm-svn: 299998
* AMDGPU/GFX9: Fix shared and private aperture queriesKonstantin Zhuravlyov2017-04-061-1/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D31786 llvm-svn: 299727
* AMDGPU: Remove unnecessary ands when f16 is legalMatt Arsenault2017-03-311-0/+1
| | | | | | | | | | Add a new node to act as a fancy bitcast from f16 operations to i32 that implicitly zero the high 16-bits of the result. Alternatively could try making v2f16 legal and canonicalizing on build_vectors. llvm-svn: 299246
* AMDGPU: Cleanup control flow intrinsicsMatt Arsenault2017-03-171-1/+1
| | | | | | | | | | | | | | | | Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119
* AMDGPU: Allow sinking of addressing modes for atomic_inc/decMatt Arsenault2017-03-151-2/+6
| | | | llvm-svn: 297913
* AMDGPU: Use v_med3_{f16|i16|u16}Matt Arsenault2017-02-271-0/+2
| | | | llvm-svn: 296401
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-221-0/+1
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* AMDGPU: Redefine clamp node as clamp 0.0-1.0Matt Arsenault2017-02-211-0/+3
| | | | | | | | | | | Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788
* AMDGPU: Custom lower more vector operationsMatt Arsenault2017-01-231-0/+5
| | | | | | This avoids stack usage. llvm-svn: 292846
OpenPOWER on IntegriCloud