summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] setcc (select cc, CT, CF), CF, eq | ne -> xor cc, -1 | ccStanislav Mekhanoshin2018-06-161-17/+43
| | | | | | | | | This is the common case in the BE when we serialize condition and then rematerialize it. Use either original or inverted condition. Differential Revision: https://reviews.llvm.org/D48246 llvm-svn: 334882
* AMDGPU: Add combine for short vector extract_vector_eltsMatt Arsenault2018-06-151-1/+42
| | | | | | | | | | Try to access pieces 4 bytes at a time. This helps various hasOneUse extract_vector_elt combines, such as load width reductions. Avoids test regressions in a future commit. llvm-svn: 334836
* AMDGPU: Make v4i16/v4f16 legalMatt Arsenault2018-06-158-92/+235
| | | | | | | Some image loads return these, and it's awkward working around them not being legal. llvm-svn: 334835
* [AMDGPU] Recognize x & ~(-1 << y) pattern.Roman Lebedev2018-06-151-0/+6
| | | | | | | | | | | | | | | | Summary: The same pattern as D48010, but this one is IR-canonical as of D47428. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48012 llvm-svn: 334817
* [AMDGPU] Recognize x & ((1 << y) - 1) pattern.Roman Lebedev2018-06-151-0/+7
| | | | | | | | | | | | | | | | | | | | | Summary: As a followup for D48007. Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern, which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`), i think also handling a pattern that is ub for `y == bitwidth` should be fine. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48010 llvm-svn: 334816
* [AMDGPU] Recognize x & (-1 >> (32 - y)) pattern.Roman Lebedev2018-06-151-0/+7
| | | | | | | | | | | | | | | | | | | | | Summary: D47980 will canonicalize the `x << (32 - y) >> (32 - y)`, which is the pattern the AMDGPU expects to `x & (-1 >> (32 - y))`, which is not recognized by AMDGPU. Thus, it needs to be recognized, too. Reviewers: nhaehnle, bogner, tstellar, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #amdgpu Differential Revision: https://reviews.llvm.org/D48007 llvm-svn: 334815
* AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtzTom Stellard2018-06-143-0/+49
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45907 llvm-svn: 334757
* AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMULTom Stellard2018-06-133-0/+16
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46171 llvm-svn: 334665
* [AMDGPU] Corrected computeKnownBits for V_PERM_B32Stanislav Mekhanoshin2018-06-131-7/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D48133 llvm-svn: 334640
* [AMDGPU] Change enqueue kernel handle typeYaxun Liu2018-06-131-1/+2
| | | | | | | | | | Currently the handle type is a global pointer which holds 8 bytes. We need a larger type which hold 16 bytes, therefore change it to [i64 x 2]. Differential Revision: https://reviews.llvm.org/D48094 llvm-svn: 334625
* [AMDGPU][MC] Enabled parsing of relocations on VALU instructionsDmitry Preobrazhensky2018-06-131-2/+2
| | | | | | | | | | See bug 37566: https://bugs.llvm.org/show_bug.cgi?id=37566 Reviewers: artem.tamazov, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D47884 llvm-svn: 334622
* [AMDGPU][MC][GFX8][GFX9] Allow LDS direct reads for BUFFER_LOAD_DWORDX2/X3/X4Dmitry Preobrazhensky2018-06-131-3/+19
| | | | | | | | | | See bug 37653: https://bugs.llvm.org/show_bug.cgi?id=37653 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D47885 llvm-svn: 334609
* AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLoweringTom Stellard2018-06-134-71/+69
| | | | | | | | | | | | | | | | | | Summary: The code that handles ISD:Register and ISD::CopyFromReg assumes the target is amdgcn, so this is broken on r600. We don't need this analysis on r600 anyway so we can safely move it to SITargetLowering. Reviewers: alex-t, arsenm, nhaehnle Reviewed By: arsenm Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46298 llvm-svn: 334607
* [AMDGPU] DAG combine to produce V_PERM_B32Stanislav Mekhanoshin2018-06-125-1/+214
| | | | | | Differential Revision: https://reviews.llvm.org/D48099 llvm-svn: 334559
* AMDHSA/NFC: Code object v3 updates (additional):Konstantin Zhuravlyov2018-06-122-13/+16
| | | | | | - Move section selection and alignment to AMDGPUAsmPrinter llvm-svn: 334521
* AMDHSA: Code object v3 updatesKonstantin Zhuravlyov2018-06-126-10/+184
| | | | | | | | | | | | | | | - Do not emit following assembler directives: - .hsa_code_object_version - .hsa_code_object_isa - .amd_amdgpu_isa - .amd_amdgpu_hsa_metadata - .amd_amdgpu_pal_metadata - Do not emit .note entries - Cleanup and bring in sync kernel descriptor header file - Emit kernel descriptor into .rodata with appropriate relocations and alignments llvm-svn: 334519
* [AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"'Mark Searles2018-06-121-2/+2
| | | | | | | | | The use iterator, used within findMaskOperands(), can return anything which is not a def. isUse() requires a register, so check isReg() before calling isUse(). Differential Revision: https://reviews.llvm.org/D48047 llvm-svn: 334459
* Simplify; NFCGeorge Burgess IV2018-06-111-1/+1
| | | | | | Not shown in the diff: AQ is a `vector<SUnit *>`, and SU is a `SUnit *` llvm-svn: 334451
* AMDGPU: Add 64-bit relative variant kindKonstantin Zhuravlyov2018-06-111-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D47601 llvm-svn: 334443
* [AMDGPU] Do not consider indirect acces through phi for wave limiterStanislav Mekhanoshin2018-06-111-6/+0
| | | | | | | | | | | Rational: if there is indirect access that is usually an issue because load is not ready by the use. However, if use is inside a loop and load is outside that is potentially an issue for a first iteration only. Differential Revision: https://reviews.llvm.org/D47740 llvm-svn: 334420
* [AMDGPU] Inline asm - added i16, half and i128 types supportDaniil Fukalov2018-06-081-16/+32
| | | | | | | | | | AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error. Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341, e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable Differential Revision: https://reviews.llvm.org/D44920 llvm-svn: 334301
* AMDGPU: Error on LDS global address in functionsMatt Arsenault2018-06-081-1/+9
| | | | | | | These won't work as expected now, so error on them to avoid wasting time debugging this in the future. llvm-svn: 334269
* [AMDGPU] Simplify memory legalizer (add missing virtual descructor)Tony Tye2018-06-081-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D47504 llvm-svn: 334257
* [AMDGPU] Simplify memory legalizerTony Tye2018-06-071-234/+707
| | | | | | | | | | - Make code easier to maintain. - Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM. - Add support to generate waitcnts for LDS and GDS memory. Differential Revision: https://reviews.llvm.org/D47504 llvm-svn: 334241
* AMDGPU: Fix not including v2f64 in SReg_128Matt Arsenault2018-06-071-2/+2
| | | | | | Fixes assertion with calls returning v2f64. llvm-svn: 334189
* AMDGPU: Use scalar operations for f16 fabs/fneg patternsMatt Arsenault2018-06-071-7/+7
| | | | | | Fixes unnecessary differences between subtargets. llvm-svn: 334184
* AMDGPU: Try a lot harder to emit scalar loadsMatt Arsenault2018-06-073-1/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has two main components. First, widen widen short constant loads in DAG when they have the correct alignment. This is already done a bit in AMDGPUCodeGenPrepare, since that has access to DivergenceAnalysis. This can't help kernarg loads created in the DAG. Start to use DAG divergence analysis to help this case. The second part is to avoid kernel argument lowering breaking the alignment of short vector elements because calling convention lowering wants to split everything into legal register types. When loading a split type, load the nearest 4-byte aligned segment and shift to get the desired bits. This extra load of the earlier argument piece ends up merging, and the bit extract hopefully folds out. There are a number of improvements and regressions with this, but I think as-is this is a better compromise between several of the worst parts of SelectionDAG. Particularly when i16 is legal, this produces worse code for i8 and i16 element vector kernel arguments. This is partially due to the very weak load merging the DAG does. It only looks for fairly specific combines between pairs of loads which no longer appear. In particular this causes v4i16 loads to be split into 2 components when previously the two halves were merged. Worse, because of the newly introduced shifts, there is a lot more unnecessary vector packing and unpacking code emitted. At least some of this is due to reporting false for isTypeDesirableForOp for i16 as a workaround for the lack of divergence information in the DAG. The cases where this happens it doesn't actually matter, but the relevant code in SimplifyDemandedBits doens't have the context to know to ignore this. The use of the scalar cache is probably more important than the mess of mostly scalar instructions doing this packing and unpacking. Future work can fix this, possibly by making better use of the new DAG divergence information for controlling promotion decisions, or adding another version of shift + trunc + shift combines that doesn't only know about the used types. llvm-svn: 334180
* [AMDGPU] Improve reciprocal handlingStanislav Mekhanoshin2018-06-061-7/+13
| | | | | | | | | | | | | | | | | | | | | | | When denormals are supported we are producing a full division for 1.0f / x. That still can be replaced by the faster version: bool c = fabs(x) > 0x1.0p+96f; float s = c ? 0x1.0p-32f : 1.0f; x *= s; return s * v_rcp_f32(x) in case if requested accuracy is 2.5ulp or less. The same version is used if denormals are not supported for non 1.0 numerators, where just v_rcp_f32 is then used for 1.0 numerator. The optimization of 1/x is extended to the case -1/x, which is the same except for the resulting sign bit. OpenCL conformance passed with both enabled and disabled denorms. Differential Revision: https://reviews.llvm.org/D47805 llvm-svn: 334142
* AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16Matt Arsenault2018-06-062-0/+30
| | | | | | | | | | | | Fixes terrible code on targets without f16 support. The legalization creates a mess that is difficult to recover from. Also should avoid randomly breaking these tests multiple times in sequence in future commits. Some regressions in cases where it happens to be better to pull the source modifier after the conversion. llvm-svn: 334132
* [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixupPeter Smith2018-06-061-3/+8
| | | | | | | | | | | | | | | | | | On targets like Arm some relaxations may only be performed when certain architectural features are available. As functions can be compiled with differing levels of architectural support we must make a judgement on whether we can relax based on the MCSubtargetInfo for the function. This change passes through the MCSubtargetInfo for the function to fixupNeedsRelaxation so that the decision on whether to relax can be made per function. In this patch, only the ARM backend makes use of this information. We must also pass the MCSubtargetInfo to applyFixup because some fixups skip error checking on the assumption that relaxation has occurred, to prevent code-generation errors applyFixup must see the same MCSubtargetInfo as fixupNeedsRelaxation. Differential Revision: https://reviews.llvm.org/D44928 llvm-svn: 334078
* AMDGPU: Preserve metadata when widening loadsMatt Arsenault2018-06-051-2/+23
| | | | | | | | Preserves the low bound of the !range. I don't think it's legal to do anything with the top half since it's theoretically reading garbage. llvm-svn: 334045
* AMDGPU: Use more custom insert/extract_vector_elt loweringMatt Arsenault2018-06-051-14/+31
| | | | | | Apply to i8 vectors. llvm-svn: 334044
* Move Analysis/Utils/Local.h back to TransformsDavid Blaikie2018-06-042-2/+2
| | | | | | | | | | Review feedback from r328165. Split out just the one function from the file that's used by Analysis. (As chandlerc pointed out, the original change only moved the header and not the implementation anyway - which was fine for the one function that was used (since it's a template/inlined in the header) but not in general) llvm-svn: 333954
* [AMDGPU] Small refactoring in the schedulerStanislav Mekhanoshin2018-06-041-18/+3
| | | | | | | | After last changes some code can be simplified. Differential Revision: https://reviews.llvm.org/D47661 llvm-svn: 333934
* [AMDGPU] Factored out common part of GCNRPTracker::reset()Stanislav Mekhanoshin2018-06-042-11/+17
| | | | | | Differential Revision: https://reviews.llvm.org/D47664 llvm-svn: 333931
* [AMDGPU][Waitcnt] Fix handling of flat instrsMark Searles2018-06-042-6/+14
| | | | | | | | On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order. Differential Revision: https://reviews.llvm.org/D46616 llvm-svn: 333926
* AMDGPU: Make various NamedOperands upper caseNicolai Haehnle2018-06-044-43/+43
| | | | | | | | | | | | | | | | Summary: Avoid name clashes with the corresponding bit fields in the instruction encoding. Change-Id: Id1644e703e976e78f7af93788d9f44cb48c3251f Reviewers: arsenm, rampitec, kzhuravl Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47433 llvm-svn: 333905
* TableGen: Streamline the semantics of NAMENicolai Haehnle2018-06-042-42/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The new rules are straightforward. The main rules to keep in mind are: 1. NAME is an implicit template argument of class and multiclass, and will be substituted by the name of the instantiating def/defm. 2. The name of a def/defm in a multiclass must contain a reference to NAME. If such a reference is not present, it is automatically prepended. And for some additional subtleties, consider these: 3. defm with no name generates a unique name but has no special behavior otherwise. 4. def with no name generates an anonymous record, whose name is unique but undefined. In particular, the name won't contain a reference to NAME. Keeping rules 1&2 in mind should allow a predictable behavior of name resolution that is simple to follow. The old "rules" were rather surprising: sometimes (but not always), NAME would correspond to the name of the toplevel defm. They were also plain bonkers when you pushed them to their limits, as the old version of the TableGen test case shows. Having NAME correspond to the name of the toplevel defm introduces "spooky action at a distance" and breaks composability: refactoring the upper layers of a hierarchy of nested multiclass instantiations can cause unexpected breakage by changing the value of NAME at a lower level of the hierarchy. The new rules don't suffer from this problem. Some existing .td files have to be adjusted because they ended up depending on the details of the old implementation. Change-Id: I694095231565b30f563e6fd0417b41ee01a12589 Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D47430 llvm-svn: 333900
* Set ADDE/ADDC/SUBE/SUBC to expand by defaultAmaury Sechet2018-06-012-7/+6
| | | | | | | | | | | | | | | Summary: They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while. Target that uses these opcodes are changed in order to ensure their behavior doesn't change. Reviewers: efriedma, craig.topper, dblaikie, bkramer Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D47422 llvm-svn: 333748
* AMDGPU/R600: Move intrinsics to IntrinsicsAMDGPU.tdTom Stellard2018-06-013-70/+6
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47487 llvm-svn: 333720
* [AMDGPU] Construct memory clauses before RAStanislav Mekhanoshin2018-05-315-0/+422
| | | | | | | | | | | | | | | | | | Memory clauses are formed into bundles in presence of xnack. Their source operands are marked as early-clobber. This allows to allocate distinct source and destination registers within a clause and prevent breaking the clause with s_nop in the hazard recognizer. Clauses are undone before post-RA scheduler to allow some rescheduling, which will not break the clause since artificial edges are created in the dag to keep memory operations together. Yet this allows a better ILP in some cases. Differential Revision: https://reviews.llvm.org/D47511 llvm-svn: 333691
* [GlobalISel][AMDGPU] LegalizerInfo verifier: Adding ↵Roman Tereshin2018-05-311-0/+1
| | | | | | | | | | | | LegalizerInfo::verify(...) call for AMDGPU Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333664
* [AMDGPU] Track occupancy in MFIStanislav Mekhanoshin2018-05-316-13/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep track of achieved occupancy in SIMachineFunctionInfo. At the moment we have a lot of duplicated or even missed code to query and maintain occupancy info. Record it in the MFI and query in a single call. Interfaces: - getOccupancy() - returns current recorded achieved occupancy. - getMinAllowedOccupancy() - returns lesser of the achieved occupancy and the lowest occupancy we are ready to tolerate. For example if a kernel is memory bound we are ready to tolerate 4 waves. - limitOccupancy() - record occupancy level if we have to lower it. - increaseOccupancy() - record occupancy if scheduler managed to increase the occupancy. MFI takes care of integrating different checks affecting occupancy, including LDS use and waves-per-eu attribute. Note that scheduler starts with not yet known register pressure, so has to record either limit or increase in occupancy after it is done. Later passes can just query a resulting value. New interface is used in the active scheduler and NFC wrt its work. Changes are also made to experimental schedulers to use it and record an occupancy after they are done. Before the change waves-per-eu was ignored by experimental schedulers and tolerance window for memory bound kernels was not used. Differential Revision: https://reviews.llvm.org/D47509 llvm-svn: 333629
* AMDGPU/R600: Make sure functions are cacheline alignedJan Vesely2018-05-311-0/+4
| | | | | | | | | | | v2: use "ensureAlignment" make functions cache line aligned Fixes GPU hangs since r333219: "AMDGPU: Split R600 AsmPrinter code into its own class" Differential Revision: https://reviews.llvm.org/D47516 llvm-svn: 333622
* AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTITom Stellard2018-05-304-42/+212
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47359 llvm-svn: 333605
* [AMDGPU][Waitcnt] Fix build error: unused variable 'SWaitInst'Mark Searles2018-05-301-6/+2
| | | | | | | | | | | | | | https://reviews.llvm.org/rL333556 caused a buildbot failure. See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(), The unused variable was for debugging purposes; removing that piece of code to fix the build. llvm-svn: 333559
* AMDGPU: Use better alignment for kernarg loweringMatt Arsenault2018-05-302-17/+22
| | | | | | | | | | This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. llvm-svn: 333558
* [AMDGPU][Waitcnt] Fix handling of loops with many bottom blocksMark Searles2018-05-301-19/+23
| | | | | | | | | | | | | In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence for a loop. Previously, that kicked if greater than 2 passes over a loop, which doesn't account for loop with many bottom blocks. So, increase the threshold to (n+1), where n is the number of bottom blocks. This gives the pass an opportunity to consider the contribution of each bottom block, to the overall loop, before the forced convergence potentially kicks in. Differential Revision: https://reviews.llvm.org/D47488 llvm-svn: 333556
* AMDGPU: Fix typo in option descriptionMatt Arsenault2018-05-291-1/+1
| | | | llvm-svn: 333457
* AMDGPU: Round up kernel argument allocation sizeMatt Arsenault2018-05-293-5/+13
| | | | | | | | | | AFAIK the driver's allocation will actually have to round this up anyway. It is useful to track the rounded up size, so that the end of the kernel segment is known to be dereferencable so a wider s_load_dword can be used for a short argument at the end of the segment. llvm-svn: 333456
OpenPOWER on IntegriCloud