summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16Matt Arsenault2018-06-062-0/+30
| | | | | | | | | | | | Fixes terrible code on targets without f16 support. The legalization creates a mess that is difficult to recover from. Also should avoid randomly breaking these tests multiple times in sequence in future commits. Some regressions in cases where it happens to be better to pull the source modifier after the conversion. llvm-svn: 334132
* [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixupPeter Smith2018-06-061-3/+8
| | | | | | | | | | | | | | | | | | On targets like Arm some relaxations may only be performed when certain architectural features are available. As functions can be compiled with differing levels of architectural support we must make a judgement on whether we can relax based on the MCSubtargetInfo for the function. This change passes through the MCSubtargetInfo for the function to fixupNeedsRelaxation so that the decision on whether to relax can be made per function. In this patch, only the ARM backend makes use of this information. We must also pass the MCSubtargetInfo to applyFixup because some fixups skip error checking on the assumption that relaxation has occurred, to prevent code-generation errors applyFixup must see the same MCSubtargetInfo as fixupNeedsRelaxation. Differential Revision: https://reviews.llvm.org/D44928 llvm-svn: 334078
* AMDGPU: Preserve metadata when widening loadsMatt Arsenault2018-06-051-2/+23
| | | | | | | | Preserves the low bound of the !range. I don't think it's legal to do anything with the top half since it's theoretically reading garbage. llvm-svn: 334045
* AMDGPU: Use more custom insert/extract_vector_elt loweringMatt Arsenault2018-06-051-14/+31
| | | | | | Apply to i8 vectors. llvm-svn: 334044
* Move Analysis/Utils/Local.h back to TransformsDavid Blaikie2018-06-042-2/+2
| | | | | | | | | | Review feedback from r328165. Split out just the one function from the file that's used by Analysis. (As chandlerc pointed out, the original change only moved the header and not the implementation anyway - which was fine for the one function that was used (since it's a template/inlined in the header) but not in general) llvm-svn: 333954
* [AMDGPU] Small refactoring in the schedulerStanislav Mekhanoshin2018-06-041-18/+3
| | | | | | | | After last changes some code can be simplified. Differential Revision: https://reviews.llvm.org/D47661 llvm-svn: 333934
* [AMDGPU] Factored out common part of GCNRPTracker::reset()Stanislav Mekhanoshin2018-06-042-11/+17
| | | | | | Differential Revision: https://reviews.llvm.org/D47664 llvm-svn: 333931
* [AMDGPU][Waitcnt] Fix handling of flat instrsMark Searles2018-06-042-6/+14
| | | | | | | | On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order. Differential Revision: https://reviews.llvm.org/D46616 llvm-svn: 333926
* AMDGPU: Make various NamedOperands upper caseNicolai Haehnle2018-06-044-43/+43
| | | | | | | | | | | | | | | | Summary: Avoid name clashes with the corresponding bit fields in the instruction encoding. Change-Id: Id1644e703e976e78f7af93788d9f44cb48c3251f Reviewers: arsenm, rampitec, kzhuravl Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47433 llvm-svn: 333905
* TableGen: Streamline the semantics of NAMENicolai Haehnle2018-06-042-42/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The new rules are straightforward. The main rules to keep in mind are: 1. NAME is an implicit template argument of class and multiclass, and will be substituted by the name of the instantiating def/defm. 2. The name of a def/defm in a multiclass must contain a reference to NAME. If such a reference is not present, it is automatically prepended. And for some additional subtleties, consider these: 3. defm with no name generates a unique name but has no special behavior otherwise. 4. def with no name generates an anonymous record, whose name is unique but undefined. In particular, the name won't contain a reference to NAME. Keeping rules 1&2 in mind should allow a predictable behavior of name resolution that is simple to follow. The old "rules" were rather surprising: sometimes (but not always), NAME would correspond to the name of the toplevel defm. They were also plain bonkers when you pushed them to their limits, as the old version of the TableGen test case shows. Having NAME correspond to the name of the toplevel defm introduces "spooky action at a distance" and breaks composability: refactoring the upper layers of a hierarchy of nested multiclass instantiations can cause unexpected breakage by changing the value of NAME at a lower level of the hierarchy. The new rules don't suffer from this problem. Some existing .td files have to be adjusted because they ended up depending on the details of the old implementation. Change-Id: I694095231565b30f563e6fd0417b41ee01a12589 Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D47430 llvm-svn: 333900
* Set ADDE/ADDC/SUBE/SUBC to expand by defaultAmaury Sechet2018-06-012-7/+6
| | | | | | | | | | | | | | | Summary: They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while. Target that uses these opcodes are changed in order to ensure their behavior doesn't change. Reviewers: efriedma, craig.topper, dblaikie, bkramer Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D47422 llvm-svn: 333748
* AMDGPU/R600: Move intrinsics to IntrinsicsAMDGPU.tdTom Stellard2018-06-013-70/+6
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle, jvesely Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47487 llvm-svn: 333720
* [AMDGPU] Construct memory clauses before RAStanislav Mekhanoshin2018-05-315-0/+422
| | | | | | | | | | | | | | | | | | Memory clauses are formed into bundles in presence of xnack. Their source operands are marked as early-clobber. This allows to allocate distinct source and destination registers within a clause and prevent breaking the clause with s_nop in the hazard recognizer. Clauses are undone before post-RA scheduler to allow some rescheduling, which will not break the clause since artificial edges are created in the dag to keep memory operations together. Yet this allows a better ILP in some cases. Differential Revision: https://reviews.llvm.org/D47511 llvm-svn: 333691
* [GlobalISel][AMDGPU] LegalizerInfo verifier: Adding ↵Roman Tereshin2018-05-311-0/+1
| | | | | | | | | | | | LegalizerInfo::verify(...) call for AMDGPU Reviewers: aemerson, qcolombet Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D46339 llvm-svn: 333664
* [AMDGPU] Track occupancy in MFIStanislav Mekhanoshin2018-05-316-13/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep track of achieved occupancy in SIMachineFunctionInfo. At the moment we have a lot of duplicated or even missed code to query and maintain occupancy info. Record it in the MFI and query in a single call. Interfaces: - getOccupancy() - returns current recorded achieved occupancy. - getMinAllowedOccupancy() - returns lesser of the achieved occupancy and the lowest occupancy we are ready to tolerate. For example if a kernel is memory bound we are ready to tolerate 4 waves. - limitOccupancy() - record occupancy level if we have to lower it. - increaseOccupancy() - record occupancy if scheduler managed to increase the occupancy. MFI takes care of integrating different checks affecting occupancy, including LDS use and waves-per-eu attribute. Note that scheduler starts with not yet known register pressure, so has to record either limit or increase in occupancy after it is done. Later passes can just query a resulting value. New interface is used in the active scheduler and NFC wrt its work. Changes are also made to experimental schedulers to use it and record an occupancy after they are done. Before the change waves-per-eu was ignored by experimental schedulers and tolerance window for memory bound kernels was not used. Differential Revision: https://reviews.llvm.org/D47509 llvm-svn: 333629
* AMDGPU/R600: Make sure functions are cacheline alignedJan Vesely2018-05-311-0/+4
| | | | | | | | | | | v2: use "ensureAlignment" make functions cache line aligned Fixes GPU hangs since r333219: "AMDGPU: Split R600 AsmPrinter code into its own class" Differential Revision: https://reviews.llvm.org/D47516 llvm-svn: 333622
* AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTITom Stellard2018-05-304-42/+212
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47359 llvm-svn: 333605
* [AMDGPU][Waitcnt] Fix build error: unused variable 'SWaitInst'Mark Searles2018-05-301-6/+2
| | | | | | | | | | | | | | https://reviews.llvm.org/rL333556 caused a buildbot failure. See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable] auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(), The unused variable was for debugging purposes; removing that piece of code to fix the build. llvm-svn: 333559
* AMDGPU: Use better alignment for kernarg loweringMatt Arsenault2018-05-302-17/+22
| | | | | | | | | | This was just emitting loads with the ABI alignment for the raw type. The true alignment is often better, especially when an illegal vector type was scalarized. The better alignment allows using a scalar load more often. llvm-svn: 333558
* [AMDGPU][Waitcnt] Fix handling of loops with many bottom blocksMark Searles2018-05-301-19/+23
| | | | | | | | | | | | | In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence for a loop. Previously, that kicked if greater than 2 passes over a loop, which doesn't account for loop with many bottom blocks. So, increase the threshold to (n+1), where n is the number of bottom blocks. This gives the pass an opportunity to consider the contribution of each bottom block, to the overall loop, before the forced convergence potentially kicks in. Differential Revision: https://reviews.llvm.org/D47488 llvm-svn: 333556
* AMDGPU: Fix typo in option descriptionMatt Arsenault2018-05-291-1/+1
| | | | llvm-svn: 333457
* AMDGPU: Round up kernel argument allocation sizeMatt Arsenault2018-05-293-5/+13
| | | | | | | | | | AFAIK the driver's allocation will actually have to round this up anyway. It is useful to track the rounded up size, so that the end of the kernel segment is known to be dereferencable so a wider s_load_dword can be used for a short argument at the end of the segment. llvm-svn: 333456
* AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA asKonstantin Zhuravlyov2018-05-291-1/+2
| | | | | | | | it is set by CP Differential Revision: https://reviews.llvm.org/D47392 llvm-svn: 333451
* AMDGPU: Pass function directly instead of MachineFunctionMatt Arsenault2018-05-298-36/+37
| | | | | | | These functions just query the underlying IR function, so pass it directly. llvm-svn: 333442
* AMDGPU: Add nuw to add off of kernarg ptrMatt Arsenault2018-05-291-2/+1
| | | | llvm-svn: 333441
* AMDGPU: Split R600 MCInst lowering into its own classTom Stellard2018-05-291-5/+29
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47307 llvm-svn: 333439
* [AMDGPU] Fixed build warningTim Renouf2018-05-291-4/+3
| | | | | | | | | | | | Summary: V2: Use cast instead of extra if. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47426 Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a llvm-svn: 333397
* [AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by ↵Farhana Aleen2018-05-281-5/+3
| | | | | | | | | | default. Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found to be resolved by some other fixes. Author: FarhanaAleen llvm-svn: 333380
* [AMDGPU] Fixed WWM bug in block otherwise entirely in WQMTim Renouf2018-05-271-0/+5
| | | | | | | | | | | | | | | | Summary: For a block with WQM on entry and exit and containing no exact mode code, but containing some WWM code, the WQM pass forgot to process the block at all and so did not insert code to enter and leave WWM. This commit fixes that. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47027 Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6 llvm-svn: 333362
* [AMDGPU][Waitcnt] Remove obsolete waitcnt optionMark Searles2018-05-251-6/+0
| | | | | | | | With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it. Differential Revision: https://reviews.llvm.org/D47378 llvm-svn: 333303
* [AMDGPU] Fixed test failure with AMDGPUPerfHintStanislav Mekhanoshin2018-05-251-8/+7
| | | | | | | We shall not keep iterator to a map while map is modified, this leads to a broken map. llvm-svn: 333298
* Fix -Winconsistent-missing-overrides in AMDGPU codeReid Kleckner2018-05-251-1/+1
| | | | llvm-svn: 333291
* [AMDGPU] Add perf hints to functionsStanislav Mekhanoshin2018-05-2510-6/+516
| | | | | | | | | | | | | | | This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289
* [AMDGPU] Fixed incorrect break from loopTim Renouf2018-05-251-2/+40
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Lower control flow did not correctly handle the case that a loop break in if/else was on a condition that was not guaranteed to be masked by exec. The first test kernel shows an example of this going wrong; after exiting the loop, exec is all ones, even if it was not before the loop. The fix is for lowering of if-break and else-break to insert an S_AND_B64 to mask the break condition with exec. This commit also includes the optimization of not inserting that S_AND_B64 if it is obviously not needed because the break condition is the result of a V_CMP in the same basic block. V2: Addressed some review comments. V3: Test fixes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44046 Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c llvm-svn: 333258
* AMDGPU: Remove AMDGPUMCInstLower.hTom Stellard2018-05-253-48/+23
| | | | | | | | | | | | | | | | Summary: The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp, so we don't need a header file. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47264 llvm-svn: 333254
* AMDGPU: Split R600 AsmPrinter code into its own classTom Stellard2018-05-246-161/+303
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47245 llvm-svn: 333219
* AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMPTom Stellard2018-05-248-56/+28
| | | | | | | | | | | | | | | | Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153
* AMDGPU: Fix v2f16 fneg/fabs patternMatt Arsenault2018-05-221-0/+5
| | | | | | | | | | The integer operation convertion for some reason only happens if the source is a bitcast from an integer, which happens to always be the situation when the result is loaded. Add an additional pattern for when the source operation is really an FP operation. llvm-svn: 333019
* AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLoweringTom Stellard2018-05-224-13/+14
| | | | | | | | | | | | | | Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 llvm-svn: 333016
* AMDGPU: Make v2i16/v2f16 legal on VIMatt Arsenault2018-05-227-293/+275
| | | | | | | | | | | | This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-2252-51/+115
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* MC: Separate creating a generic object writer from creating a target object ↵Peter Collingbourne2018-05-213-12/+10
| | | | | | | | | | | | | writer. NFCI. With this we gain a little flexibility in how the generic object writer is created. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47045 llvm-svn: 332868
* [AMDGPU] Add divergence analysis as a dependency for ISelStanislav Mekhanoshin2018-05-211-0/+1
| | | | | | | | | | AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage but does not list it in pass dependencies which may lead to crash. Differential Revision: https://reviews.llvm.org/D47151 llvm-svn: 332862
* MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an ↵Peter Collingbourne2018-05-211-6/+5
| | | | | | | | | | | | | MCObjectWriter. NFCI. To make this work I needed to add an endianness field to MCAsmBackend so that writeNopData() implementations know which endianness to use. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47035 llvm-svn: 332857
* AMDGPU/GlobalISel: Address post-commit review comments for r332379Tom Stellard2018-05-211-1/+1
| | | | | | MCRegisterInfo::getPhysRegSize() will be deprecated. llvm-svn: 332856
* Fix MSVC unused variable warning. NFCI.Simon Pilgrim2018-05-191-5/+4
| | | | | | AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance. llvm-svn: 332807
* AMDGPU: Add pass to optimize reqd_work_group_sizeMatt Arsenault2018-05-184-0/+280
| | | | | | | | | | | Eliminate loads from the dispatch packet when they will have a known value. Also pattern match the code used by the library to handle partial workgroup dispatches, which isn't necessary if reqd_work_group_size is used. llvm-svn: 332771
* Support: Simplify endian stream interface. NFCI.Peter Collingbourne2018-05-181-2/+2
| | | | | | | | | | | | Provide some free functions to reduce verbosity of endian-writing a single value, and replace the endianness template parameter with a field. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47032 llvm-svn: 332757
* AMDGPU/NFC: Set symbol's type that is coming from an argument inKonstantin Zhuravlyov2018-05-181-1/+1
| | | | | | EmitAMDGPUSymbolType, instead of hard-coding it to STT_AMDGPU_HSA_KERNEL. llvm-svn: 332753
* MC: Change the streamer ctors to take an object writer instead of a stream. ↵Peter Collingbourne2018-05-183-13/+21
| | | | | | | | | | | | | | NFCI. The idea is that a client that wants split dwarf would create a specific kind of object writer that creates two files, and use it to create the streamer. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47050 llvm-svn: 332749
OpenPOWER on IntegriCloud