summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA asKonstantin Zhuravlyov2018-05-291-1/+2
| | | | | | | | it is set by CP Differential Revision: https://reviews.llvm.org/D47392 llvm-svn: 333451
* AMDGPU: Pass function directly instead of MachineFunctionMatt Arsenault2018-05-298-36/+37
| | | | | | | These functions just query the underlying IR function, so pass it directly. llvm-svn: 333442
* AMDGPU: Add nuw to add off of kernarg ptrMatt Arsenault2018-05-291-2/+1
| | | | llvm-svn: 333441
* AMDGPU: Split R600 MCInst lowering into its own classTom Stellard2018-05-291-5/+29
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47307 llvm-svn: 333439
* [AMDGPU] Fixed build warningTim Renouf2018-05-291-4/+3
| | | | | | | | | | | | Summary: V2: Use cast instead of extra if. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47426 Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a llvm-svn: 333397
* [AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by ↵Farhana Aleen2018-05-281-5/+3
| | | | | | | | | | default. Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found to be resolved by some other fixes. Author: FarhanaAleen llvm-svn: 333380
* [AMDGPU] Fixed WWM bug in block otherwise entirely in WQMTim Renouf2018-05-271-0/+5
| | | | | | | | | | | | | | | | Summary: For a block with WQM on entry and exit and containing no exact mode code, but containing some WWM code, the WQM pass forgot to process the block at all and so did not insert code to enter and leave WWM. This commit fixes that. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47027 Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6 llvm-svn: 333362
* [AMDGPU][Waitcnt] Remove obsolete waitcnt optionMark Searles2018-05-251-6/+0
| | | | | | | | With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it. Differential Revision: https://reviews.llvm.org/D47378 llvm-svn: 333303
* [AMDGPU] Fixed test failure with AMDGPUPerfHintStanislav Mekhanoshin2018-05-251-8/+7
| | | | | | | We shall not keep iterator to a map while map is modified, this leads to a broken map. llvm-svn: 333298
* Fix -Winconsistent-missing-overrides in AMDGPU codeReid Kleckner2018-05-251-1/+1
| | | | llvm-svn: 333291
* [AMDGPU] Add perf hints to functionsStanislav Mekhanoshin2018-05-2510-6/+516
| | | | | | | | | | | | | | | This is adoption of HSAIL perfhint pass. Two types of hints are produced: 1. Function is memory bound. 2. Kernel can use wave limiter. Currently these hints are used in the scheduler. If a function is suspected to be memory bound we allow occupancy to decrease to 4 waves in the course of scheduling. Differential Revision: https://reviews.llvm.org/D46992 llvm-svn: 333289
* [AMDGPU] Fixed incorrect break from loopTim Renouf2018-05-251-2/+40
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Lower control flow did not correctly handle the case that a loop break in if/else was on a condition that was not guaranteed to be masked by exec. The first test kernel shows an example of this going wrong; after exiting the loop, exec is all ones, even if it was not before the loop. The fix is for lowering of if-break and else-break to insert an S_AND_B64 to mask the break condition with exec. This commit also includes the optimization of not inserting that S_AND_B64 if it is obviously not needed because the break condition is the result of a V_CMP in the same basic block. V2: Addressed some review comments. V3: Test fixes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44046 Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c llvm-svn: 333258
* AMDGPU: Remove AMDGPUMCInstLower.hTom Stellard2018-05-253-48/+23
| | | | | | | | | | | | | | | | Summary: The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp, so we don't need a header file. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47264 llvm-svn: 333254
* AMDGPU: Split R600 AsmPrinter code into its own classTom Stellard2018-05-246-161/+303
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D47245 llvm-svn: 333219
* AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMPTom Stellard2018-05-248-56/+28
| | | | | | | | | | | | | | | | Summary: We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp is gone. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47181 llvm-svn: 333153
* AMDGPU: Fix v2f16 fneg/fabs patternMatt Arsenault2018-05-221-0/+5
| | | | | | | | | | The integer operation convertion for some reason only happens if the source is a bitcast from an integer, which happens to always be the situation when the result is loaded. Add an additional pattern for when the source operation is really an FP operation. llvm-svn: 333019
* AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLoweringTom Stellard2018-05-224-13/+14
| | | | | | | | | | | | | | Summary: This is always false for R600. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47180 llvm-svn: 333016
* AMDGPU: Make v2i16/v2f16 legal on VIMatt Arsenault2018-05-227-293/+275
| | | | | | | | | | | | This usually results in better code. Fixes using inline asm with short2, and also fixes having a different ABI for function parameters between VI and gfx9. Partially cleans up the mess used for lowering of the d16 operations. Making v4f16 legal will help clean this up more, but this requires additional work. llvm-svn: 332953
* AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headersTom Stellard2018-05-2252-51/+115
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction and register defintions, which are huge so we only want to include them where needed. This will also make it easier if we want to split the R600 and GCN definitions into separate tablegenerated files. I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h because it uses some enums from the header to initialize default values for the SIMachineFunction class, so I ended up having to remove includes of SIMachineFunctionInfo.h from headers too. Reviewers: arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D46272 llvm-svn: 332930
* MC: Separate creating a generic object writer from creating a target object ↵Peter Collingbourne2018-05-213-12/+10
| | | | | | | | | | | | | writer. NFCI. With this we gain a little flexibility in how the generic object writer is created. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47045 llvm-svn: 332868
* [AMDGPU] Add divergence analysis as a dependency for ISelStanislav Mekhanoshin2018-05-211-0/+1
| | | | | | | | | | AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage but does not list it in pass dependencies which may lead to crash. Differential Revision: https://reviews.llvm.org/D47151 llvm-svn: 332862
* MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an ↵Peter Collingbourne2018-05-211-6/+5
| | | | | | | | | | | | | MCObjectWriter. NFCI. To make this work I needed to add an endianness field to MCAsmBackend so that writeNopData() implementations know which endianness to use. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47035 llvm-svn: 332857
* AMDGPU/GlobalISel: Address post-commit review comments for r332379Tom Stellard2018-05-211-1/+1
| | | | | | MCRegisterInfo::getPhysRegSize() will be deprecated. llvm-svn: 332856
* Fix MSVC unused variable warning. NFCI.Simon Pilgrim2018-05-191-5/+4
| | | | | | AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance. llvm-svn: 332807
* AMDGPU: Add pass to optimize reqd_work_group_sizeMatt Arsenault2018-05-184-0/+280
| | | | | | | | | | | Eliminate loads from the dispatch packet when they will have a known value. Also pattern match the code used by the library to handle partial workgroup dispatches, which isn't necessary if reqd_work_group_size is used. llvm-svn: 332771
* Support: Simplify endian stream interface. NFCI.Peter Collingbourne2018-05-181-2/+2
| | | | | | | | | | | | Provide some free functions to reduce verbosity of endian-writing a single value, and replace the endianness template parameter with a field. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47032 llvm-svn: 332757
* AMDGPU/NFC: Set symbol's type that is coming from an argument inKonstantin Zhuravlyov2018-05-181-1/+1
| | | | | | EmitAMDGPUSymbolType, instead of hard-coding it to STT_AMDGPU_HSA_KERNEL. llvm-svn: 332753
* MC: Change the streamer ctors to take an object writer instead of a stream. ↵Peter Collingbourne2018-05-183-13/+21
| | | | | | | | | | | | | | NFCI. The idea is that a client that wants split dwarf would create a specific kind of object writer that creates two files, and use it to create the streamer. Part of PR37466. Differential Revision: https://reviews.llvm.org/D47050 llvm-svn: 332749
* AMDGPU/SI: Don't promote alloca to vector for atomic load/storeChangpeng Fang2018-05-171-3/+5
| | | | | | | | | | | | | Summary: Don't promote alloca to vector for atomic load/store Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D46085 llvm-svn: 332673
* AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with ↵Changpeng Fang2018-05-171-0/+32
| | | | | | | | | | | | | | | | | | | infinite loops. Summary: The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working. In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops. This will make CFG with infinite loops be structurized. Reviewer: nhaehnle Differential Revision: https://reviews.llvm.org/D46340 llvm-svn: 332625
* AMDGPU : Recalculate SGPRs when trap handler is supportedKonstantin Zhuravlyov2018-05-162-6/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D29911 llvm-svn: 332523
* [AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume ↵Tony Tye2018-05-162-34/+34
| | | | | | | | | | execution. No longer require the queue pointer to be passed in in fixed SGPRs. Differential Revision: https://reviews.llvm.org/D46769 llvm-svn: 332485
* AMDGPU: Custom lower v4i16/v4f16 vector operationsMatt Arsenault2018-05-164-19/+124
| | | | | | | | | Avoids stack access. Also handle extract hi elt pattern from truncate + shift to avoid a couple test regressions. llvm-svn: 332453
* [AMDGPU] Fix handling of void types in isLegalAddressingModeStanislav Mekhanoshin2018-05-151-1/+1
| | | | | | | | | | | | | It is legal for the type passed to isLegalAddressingMode to be unsized or, more specifically, VoidTy. In this case, we must check the legality of load / stores for all legal types. Directly trying to call getTypeStoreSize is incorrect, and leads to breakage in e.g. Loop Strength Reduction. This change guards against that behaviour. Differential Revision: https://reviews.llvm.org/D40405 llvm-svn: 332409
* AMDGPU: Fix v_dot{4, 8}* instruction encodingKonstantin Zhuravlyov2018-05-152-8/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D46848 llvm-svn: 332387
* AMDGPU/GlobalISel: Implement select() for G_FCONSTANTTom Stellard2018-05-151-15/+47
| | | | | | | | | | | | Summary: Also clean up G_CONSTANT selection. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46170 llvm-svn: 332379
* AMDGPU: Add disasm tests for deep learning instructions + fix v_fmac_f32 disasmKonstantin Zhuravlyov2018-05-151-1/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D46853 llvm-svn: 332377
* Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen2018-05-1429-699/+649
| | | | | | | | | | | | | | | | The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
* AMDGPU: Rename OpenCL lowering pass to be R600 specific.Matt Arsenault2018-05-134-10/+11
| | | | | | | | | | | | | | | | | | This pass is a) broken. b) r600 specific. Fixing (a) is a bit more non-trivial, but fixing (b) is easy. Move this pass to being R600 only for now. This pass does pass all the unit tests, however clang no longer generates code that looks like the unit test input, so fixing the pass requires fixing the tests and the pass as one, and checking it works with clang still. Patch by Dave Airlie llvm-svn: 332196
* AMDGPU: Make undef legal for v2i16/v2f16Matt Arsenault2018-05-131-0/+3
| | | | | | | This is apparently necessary to stop undef from being turned into a build_vector of 0s. llvm-svn: 332195
* [AMDGPU] Fix amdgpu-waves-per-eu accounting in schedulerStanislav Mekhanoshin2018-05-122-3/+7
| | | | | | | | | | We cannot query this attribute from a subtarget given a machine function. At this point attribute itself is already unavailable and can only be obtained through MFI. Differential Revision: https://reviews.llvm.org/D46781 llvm-svn: 332166
* AMDGPU/GlobalISel: Implement select() for >32-bit G_STORETom Stellard2018-05-112-1/+28
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46153 llvm-svn: 332154
* AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction.Changpeng Fang2018-05-111-1/+0
| | | | | | | | | | | | | Summary: We have no logic to promote alloca to vector for an AddrSpaceCast instruction. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D45993 llvm-svn: 332147
* [AMDGPU] Fix compilation failure when IR contains comdatYaxun Liu2018-05-111-2/+0
| | | | | | | | | | | | | | | | | Remove a useless SwitchSection which also causes compilation failure when IR contains comdat. The SwitchSection is useless because the current section is already correct text section for the function therefore no need to switch. It causes compilation failure for comdat because functions with comdat has specific text section, not the default .text section. Since HIP uses comdat, this bug caused failures for HIP. Differential Revision: https://reviews.llvm.org/D46770 llvm-svn: 332137
* AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUITom Stellard2018-05-113-0/+18
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45883 llvm-svn: 332082
* AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16>Tom Stellard2018-05-102-0/+21
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45881 llvm-svn: 332042
* AMDGPU/GlobalISel: Enable TableGen'd instruction selectorTom Stellard2018-05-107-4/+132
| | | | | | | | | | | | Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45994 llvm-svn: 332039
* [AMDGPU] Support horizontal vectorization of min/max.Farhana Aleen2018-05-093-1/+26
| | | | | | | | | | | | Author: FarhanaAleen Reviewed By: rampitec Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D46604 llvm-svn: 331920
* AMDGPU: Ignore any_extend in mul24 combineMatt Arsenault2018-05-091-0/+11
| | | | | | | | | | If a multiply is truncated, SimplifyDemandedBits sometimes turns a zero_extend of the inputs into an any_extend, which makes the known bits computation unhelpful. Ignore these and compute known bits for the underlying value, since we insert the correct extend type after. llvm-svn: 331919
* AMDGPU: Handle partial shift reduction for variable shiftsMatt Arsenault2018-05-091-15/+22
| | | | | | | If the variable shift amount has known bits, we can still reduce the shift. llvm-svn: 331917
OpenPOWER on IntegriCloud