summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: RegBankSelect for carry-inMatt Arsenault2019-01-082-2/+33
| | | | | | | | I'm not sure we should be allowing the truncate to s1 for the inputs. It may be necessary to create a new VCC reg bank. llvm-svn: 350592
* AMDGPU/GlobalISel: RegBankSelect for add/sub with carry outMatt Arsenault2019-01-082-5/+21
| | | | llvm-svn: 350589
* AMDGPU/GlobalISel: InstrMapping for G_UNMERGE_VALUESMatt Arsenault2019-01-081-0/+12
| | | | llvm-svn: 350588
* [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes ↵Craig Topper2019-01-071-30/+52
| | | | | | | | | | | | | | a User and OpIdx. Stop using it in AMDGPU target for simplifyI24. As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions. Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node. This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input. Differential Revision: https://reviews.llvm.org/D56087 llvm-svn: 350560
* AMDGPU: test for uniformity of branch instruction, not its conditionRhys Perry2019-01-072-9/+3
| | | | | | | | | | | | | | | | | Summary: If a divergent branch instruction is marked as divergent by propagation rule 2 in DivergencePropagator::exploreSyncDependency() and its condition is uniform, that branch would incorrectly be assumed to be uniform. Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56331 llvm-svn: 350532
* AMDGPU: Remove v16i8 from register classesMatt Arsenault2019-01-071-3/+3
| | | | llvm-svn: 350518
* AMDGPU: Remove VS/SV mappings from selectMatt Arsenault2019-01-071-16/+0
| | | | | | These would violate the constant bus restriction llvm-svn: 350517
* [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.Alexander Timofeev2019-01-031-3/+7
| | | | | | | | | | | | | | | | | | | | | | Detailed description: SIFoldOperands::foldInstOperand iterates over the operand uses calling the function that changes def-use iteratorson the way. As a result loop exits immediately when def-use iterator is changed. Hence, the operand is folded to the very first use instruction only. This makes VGPR live along the whole basic block and increases register pressure significantly. The performance drop observed in SHOC DeviceMemory test is caused by this bug. Proposed fix: collect uses to separate container for further processing in another loop. Testing: make check-llvm SHOC performance test. Reviewers: rampitec, ronlieb Differential Revision: https://reviews.llvm.org/D56161 llvm-svn: 350350
* [AMDGPU] Change section name with metadata accessPiotr Sobczak2019-01-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Summary: The commit rL348922 introduced a means to set Metadata section kind for a global variable, if its explicit section name was prefixed with ".AMDGPU.metadata.". This patch changes that prefix to ".AMDGPU.comment.", as "metadata" in the section name might lead to ambiguity with metadata used by AMD PAL runtime. Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668 Reviewers: kzhuravl Reviewed By: kzhuravl Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D56197 llvm-svn: 350292
* [AMDGPU] Handle OR as operand of raw load/storePiotr Sobczak2019-01-021-4/+6
| | | | | | | | | | | | | | | | | | Summary: Use isBaseWithConstantOffset() which handles OR as an operand to llvm.amdgcn.raw.buffer.load and llvm.amdgcn.raw.buffer.store. Change-Id: Ifefb9dc5ded8710d333df07ab1900b230e33539a Reviewers: nhaehnle, mareko, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55999 llvm-svn: 350208
* AMDGPU: Don't peel of the offset if the resulting base could possibly be ↵Changpeng Fang2018-12-211-3/+7
| | | | | | | | | | | | | | | | | | | negative in Indirect addressing. Summary: Don't peel of the offset if the resulting base could possibly be negative in Indirect addressing. This is because the M0 field is of unsigned. This patch achieves the similar goal as https://reviews.llvm.org/D55241, but keeps the optimization if the base is known unsigned. Reviewers: arsemn Differential Revision: https://reviews.llvm.org/D55568 llvm-svn: 349951
* [AMDGPU] Always use the version of computeKnownBits that returns a value. NFCI.Simon Pilgrim2018-12-211-14/+7
| | | | | | Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349912
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.wqm.voteMatt Arsenault2018-12-211-0/+6
| | | | llvm-svn: 349882
* AMDGPU/GlobalISel: RegBankSelect for some fp opsMatt Arsenault2018-12-212-0/+11
| | | | llvm-svn: 349880
* AMDGPU/GlobalISel: Redo legality for build_vectorMatt Arsenault2018-12-211-10/+38
| | | | | | | | | | It seems better to avoid using the callback if possible since there are coverage assertions which are disabled if this is used. Also fix missing tests. Only test the legal cases since it seems legalization for build_vector is quite lacking. llvm-svn: 349878
* AMDGPU: Make i1/i64/v2i32 and/or/xor legalMatt Arsenault2018-12-202-6/+19
| | | | | | | The 64-bit types do depend on the register bank, but that's another issue to deal with later. llvm-svn: 349716
* AMDGPU/GlobalISel: Fix ValueMapping tables for i1Matt Arsenault2018-12-202-26/+41
| | | | | | | This was incorrectly selecting SGPR for any i1 values, e.g. G_TRUNC to i1 from a VGPR was still an SGPR. llvm-svn: 349715
* AMDGPU/GlobalISel: RegBankSelect for fp conversionsMatt Arsenault2018-12-202-0/+9
| | | | llvm-svn: 349709
* AMDGPU/GlobalISel: Legality/regbankselect for atomicrmw/atomic_cmpxchgMatt Arsenault2018-12-203-0/+41
| | | | llvm-svn: 349708
* AMDGPU: Add patterns for v4i16/v4f16 -> v4i16/v4f16 bitcastsRhys Perry2018-12-191-0/+2
| | | | | | | | | | | | Reviewers: arsenm, tstellar Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55058 llvm-svn: 349694
* AMDGPU: Use an ABS32_LO relocation for SCRATCH_RSRC_DWORD1Nicolai Haehnle2018-12-191-4/+2
| | | | | | | | | | | | | | | | | | | Summary: Using HI here makes no logical sense, since the dword is only 32 bits to begin with. Current Mesa master does not look at the relocation type at all, so this change is fine. Future Mesa will rely on this, however. Change-Id: I91085707834c4ac0370926602b93c94b90e44cb1 Reviewers: arsenm, rampitec, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55369 llvm-svn: 349620
* AMDGPU/InsertWaitcnts: Update VGPR/SGPR bounds when brackets are mergedCarl Ritson2018-12-191-0/+3
| | | | | | | | | | | | | | | | | | Summary: Fix an issue where VGPR/SGPR bounds are not properly extended when brackets are merged. This manifests as missing waitcnt insertions when multiple brackets are forwarded to a successor block and the first forward has lower VGPR/SGPR bounds. Irreducible loop test has been extended based on a CTS failure detected for GFX9. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D55602 llvm-svn: 349611
* AMDGPU/GlobalISel: Regbankselect for fsubMatt Arsenault2018-12-191-0/+1
| | | | llvm-svn: 349608
* [AMDGPU] Removed the unnecessary operand size-check-assert from ↵Farhana Aleen2018-12-181-2/+0
| | | | | | | | | | processBaseWithConstOffset(). Summary: 32bit operand sizes are guaranteed by the opcode check AMDGPU::V_ADD_I32_e64 and AMDGPU::V_ADDC_U32_e64. Therefore, we don't any additional operand size-check-assert. Author: FarhanaAleen llvm-svn: 349529
* AMDGPU: Legalize/regbankselect frame_indexMatt Arsenault2018-12-182-0/+3
| | | | llvm-svn: 349468
* AMDGPU: Legalize/regbankselect fmaMatt Arsenault2018-12-182-1/+2
| | | | llvm-svn: 349467
* AMDGPU/GlobalISel: Legalize/regbankselect fneg/fabs/fsubMatt Arsenault2018-12-182-2/+10
| | | | llvm-svn: 349463
* Fix -Wunused-variable warning. NFCI.Simon Pilgrim2018-12-151-0/+4
| | | | llvm-svn: 349265
* [SILoadStoreOptimizer] Use std::abs to avoid truncation.Florian Hahn2018-12-151-2/+2
| | | | | | | | | | | | | | Using regular abs() causes the following warning error: absolute value function 'abs' given an argument of type 'int64_t' (aka 'long') but has parameter of type 'int' which may cause truncation of value [-Werror,-Wabsolute-value] (uint32_t)abs(Dist) > MaxDist) { ^ lib/Target/AMDGPU/SILoadStoreOptimizer.cpp:1369:19: note: use function 'std::abs' instead which causes a bot to fail: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/18284/steps/bootstrap%20clang/logs/stdio llvm-svn: 349224
* [AMDGPU] Promote constant offset to the immediate by finding a new base with ↵Farhana Aleen2018-12-142-1/+362
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 13bit constant offset from the nearby instructions. Summary: Promote constant offset to immediate by recomputing the relative 13bit offset from nearby instructions. E.g. s_movk_i32 s0, 0x1800 v_add_co_u32_e32 v0, vcc, s0, v2 v_addc_co_u32_e32 v1, vcc, 0, v6, vcc s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[0:1], off => s_movk_i32 s0, 0x1000 v_add_co_u32_e32 v5, vcc, s0, v2 v_addc_co_u32_e32 v6, vcc, 0, v6, vcc global_load_dwordx2 v[5:6], v[5:6], off global_load_dwordx2 v[0:1], v[5:6], off offset:2048 Author: FarhanaAleen Reviewed By: arsenm, rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D55539 llvm-svn: 349196
* Revert r348971: [AMDGPU] Support for "uniform-work-group-size" attributeAakanksha Patil2018-12-132-65/+7
| | | | | | This patch breaks RADV (and probably RadeonSI as well) llvm-svn: 349084
* AMDGPU/GlobalISel: Legalize/regbankselect block_addrMatt Arsenault2018-12-132-1/+6
| | | | llvm-svn: 349081
* AMDGPU/GlobalISel: Legalize f64 fadd/fmulMatt Arsenault2018-12-131-3/+3
| | | | llvm-svn: 349014
* AMDGPU/GlobalISel: RegBankSelect some simple operationsMatt Arsenault2018-12-132-2/+29
| | | | llvm-svn: 349012
* [AMDGPU] Fix build failure, second attemptStanislav Mekhanoshin2018-12-131-1/+1
| | | | | | | Some compilers complain that variable is captured and some complain when it is not. Switch to [&]. llvm-svn: 349006
* [AMDGPU] Fix build failureStanislav Mekhanoshin2018-12-131-1/+1
| | | | | | | Fixed error 'lambda capture 'CondReg' is not required to be captured for this use'. llvm-svn: 349005
* [AMDGPU] Simplify negated conditionStanislav Mekhanoshin2018-12-133-0/+187
| | | | | | | | | | | | | | | | | | | Optimize sequence: %sel = V_CNDMASK_B32_e64 0, 1, %cc %cmp = V_CMP_NE_U32 1, %1 $vcc = S_AND_B64 $exec, %cmp S_CBRANCH_VCC[N]Z => $vcc = S_ANDN2_B64 $exec, %cc S_CBRANCH_VCC[N]Z It is the negation pattern inserted by DAGCombiner::visitBRCOND() in the rebuildSetCC(). Differential Revision: https://reviews.llvm.org/D55402 llvm-svn: 349003
* [AMDGPU] Support for "uniform-work-group-size" attributeAakanksha Patil2018-12-122-7/+65
| | | | | | | | Updated the annotate-kernel-features pass to support the propagation of uniform-work-group attribute from the kernel to the called functions. Once this pass is run, all kernels, even the ones which initially did not have the attribute, will be able to indicate weather or not they have uniform work group size depending on the value of the attribute. Differential Revision: https://reviews.llvm.org/D50200 llvm-svn: 348971
* [AMDGPU] Emit MessagePack HSA Metadata for v3 code objectScott Linder2018-12-1210-142/+831
| | | | | | | | | Continue to present HSA metadata as YAML in ASM and when output by tools (e.g. llvm-readobj), but encode it in Messagepack in the code object. Differential Revision: https://reviews.llvm.org/D48179 llvm-svn: 348963
* [AMDGPU] Extend the SI Load/Store optimizer to combine more things.Neil Henning2018-12-124-238/+544
| | | | | | | | | | I've extended the load/store optimizer to be able to produce dwordx3 loads and stores, This change allows many more load/stores to be combined, and results in much more optimal code for our hardware. Differential Revision: https://reviews.llvm.org/D54042 llvm-svn: 348937
* [AMDGPU] Set metadata access for explicit sectionPiotr Sobczak2018-12-122-0/+12
| | | | | | | | | | | | | | | | | | | Summary: This patch provides a means to set Metadata section kind for a global variable, if its explicit section name is prefixed with ".AMDGPU.metadata." This could be useful to make the global variable go to an ELF section without any section flags set. Reviewers: dstuttard, tpr, kzhuravl, nhaehnle, t-tye Reviewed By: dstuttard, kzhuravl Subscribers: llvm-commits, arsenm, jvesely, wdng, yaxunl, t-tye Differential Revision: https://reviews.llvm.org/D55267 llvm-svn: 348922
* [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes.Amara Emerson2018-12-101-0/+8
| | | | | | | | | | | | This patch restricts the capability of G_MERGE_VALUES, and uses the new G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places. This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32> and <2 x s64> vectors. Differential Revisions: https://reviews.llvm.org/D53629 llvm-svn: 348788
* [AMDGPU] Change the l1 flush instruction for AMDPAL/MESA3D.Neil Henning2018-12-101-1/+7
| | | | | | | | | | This commit changes which l1 flush instruction is used for AMDPAL and MESA3d workloads to flush the entire l1 cache instead of just the volatile lines. Differential Revision: https://reviews.llvm.org/D55367 llvm-svn: 348771
* [AMDGPU] Add new Mode Register pass - minor fixTim Corringham2018-12-101-1/+1
| | | | | | | Trivial change to add parentheses to an expression to avoid a sanitizer error in SIModeRegister.cpp, which was committed earlier. llvm-svn: 348767
* [AMDGPU] Add new Mode Register passTim Corringham2018-12-1011-11/+487
| | | | | | | | | | | | | | | A new pass to manage the Mode register. Currently this just manages the floating point double precision rounding requirements, but is intended to be easily extended to encompass all Mode register settings. The immediate motivation comes from the requirement to use the round-to-zero rounding mode for the 16 bit interpolation instructions, where the rounding mode setting is shared between 16 and 64 bit operations. llvm-svn: 348754
* [AMDGPU] Fix discarded result of addAttributeBrian Gesiak2018-12-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: `llvm::AttributeList` and `llvm::AttributeSet` are immutable, and so methods defined on these classes, such as `addAttribute`, return a new immutable object with the attribute added. In https://reviews.llvm.org/D55217 I attempted to annotate methods such as `addAttribute` with `LLVM_NODISCARD`, since calling these methods has no side-effects, and so ignoring the result that is returned is almost certainly a programmer error. However, committing the change resulted in new warnings in the AMDGPU target. The AMDGPU simplify libcalls pass added in https://reviews.llvm.org/D36436 attempts to add the readonly and nounwind attributes to simplified library functions, but instead calls the `addAttribute` methods and ignores the result. Modify the simplify libcalls pass to actually add the nounwind and readonly attributes. Also update the simplify libcalls test to assert that these attributes are actually being set. Reviewers: rampitec, vpykhtin, rnk Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D55435 llvm-svn: 348732
* AMDGPU: Fix offsets for < 4-byte aggregate kernel argumentsMatt Arsenault2018-12-071-4/+7
| | | | | | | | We were still using the rounded down offset and alignment even though they aren't handled because you can't trivially bitcast the loaded value. llvm-svn: 348658
* Fix unused variable warning. NFCI.Simon Pilgrim2018-12-071-2/+2
| | | | llvm-svn: 348649
* AMDGPU: Allow f32 types for llvm.amdgcn.s.buffer.loadMatt Arsenault2018-12-072-5/+12
| | | | llvm-svn: 348625
* AMDGPU: Remove llvm.SI.tbuffer.storeMatt Arsenault2018-12-072-67/+0
| | | | llvm-svn: 348619
OpenPOWER on IntegriCloud