summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Avoid folding 2 constant operands into an SALU operationDavid Stuttard2019-12-041-0/+71
| | | | | | | | | | | | | | | Summary: Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2 constant operands into the same SALU operation, with neither operand able to be encoded as an inline constant. Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70896
* AMDGPU/GlobalISel: Add AGPR bank and RegBankSelect mfma intrinsicsAustin Kerbow2019-12-011-0/+943
| | | | Differential Revision: https://reviews.llvm.org/D70871
* AMDGPU: Reuse carry out register during FI eliminationAustin Kerbow2019-11-281-0/+83
| | | | | | | | | | | | | | | | | | | Summary: Pre gfx9 we need to scavenge a 64-bit SGPR to use as the carry out for an Add. If only one SGPR was available this crashed when trying to scavenge another 32bit SGPR to materialize the offset. Instead, reuse a 32-bit SGPR from the carry out as the offset register. Also prefer to use vcc for the unused carry out when it is available. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70614
* AMDGPU: Fix lit test checks with dag optionDavid Stuttard2019-11-281-2/+24
| | | | | | | | | | | | | | Summary: I was seeing some failures on a test with slightly different instruction ordering. Adding in some DAG directives solved the issue. Change-Id: If5a3d3969055fb19279943bd45161bb70a3dabce Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70531
* Revert "Revert "As a follow-up to my initial mail to llvm-dev here's a first ↵Eric Christopher2019-11-261-134/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | pass at the O1 described there."" This reapplies: 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4 Original commit message: As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410 This reverts commit c9ddb02659e3ece7a0d9d6b4dac7ceea4ae46e6d.
* [AMDGPU] Fix emitIfBreak CF lowering: use temp reg to make register ↵vpykhtin2019-11-267-48/+40
| | | | | | coalescer life easier. Differential revision: https://reviews.llvm.org/D70405
* Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at ↵Muhammad Omair Javaid2019-11-261-134/+134
| | | | | | | | | | | | | | | | | | | | | | | the O1 described there." This reverts commit 8ff85ed905a7306977d07a5cd67ab4d5a56fafb4. This commit introduced 9 new failures on lldb buildbot host at http://lab.llvm.org:8014/builders/lldb-aarch64-ubuntu Following tests were failing: lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq1/TestAmbiguousTailCallSeq1.py lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py lldb-api :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py lldb-api :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py lldb-api :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py lldb-api :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py lldb-api :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py lldb-api :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py lldb-api :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py lldb-api :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410
* As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 ↵Eric Christopher2019-11-251-134/+134
| | | | | | | | | | | | | | | | | | | | | described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410
* AMDGPU: Handle waitcnt overflowAustin Kerbow2019-11-231-0/+172
| | | | | | | | | | | | | | | | | | | | | | Summary: The waitcnt pass can overflow the counters when the number of outstanding events for a type exceed the capacity of the counter. This can lead to inefficient insertion of waitcnts, or to waitcnt instructions with max values for each type. The last situation can cause an instruction which when disassembled appears to be an illegal waitcnt without an operand. In these cases we should add a wait for the 'counter maximum' - 1, and update the waitcnt brackets accordingly. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70418
* [AMDGPU] Add attribute for target loop unroll threshold defaultTim Corringham2019-11-211-0/+52
| | | | | | | | | | | | | | Summary: Add a function attribute to allow the target specific default loop unroll threshold to be specified on a per-function basis. This allows a front-end to give guidance where it has insight that is not available to the back-end, while still allowing the target specific heuristics to also have an effect. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68873
* [AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/storesPiotr Sobczak2019-11-201-0/+1559
| | | | | | | | | | | | | | Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69794
* [AMDGPU] Fixed mfma test check. NFC.Stanislav Mekhanoshin2019-11-201-69/+9
|
* [AMDGPU] Keep consistent check of legal addressing mode.Michael Liao2019-11-202-1/+281
| | | | | | | | | | | | | | Summary: - Add test cases for GFX10, which has narrower offset range compared to GFX9. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70473
* [AMDGPU] add support for hostcall buffer pointer as hidden kernel argumentSameer Sahasrabuddhe2019-11-207-0/+244
| | | | | | | | | | | Hostcall is a service that allows a kernel to submit requests to the host using shared buffers, and block until a response is received. This will eventually replace the shared buffer currently used for printf, and repurposes the same hidden kernel argument. This change introduces a new ValueKind in the HSA metadata to represent the hostcall buffer. Differential Revision: https://reviews.llvm.org/D70038
* AMDGPU/GlobalISel: Legalize FDIV64Austin Kerbow2019-11-191-41/+468
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70403
* AMDGPU: Refactor treatment of denormal modeMatt Arsenault2019-11-191-12/+12
| | | | | | | | | | | Start moving towards treating this as a property of the calling convention, and not the subtarget. The default denormal mode should not be part of the subtarget, and be moved into a separate function attribute. This patch is still NFC. The denormal mode remains as a subtarget feature for now, but make the necessary changes to switch to using an attribute.
* AMDGPU: Be explicit about denormal mode in MIR testsMatt Arsenault2019-11-197-863/+1949
| | | | | | | Start checking the machine function in GlobalISel instead of the target directly. This temporarily breaks fcanonicalize selection in GlobalISel.
* [AMDGPU] Tune inlining parameters for AMDGPU target (part 2)dfukalov2019-11-191-0/+7
| | | | | | | | | | | | | | | | | Summary: Most of IR instructions got better code size estimations after commit 47a5c36b. So default parameters values should be updated to improve inlining and unrolling for the target. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70391
* [AMDGPU] Lower llvm.amdgcn.s.buffer.load.v3[i|f]32Piotr Sobczak2019-11-151-35/+125
| | | | | | | | | | Summary: Add lowering support for 32-bit vec3 variant of s.buffer.load intrinsic. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70118
* AMDGPU: Change boolean content type to 0 or 1Matt Arsenault2019-11-155-13/+18
| | | | | | | | The usage of target boolean checks is overly inflexible, since sext and zext of a compare are equally cheap. The choice is arbitrary, but using 0/1 to some degree is the choice of lower resistance since that's what most targets use. This enables a few combines that don't bother to support ZeroOrNegativeOneBooleanContent.
* AMDGPU: Try to commute sub of boolean extMatt Arsenault2019-11-151-0/+43
| | | | Avoids another regression in a future patch.
* GlobalISel: Lower s1 source G_SITOFP/G_UITOFPMatt Arsenault2019-11-156-492/+162
|
* [AMDGPU] Fixed dpp test. NFC.Stanislav Mekhanoshin2019-11-131-1/+1
|
* [AMDGPU] Fixed mfma-loop test. NFC.Stanislav Mekhanoshin2019-11-131-26/+111
|
* AMDGPU: Extend add x, (ext setcc) combine to subMatt Arsenault2019-11-131-0/+74
| | | | | | This is the same as the add case, but inverts the operation type. This avoids regressions in a future patch.
* AMDGPU: Switch backend default max workgroup size to 1024Matt Arsenault2019-11-1311-17/+57
| | | | | | | | | | | | | Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.
* AMDGPU Reduce reported maximum group size to 1024Matt Arsenault2019-11-133-21/+22
| | | | | While some targets allow encoding 2048, this was never tested or supported.
* MCP: Fixed bug with dest overlapping copy sourceTim Renouf2019-11-121-0/+27
| | | | | | | | | | | | In MachineCopyPropagation, when propagating the source of a copy into the operand of a later instruction, bail if a destination overlaps (partly defines) the copy source. If the instruction where the substitution is happening is also a copy, allowing the propagation confuses the tracking mechanism. Differential Revision: https://reviews.llvm.org/D69953 Change-Id: Ic570754f878f2d91a4a50a9bdcf96fbaa240726d
* AMDGPU: Select global atomicrmw faddMatt Arsenault2019-11-061-0/+29
| | | | This only works if there is no use of the return value.
* [AMDGPU] Add handling of 160 bit registers in analyzeResourceUsageStanislav Mekhanoshin2019-11-061-0/+28
| | | | | | This was omitted. Also SReg_96Reg missed IsSGPR assignment. Differential Revision: https://reviews.llvm.org/D69919
* [GISel][ArtifactCombiner] Relax the constraint to combine unmerge with ↵Quentin Colombet2019-11-062-47/+78
| | | | | | | | | | | | | | | concat_vectors The combine G_UNMERGE_VALUES with G_CONCAT_VECTORS used to only be performed when the result type of the G_UNMERGE_VALUES was a vector type. In other words, we were expecting that the G_UNMERGE_VALUES was effectively the exact opposite of the G_CONCAT_VECTORS. Lift that constraint by allowing any G_UNMERGE_VALUES to be combined with any G_CONCAT_VECTORS (as long as the size of the different pieces that we merge/unmerge match). Differential Revision: https://reviews.llvm.org/D69288
* [globalisel] Rename G_GEP to G_PTR_ADDDaniel Sanders2019-11-0532-6783/+6783
| | | | | | | | | | | | | | | | Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734
* [amdgpu] Fix known bits compuation on `MUL_I24`/`MUL_U24`.Michael Liao2019-11-011-0/+11
| | | | | | | | | | Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D69735
* AMDGPU: Disallow spill folding with m0 copiesMatt Arsenault2019-10-301-0/+58
| | | | | | | | | | readlane and writelane instructions are not allowed to use m0 as the data operand, so spilling them is tricky and would require an intermediate SGPR to spill it. Constrain the virtual register class in this caes to disallow the inline spiller from folding the m0 operand directly into the spill instruction. I copied this hack from AArch64 which has the same problem for $sp.
* AMDGPU: Don't fold S_NOPs with implicit operandsMatt Arsenault2019-10-301-0/+137
|
* [AMDGPU] Simplify VCCZ bug handlingJay Foad2019-10-301-2/+4
| | | | | | | | | | | | | | | | | | | | Summary: VCCZBugHandledSet was used to make sure we don't apply the same workaround more than once to a single cbranch instruction, but it's not necessary because the workaround involves inserting an s_waitcnt instruction, which is enough for subsequent iterations to detect that no further workaround is necessary. Also beef up the test case to check that the workaround was only applied once. I have also manually verified that the test still passes even if I hack the big do-while loop in runOnMachineFunction to run a minimum of five iterations. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69621
* [SelectionDAG] Add support for FP_ROUND in WidenVectorOperand.Jay Foad2019-10-301-0/+10
| | | | | | | | | | | | Summary: This is used on AMDGPU for rounding from v3f64 (which is illegal) to v3f32 (which is legal). Subscribers: jvesely, nhaehnle, tpr, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69339
* AMDGPU/GlobalISel: Legalize FDIV32Austin Kerbow2019-10-291-73/+961
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69581
* AMDGPU: Make VReg_1 only include 1 artificial registerMatt Arsenault2019-10-281-12/+11
| | | | | | | | | | | | | | | | | | | | When TableGen is inferring register classes from contexts, it uses a sorting function based on the number of registers in the class. Since this was being treated as an alias of VGPR_32, they had exactly the same size. The sort used wasn't a stable sort, and even if it were, I believe the tie breaker would effectively end up being the alphabetical ordering of the class name. There appear to be issues trying to use an empty set of registers, so add only one so this will always sort to the end. Also add a comment explaining how VReg_1 is a dirty hack for SelectionDAG. This does end up changing the behavior of i1 with inline asm and VGPR constraints, but the existing behavior was was already nonsensical and inconsistent. It should probably be disallowed anyway. Fixes bug 43699
* AMDGPU: Avoid overwriting saved PCAustin Kerbow2019-10-281-0/+53
| | | | | | | | | | | | | | | | Summary: An outstanding load with same destination sgpr as call could cause PC to be updated with junk value on return. Reviewers: arsenm, rampitec Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69474
* [AMDGPU] Fix Vreg_1 PHI lowering in SILowerI1Copies.cdevadas2019-10-261-0/+140
| | | | | | | | | | | | | | | There is a minor flaw in the implementation of function lowerPhis. This function replaces values of regclass Vreg_1 (boolean values) involved in PHIs into an SGPR. Currently it iterates over the MBBs and performs an inplace lowering of PHIs and fails to lower any incoming value that itself is another PHI of Vreg_1 regclass. The failure occurs only when the MBB where the incoming PHI value belongs is not visited/lowered yet. To fix this problem, collect all Vreg_1 PHIs upfront and then perform the lowering. Differential Revision: https://reviews.llvm.org/D69182
* [SDAG] fold extract_vector_elt with undef indexSanjay Patel2019-10-251-4/+0
| | | | | | | | | | | This makes the DAG behavior consistent with IR's extractelement after: rGb32e4664a715 https://bugs.llvm.org/show_bug.cgi?id=42689 I've tried to maintain test intent for WebAssembly. The AMDGPU test is trying to test for crashing or other bad behavior, but I'm not sure if that's possible after this change.
* [AMDGPU] Enable SGPR copy foldingStanislav Mekhanoshin2019-10-251-0/+48
| | | | | | | | | | | | | That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IMM. Its legal effective subreg class is SReg_32 while instruction expects more restricted SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand() passed the legality check and it was caught in the verifier. Borrowed code from the verifier to check for RC legality. Differential Revision: https://reviews.llvm.org/D69445
* GlobalISel: Implement widenScalar for G_INSERT_VECTOR_ELTMatt Arsenault2019-10-251-0/+48
|
* AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHGMatt Arsenault2019-10-253-37/+454
| | | | | | | | Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.
* AMDGPU: Fix the broken dominator tree when creating waterfall loop for ↵Changpeng Fang2019-10-251-0/+48
| | | | | | | | | | | | | resource descriptor Summary: In loadSRsrcFromVGPR, if MBB is the same as Succ, Remiander is not the immediate dominator of Succ. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D69358
* [AMDGPU] Fold AGPR reg_sequence initializersStanislav Mekhanoshin2019-10-251-38/+330
| | | | Differential Revision: https://reviews.llvm.org/D69413
* [AMDGPU] Disallow dpp combining for dpp instructions without Src2 operand ↵vpykhtin2019-10-251-0/+16
| | | | | | (when Src2 is required) Differential revision: https://reviews.llvm.org/D69430
* AMDGPU/GlobalISel: Legalize FDIV16Austin Kerbow2019-10-251-133/+335
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69347
* [AMDGPU] Remove update_llc_test_checks for a testScott Linder2019-10-251-2/+1
| | | | | | | | | The test split-arg-dbg-value.ll has a host-specific path in the full output captured by update_llc_test_checks. Fix for test failures introduced in https://reviews.llvm.org/D69402 Tags: #llvm
OpenPOWER on IntegriCloud