summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* Migrate some more fadd and fsub cases away from UnsafeFPMath control to ↵Michael Berg2019-07-313-23/+38
| | | | | | | | | | | | | | | | utilize NoSignedZerosFPMath options control Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context. Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper Reviewed By: spatel Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji Differential Revision: https://reviews.llvm.org/D65170 llvm-svn: 367486
* [AMDGPU] Fix high occupancy calculation and print itStanislav Mekhanoshin2019-07-314-7/+292
| | | | | | | | | | | We had couple places which still return 10 as a maximum occupancy. Fixed. Also print comment about occupancy as compiler see it. Differential Revision: https://reviews.llvm.org/D65423 llvm-svn: 367381
* GlobalISel: Add G_ATOMICRMW_{FADD|FSUB}Matt Arsenault2019-07-301-0/+48
| | | | llvm-svn: 367369
* [AMDGPU] Reserve all AGPRs on targets which do not have themStanislav Mekhanoshin2019-07-301-11/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D65471 llvm-svn: 367347
* [AMDGPU/GlobalISel] Add llvm.amdgcn.fdiv.fast legalization.Austin Kerbow2019-07-301-0/+54
| | | | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64966 llvm-svn: 367344
* AMDGPU/LoadStoreOptimizer: combine MMOs when merging instructionsTom Stellard2019-07-291-0/+23
| | | | | | | | | | | | | | | | | | | | | | | Summary: The LoadStoreOptimizer was creating instructions with 2 MachineMemOperands, which meant they were assumed to alias with all other instructions, because MachineInstr:mayAlias() returns true when an instruction has multiple MachineMemOperands. This was preventing these instructions from being merged again, and was giving the scheduler less freedom to reorder them. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65036 llvm-svn: 367237
* [AMDGPU] Add amdgpu_kernel for consistency with other testsJay Foad2019-07-291-1/+1
| | | | llvm-svn: 367221
* [DivergenceAnalysis] Add methods for querying divergence at useJay Foad2019-07-291-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The existing isDivergent(Value) methods query whether a value is divergent at its definition. However even if a value is uniform at its definition, a use of it in another basic block can be divergent because of divergent control flow between the def and the use. This patch adds new isDivergent(Use) methods to DivergenceAnalysis, LegacyDivergenceAnalysis and GPUDivergenceAnalysis. This might allow D63953 or other similar workarounds to be removed. Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue Reviewed By: nhaehnle Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65141 llvm-svn: 367218
* [AMDGPU] Regenerate v2i16 insertelement tests. Simon Pilgrim2019-07-291-337/+1557
| | | | | | To help show the diffs from an upcoming SimplifyDemandedBits patch. llvm-svn: 367213
* [AMDGPU] Enable v4f16 and above for v_pk_fma instructionsDavid Stuttard2019-07-292-84/+186
| | | | | | | | | | | | | | | | | | | | Summary: If isel is presented with <2 x half> vectors then it will correctly select v_pk_fma style instructions. If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for other instruction types (such as fadd, fmul etc.) Added extra support to enable this. Updated one of the tests to include a test for this (as well as extending the test to GFX9) Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65325 Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee llvm-svn: 367206
* [FunctionAttrs] Annotate "willreturn" for intrinsicsHideto Ueno2019-07-281-1/+1
| | | | | | | | | | | | | | | | | | | Summary: In D62801, new function attribute `willreturn` was introduced. In short, a function with `willreturn` is guaranteed to come back to the call site(more precise definition is in LangRef). In this patch, willreturn is annotated for LLVM intrinsics. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: jvesely, nhaehnle, sstefan1, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64904 llvm-svn: 367184
* [AMDGPU] Regenerate tests. Simon Pilgrim2019-07-273-314/+1767
| | | | | | To help show the diffs from an upcoming SimplifyDemandedBits patch. llvm-svn: 367175
* [SelectionDAG] Check for any recursion depth greater than or equal to limit ↵Simon Pilgrim2019-07-272-899/+1182
| | | | | | | | | | | | instead of just equal the limit. If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit. This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it. This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches. llvm-svn: 367171
* [AMDGPU] Add llvm.amdgcn.softwqm intrinsicCarl Ritson2019-07-261-0/+188
| | | | | | | | | | | | | | | | | Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 llvm-svn: 367097
* AMDGPU/GlobalISel: Handle most function return typesMatt Arsenault2019-07-265-141/+1345
| | | | | | | | | handleAssignments gives up pretty easily on structs, and i8 values for some reason. The other case that doesn't work is when an implicit sret needs to be inserted if the return size exceeds the number of return registers. llvm-svn: 367082
* GlobalISel: Fold out unmerge to scalars from concat_vectorMatt Arsenault2019-07-262-47/+56
| | | | | | | Removes illegal intermediate vectors if an operation was lowering to concat_vectors, and the next operation is scalarized. llvm-svn: 367081
* [AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs.Michael Liao2019-07-251-0/+26
| | | | | | | | | | | | | | | | | | | | Summary: - As LCSSA is turned on just before isel, it may create PHI of the flow, which is consumed by pseudo structurized CFG instructions. When that PHIs are eliminated in O0, COPY may be placed wrongly as the these pseudo structurized CFG instructions are considering prologue of MBB. - Run extra `unreachable-mbb-elimination` at the end of isel to clean up PHIs. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64353 llvm-svn: 367023
* AMDGPU: Don't assert on v4f16 arguments to shader calling conventionsMatt Arsenault2019-07-251-0/+31
| | | | llvm-svn: 367018
* [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 foldRoman Lebedev2019-07-241-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955
* [AMDGPU] Increase kernel paddingStanislav Mekhanoshin2019-07-241-35/+2
| | | | | | | | | | To support prefetch mode 3 we need to pad current cacheline and fill 3 cachelines after. Current padding is only sufficient for mode 2. Differential Revision: https://reviews.llvm.org/D65236 llvm-svn: 366938
* AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting extsMatt Arsenault2019-07-241-9/+19
| | | | | | | The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915
* [TargetLowering] Add SimplifyMultipleUseDemandedBitsSimon Pilgrim2019-07-235-1186/+887
| | | | | | | | | | | | | | | | | | This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured. The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use. We do see a couple of regressions that need to be addressed: AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better). X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG. The code owners have confirmed its ok for these cases to fixed up in future patches. Differential Revision: https://reviews.llvm.org/D63281 llvm-svn: 366799
* [AMDGPU] Test update. NFC.Stanislav Mekhanoshin2019-07-221-26/+26
| | | | llvm-svn: 366715
* AMDGPU/GlobalISel: Fix broken testsMatt Arsenault2019-07-229-54/+54
| | | | llvm-svn: 366688
* AMDGPU/GlobalISel: Fix tests without assertsMatt Arsenault2019-07-2212-692/+283
| | | | | | | The legality check is only done under NDEBUG, so the failure cases are different in a release build. llvm-svn: 366680
* AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spacesMatt Arsenault2019-07-192-0/+122
| | | | llvm-svn: 366621
* [AMDGPU] Fixed occupancy calculation for gfx10Stanislav Mekhanoshin2019-07-195-8/+23
| | | | | | Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616
* AMDGPU: Don't rely on m0 being -1 for GWS offsetsMatt Arsenault2019-07-196-39/+70
| | | | | | | This only works if the high bits of m0 are also 0, so m0 would have to be set to 0xffff. llvm-svn: 366608
* AMDGPU: Force s_waitcnt after GWS instructionsMatt Arsenault2019-07-192-4/+26
| | | | | | | This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607
* LiveIntervals: Fix handleMove asserting on BUNDLEMatt Arsenault2019-07-191-0/+52
| | | | | | | | | The top-level BUNDLE instruction should behave as an ordinary instruction. It is supposed to have all relevant registers as implicit operands. Moving it should work as any other instruction. I believe the assert intended to avoid moving instructions inside bundles. llvm-svn: 366605
* [AMDGPU] Add test case on crashing of `si-lower-sgpr-spills` passMichael Liao2019-07-191-0/+20
| | | | | | | | | | | | Reviewers: arsenm Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64273 llvm-svn: 366602
* AMDGPU/GlobalISel: Fix MMO flags for kernel argument loadsMatt Arsenault2019-07-191-114/+114
| | | | | | The DAG lowering sets dereferencable and invariant, not nontemporal. llvm-svn: 366597
* AMDGPU: Add some function return test casesMatt Arsenault2019-07-191-0/+20
| | | | llvm-svn: 366591
* [AMDGPU] Regenerate test file for upcoming patch. NFCI.Simon Pilgrim2019-07-191-42/+482
| | | | llvm-svn: 366589
* AMDGPU: Attempt to fix bot errorMatt Arsenault2019-07-191-1/+1
| | | | | | | Manually remove file name from check line, since it somehow ends up being different on an msvc bot. llvm-svn: 366586
* AMDGPU/GlobalISel: Selection for fminnum/fmaxnumMatt Arsenault2019-07-198-251/+1230
| | | | | | | v2f16 case doesn't work yet because the VOP3P complex patterns haven't been ported yet. llvm-svn: 366585
* AMDGPU/GlobalISel: Support arguments with multiple registersMatt Arsenault2019-07-192-12/+42
| | | | | | Handles structs used directly in argument lists. llvm-svn: 366584
* AMDGPU/GlobalISel: Rewrite lowerFormalArgumentsMatt Arsenault2019-07-192-7/+1994
| | | | | | | | | | | | | | | | | This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582
* AMDGPU: Decompose all values to 32-bit pieces for calling conventionsMatt Arsenault2019-07-1911-206/+216
| | | | | | | | | | This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578
* DAG: Handle dbg_value for arguments split into multiple subregsMatt Arsenault2019-07-191-0/+227
| | | | | | | | This was handled previously for arguments split due to not fitting in an MVT. This was dropping the register for argument registers split due to TLI::getRegisterTypeForCallingConv. llvm-svn: 366574
* [AMDGPU] Simplify the exclusive scan used for optimized atomicsJay Foad2019-07-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3 and then doing a 3-way ADD, because: 1. It simplifies the compiler a little. 2. It minimizes vgpr pressure because each instruction is now of the form vn = vn + vn << c. 3. It is more friendly to the DPP combiner, which currently can't combine into an ADD3 instruction. Because of #2 and #3 the end result is improved from this: v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf To this: v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf I.e. two fewer computational instructions, one extra nop where we could schedule something else. Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64411 llvm-svn: 366543
* GlobalISel: Handle widenScalar of arbitrary G_MERGE_VALUES sourcesMatt Arsenault2019-07-171-129/+318
| | | | | | | | | | | Extract the sources to the GCD of the original size and target size, padding with implicit_def as necessary. Also fix the case where the requested source type is wider than the original result type. This was ignoring the type, and just using the destination. Do the operation in the requested type and truncate back. llvm-svn: 366367
* GlobalISel: Handle more cases for widenScalar of G_MERGE_VALUESMatt Arsenault2019-07-171-0/+61
| | | | | | | | | | | | Use an anyext to the requested type for the leftover operand to produce a slightly wider type, and then truncate the final merge. I have another implementation almost ready which handles arbitrary widens, but I think it produces worse code in this example (which I think is 90% due to not folding redundant copies or folding out implicit_def users), so I wanted to add this as a baseline first. llvm-svn: 366366
* [AMDGPU] Tune inlining parameters for AMDGPU targetDaniil Fukalov2019-07-171-7/+0
| | | | | | | | | | | | | | | | | | | Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 llvm-svn: 366348
* AMDGPU: Use getTargetConstantMatt Arsenault2019-07-171-4/+4
| | | | | | Avoids creating an extra intermediate mov. llvm-svn: 366340
* [AMDGPU] Optimize atomic AND/OR/XORJay Foad2019-07-171-0/+36
| | | | | | | | | | | | | | Summary: Extend the atomic optimizer to handle AND, OR and XOR. Reviewers: arsenm, sheredom Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64809 llvm-svn: 366323
* AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXECNicolai Haehnle2019-07-173-0/+17
| | | | | | | | | | | | | | | Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630 Reviewers: rampitec, mareko Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64807 Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc llvm-svn: 366314
* AMDGPU: Improve alias analysis for GDSNicolai Haehnle2019-07-171-8/+43
| | | | | | | | | | | | | | | | | Summary: GDS cannot alias anything else. Original patch by: Marek Olšák Reviewers: arsenm, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64114 Change-Id: I07bfbd96f5d5c37a6dfba7997df12f291dd794b0 llvm-svn: 366313
* AMDGPU/GlobalISel: Select G_ASHRMatt Arsenault2019-07-163-59/+676
| | | | llvm-svn: 366257
* AMDGPU/GlobalISel: Select G_LSHRMatt Arsenault2019-07-163-0/+699
| | | | llvm-svn: 366256
OpenPOWER on IntegriCloud