summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU/GlobalISel: Split VALU s64 G_ZEXT/G_SEXT in RegBankSelectMatt Arsenault2019-06-242-12/+87
| | | | | | | | | | | Scalar extends to s64 can use S_BFE_{I64|U64}, but vector extends need to extend to the 32-bit half, and then to 64. I'm not sure what the line should be between what RegBankSelect handles, and what instruction select does, but for now I'm erring on the side of RegBankSelect for future post-RBS combines. llvm-svn: 364212
* AMDGPU/GlobalISel: Fix selecting G_IMPLICIT_DEF for s1Matt Arsenault2019-06-241-27/+133
| | | | | | Try to fail for scc, since I don't think that should ever be produced. llvm-svn: 364199
* AMDGPU/GlobalISel: Fix RegBankSelect for s1 sext/zext/anyextMatt Arsenault2019-06-243-14/+692
| | | | | | | | This needs different handling if the source is known to be a valid condition or not. Handle turning it into shifts or a select during regbankselect. llvm-svn: 364186
* AMDGPU: Fold frame index into MUBUFMatt Arsenault2019-06-245-10/+207
| | | | | | | | | | | | | | | | This matters for byval uses outside of the entry block, which appear as copies. Previously, the only folding done was during selection, which could not see the underlying frame index. For any uses outside the entry block, the frame index was materialized in the entry block relative to the global scratch wave offset. This may produce worse code in cases where the offset ends up not fitting in the MUBUF offset field. A better heuristic would be helpfu for extreme frames. llvm-svn: 364185
* AMDGPU: Fix not using s33 for scratch wave offset in kernelsMatt Arsenault2019-06-213-9/+11
| | | | | | Fixes missing piece from r363990. llvm-svn: 364099
* [AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal.Amara Emerson2019-06-211-6/+4
| | | | | | | | | | | | | | | | | | | | | We sometimes get poor code size because constants of types < 32b are legalized as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the localizer can no longer sink them (although it's possible to extend it to do so). On AArch64 however s8 and s16 constants can be selected in the same way as s32 constants, with a mov pseudo into a W register. If we make s8 and s16 constants legal then we can avoid unnecessary truncates, they can be CSE'd, and the localizer can sink them as normal. There is a caveat: if the user of a smaller constant has to widen the sources, we end up with an anyext of the smaller typed G_CONSTANT. This can cause regressions because of the additional extend and missed pattern matching. To remedy this, there's a new artifact combiner to generate the wider G_CONSTANT if it's legal for the target. Differential Revision: https://reviews.llvm.org/D63587 llvm-svn: 364075
* [AMDGPU] hazard recognizer for fp atomic to s_denorm_modeStanislav Mekhanoshin2019-06-211-0/+447
| | | | | | | | | This requires 3 wait states unless there is a wait or VALU in between. Differential Revision: https://reviews.llvm.org/D63619 llvm-svn: 364074
* AMDGPU: Always use s33 for global scratch wave offsetMatt Arsenault2019-06-2027-443/+478
| | | | | | | | | Every called function could possibly need this to calculate the absolute address of stack objectst, and this avoids inserting a copy around every call site in the kernel. It's also somewhat cleaner to keep this in a callee saved SGPR. llvm-svn: 363990
* AMDGPU: Add intrinsics for DS GWS semaphore instructionsMatt Arsenault2019-06-205-1/+132
| | | | llvm-svn: 363983
* AMDGPU: Insert mem_viol check loop around GWS pre-GFX9Matt Arsenault2019-06-202-97/+124
| | | | | | | It is necessary to emit this loop around GWS operations in case the wave is preempted pre-GFX9. llvm-svn: 363979
* AMDGPU: Eliminate test usage of legacy FP elim attributesMatt Arsenault2019-06-202-3/+3
| | | | llvm-svn: 363950
* AMDGPU: Fix ignoring DisableFramePointerElim in leaf functionsMatt Arsenault2019-06-201-4/+41
| | | | | | | | The attribute can specify elimination for leaf or non-leaf, so it should always be considered. I copied this bug from AArch64, which probably should also be fixed. llvm-svn: 363949
* [AMDGPU] gfx10 tests. NFC.Stanislav Mekhanoshin2019-06-2011-191/+2616
| | | | llvm-svn: 363946
* AMDGPU: Treat undef as an inline immediateMatt Arsenault2019-06-201-4/+2
| | | | | | | This should only matter in vectors with an undef component, since a full undef vector would have been folded out. llvm-svn: 363941
* AMDGPU: Make test functions hiddenMatt Arsenault2019-06-201-23/+23
| | | | | | Reduces amount of code in the function from eliminating the GOT load. llvm-svn: 363940
* [AMDGPU] gfx1010 core wave32 changesStanislav Mekhanoshin2019-06-2018-20/+1285
| | | | | | Differential Revision: https://reviews.llvm.org/D63204 llvm-svn: 363934
* AMDGPU: Don't clobber VCC in MUBUF addr64 emulationMatt Arsenault2019-06-201-8/+8
| | | | | | | | | Introducing VCC defs during SIFixSGPRCopies is generally problematic. Avoid it by starting with the VOP3 form with the general condition register. This is the easiest to fix instance, but doesn't solve any specific problems I'm looking at. llvm-svn: 363904
* AMDGPU: Undo sub x, c canonicalization for v2i16Matt Arsenault2019-06-194-27/+24
| | | | | | Should avoid regression from D62341 llvm-svn: 363899
* AMDGPU: Add baseline test for vector sub x, c canonicalizationMatt Arsenault2019-06-191-0/+1448
| | | | | | | This will catch regressions from D62341, and show improvements from a future patch to fix them. llvm-svn: 363888
* AMDGPU: Fix folding immediate into readfirstlane through reg_sequenceMatt Arsenault2019-06-192-0/+135
| | | | | | | | | | | | | The def instruction for the vreg may not match, because it may be folding through a reg_sequence. The assert was overly conservative and not necessary. It's not actually important if DefMI really defined the register, because the fold that will be done cares about the def of the value that will be folded. For some reason copies aren't making it through the reg_sequence, although they should. llvm-svn: 363876
* Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"Matt Arsenault2019-06-195-0/+508
| | | | | | | | This reapplies r363678, using the correct chain for the CopyToReg for v0. glueCopyToM0 counterintuitively changes the operands of the original node. llvm-svn: 363870
* Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsSimon Pilgrim2019-06-195-508/+0
| | | | | | | | | There may or may not be additional work to handle this correctly on SI/CI. ........ Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/ llvm-svn: 363797
* Rename ExpandISelPseudo->FinalizeISel, delay register reservationMatt Arsenault2019-06-191-1/+1
| | | | | | | | | | | This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757
* AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsicsMatt Arsenault2019-06-185-0/+508
| | | | | | | There may or may not be additional work to handle this correctly on SI/CI. llvm-svn: 363678
* AMDGPU: Fold readlane from copy of SGPR or immMatt Arsenault2019-06-183-17/+329
| | | | | | These may be inserted to assert uniformity somewhere. llvm-svn: 363670
* AMDGPU: Fix iterator crash in AMDGPUPromoteAllocaMatt Arsenault2019-06-181-7/+21
| | | | | | The lifetime intrinsic was erased, which was the next iterator. llvm-svn: 363668
* AMDGPU/GlobalISel: RegBankSelect for amdgcn.div.scaleMatt Arsenault2019-06-181-0/+67
| | | | llvm-svn: 363667
* [llvm-objdump] Tidy up AMDGCNPrettyPrinterFangrui Song2019-06-181-1/+1
| | | | llvm-svn: 363650
* GlobalISel: Use the original flags when lowering fneg to fsubMatt Arsenault2019-06-172-0/+61
| | | | | | | | | | This was ignoring the flag on fneg, and using the source instruction's flags. Also fixes tests missing from r358702. Note the expansion itself isn't correct without nnan, but that should be fixed separately. llvm-svn: 363637
* [AMDGPU] gfx1010 subvector test. NFC.Stanislav Mekhanoshin2019-06-171-0/+37
| | | | llvm-svn: 363623
* [AMDGPU] Propagate function attributes thru bitcastsStanislav Mekhanoshin2019-06-171-0/+23
| | | | | | | | | AMDGPUPropagateAttributes will not work on function bitcatsts, so move AMDGPUFixFunctionBitcasts before it. Differential Revision: https://reviews.llvm.org/D63455 llvm-svn: 363614
* AMDGPU/GFX10: Don't generate s_code_end padding in the asm-printerNicolai Haehnle2019-06-171-46/+48
| | | | | | | | | | | | | | | | | | | | | | Summary: The purpose of the padding is to guard against stale code being fetched into the instruction cache by the lowest level prefetching. We're generating relocatable ELF here, and so the padding should arguably be added by the linker. This is in fact what Mesa does. This also fixes multi-part shaders for Mesa. Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82 Reviewers: arsenm, rampitec, t-tye Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63427 llvm-svn: 363602
* AMDGPU: Explicitly define a triple for some testsNicolai Haehnle2019-06-175-10/+13
| | | | | | | | | | | | | | | | | | | | | | | Summary: This is related to the changes to the groupstaticsize intrinsic in D61494 which would otherwise make the related tests in these files fail or much less useful. Note that for some reason, SOPK generation is less effective in the amdhsa OS, which is why I chose PAL. I haven't investigated this deeper. Change-Id: I6bb99569338f7a433c28b4c9eb1e3e036b00d166 Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63392 llvm-svn: 363600
* [AMDGPU] gfx1010 wavefrontsize intrinsic foldingStanislav Mekhanoshin2019-06-171-0/+84
| | | | | | Differential Revision: https://reviews.llvm.org/D63206 llvm-svn: 363588
* [AMDGPU] Pass to propagate ABI attributes from kernels to the functionsStanislav Mekhanoshin2019-06-172-0/+159
| | | | | | | | | | | | | | | | The pass works in two modes: Mode 1: Just set attributes starting from kernels. This can work at the very beginning of opt and llc pipeline, but cannot clone functions because it must be a function pass. Mode 2: Actually clone functions for new attributes. This can only work after all function passes in the opt pipeline because it has to be a module pass. Differential Revision: https://reviews.llvm.org/D63208 llvm-svn: 363586
* GlobalISel: Ignore callsite attributes when picking intrinsic typeMatt Arsenault2019-06-171-0/+21
| | | | | | | | | | | A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. I fixed the same bug in SelectionDAG in r287593. llvm-svn: 363580
* GlobalISel: Verify intrinsicsMatt Arsenault2019-06-171-10/+10
| | | | | | | | I keep using the wrong instruction when manually writing tests. This really needs to check the number of operands, but I don't see an easy way to do that right now. llvm-svn: 363579
* [AMDGPU] gfx1010 wave32 metadataStanislav Mekhanoshin2019-06-172-5/+23
| | | | | | Differential Revision: https://reviews.llvm.org/D63207 llvm-svn: 363577
* AMDGPU/GlobalISel: Implement select for G_ICMP and G_SELECTTom Stellard2019-06-172-4/+364
| | | | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60640 llvm-svn: 363576
* [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splittingLuis Marques2019-06-171-12/+18
| | | | | | | | | | | | | | | Some GEPs were not being split, presumably because that split would just be undone by the DAGCombiner. Not performing those splits can prevent important optimizations, such as preventing the element indices / member offsets from being (partially) folded into load/store instruction immediates. This patch: - Makes the splits also occur in the cases where the base address and the GEP are in the same BB. - Ensures that the DAGCombiner doesn't reassociate them back again. Differential Revision: https://reviews.llvm.org/D60294 llvm-svn: 363544
* Describe stack-id as an enumSander de Smalen2019-06-176-18/+18
| | | | | | | | | | | | | | | | | This patch changes MIR stack-id from an integer to an enum, and adds printing/parsing support for this in MIR files. The default stack-id '0' is now renamed to 'default'. This should make MIR tests that have stack objects with different stack-ids more descriptive. It also clarifies code operating on StackID. Reviewers: arsenm, thegameg, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D60137 llvm-svn: 363533
* AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0Nicolai Haehnle2019-06-162-2/+2
| | | | | | | | | | | | | | | | | | | | | Summary: Instead of encoding a high-word of 0 using a fake TargetGlobalAddress, just use a literal target constant. This simplifies some subsequent changes. The generated assembly is now more explicit about the kind of relocation that is to be used. Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61491 llvm-svn: 363516
* AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsicNicolai Haehnle2019-06-162-42/+80
| | | | | | | | | | | | | | | Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539 Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62486 Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0 llvm-svn: 363514
* [AMDGPU] gfx10 conditional registers handlingStanislav Mekhanoshin2019-06-161-97/+188
| | | | | | | | | This is cpp source part of wave32 support, excluding overriden getRegClass(). Differential Revision: https://reviews.llvm.org/D63351 llvm-svn: 363513
* Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect"Matt Arsenault2019-06-153-28/+160
| | | | | | | This reapplies r363410, avoiding null dereference if there is no AltRegBank. llvm-svn: 363478
* Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect"Mitch Phillips2019-06-143-160/+28
| | | | | | | | | | | This patch breaks UBSan build bots. See https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for a guide as to how to reproduce the error. This reverts commit c2864c0de07efb5451d32d27a7d4ff2984830929. This reverts rL363410. llvm-svn: 363476
* [MBP] Move a latch block with conditional exit and multi predecessors to top ↵Guozhi Wei2019-06-1414-94/+100
| | | | | | | | | | | | | | | | of loop Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is: * a latch block * it has two successors, one is loop header, another is exit * it has more than one predecessors If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch. Differential Revision: https://reviews.llvm.org/D43256 llvm-svn: 363471
* AMDGPU: Avoid most waitcnts before callsMatt Arsenault2019-06-142-30/+25
| | | | | | | | | | | Currently you get extra waits, because waits are inserted for the register dependencies of the call, and the function prolog waits on everything. Currently waits are still inserted on returns. It may make sense to not do this, and wait in the caller instead. llvm-svn: 363465
* AMDGPU: Fix capitalized register names in asm constraintsMatt Arsenault2019-06-1412-20/+20
| | | | | | | This was a workaround a long time ago, but the canonical lower case names work now. llvm-svn: 363459
* AMDGPU: Fix dropping memref for ds append/consumeMatt Arsenault2019-06-142-0/+18
| | | | | | | | | The way SelectionDAG treats memory operands is very frustrating, and by default drops them unless a property is set on the pattern. There is no pattern for manually selected instructions, so this requires manually setting them. llvm-svn: 363455
OpenPOWER on IntegriCloud