summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Fix analyzeBranch failing with pseudoterminatorsMatt Arsenault2018-11-161-0/+23
| | | | | | | | | If a block had one of the _term instructions used for gluing exec modifying instructions to the end of the block, analyzeBranch would fail, preventing the verifier from catching a broken successor list. llvm-svn: 347027
* [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/STRon Lieberman2018-11-1615-127/+631
| | | | | | | | | Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. llvm-svn: 347008
* AMDGPU: Fix check lines in fdot2 test:Konstantin Zhuravlyov2018-11-151-6/+6
| | | | | | GCN900 -> GFX900 llvm-svn: 346925
* AMDGPU: Enable code object v3 for AMDHSA onlyKonstantin Zhuravlyov2018-11-1533-120/+120
| | | | | | Differential Revision: https://reviews.llvm.org/D54186 llvm-svn: 346923
* Bias physical register immediate assignmentsNirav Dave2018-11-146-68/+61
| | | | | | | | | | | | | | | | | | | | | | | The machine scheduler currently biases register copies to/from physical registers to be closer to their point of use / def to minimize their live ranges. This change extends this to also physical register assignments from immediate values. This causes a reduction in reduction in overall register pressure and minor reduction in spills and indirectly fixes an out-of-registers assertion (PR39391). Most test changes are from minor instruction reorderings and register name selection changes and direct consequences of that. Reviewers: MatzeB, qcolombet, myatsina, pcc Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya, javed.absar, arphaman, jfb, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54218 llvm-svn: 346894
* AMDGPU: Additional pattern for i16 median3 matchingAakanksha Patil2018-11-142-0/+43
| | | | | | | | min(max(a, b), max(min(a, b), c)) Differential Revision: https://reviews.llvm.org/D54494 llvm-svn: 346886
* [AMDGPU] combine extractelement into several selectsStanislav Mekhanoshin2018-11-139-47/+406
| | | | | | | | | | An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction into a set of selects if vector size is not too big. Differential Revision: https://reviews.llvm.org/D54351 llvm-svn: 346800
* Fixed DAGTypeLegalizer::SplitVecOp_EXTRACT_VECTOR_ELT i1 handlingStanislav Mekhanoshin2018-11-131-0/+28
| | | | | | | | | Legalizer used to request an ext load from i8 to i1 when promoting vector element type to i8. Fixed. Differential Revision: https://reviews.llvm.org/D54440 llvm-svn: 346795
* AMDGPU: Adding more median3 patternsAakanksha Patil2018-11-122-0/+411
| | | | | | | | min(max(a, b), max(min(a, b), c)) -> med3 a, b, c Differential Revision: https://reviews.llvm.org/D54331 llvm-svn: 346704
* [AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]ZStanislav Mekhanoshin2018-11-123-2/+322
| | | | | | | | | | | | | | | | | | | Sometimes after basic block placement we end up with a code like: sreg = s_mov_b64 -1 vcc = s_and_b64 exec, sreg s_cbranch_vccz This happens as a join of a block assigning -1 to a saved mask and another block which consumes that saved mask with s_and_b64 and a branch. This is essentially a single s_cbranch_execz instruction when moved into a single new basic block. Differential Revision: https://reviews.llvm.org/D54164 llvm-svn: 346690
* Fix MachineInstr::findRegisterUseOperandIdx subreg checksStanislav Mekhanoshin2018-11-121-0/+49
| | | | | | | | | | | | The function only checks that instruction reads a super-register containing requested physical register. In case if a sub-register if being read that is also a use of a super-reg, so added the check. In particular MI->readsRegister() is broken because of the missing check. The resulting check is essentially regsOverlap(). Differential Revision: https://reviews.llvm.org/D54128 llvm-svn: 346686
* [AMDGPU] Cleanup optimize-if-exec-masking.mir test. NFC.Stanislav Mekhanoshin2018-11-091-272/+42
| | | | llvm-svn: 346533
* AMDGPU: Add testcase to demonstrate a condition with pre-existing waitcntNicolai Haehnle2018-11-091-0/+36
| | | | | | Relevant for https://reviews.llvm.org/D54226. llvm-svn: 346501
* Add test case for the regression caused by r344696Nicolai Haehnle2018-11-081-0/+24
| | | | | | | | (That change has since been reverted.) Reduced from https://bugs.freedesktop.org/show_bug.cgi?id=108611 llvm-svn: 346423
* [AMDGPU] Extend promote alloca vectorizationStanislav Mekhanoshin2018-11-081-0/+189
| | | | | | | | | | | Promote alloca can vectorize a small array by bitcasting it to a vector type. Extend vectorization for the case when alloca is already a vector type. We still want to replace GEPs with an insert/extract element instructions in this case. Differential Revision: https://reviews.llvm.org/D54219 llvm-svn: 346376
* Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"Nicolai Haehnle2018-11-072-36/+22
| | | | | | | | This reverts commit r344696 for now (except for some test additions). See https://bugs.freedesktop.org/show_bug.cgi?id=108611. llvm-svn: 346364
* Allow subclassing ExternalAAMatt Arsenault2018-11-071-2/+2
| | | | | | | | | | | | | | This allows testing AMDGPU alias analysis like any other alias analysis pass. This fixes the existing test pointlessly running opt -O3 when it really just wants to run the one analysis. Before there was no way to test this using -aa-eval with opt, since the default constructed pass is run. The wrapper subclass allows the default constructor to pass the necessary callback. llvm-svn: 346353
* RegAllocFast: Leave unassigned virtreg entries in mapMatthias Braun2018-11-073-215/+184
| | | | | | | | | | | | | | | | Set `LiveReg::PhysReg` to zero when freeing a register instead of removing it from the entry from `LiveRegMap`. This way no iterators get invalidated and we can avoid passing around and updating iterators all over the place. This does not change any allocator decisions. It is not completely NFC because the arbitrary iteration order through `LiveRegMap` in `spillAll()` changes so we may get a different order in those spill sequences (the amount of spills does not change). This is in preparation of https://reviews.llvm.org/D52010. llvm-svn: 346298
* AMDGPU: Add an option -disable-promote-alloca-to-ldsYaxun Liu2018-11-061-0/+3
| | | | | | | | | | Add this option for debugging and providing workaround. By default it is off so no behavior change in backend. Differential Revision: https://reviews.llvm.org/D54158 llvm-svn: 346267
* [LICM] Use ICFLoopSafetyInfo in LICMMax Kazantsev2018-11-061-1/+1
| | | | | | | | | | | | | | | This patch makes LICM use `ICFLoopSafetyInfo` that is a smarter version of LoopSafetyInfo that leverages power of Implicit Control Flow Tracking to keep track of throwing instructions and give less pessimistic answers to queries related to throws. The ICFLoopSafetyInfo itself has been introduced in rL344601. This patch enables it in LICM only. Differential Revision: https://reviews.llvm.org/D50377 Reviewed By: apilipenko llvm-svn: 346201
* AMDGPU: Add sram-ecc featureKonstantin Zhuravlyov2018-11-053-2/+53
| | | | | | Differential Revision: https://reviews.llvm.org/D53222 llvm-svn: 346177
* [AMDGPU] Fix the new atomic optimizer in pixel shaders.Neil Henning2018-11-051-0/+59
| | | | | | | | | | | | | | | | | The new atomic optimizer I previously added in D51969 did not work correctly when a pixel shader was using derivatives, and had helper lanes active. To fix this we add an llvm.amdgcn.ps.live call that guards a branch around the entire atomic operation - ensuring that all helper lanes are inactive within the wavefront when we compute our atomic results. I've added a test case that can cause derivatives, and exposes the problem. Differential Revision: https://reviews.llvm.org/D53930 llvm-svn: 346128
* AMDGPU: Fix assertion with bitcast from i64 constant to v4i16Matt Arsenault2018-11-021-0/+38
| | | | llvm-svn: 345922
* [AMDGPU] Handle the idot8 pattern generated by FE.Farhana Aleen2018-11-011-0/+220
| | | | | | | | | | | | | | Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge increase in the compile time. Support the pattern that clang FE generates after reordering the additions in integer-dot8 source language pattern. Author: FarhanaAleen Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D53937 llvm-svn: 345902
* Check shouldReduceLoadWidth from SimplifySetCCStanislav Mekhanoshin2018-10-311-0/+65
| | | | | | | | | | | | SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778
* [SelectionDAG] Handle constant range [0,1) in lowerRangeToAssertZExtScott Linder2018-10-311-6/+20
| | | | | | | | | lowerRangeToAssertZExt currently relies on something like EarlyCSE having eliminated the constant range [0,1). At -O0 this leads to an assert. Differential Revision: https://reviews.llvm.org/D53888 llvm-svn: 345770
* [AMDGPU] Remove FeatureVGPRSpillingScott Linder2018-10-319-36/+87
| | | | | | | | | | | This feature is only relevant to shaders, and is no longer used. When disabled, lowering of reserved registers for shaders causes a compiler crash. Remove the feature and add a test for compilation of shaders at OptNone. Differential Revision: https://reviews.llvm.org/D53829 llvm-svn: 345763
* AMDGPU: Rewrite SILowerI1Copies to always stay on SALUNicolai Haehnle2018-10-3115-141/+264
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719
* AMDGPU: Remove PHI loop condition optimizationNicolai Haehnle2018-10-314-79/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718
* [AMDGPU] support image load/store a16Neil Henning2018-10-315-0/+1054
| | | | | | | | | | | | Our a16 support was only enabled for sample/gather and buffer load/store, but not for image load/store operations (which take an i16 as the pixel index rather than a half). Fix our isel lowering and add test cases to prove it out. Differential Revision: https://reviews.llvm.org/D53750 llvm-svn: 345710
* MachineOperand/MIParser: Do not print debug-use flag, infer itMatthias Braun2018-10-303-5/+5
| | | | | | | | | | | | | | The debug-use flag must be set exactly for uses on DBG_VALUEs. This is so obvious that it can be trivially inferred while parsing. This will reduce noise when printing while omitting an information that has little value to the user. The parser will keep recognizing the flag for compatibility with old `.mir` files. Differential Revision: https://reviews.llvm.org/D53903 llvm-svn: 345671
* Revert r345542: AMDGPU: Enable code object v3 by defaultKonstantin Zhuravlyov2018-10-3033-120/+120
| | | | | | It breaks mesa. llvm-svn: 345662
* [SchedModel] Fix for read advance cycles with implicit pseudo operands.Jonas Paulsson2018-10-309-50/+51
| | | | | | | | | | | | | | | | | | The SchedModel allows the addition of ReadAdvances to express that certain operands of the instructions are needed at a later point than the others. RegAlloc may add pseudo operands that are not part of the instruction descriptor, and therefore cannot have any read advance entries. This meant that in some cases the desired read advance was nullified by such a pseudo operand, which still had the original latency. This patch fixes this by making sure that such pseudo operands get a zero latency during DAG construction. Review: Matthias Braun, Ulrich Weigand. https://reviews.llvm.org/D49671 llvm-svn: 345606
* [SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodesSimon Pilgrim2018-10-301-2/+2
| | | | | | | | | | Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy. This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle. Differential Revision: https://reviews.llvm.org/D53760 llvm-svn: 345578
* AMDGPU: Remove custom BUILD_VECTOR combineMatt Arsenault2018-10-302-6/+33
| | | | | | | This was looping in a testcase and removing it now slightly improves a test. llvm-svn: 345560
* AMDGPU: Use scavengeRegisterBackwardsMatt Arsenault2018-10-302-44/+48
| | | | llvm-svn: 345559
* AMDGPU: Enable code object v3 by defaultKonstantin Zhuravlyov2018-10-2933-120/+120
| | | | | | Differential Revision: https://reviews.llvm.org/D53525 llvm-svn: 345542
* Relax fast register allocator related test cases; NFCMatthias Braun2018-10-292-7/+7
| | | | | | | | | | | | | - Relex hard coded registers and stack frame sizes - Some test cleanups - Change phi-dbg.ll to match on mir output after phi elimination instead of going through the whole codegen pipeline. This is in preparation for https://reviews.llvm.org/D52010 I'm committing all the test changes upfront that work before and after independently. llvm-svn: 345532
* [AMDGPU] Match v_swap_b32Stanislav Mekhanoshin2018-10-291-0/+564
| | | | | | Differential Revision: https://reviews.llvm.org/D52677 llvm-svn: 345514
* [AMDGPU] Add a pass to promote bitcast callsScott Linder2018-10-263-8/+148
| | | | | | | | | | | | AMDGPU currently only supports direct calls, but at lower optimisation levels it fails to lower statically direct calls which appear indirect due to a bitcast. Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize" calls. Differential Revision: https://reviews.llvm.org/D52741 llvm-svn: 345382
* [AMDGPU] Defined gfx909 Raven Ridge 2Tim Renouf2018-10-242-0/+4
| | | | | | | Differential Revision: https://reviews.llvm.org/D53418 Change-Id: Ie3d054f2e956c2768988c0f4c0ffd29a47294eef llvm-svn: 345120
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-2218-582/+2188
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* AMDGPU: Add support pattern for SUB of one bitChangpeng Fang2018-10-192-0/+73
| | | | | | | | | | | | | Summary: Add selection patterns to support one bit Sub. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D52946 llvm-svn: 344815
* AMDGPU: Avoid selecting ds_{read,write}2_b32 on SINicolai Haehnle2018-10-171-0/+129
| | | | | | | | | | | | | | | | | | | | | | Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698
* StructurizeCFG: Simplify inserted PHI nodesNicolai Haehnle2018-10-173-9/+12
| | | | | | | | | | | | | | | Summary: This improves subsequent divergence analysis in some cases. Change-Id: I5e95e7ec7fd3fa80d414d1a53a02fea23e3d67d3 Reviewers: arsenm, rampitec Subscribers: jvesely, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D53316 llvm-svn: 344697
* AMDGPU: Divergence-driven selection of scalar buffer load intrinsicsNicolai Haehnle2018-10-172-8/+67
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696
* StructurizeCFG,AMDGPU: Test case of a redundant phi and codegen consequencesNicolai Haehnle2018-10-151-0/+34
| | | | | Change-Id: I9681f9e41ca30f82576f3d1f965c3a550a34b171 llvm-svn: 344569
* AMDGPU: Generate .amdgcn_target for object code v3Konstantin Zhuravlyov2018-10-151-0/+58
| | | | | | Differential Revision: https://reviews.llvm.org/D53221 llvm-svn: 344552
* AMDGPU: Test showing a scalar buffer load deficiencyNicolai Haehnle2018-10-151-0/+23
| | | | | Change-Id: I5b64a565f22a8482aa0712488d85e45163ac3d12 llvm-svn: 344506
* Revert "AMDGPU/GlobalISel: Implement select for G_INSERT"Tom Stellard2018-10-111-49/+0
| | | | | | | | This reverts commit r344310. The test case was failing on some bots. llvm-svn: 344317
OpenPOWER on IntegriCloud