summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Fix analyzeBranch failing with pseudoterminatorsMatt Arsenault2018-11-163-3/+31
| | | | | | | | | If a block had one of the _term instructions used for gluing exec modifying instructions to the end of the block, analyzeBranch would fail, preventing the verifier from catching a broken successor list. llvm-svn: 347027
* [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/STRon Lieberman2018-11-168-4/+263
| | | | | | | | | Add a pass to fixup various vector ISel issues. Currently we handle converting GLOBAL_{LOAD|STORE}_* and GLOBAL_Atomic_* instructions into their _SADDR variants. This involves feeding the sreg into the saddr field of the new instruction. llvm-svn: 347008
* [AMDGPU] NFC Test commitRon Lieberman2018-11-161-1/+1
| | | | llvm-svn: 347002
* AMDHSA: More code object v3 fixes:Konstantin Zhuravlyov2018-11-151-1/+2
| | | | | | | | - Make sure IsaInfo::hasCodeObjectV3 returns true only for AMDHSA - Update assembler metadata tests to use v2 by default llvm-svn: 347001
* AMDGPU: Enable code object v3 for AMDHSA onlyKonstantin Zhuravlyov2018-11-152-17/+34
| | | | | | Differential Revision: https://reviews.llvm.org/D54186 llvm-svn: 346923
* AMDGPU: Additional pattern for i16 median3 matchingAakanksha Patil2018-11-141-4/+17
| | | | | | | | min(max(a, b), max(min(a, b), c)) Differential Revision: https://reviews.llvm.org/D54494 llvm-svn: 346886
* [AMDGPU] combine extractelement into several selectsStanislav Mekhanoshin2018-11-131-4/+26
| | | | | | | | | | An extractelement with non-constant index will be lowered either to scratch or movrel loop in most cases. This patch converts such instruction into a set of selects if vector size is not too big. Differential Revision: https://reviews.llvm.org/D54351 llvm-svn: 346800
* AMDGPU: Adding more median3 patternsAakanksha Patil2018-11-122-9/+22
| | | | | | | | min(max(a, b), max(min(a, b), c)) -> med3 a, b, c Differential Revision: https://reviews.llvm.org/D54331 llvm-svn: 346704
* [AMDGPU] Optimize S_CBRANCH_VCC[N]Z -> S_CBRANCH_EXEC[N]ZStanislav Mekhanoshin2018-11-121-0/+97
| | | | | | | | | | | | | | | | | | | Sometimes after basic block placement we end up with a code like: sreg = s_mov_b64 -1 vcc = s_and_b64 exec, sreg s_cbranch_vccz This happens as a join of a block assigning -1 to a saved mask and another block which consumes that saved mask with s_and_b64 and a branch. This is essentially a single s_cbranch_execz instruction when moved into a single new basic block. Differential Revision: https://reviews.llvm.org/D54164 llvm-svn: 346690
* [x86] allow vector load narrowing with multi-use valuesSanjay Patel2018-11-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595
* [AMDGPU] Always pass TRI into findRegister[Use/Def]OperandIdxStanislav Mekhanoshin2018-11-094-7/+10
| | | | | | | | This only covers AMDGPU BE, hopefully all occurrences. Differential Revision: https://reviews.llvm.org/D54235 llvm-svn: 346528
* [AMDGPU] Extend promote alloca vectorizationStanislav Mekhanoshin2018-11-081-4/+20
| | | | | | | | | | | Promote alloca can vectorize a small array by bitcasting it to a vector type. Extend vectorization for the case when alloca is already a vector type. We still want to replace GEPs with an insert/extract element instructions in this case. Differential Revision: https://reviews.llvm.org/D54219 llvm-svn: 346376
* Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"Nicolai Haehnle2018-11-076-90/+220
| | | | | | | | This reverts commit r344696 for now (except for some test additions). See https://bugs.freedesktop.org/show_bug.cgi?id=108611. llvm-svn: 346364
* AMDGPU/InsertWaitcnts: Cleanup some old cruft (NFCI)Nicolai Haehnle2018-11-071-91/+71
| | | | | | | | | | | | Summary: Remove redundant logic and simplify control flow. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54086 llvm-svn: 346363
* AMDGPU/InsertWaitcnts: Remove kill-related logicNicolai Haehnle2018-11-071-101/+1
| | | | | | | | | | | | | | | | Summary: This is not needed, because we don't actually insert relevant branches for KILLs that late in the compilation flow. Besides, this was always checking for the wrong kill opcode anyway... Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D54085 llvm-svn: 346362
* AMDGPU/NFC: Split FLAT_Global_Atomic_Pseudo into RTN/NO_RTN multiclassesKonstantin Zhuravlyov2018-11-071-11/+30
| | | | llvm-svn: 346361
* AMDGPU/NFC: Split MUBUF_Pseudo_Atomics into RTN/NO_RTN multiclassesKonstantin Zhuravlyov2018-11-071-5/+16
| | | | llvm-svn: 346357
* Allow subclassing ExternalAAMatt Arsenault2018-11-074-7/+24
| | | | | | | | | | | | | | This allows testing AMDGPU alias analysis like any other alias analysis pass. This fixes the existing test pointlessly running opt -O3 when it really just wants to run the one analysis. Before there was no way to test this using -aa-eval with opt, since the default constructed pass is run. The wrapper subclass allows the default constructor to pass the necessary callback. llvm-svn: 346353
* fix typos aggressively; NFCSanjay Patel2018-11-071-1/+1
| | | | llvm-svn: 346316
* AMDGPU: Add an option -disable-promote-alloca-to-ldsYaxun Liu2018-11-061-0/+8
| | | | | | | | | | Add this option for debugging and providing workaround. By default it is off so no behavior change in backend. Differential Revision: https://reviews.llvm.org/D54158 llvm-svn: 346267
* [TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take ↵Craig Topper2018-11-052-2/+2
| | | | | | | | an MVT instead of an EVT. NFC The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit. llvm-svn: 346180
* AMDGPU: Add sram-ecc featureKonstantin Zhuravlyov2018-11-056-21/+27
| | | | | | Differential Revision: https://reviews.llvm.org/D53222 llvm-svn: 346177
* [AMDGPU] Fix the new atomic optimizer in pixel shaders.Neil Henning2018-11-051-2/+39
| | | | | | | | | | | | | | | | | The new atomic optimizer I previously added in D51969 did not work correctly when a pixel shader was using derivatives, and had helper lanes active. To fix this we add an llvm.amdgcn.ps.live call that guards a branch around the entire atomic operation - ensuring that all helper lanes are inactive within the wavefront when we compute our atomic results. I've added a test case that can cause derivatives, and exposes the problem. Differential Revision: https://reviews.llvm.org/D53930 llvm-svn: 346128
* Fixed inclusion of M_PI fow MinGW-w64Sylvestre Ledru2018-11-021-1/+1
| | | | | | Patch by KOLANICH llvm-svn: 346000
* [AMDGPU] UBSan bug fix for r345710Neil Henning2018-11-021-1/+1
| | | | | | | | UBSan detected an error in our ISelLowering that is exposed only when you have a dmask == 0x1. Fix this by adding in an explicit check to ensure we don't do the UBSan detected shl << 32. llvm-svn: 345962
* AMDGPU: Fix assertion with bitcast from i64 constant to v4i16Matt Arsenault2018-11-021-3/+4
| | | | llvm-svn: 345922
* [AMDGPU] Handle the idot8 pattern generated by FE.Farhana Aleen2018-11-011-0/+9
| | | | | | | | | | | | | | Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge increase in the compile time. Support the pattern that clang FE generates after reordering the additions in integer-dot8 source language pattern. Author: FarhanaAleen Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D53937 llvm-svn: 345902
* Fix clang -Wimplicit-fallthrough warnings across llvm, NFCReid Kleckner2018-11-013-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch should not introduce any behavior changes. It consists of mostly one of two changes: 1. Replacing fall through comments with the LLVM_FALLTHROUGH macro 2. Inserting 'break' before falling through into a case block consisting of only 'break'. We were already using this warning with GCC, but its warning behaves slightly differently. In this patch, the following differences are relevant: 1. GCC recognizes comments that say "fall through" as annotations, clang doesn't 2. GCC doesn't warn on "case N: foo(); default: break;", clang does 3. GCC doesn't warn when the case contains a switch, but falls through the outer case. I will enable the warning separately in a follow-up patch so that it can be cleanly reverted if necessary. Reviewers: alexfh, rsmith, lattner, rtrieu, EricWF, bollu Differential Revision: https://reviews.llvm.org/D53950 llvm-svn: 345882
* Check shouldReduceLoadWidth from SimplifySetCCStanislav Mekhanoshin2018-10-311-0/+12
| | | | | | | | | | | | SimplifySetCC could shrink a load without checking for profitability or legality of such shink with a target. Added checks to prevent shrinking of aligned scalar loads in AMDGPU below dword as scalar engine does not support it. Differential Revision: https://reviews.llvm.org/D53846 llvm-svn: 345778
* [AMDGPU] Remove FeatureVGPRSpillingScott Linder2018-10-316-48/+8
| | | | | | | | | | | This feature is only relevant to shaders, and is no longer used. When disabled, lowering of reserved registers for shaders causes a compiler crash. Remove the feature and add a test for compilation of shaders at OptNone. Differential Revision: https://reviews.llvm.org/D53829 llvm-svn: 345763
* AMDGPU: Rewrite SILowerI1Copies to always stay on SALUNicolai Haehnle2018-10-316-189/+749
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719
* AMDGPU: Remove PHI loop condition optimizationNicolai Haehnle2018-10-315-138/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718
* [AMDGPU] support image load/store a16Neil Henning2018-10-311-2/+4
| | | | | | | | | | | | Our a16 support was only enabled for sample/gather and buffer load/store, but not for image load/store operations (which take an i16 as the pixel index rather than a half). Fix our isel lowering and add test cases to prove it out. Differential Revision: https://reviews.llvm.org/D53750 llvm-svn: 345710
* Revert r345542: AMDGPU: Enable code object v3 by defaultKonstantin Zhuravlyov2018-10-301-30/+15
| | | | | | It breaks mesa. llvm-svn: 345662
* [SelectionDAG] Add FoldBUILD_VECTOR to simplify new BUILD_VECTOR nodesSimon Pilgrim2018-10-301-16/+18
| | | | | | | | | | Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy. This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle. Differential Revision: https://reviews.llvm.org/D53760 llvm-svn: 345578
* AMDGPU: Remove custom BUILD_VECTOR combineMatt Arsenault2018-10-302-46/+0
| | | | | | | This was looping in a testcase and removing it now slightly improves a test. llvm-svn: 345560
* AMDGPU: Use scavengeRegisterBackwardsMatt Arsenault2018-10-301-2/+3
| | | | llvm-svn: 345559
* AMDGPU: Enable code object v3 by defaultKonstantin Zhuravlyov2018-10-291-15/+30
| | | | | | Differential Revision: https://reviews.llvm.org/D53525 llvm-svn: 345542
* [AMDGPU] Fixed return value causing warning and regressionStanislav Mekhanoshin2018-10-291-1/+1
| | | | llvm-svn: 345518
* [AMDGPU] Match v_swap_b32Stanislav Mekhanoshin2018-10-292-0/+175
| | | | | | Differential Revision: https://reviews.llvm.org/D52677 llvm-svn: 345514
* [AMDGPU] Add a pass to promote bitcast callsScott Linder2018-10-264-0/+74
| | | | | | | | | | | | AMDGPU currently only supports direct calls, but at lower optimisation levels it fails to lower statically direct calls which appear indirect due to a bitcast. Add a pass to visit all CallSites and use CallPromotionUtils to "devirtualize" calls. Differential Revision: https://reviews.llvm.org/D52741 llvm-svn: 345382
* [AMDGPU] Defined gfx909 Raven Ridge 2Tim Renouf2018-10-244-0/+15
| | | | | | | Differential Revision: https://reviews.llvm.org/D53418 Change-Id: Ie3d054f2e956c2768988c0f4c0ffd29a47294eef llvm-svn: 345120
* DAG: Change behavior of fminnum/fmaxnum nodesMatt Arsenault2018-10-229-47/+191
| | | | | | | | | | | Introduce new versions that follow the IEEE semantics to help with legalization that may need quieted inputs. There are some regressions from inserting unnecessary canonicalizes when these are matched from fast math fcmp + select which should be fixed in a future commit. llvm-svn: 344914
* AMDGPU: Add support pattern for SUB of one bitChangpeng Fang2018-10-191-0/+10
| | | | | | | | | | | | | Summary: Add selection patterns to support one bit Sub. Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D52946 llvm-svn: 344815
* AMDGPU: Avoid selecting ds_{read,write}2_b32 on SINicolai Haehnle2018-10-173-3/+26
| | | | | | | | | | | | | | | | | | | | | | Summary: To workaround a hardware issue in the (base + offset) calculation when base is negative. The impact on code quality should be limited since SILoadStoreOptimizer still runs afterwards and is able to combine loads/stores based on known sign information. This fixes visible corruption in Hitman on SI (easily reproducible by running benchmark mode). Change-Id: Ia178d207a5e2ac38ae7cd98b532ea2ae74704e5f Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99923 Reviewers: arsenm, mareko Subscribers: jholewinski, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53160 llvm-svn: 344698
* AMDGPU: Divergence-driven selection of scalar buffer load intrinsicsNicolai Haehnle2018-10-176-220/+90
| | | | | | | | | | | | | | | | | | | | | | | | Summary: Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if the load is really uniform. So select the scalar load intrinsics directly to either VMEM or SMRD buffer loads based on divergence analysis. If an offset happens to end up in a VGPR -- either because a floating point calculation was involved, or due to other remaining deficiencies in SIFixSGPRCopies -- we use v_readfirstlane. There is some unrelated churn in tests since we now select MUBUF offsets in a unified way with non-scalar buffer loads. Change-Id: I170e6816323beb1348677b358c9d380865cd1a19 Reviewers: arsenm, alex-t, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53283 llvm-svn: 344696
* AMDGPU: Remove dead TableGen codeNicolai Haehnle2018-10-171-2/+0
| | | | | | | | | | | | | Summary: Change-Id: Ic1f2c1d0cf9e90a0baa9fc6bacd0d3c386069fb0 Reviewers: tpr Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53318 Change-Id: Ib4d143c898801e5cf6cb9999a495d62c91ae77fb llvm-svn: 344691
* AMDGPU: Generate .amdgcn_target for object code v3Konstantin Zhuravlyov2018-10-151-3/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D53221 llvm-svn: 344552
* [TI removal] Make variables declared as `TerminatorInst` and initializedChandler Carruth2018-10-151-1/+1
| | | | | | | | | | | | | by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502
* Revert "AMDGPU/GlobalISel: Implement select for G_INSERT"Tom Stellard2018-10-112-31/+0
| | | | | | | | This reverts commit r344310. The test case was failing on some bots. llvm-svn: 344317
OpenPOWER on IntegriCloud