summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Match load d16 hi instructionsMatt Arsenault2017-09-205-16/+525
| | | | | | | | | | | | Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716
* [AMDGPU] Port of HSAIL inlinerStanislav Mekhanoshin2017-09-202-15/+157
| | | | | | Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714
* AMDGPU: Match store d16_hi instructionsMatt Arsenault2017-09-203-6/+601
| | | | llvm-svn: 313712
* [AMDGPU] Prevent post-RA scheduler from breaking memory clausesStanislav Mekhanoshin2017-09-1923-87/+119
| | | | | | | | | The pre-RA scheduler does load/store clustering, but post-RA scheduler undoes it. Add mutation to prevent it. Differential Revision: https://reviews.llvm.org/D38014 llvm-svn: 313670
* AMDGPU: Run internalize symbols at -O0Matt Arsenault2017-09-191-15/+48
| | | | | | | | The relocations used for externally visible functions aren't supported, so the direct call emitted ends up hitting a linker error. llvm-svn: 313616
* AMDGPU: Start selecting s_xnor_{b32, b64}Konstantin Zhuravlyov2017-09-181-0/+83
| | | | | | Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565
* AMDGPU: Fix violating constant bus restrictionMatt Arsenault2017-09-141-0/+22
| | | | | | You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298
* AMDGPU: Fix assert on alloca of array of structMatt Arsenault2017-09-141-0/+9
| | | | llvm-svn: 313282
* AMDGPU: Stop modifying SP in call sequencesMatt Arsenault2017-09-143-15/+15
| | | | | | | | | | | | | | | Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279
* AMDGPU: Make frame register caller preservedMatt Arsenault2017-09-143-3/+33
| | | | | | | | | | | | | Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274
* [DAGCombine] (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)Simon Pilgrim2017-09-143-14/+60
| | | | | | | | | | We already have a combine for this pattern when the input to shl is add, so we just need to enable the transformation when the input is or. Original patch by @tstellar Differential Revision: https://reviews.llvm.org/D19325 llvm-svn: 313251
* Fix line endings. NFCI.Simon Pilgrim2017-09-141-131/+131
| | | | llvm-svn: 313247
* AMDGPU: Don't spill SP reg like a normal CSRMatt Arsenault2017-09-132-5/+10
| | | | llvm-svn: 313217
* Allow target to decide when to cluster loads/stores in mischedStanislav Mekhanoshin2017-09-136-14/+95
| | | | | | | | | | | | | | | | MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208
* AMDGPU: Allow coldcc callsMatt Arsenault2017-09-111-0/+34
| | | | llvm-svn: 312936
* [AMDGPU] Produce madak and madmk from the two-address passStanislav Mekhanoshin2017-09-113-3/+113
| | | | | | | | | | These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928
* [AMDGPU] exp should not be in WQM modeTim Renouf2017-09-112-14/+58
| | | | | | | | | | | | | | | A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports the exec mask as the valid mask to determine which pixels to render. This commit marks any exp as needing to be in exact mode. Actually, if there are multiple mrt exps, only one needs to have vm=1, and only that one needs to be in exact mode. But that is an optimization for another day. Differential Revision: https://reviews.llvm.org/D36305 llvm-svn: 312915
* AMDGPU: Recompute scc livenessMatt Arsenault2017-09-081-0/+60
| | | | | | | | The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. llvm-svn: 312819
* AMDGPU: Start selecting v_mad_mix_f32Matt Arsenault2017-09-071-0/+409
| | | | llvm-svn: 312732
* AMDGPU: Handle non-temporal loads and storesKonstantin Zhuravlyov2017-09-072-0/+194
| | | | | | Differential Revision: https://reviews.llvm.org/D36862 llvm-svn: 312729
* AMDGPU: Don't legalize i16 extloads to i32 with legal i16Matt Arsenault2017-09-071-2/+2
| | | | | | | Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. llvm-svn: 312699
* [AMDGPU] Use v_pk_max_f16 for fcanonicalizeStanislav Mekhanoshin2017-09-062-5/+45
| | | | | | Differential Revision: https://reviews.llvm.org/D37325 llvm-svn: 312676
* [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalizeStanislav Mekhanoshin2017-09-061-6/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D37522 llvm-svn: 312660
* [AMDGPU] Fix shouldClusterMemOps to process flat loadsStanislav Mekhanoshin2017-09-061-0/+20
| | | | | | | | Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640
* AMDGPU: Make worst-case assumption about the wait states in inline assemblyNicolai Haehnle2017-09-061-0/+29
| | | | | | | | | | | | | | | | Summary: Mesa still uses a hack where empty inline assembly is used as a kind of optimization barrier. This exposed a problem where not enough wait states were inserted, because the hazard recognizer implicitly assumed that each inline assembly "instruction" has at least one wait state. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37205 llvm-svn: 312635
* [AMDGPU] Transform __read_pipe_* and __write_pipe_*Yaxun Liu2017-09-061-18/+111
| | | | | | | | | When packet size equals packet align and is power of 2, transform __read_pipe* and __write_pipe* to specialized library function. Differential Revision: https://reviews.llvm.org/D36831 llvm-svn: 312598
* AMDGPU: Fix not accounting for tail call resource usageMatt Arsenault2017-09-051-0/+31
| | | | | | | | If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. llvm-svn: 312561
* [AMDGPU] Added extra test checks to make D19325 diff clearerSimon Pilgrim2017-09-051-5/+11
| | | | llvm-svn: 312537
* Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source ↵Sam McCall2017-09-042-14/+14
| | | | | | | | | | forwarding"" This crashes on boringSSL on PPC (will send reduced testcase) This reverts commit r312328. llvm-svn: 312490
* [AMDGPU] Testcase for computeKnownBits recursion. NFC.Stanislav Mekhanoshin2017-09-011-0/+69
| | | | | | | Testcase for rL312364: [AMDGPU] Prevent infinite recursion in DAG.computeKnownBits() llvm-svn: 312388
* AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait statesNicolai Haehnle2017-09-011-0/+31
| | | | | | | | | | | | | | | Summary: This fixes a bug that was exposed on gfx9 in various GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests, e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36193 llvm-svn: 312337
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-09-012-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312328
* AMDGPU: Fold clamp modifier for packed instructionsMatt Arsenault2017-08-311-15/+187
| | | | llvm-svn: 312297
* Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY ↵Hans Wennborg2017-08-302-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | source forwarding"" It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!") > Issues identified by buildbots addressed since original review: > - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. > - The pass no longer forwards COPYs to physical register uses, since > doing so can break code that implicitly relies on the physical > register number of the use. > - The pass no longer forwards COPYs to undef uses, since doing so > can break the machine verifier by creating LiveRanges that don't > end on a use (since the undef operand is not considered a use). > > [MachineCopyPropagation] Extend pass to do COPY source forwarding > > This change extends MachineCopyPropagation to do COPY source forwarding. > > This change also extends the MachineCopyPropagation pass to be able to > be run during register allocation, after physical registers have been > assigned, but before the virtual registers have been re-written, which > allows it to remove virtual register COPY LiveIntervals that become dead > through the forwarding of all of their uses. llvm-svn: 312178
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-08-302-14/+14
| | | | | | | | | | | | | | | | | | | | | | | Issues identified by buildbots addressed since original review: - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312154
* Re-land MachineInstr: Reason locally about some memory objects before going ↵Balaram Makam2017-08-304-25/+21
| | | | | | | | | | | | | | | | | | | | to AA. Summary: Reverts r311008 to reinstate r310825 with a fix. Refine alias checking for pseudo vs value to be conservative. This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs. Reviewers: hfinkel, nemanjai, efriedma Reviewed By: efriedma Subscribers: bjope, mcrosier, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36900 llvm-svn: 312126
* [AMDGPU] Use v_max_f* for fcanonicalizeStanislav Mekhanoshin2017-08-303-32/+109
| | | | | | | | | | If denorms are not flushed we can use max instead of multiplication by 1. For double that is simply faster, while for float and half it is shorter, because mul uses constant bus and VOP3. Differential Revision: https://reviews.llvm.org/D36856 llvm-svn: 312095
* AMDGPU: Select clamp pattern with v2f16Matt Arsenault2017-08-301-34/+190
| | | | llvm-svn: 312087
* [AMDGPU] Fix regression in AMDGPULibCalls allowing native for doublesStanislav Mekhanoshin2017-08-281-0/+11
| | | | | | | | | | | | | Under -cl-fast-relaxed-math we could use native_sqrt, but f64 was allowed to produce HSAIL's nsqrt instruction. HSAIL is not here and we stick with non-existing native_sqrt(double) as a result. Add check for f64 to not return native functions and also remove handling of f64 case for fold_sqrt. Differential Revision: https://reviews.llvm.org/D37223 llvm-svn: 311900
* [AMDGPU] computeKnownBitsForTargetNode for 24 bit mulStanislav Mekhanoshin2017-08-281-13/+47
| | | | | | Differential Revision: https://reviews.llvm.org/D37168 llvm-svn: 311896
* IPRA: Don't assume called function is first call operandMatt Arsenault2017-08-241-1/+83
| | | | | | | | Fixes not finding the called global for AMDGPU call pseudoinstructions, which prevented IPRA from doing much. llvm-svn: 311637
* Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" ↵Geoff Berry2017-08-182-14/+14
| | | | | | | | | | | round 2 This reverts commit r311135. sanitizer-x86_64-linux-android buildbot is timing out with just this patch applied. llvm-svn: 311142
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source ↵Geoff Berry2017-08-172-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | forwarding" Two issues identified by buildbots were addressed: - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 llvm-svn: 311135
* Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-08-179-135/+135
| | | | | | | | | | This reverts commit r311038. Several buildbots are breaking, and at least one appears to be due to the forwarding of physical regs enabled by this change. Reverting while I investigate further. llvm-svn: 311062
* [MachineCopyPropagation] Extend pass to do COPY source forwardingGeoff Berry2017-08-169-135/+135
| | | | | | | | | | | | | | | | | | This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 llvm-svn: 311038
* Revert "MachineInstr: Reason locally about some memory objects before going ↵Balaram Makam2017-08-164-21/+25
| | | | | | | | | | | | | to AA." r310825 caused the clang-ppc64le-linux-lnt bot to go red (http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/5712) because of a test-suite failure of SingleSource/UnitTests/2003-07-09-SignedArgs This reverts commit 0028f6a87224fb595a1c19c544cde9b003035996. llvm-svn: 311008
* [AMDGPU] Eliminate no effect instructions before s_endpgmStanislav Mekhanoshin2017-08-1614-29/+343
| | | | | | Differential Revision: https://reviews.llvm.org/D36585 llvm-svn: 310987
* [Dominators] Include infinite loops in PostDominatorTreeJakub Kuderski2017-08-151-14/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change. What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG. This patch makes the following assumptions: - A sequence of updates should produce the same tree as a recalculating it. - Any sequence of the same updates should lead to the same tree. - Siblings and roots are unordered. The last two properties are essential to efficiently perform batch updates in the future. When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently. This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough. That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call: ``` # functions: 52283 # samples: 337609 # reverse unreachable BBs: 216022 # BBs: 247840796 Percent reverse-unreachable: 0.08716159869015269 % Max(PercRevUnreachable) in a function: 87.58620689655172 % # > 25 % samples: 471 ( 0.1395104988314885 % samples ) ... in 145 ( 0.27733680163724345 % functions ) ``` Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway. I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :). Reviewers: dberlin, sanjoy, grosser, brzycki, davide, chandlerc, hfinkel Reviewed By: dberlin Subscribers: nhaehnle, javed.absar, kparzysz, uabelho, jlebar, hiraditya, llvm-commits, dberlin, david2050 Differential Revision: https://reviews.llvm.org/D35851 llvm-svn: 310940
* MachineInstr: Reason locally about some memory objects before going to AA.Balaram Makam2017-08-144-25/+21
| | | | | | This addresses a FIXME in MachineInstr::mayAlias. llvm-svn: 310825
* AMDGPU: Start adding tail call supportMatt Arsenault2017-08-112-0/+268
| | | | | | Handle the sibling call cases. llvm-svn: 310753
OpenPOWER on IntegriCloud