summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-10-023-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issues addressed since original review: - Avoid bug in regalloc greedy/machine verifier when forwarding to use in an instruction that re-defines the same virtual register. - Fixed bug when forwarding to use in EarlyClobber instruction slot. - Fixed incorrect forwarding to register definitions that showed up in explicit_uses() iterator (e.g. in INLINEASM). - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 314729
* AMDGPU: Fix potentially incorrectly matching check linesMatt Arsenault2017-10-021-8/+8
| | | | | | | | These check lines are supposed to make sure the new d16 load instructions aren't used, but the expected instruction name is a prefix of the incorrect instruction name. llvm-svn: 314714
* Eliminate ftrunc if source is know to be roundedStanislav Mekhanoshin2017-10-021-0/+92
| | | | | | Differential Revision: https://reviews.llvm.org/D38421 llvm-svn: 314688
* [AMDGPU] Set fast-math flags on functions given the optionsStanislav Mekhanoshin2017-09-291-0/+33
| | | | | | | | | | | | | | | | We have a single library build without relaxation options. When inlined library functions remove fast math attributes from the functions they are integrated into. This patch sets relaxation attributes on the functions after linking provided corresponding relaxation options are given. Math instructions inside the inlined functions remain to have no fast flags, but inlining does not prevent fast math transformations of a surrounding caller code anymore. Differential Revision: https://reviews.llvm.org/D38325 llvm-svn: 314568
* CodeGen: Fix pointer info in expandUnalignedLoad/StoreYaxun Liu2017-09-291-0/+24
| | | | | | | | | | | | Currently expandUnalignedLoad/Store uses place holder pointer info for temporary memory operand in stack, which does not have correct address space. This causes unaligned private double16 load/store to be lowered to flat_load instead of buffer_load for amdgcn target. This fixes failures of OpenCL conformance test basic/vload_private/vstore_private on target amdgcn---amdgizcl. Differential Revision: https://reviews.llvm.org/D35361 llvm-svn: 314566
* AMDGPU: VALU carry-in and v_cndmask condition cannot be EXECNicolai Haehnle2017-09-294-30/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* llvm-svn: 314522
* [AMDGPU] calling conventions for AMDPAL OS typeTim Renouf2017-09-298-1/+141
| | | | | | | | | | | | | | | Summary: This commit adds comments on how the AMDPAL OS type overloads the existing AMDGPU_ calling conventions used by Mesa, and adds a couple of new ones. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37752 llvm-svn: 314502
* [AMDGPU] AMDPAL scratch buffer supportTim Renouf2017-09-291-1/+46
| | | | | | | | | | | | | | | | | | | | | | | Summary: Added support for scratch (including spilling) for OS type amdpal: generates code to set up the scratch descriptor if it is needed. With amdpal, the scratch resource descriptor is loaded from offset 0 of the global information table. The low 32 bits of the address of the global information table is passed in s0. Added amdgpu-git-ptr-high function attribute to hard-wire the high 32 bits of the address of the global information table. If the function attribute is not specified, or is 0xffffffff, then the backend generates code to use the high 32 bits of pc. The documentation for the AMDPAL ABI will be added in a later commit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye Differential Revision: https://reviews.llvm.org/D37483 llvm-svn: 314501
* [Triple] Add AMDPAL operating system typeTim Renouf2017-09-291-0/+10
| | | | | | | | | | | | | | | | | | Summary: This operating system type represents the AMDGPU PAL runtime, and will be required by the AMDGPU backend in order to generate correct code for this runtime. Currently it generates the same code as not specifying an OS at all. That will change in future commits. Patch from Tim Corringham. Subscribers: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D37380 llvm-svn: 314500
* AMDGPU: Add option to stress callsMatt Arsenault2017-09-211-0/+36
| | | | | | | This inverts the behavior of the AlwaysInline pass to mark every function not already marked alwaysinline as noinline. llvm-svn: 313865
* AMDGPU: Fix crash on immediate operandMatt Arsenault2017-09-211-0/+58
| | | | | | | | We can have a v_mac with an immediate src0. We can still fold if it's an inline immediate, otherwise it already uses the constant bus. llvm-svn: 313852
* AMDGPU: Start selecting v_mad_mixhi_f16Matt Arsenault2017-09-202-41/+212
| | | | llvm-svn: 313814
* AMDGPU: Start selecting v_mad_mixlo_f16Matt Arsenault2017-09-202-0/+281
| | | | | | | | Also add some tests that should be able to use v_mad_mixhi_f16, but do not yet. This is trickier because we don't really model the partial update of the register done by 16-bit instructions. llvm-svn: 313806
* AMDGPU: Fix encoding of op_sel for mad_mix* opcodesMatt Arsenault2017-09-201-24/+24
| | | | llvm-svn: 313797
* AMDGPU: Match load d16 hi instructionsMatt Arsenault2017-09-205-16/+525
| | | | | | | | | | | | Also starts selecting global loads for constant address in some cases. Some end up selecting to mubuf still, which requires investigation. We still get sub-optimal regalloc and extra waitcnts inserted due to not really tracking the liveness of the separate register halves. llvm-svn: 313716
* [AMDGPU] Port of HSAIL inlinerStanislav Mekhanoshin2017-09-202-15/+157
| | | | | | Differential Revision: https://reviews.llvm.org/D36849 llvm-svn: 313714
* AMDGPU: Match store d16_hi instructionsMatt Arsenault2017-09-203-6/+601
| | | | llvm-svn: 313712
* [AMDGPU] Prevent post-RA scheduler from breaking memory clausesStanislav Mekhanoshin2017-09-1923-87/+119
| | | | | | | | | The pre-RA scheduler does load/store clustering, but post-RA scheduler undoes it. Add mutation to prevent it. Differential Revision: https://reviews.llvm.org/D38014 llvm-svn: 313670
* AMDGPU: Run internalize symbols at -O0Matt Arsenault2017-09-191-15/+48
| | | | | | | | The relocations used for externally visible functions aren't supported, so the direct call emitted ends up hitting a linker error. llvm-svn: 313616
* AMDGPU: Start selecting s_xnor_{b32, b64}Konstantin Zhuravlyov2017-09-181-0/+83
| | | | | | Differential Revision: https://reviews.llvm.org/D37981 llvm-svn: 313565
* AMDGPU: Fix violating constant bus restrictionMatt Arsenault2017-09-141-0/+22
| | | | | | You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298
* AMDGPU: Fix assert on alloca of array of structMatt Arsenault2017-09-141-0/+9
| | | | llvm-svn: 313282
* AMDGPU: Stop modifying SP in call sequencesMatt Arsenault2017-09-143-15/+15
| | | | | | | | | | | | | | | Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279
* AMDGPU: Make frame register caller preservedMatt Arsenault2017-09-143-3/+33
| | | | | | | | | | | | | Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274
* [DAGCombine] (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2)Simon Pilgrim2017-09-143-14/+60
| | | | | | | | | | We already have a combine for this pattern when the input to shl is add, so we just need to enable the transformation when the input is or. Original patch by @tstellar Differential Revision: https://reviews.llvm.org/D19325 llvm-svn: 313251
* Fix line endings. NFCI.Simon Pilgrim2017-09-141-131/+131
| | | | llvm-svn: 313247
* AMDGPU: Don't spill SP reg like a normal CSRMatt Arsenault2017-09-132-5/+10
| | | | llvm-svn: 313217
* Allow target to decide when to cluster loads/stores in mischedStanislav Mekhanoshin2017-09-136-14/+95
| | | | | | | | | | | | | | | | MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208
* AMDGPU: Allow coldcc callsMatt Arsenault2017-09-111-0/+34
| | | | llvm-svn: 312936
* [AMDGPU] Produce madak and madmk from the two-address passStanislav Mekhanoshin2017-09-113-3/+113
| | | | | | | | | | These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928
* [AMDGPU] exp should not be in WQM modeTim Renouf2017-09-112-14/+58
| | | | | | | | | | | | | | | A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports the exec mask as the valid mask to determine which pixels to render. This commit marks any exp as needing to be in exact mode. Actually, if there are multiple mrt exps, only one needs to have vm=1, and only that one needs to be in exact mode. But that is an optimization for another day. Differential Revision: https://reviews.llvm.org/D36305 llvm-svn: 312915
* AMDGPU: Recompute scc livenessMatt Arsenault2017-09-081-0/+60
| | | | | | | | The various scalar bit operations set SCC, so one is erased or moved it needs to be recomputed. Not sure why the existing tests don't fail on this. llvm-svn: 312819
* AMDGPU: Start selecting v_mad_mix_f32Matt Arsenault2017-09-071-0/+409
| | | | llvm-svn: 312732
* AMDGPU: Handle non-temporal loads and storesKonstantin Zhuravlyov2017-09-072-0/+194
| | | | | | Differential Revision: https://reviews.llvm.org/D36862 llvm-svn: 312729
* AMDGPU: Don't legalize i16 extloads to i32 with legal i16Matt Arsenault2017-09-071-2/+2
| | | | | | | Keeping non-i16 extloads makes it easier to match some new gfx9 load instructions. llvm-svn: 312699
* [AMDGPU] Use v_pk_max_f16 for fcanonicalizeStanislav Mekhanoshin2017-09-062-5/+45
| | | | | | Differential Revision: https://reviews.llvm.org/D37325 llvm-svn: 312676
* [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalizeStanislav Mekhanoshin2017-09-061-6/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D37522 llvm-svn: 312660
* [AMDGPU] Fix shouldClusterMemOps to process flat loadsStanislav Mekhanoshin2017-09-061-0/+20
| | | | | | | | Flat loads do not have vdata operand but have vdst instead. Differential Revision: https://reviews.llvm.org/D37502 llvm-svn: 312640
* AMDGPU: Make worst-case assumption about the wait states in inline assemblyNicolai Haehnle2017-09-061-0/+29
| | | | | | | | | | | | | | | | Summary: Mesa still uses a hack where empty inline assembly is used as a kind of optimization barrier. This exposed a problem where not enough wait states were inserted, because the hazard recognizer implicitly assumed that each inline assembly "instruction" has at least one wait state. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37205 llvm-svn: 312635
* [AMDGPU] Transform __read_pipe_* and __write_pipe_*Yaxun Liu2017-09-061-18/+111
| | | | | | | | | When packet size equals packet align and is power of 2, transform __read_pipe* and __write_pipe* to specialized library function. Differential Revision: https://reviews.llvm.org/D36831 llvm-svn: 312598
* AMDGPU: Fix not accounting for tail call resource usageMatt Arsenault2017-09-051-0/+31
| | | | | | | | If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. llvm-svn: 312561
* [AMDGPU] Added extra test checks to make D19325 diff clearerSimon Pilgrim2017-09-051-5/+11
| | | | llvm-svn: 312537
* Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source ↵Sam McCall2017-09-042-14/+14
| | | | | | | | | | forwarding"" This crashes on boringSSL on PPC (will send reduced testcase) This reverts commit r312328. llvm-svn: 312490
* [AMDGPU] Testcase for computeKnownBits recursion. NFC.Stanislav Mekhanoshin2017-09-011-0/+69
| | | | | | | Testcase for rL312364: [AMDGPU] Prevent infinite recursion in DAG.computeKnownBits() llvm-svn: 312388
* AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait statesNicolai Haehnle2017-09-011-0/+31
| | | | | | | | | | | | | | | Summary: This fixes a bug that was exposed on gfx9 in various GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests, e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36193 llvm-svn: 312337
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-09-012-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312328
* AMDGPU: Fold clamp modifier for packed instructionsMatt Arsenault2017-08-311-15/+187
| | | | llvm-svn: 312297
* Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY ↵Hans Wennborg2017-08-302-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | source forwarding"" It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!") > Issues identified by buildbots addressed since original review: > - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. > - The pass no longer forwards COPYs to physical register uses, since > doing so can break code that implicitly relies on the physical > register number of the use. > - The pass no longer forwards COPYs to undef uses, since doing so > can break the machine verifier by creating LiveRanges that don't > end on a use (since the undef operand is not considered a use). > > [MachineCopyPropagation] Extend pass to do COPY source forwarding > > This change extends MachineCopyPropagation to do COPY source forwarding. > > This change also extends the MachineCopyPropagation pass to be able to > be run during register allocation, after physical registers have been > assigned, but before the virtual registers have been re-written, which > allows it to remove virtual register COPY LiveIntervals that become dead > through the forwarding of all of their uses. llvm-svn: 312178
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-08-302-14/+14
| | | | | | | | | | | | | | | | | | | | | | | Issues identified by buildbots addressed since original review: - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312154
* Re-land MachineInstr: Reason locally about some memory objects before going ↵Balaram Makam2017-08-304-25/+21
| | | | | | | | | | | | | | | | | | | | to AA. Summary: Reverts r311008 to reinstate r310825 with a fix. Refine alias checking for pseudo vs value to be conservative. This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs. Reviewers: hfinkel, nemanjai, efriedma Reviewed By: efriedma Subscribers: bjope, mcrosier, nhaehnle, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D36900 llvm-svn: 312126
OpenPOWER on IntegriCloud