summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x)Jan Vesely2020-02-101-14/+24
| | | | | | | | | | | | The old version might be faster on EG (RECIP_IEEE is Trans only), but it'd need extra corner case checks. This gives correct corner case behaviour and saves a register. Fixes OCL CTS sqrt test (1-thread, scalar) on Turks. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D74017 (cherry picked from commit e6686adf8a743564f0c455c34f04752ab08cf642)
* AMDGPU: Fix handling of infinite loops in fragment shadersConnor Abbott2020-02-041-0/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781 (cherry picked from commit 87d98c149504f9b0751189744472d7cc94883960)
* R600: Fix failing testcaseMatt Arsenault2020-02-031-3/+3
| | | | (cherry picked from commit 7dc49f77ee508b4152f9291c8e804e4eda3653d3)
* AMDGPU/R600: Emit rodata in text segmentJan Vesely2020-02-031-0/+6
| | | | | | | | | | R600 relies on this behaviour. Fixes: 6e18266aa4dd78953557b8614cb9ff260bad7c65 ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"') Fixes ~100 piglit regressions since 6e18266 Differential Revision: https://reviews.llvm.org/D72991 (cherry picked from commit 1b8eab179db46f25a267bb73c657009c0bb542cc)
* Revert "[AMDGPU] Invert the handling of skip insertion."Nicolai Hähnle2020-02-0334-217/+374
| | | | | | | | | This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa. (cherry picked from commit a80291ce10ba9667352adcc895f9668144f5f616)
* [AMDGPU] Invert the handling of skip insertion.cdevadas2020-01-1534-374/+217
| | | | | | | | | | | | | | | The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092
* [amdgpu] Fix typos in a test case.Michael Liao2020-01-141-1/+0
| | | | - There are typos introduced due to merge.
* [codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU.Michael Liao2020-01-1412-355/+329
| | | | | | | | | | | | | | | | | | | Summary: - `dead-mi-elimination` assumes MIR in the SSA form and cannot be arranged after phi elimination or DeSSA. It's enhanced to handle the dead register definition by skipping use check on it. Once a register def is `dead`, all its uses, if any, should be `undef`. - Re-arrange the DIE in RA phase for AMDGPU by placing it directly after `detect-dead-lanes`. - Many relevant tests are refined due to different register assignment. Reviewers: rampitec, qcolombet, sunfish Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72709
* [DAGCombine] Replace `getIntPtrConstant()` with `getVectorIdxTy()`.Michael Liao2020-01-141-0/+40
| | | | | | - Prefer `getVectorIdxTy()` as the index operand type for `EXTRACT_SUBVECTOR` as targets expect different types by overloading `getVectorIdxTy()`.
* [MachineScheduler] Reduce reordering due to mem op clusteringJay Foad2020-01-1415-94/+95
| | | | | | | | | | | | | | | | | | | | | | Summary: Mem op clustering adds a weak edge in the DAG between two loads or stores that should be clustered, but the direction of this edge is pretty arbitrary (it depends on the sort order of MemOpInfo, which represents the operands of a load or store). This often means that two loads or stores will get reordered even if they would naturally have been scheduled together anyway, which leads to test case churn and goes against the scheduler's "do no harm" philosophy. The fix makes sure that the direction of the edge always matches the original code order of the instructions. Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72706
* [AMDGPU] Model distance to instruction in bundleStanislav Mekhanoshin2020-01-141-0/+44
| | | | | | | This change allows to model the height of the instruction within a bundle for latency adjustment purposes. Differential Revision: https://reviews.llvm.org/D72669
* [AMDGPU] Fix getInstrLatency() always returning 1Stanislav Mekhanoshin2020-01-142-4/+5
| | | | | | | | We do not have InstrItinerary so generic getInstLatency() was always defaulting to return 1 cycle. We need to use TargetSchedModel instead to compute an instruction's latency. Differential Revision: https://reviews.llvm.org/D72655
* AMDGPU/GlobalISel: Select llvm.amdgcn.ds.ordered.{add|swap}Matt Arsenault2020-01-133-0/+11
|
* AMDGPU/GlobalISel: Add some baseline tests for vector extractMatt Arsenault2020-01-131-0/+468
| | | | | A future change will try to fold constant offsets into the loop which these will stress.
* AMDGPU/GlobalISel: Fix branch targets when emitting SI_IFMatt Arsenault2020-01-131-0/+61
| | | | | | | | The branch target needs to be changed depending on whether there is an unconditional branch or not. Loops also need to be similarly fixed, but compiling a simple testcase end to end requires another set of patches that aren't upstream yet.
* GlobalISel: Fix assertion on wide G_ZEXT sourcesMatt Arsenault2020-01-131-0/+25
| | | | It's possible to have a type that needs a mask greater than 64-bits.
* [SelectionDAG] ComputeKnownBits - minimum leading/trailing zero bits in ↵Simon Pilgrim2020-01-133-40/+34
| | | | | | | | | | LSHR/SHL (PR44526) As detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value. This patch adds support to SelectionDAG::ComputeKnownBits to use KnownBits::countMinTrailingZeros and countMinLeadingZeros to set the minimum guaranteed leading/trailing known zero bits. Differential Revision: https://reviews.llvm.org/D72573
* AMDGPU: Split test functionMatt Arsenault2020-01-121-4/+16
| | | | | This avoids slightly different scheduling/regalloc behavior, and avoids a test diff between GlobalISel and SelectionDAG.
* AMDGPU/GlobalISel: Don't use XEXEC class for SGPRsMatt Arsenault2020-01-1238-196/+193
| | | | | We don't use the xexec register classes for arbitrary values anymore. Avoids a test variance beween GlobalISel and SelectionDAG>
* AMDGPU/GlobalISel: Copy type when inserting readfirstlaneMatt Arsenault2020-01-128-29/+29
| | | | | getDefIgnoringCopies will fail to find any def if no type is set if we try to use it on the use's operand, so propagate the type.
* [AMDGPU] Regenerate shl shift testsSimon Pilgrim2020-01-121-244/+1473
|
* [AMDGPU] Remove unnecessary v_mov from a register to itself in WQM lowering.Michael Bedy2020-01-102-0/+62
| | | | | | | | | | | | | | | | | | | Summary: - SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions. While this is necessary for the special handling of moving results out of WWM live ranges, it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every WQM operation. This change uses a COPY psuedo in these cases, which allows the register allocator to coalesce the moves away. Reviewers: tpr, dstuttard, foad, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71386
* AMDGPU/GlobalISel: Clamp G_ZEXT source sizesMatt Arsenault2020-01-107-238/+967
| | | | | Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so fewer cases actually work.
* AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELTMatt Arsenault2020-01-092-0/+2099
| | | | | Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.
* AMDGPU/GlobalISel: Fix G_EXTRACT_VECTOR_ELT mapping for s-v caseMatt Arsenault2020-01-091-10/+16
| | | | | | If an SGPR vector is indexed with a VGPR, the actual indexing will be done on the SGPR and produce an SGPR. A copy needs to be inserted inside the waterwall loop to the VGPR result.
* [AMDGPU] Fix bundle schedulingStanislav Mekhanoshin2020-01-0916-31/+31
| | | | | | | Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487
* TableGen/GlobalISel: Add way for SDNodeXForm to work on timmMatt Arsenault2020-01-091-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.
* GlobalISel: Handle llvm.read_registerMatt Arsenault2020-01-091-0/+2
| | | | | Compared to the attempt in bdcc6d3d2638b3a2c99ab3b9bfaa9c02e584993a, this uses intermediate generic instructions.
* AMDGPU/GlobalISel: Fix argument lowering for vectors of pointersMatt Arsenault2020-01-091-14/+116
| | | | | | | When these arguments are broken down by the EVT based callbacks, the pointer information is lost. Hack around this by coercing the register types to be the expected pointer element type when building the remerge operations.
* AMDGPU/GlobalISel: Widen 16-bit shift amount sourcesMatt Arsenault2020-01-093-80/+94
| | | | | | This should be legal, but will require future selection work. 16-bit shift amounts were already removed from being legal, but this didn't adjust the transformation rules.
* AMDGPU/GlobalISel: Fix import of integer med3Matt Arsenault2020-01-094-0/+616
| | | | | This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.
* AMDGPU/GlobalISel: Fix import of zext of s16 op patternsMatt Arsenault2020-01-094-12/+138
|
* AMDGPU/GlobalISel: Fix add of neg inline constant patternMatt Arsenault2020-01-091-0/+113
|
* Revert "Revert "[MIR] Target specific MIR formating and parsing""Daniel Sanders2020-01-0811-194/+194
| | | | | | | There was an unguarded dereference of MF in a function that permitted nullptr. Fixed This reverts commit 71d64f72f934631aa2f12b9542c23f74f256f494.
* Revert "[MIR] Target specific MIR formating and parsing"Nico Weber2020-01-0811-194/+194
| | | | | This reverts commit 3ef05d85be8c3666ebfa3ad986eb334da5195a47. It broke check-llvm on many bots, see comments on D69836.
* [MIR] Target specific MIR formating and parsingPeng Guo2020-01-0811-194/+194
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Added MIRFormatter for target specific MIR formating and parsing with immediate and custom pseudo source values. Target machine can subclass MIRFormatter and implement custom logic for printing and parsing immediate and custom pseudo source values for better readability. * Target specific immediate mnemonic need to start with "." follows by identifier string. When MIR parser sees immediate it will call target specific parsing function. * Custom pseudo source value need to start with custom follows by double-quoted string. MIR parser will pass the quoted string to target specific PSV parsing function. * MIRFormatter have 2 helper functions to facilitate LLVM value printing and parsing for custom PSV if they refers LLVM values. Patch by Peng Guo Reviewers: dsanders, arsenm Reviewed By: dsanders Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69836
* Revert "[MIR] Target specific MIR formating and parsing"Daniel Sanders2020-01-0811-194/+194
| | | | | | Forgot to credit Peng in the commit message. This reverts commit be841f89d0014b1e0246a4feae941b2f74abd908.
* [MIR] Target specific MIR formating and parsingPeng Guo2020-01-0811-194/+194
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Added MIRFormatter for target specific MIR formating and parsing with immediate and custom pseudo source values. Target machine can subclass MIRFormatter and implement custom logic for printing and parsing immediate and custom pseudo source values for better readability. * Target specific immediate mnemonic need to start with "." follows by identifier string. When MIR parser sees immediate it will call target specific parsing function. * Custom pseudo source value need to start with custom follows by double-quoted string. MIR parser will pass the quoted string to target specific PSV parsing function. * MIRFormatter have 2 helper functions to facilitate LLVM value printing and parsing for custom PSV if they refers LLVM values. Reviewers: dsanders, arsenm Reviewed By: dsanders Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69836
* AMDGPU/GlobalISel: Fix scalar G_SELECT for arbitrary pointersMatt Arsenault2020-01-072-1/+97
| | | | | 4e85ca9562a588eba491e44bcbf73cb2f419780f missed updating the legal condition type set for pointers with any unrecognized address space.
* AMDGPU/GlobalISel: Add some missing G_SELECT testcasesMatt Arsenault2020-01-071-0/+142
|
* AMDGPU/GlobalISel: Fix missing test for s16 icmpMatt Arsenault2020-01-071-0/+236
|
* AMDGPU: Apply i16 add->sub pattern with zext to i32Matt Arsenault2020-01-072-13/+13
| | | | | This was only applying the deeper nested zext pattern, and missing the special case code size fold.
* AMDGPU: Add baseline test for missing patternMatt Arsenault2020-01-071-0/+583
| | | | | The optimization to turn an add into a sub isn't triggering when the pattern to use the zeroed high bits is used.
* AMDGPU: Fix not using v_cvt_f16_[iu]16Matt Arsenault2020-01-074-21/+24
| | | | | We weren't treating i16->f16 casts as legal on targets with these instructions, and always using a pair of casts through i32.
* AMDGPU/GlobalISel: Fix readfirstlane pattern importMatt Arsenault2020-01-071-0/+63
| | | | | The imm folding optimization pattern failed to import. The instruction pattern was already working, but failing to fail on SGPR inputs.
* AMDGPU/GlobalISel: Fix import of s_abs_i32 patternMatt Arsenault2020-01-071-0/+105
|
* AMDGPU/GlobalISel: Select llvm.amdgcn.wqm.voteMatt Arsenault2020-01-072-6/+18
|
* llc: Change behavior of -mcpu with existing attributeMatt Arsenault2020-01-071-8/+3
| | | | | | | | | | | Don't overwrite existing target-cpu attributes. I've often found the replacement behavior annoying, and this is inconsistent with how the fast math command line flags interact with the function attributes. Does not yet change target-features, since I think that should behave as a concatenation.
* AMDGPU: Add run line to int_to_fp testsMatt Arsenault2020-01-062-57/+109
| | | | | This wasn't catching a regression on targets with legal i16 triggered in a future commit.
* AMDGPU/GlobalISel: Legalize G_READCYCLECOUNTERMatt Arsenault2020-01-061-0/+3
|
OpenPOWER on IntegriCloud