summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
* Fix unused function warning (PR44808)Hans Wennborg2020-02-191-5/+7
| | | | (cherry picked from commit a19de32095e4cdb18957e66609574ce2021a8d1c)
* AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x)Jan Vesely2020-02-103-4/+10
| | | | | | | | | | | | The old version might be faster on EG (RECIP_IEEE is Trans only), but it'd need extra corner case checks. This gives correct corner case behaviour and saves a register. Fixes OCL CTS sqrt test (1-thread, scalar) on Turks. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D74017 (cherry picked from commit e6686adf8a743564f0c455c34f04752ab08cf642)
* AMDGPU: Fix handling of infinite loops in fragment shadersConnor Abbott2020-02-041-6/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Due to the fact that kill is just a normal intrinsic, even though it's supposed to terminate the thread, we can end up with provably infinite loops that are actually supposed to end successfully. The AMDGPUUnifyDivergentExitNodes pass breaks up these loops, but because there's no obvious place to make the loop branch to, it just makes it return immediately, which skips the exports that are supposed to happen at the end and hangs the GPU if all the threads end up being killed. While it would be nice if the fact that kill terminates the thread were modeled in the IR, I think that the structurizer as-is would make a mess if we did that when the kill is inside control flow. For now, we just add a null export at the end to make sure that it always exports something, which fixes the immediate problem without penalizing the more common case. This means that we sometimes do two "done" exports when only some of the threads enter the discard loop, but from tests the hardware seems ok with that. This fixes dEQP-VK.graphicsfuzz.while-inside-switch with radv. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70781 (cherry picked from commit 87d98c149504f9b0751189744472d7cc94883960)
* AMDGPU/R600: Emit rodata in text segmentJan Vesely2020-02-031-1/+1
| | | | | | | | | | R600 relies on this behaviour. Fixes: 6e18266aa4dd78953557b8614cb9ff260bad7c65 ('Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"') Fixes ~100 piglit regressions since 6e18266 Differential Revision: https://reviews.llvm.org/D72991 (cherry picked from commit 1b8eab179db46f25a267bb73c657009c0bb542cc)
* Revert "[AMDGPU] Invert the handling of skip insertion."Nicolai Hähnle2020-02-036-173/+6
| | | | | | | | | This reverts commit 0dc6c249bffac9f23a605ce4e42a84341da3ddbd. The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for Mesa. (cherry picked from commit a80291ce10ba9667352adcc895f9668144f5f616)
* [AMDGPU] Invert the handling of skip insertion.cdevadas2020-01-156-6/+173
| | | | | | | | | | | | | | | The current implementation of skip insertion (SIInsertSkip) makes it a mandatory pass required for correctness. Initially, the idea was to have an optional pass. This patch inserts the s_cbranch_execz upfront during SILowerControlFlow to skip over the sections of code when no lanes are active. Later, SIRemoveShortExecBranches removes the skips for short branches, unless there is a sideeffect and the skip branch is really necessary. This new pass will replace the handling of skip insertion in the existing SIInsertSkip Pass. Differential revision: https://reviews.llvm.org/D68092
* CMake: Make most target symbols hidden by defaultTom Stellard2020-01-146-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 36221 nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439
* [codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU.Michael Liao2020-01-141-1/+1
| | | | | | | | | | | | | | | | | | | Summary: - `dead-mi-elimination` assumes MIR in the SSA form and cannot be arranged after phi elimination or DeSSA. It's enhanced to handle the dead register definition by skipping use check on it. Once a register def is `dead`, all its uses, if any, should be `undef`. - Re-arrange the DIE in RA phase for AMDGPU by placing it directly after `detect-dead-lanes`. - Many relevant tests are refined due to different register assignment. Reviewers: rampitec, qcolombet, sunfish Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72709
* [AMDGPU] Model distance to instruction in bundleStanislav Mekhanoshin2020-01-141-5/+17
| | | | | | | This change allows to model the height of the instruction within a bundle for latency adjustment purposes. Differential Revision: https://reviews.llvm.org/D72669
* [AMDGPU] Fix getInstrLatency() always returning 1Stanislav Mekhanoshin2020-01-142-3/+7
| | | | | | | | We do not have InstrItinerary so generic getInstLatency() was always defaulting to return 1 cycle. We need to use TargetSchedModel instead to compute an instruction's latency. Differential Revision: https://reviews.llvm.org/D72655
* AMDGPU/GlobalISel: Select llvm.amdgcn.ds.ordered.{add|swap}Matt Arsenault2020-01-132-0/+88
|
* AMDGPU/GlobalISel: Set insert point after waterfall loopMatt Arsenault2020-01-131-2/+3
| | | | | | | | | The current users of the waterfall loop utility functions do not make use of the restored original insert point. The insertion is either done, or they set the insert point somewhere else. A future change will want to insert instructions after the waterfall loop, but figuring out the point after the loop is more difficult than ensuring the insert point is there after the loop.
* AMDGPU/GlobalISel: Fix branch targets when emitting SI_IFMatt Arsenault2020-01-131-7/+30
| | | | | | | | The branch target needs to be changed depending on whether there is an unconditional branch or not. Loops also need to be similarly fixed, but compiling a simple testcase end to end requires another set of patches that aren't upstream yet.
* AMDGPU/GlobalISel: Simplify assertMatt Arsenault2020-01-131-11/+3
|
* AMDGPU/GlobalISel: Don't use XEXEC class for SGPRsMatt Arsenault2020-01-121-1/+1
| | | | | We don't use the xexec register classes for arbitrary values anymore. Avoids a test variance beween GlobalISel and SelectionDAG>
* AMDGPU/GlobalISel: Copy type when inserting readfirstlaneMatt Arsenault2020-01-121-0/+2
| | | | | getDefIgnoringCopies will fail to find any def if no type is set if we try to use it on the use's operand, so propagate the type.
* [Disassembler] Delete the VStream parameter of MCDisassembler::getInstruction()Fangrui Song2020-01-112-2/+1
| | | | | | | | | | The argument is llvm::null() everywhere except llvm::errs() in llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds. If we ever have the needs to add verbose log to disassemblers, we can record log with a member function, instead of passing it around as an argument.
* [AMDGPU] Remove unnecessary v_mov from a register to itself in WQM lowering.Michael Bedy2020-01-101-5/+22
| | | | | | | | | | | | | | | | | | | Summary: - SI Whole Quad Mode phase is replacing WQM pseudo instructions with v_mov instructions. While this is necessary for the special handling of moving results out of WWM live ranges, it is not necessary for WQM live ranges. The result is a v_mov from a register to itself after every WQM operation. This change uses a COPY psuedo in these cases, which allows the register allocator to coalesce the moves away. Reviewers: tpr, dstuttard, foad, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71386
* Let targets adjust operand latency of bundlesStanislav Mekhanoshin2020-01-103-41/+28
| | | | | | | | | | | This reverts the AMDGPU DAG mutation implemented in D72487 and gives a more general way of adjusting BUNDLE operand latency. It also replaces FixBundleLatencyMutation with adjustSchedDependency callback in the AMDGPU, fixing not only successor latencies but predecessors' as well. Differential Revision: https://reviews.llvm.org/D72535
* AMDGPU/GlobalISel: Clamp G_ZEXT source sizesMatt Arsenault2020-01-101-2/+3
| | | | | Also clamps G_SEXT/G_ANYEXT, but the implementation is more limited so fewer cases actually work.
* AMDGPU/GlobalISel: Select G_EXTRACT_VECTOR_ELTMatt Arsenault2020-01-095-10/+89
| | | | | Doesn't try to do the fold into the base register of an add of a constant in the index like the DAG path does.
* AMDGPU/GlobalISel: Fix G_EXTRACT_VECTOR_ELT mapping for s-v caseMatt Arsenault2020-01-092-16/+87
| | | | | | If an SGPR vector is indexed with a VGPR, the actual indexing will be done on the SGPR and produce an SGPR. A copy needs to be inserted inside the waterwall loop to the VGPR result.
* [AMDGPU] Fix bundle schedulingStanislav Mekhanoshin2020-01-093-0/+61
| | | | | | | Bundles coming to scheduler considered free, i.e. zero latency. Fixed. Differential Revision: https://reviews.llvm.org/D72487
* TableGen/GlobalISel: Add way for SDNodeXForm to work on timmMatt Arsenault2020-01-095-16/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current implementation assumes there is an instruction associated with the transform, but this is not the case for timm/TargetConstant/immarg values. These transforms should directly operate on a specific MachineOperand in the source instruction. TableGen would assert if you attempted to define an equivalent GISDNodeXFormEquiv using timm when it failed to find the instruction matcher. Specially recognize SDNodeXForms on timm, and pass the operand index to the render function. Ideally this would be a separate render function type that looks like void renderFoo(MachineInstrBuilder, const MachineOperand&), but this proved to be somewhat mechanically painful. Add an optional operand index which will only be passed if the transform should only look at the one source operand. Theoretically it would also be possible to only ever pass the MachineOperand, and the existing renderers would check the parent. I think that would be somewhat ugly for the standard usage which may want to inspect other operands, and I also think MachineOperand should eventually not carry a pointer to the parent instruction. Use it in one sample pattern. This isn't a great example, since the transform exists to satisfy DAG type constraints. This could also be avoided by just changing the MachineInstr's arbitrary choice of operand type from i16 to i32. Other patterns have nontrivial uses, but this serves as the simplest example. One flaw this still has is if you try to use an SDNodeXForm defined for imm, but the source pattern uses timm, you still see the "Failed to lookup instruction" assert. However, there is now a way to avoid it.
* GlobalISel: Handle llvm.read_registerMatt Arsenault2020-01-091-0/+2
| | | | | Compared to the attempt in bdcc6d3d2638b3a2c99ab3b9bfaa9c02e584993a, this uses intermediate generic instructions.
* CodeGen: Use LLT instead of EVT in getRegisterByNameMatt Arsenault2020-01-092-2/+2
| | | | | | Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.
* AMDGPU/GlobalISel: Fix argument lowering for vectors of pointersMatt Arsenault2020-01-091-2/+18
| | | | | | | When these arguments are broken down by the EVT based callbacks, the pointer information is lost. Hack around this by coercing the register types to be the expected pointer element type when building the remerge operations.
* AMDGPU/GlobalISel: Widen 16-bit shift amount sourcesMatt Arsenault2020-01-091-1/+2
| | | | | | This should be legal, but will require future selection work. 16-bit shift amounts were already removed from being legal, but this didn't adjust the transformation rules.
* AMDGPU/GlobalISel: Fix import of integer med3Matt Arsenault2020-01-092-32/+30
| | | | | This isn't too useful now, since nothing is currently trying to form min/max from cmp+select.
* AMDGPU: Eliminate more legacy codepred address space PatFragsMatt Arsenault2020-01-094-93/+24
| | | | These should now be limited to R600 code.
* AMDGPU: Use new PatFrag system for d16 storesMatt Arsenault2020-01-092-15/+9
|
* AMDGPU: Use new PatFrag system for d16 load nodesMatt Arsenault2020-01-091-32/+23
|
* AMDGPU/GlobalISel: Fix import of zext of s16 op patternsMatt Arsenault2020-01-092-3/+5
|
* AMDGPU/GlobalISel: Add IMMPopCount xformMatt Arsenault2020-01-093-0/+12
| | | | Partially fixes BFE pattern import.
* AMDGPU/GlobalISel: Add selectVOP3Mods_nnanMatt Arsenault2020-01-093-0/+20
| | | | | | This doesn't enable any new imports yet, but moves the fmed patterns from failing on this to hitting the "complex suboperand referenced more than once" limitation in tablegen.
* AMDGPU/GlobalISel: Add equiv xform for bitcast_fpimm_to_i32Matt Arsenault2020-01-093-0/+17
| | | | Only partially fixes one pattern import.
* AMDGPU/GlobalISel: Fix add of neg inline constant patternMatt Arsenault2020-01-094-1/+26
|
* AMDGPU: Add register class to DS_SWIZZLE_B32 patternMatt Arsenault2020-01-091-1/+1
| | | | Reduces diff for a future patch.
* [APFloat] Fix checked error assert failuresEhud Katz2020-01-091-1/+1
| | | | | | | | | | | `APFLoat::convertFromString` returns `Expected` result, which must be "checked" if the LLVM_ENABLE_ABI_BREAKING_CHECKS preprocessor flag is set. To mark an `Expected` result as "checked" we must consume the `Error` within. In many cases, we are only interested in knowing if an error occured, without the need to examine the error info. This is achieved, easily, with the `errorToBool()` API.
* [amdgpu] Remove unused header. NFC.Michael Liao2020-01-081-1/+0
|
* AMDGPU: Annotate EXTRACT_SUBREGs with source register classesMatt Arsenault2020-01-071-24/+24
| | | | | This partially fixes GlobalISel import of the patterns, but removes a lot of entriess from the end of the skipped pattern log.
* AMDGPU/GlobalISel: Fix scalar G_SELECT for arbitrary pointersMatt Arsenault2020-01-071-1/+1
| | | | | 4e85ca9562a588eba491e44bcbf73cb2f419780f missed updating the legal condition type set for pointers with any unrecognized address space.
* AMDGPU: Apply i16 add->sub pattern with zext to i32Matt Arsenault2020-01-071-8/+15
| | | | | This was only applying the deeper nested zext pattern, and missing the special case code size fold.
* AMDGPU: Remove VOP3Mods0Clamp0OModMatt Arsenault2020-01-076-34/+1
| | | | | Now that overridable default operands work, there's no reason to use complex patterns to just produce 0s.
* AMDGPU: Fix misleading, misplaced end block commentsMatt Arsenault2020-01-071-2/+2
|
* AMDGPU: Use ImmLeafMatt Arsenault2020-01-071-2/+2
|
* AMDGPU: Fix not using v_cvt_f16_[iu]16Matt Arsenault2020-01-072-10/+35
| | | | | We weren't treating i16->f16 casts as legal on targets with these instructions, and always using a pair of casts through i32.
* AMDGPU/GlobalISel: Fix readfirstlane pattern importMatt Arsenault2020-01-072-2/+2
| | | | | The imm folding optimization pattern failed to import. The instruction pattern was already working, but failing to fail on SGPR inputs.
* AMDGPU/GlobalISel: Fix import of s_abs_i32 patternMatt Arsenault2020-01-071-1/+1
|
* AMDGPU/GlobalISel: Select llvm.amdgcn.wqm.voteMatt Arsenault2020-01-071-2/+2
|
OpenPOWER on IntegriCloud