summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU: Change boolean content type to 0 or 1Matt Arsenault2019-11-154-8/+15
| | | | | | | | The usage of target boolean checks is overly inflexible, since sext and zext of a compare are equally cheap. The choice is arbitrary, but using 0/1 to some degree is the choice of lower resistance since that's what most targets use. This enables a few combines that don't bother to support ZeroOrNegativeOneBooleanContent.
* AMDGPU: Try to commute sub of boolean extMatt Arsenault2019-11-151-3/+26
| | | | Avoids another regression in a future patch.
* GlobalISel: Lower s1 source G_SITOFP/G_UITOFPMatt Arsenault2019-11-153-48/+2
|
* Sink all InitializePasses.h includesReid Kleckner2019-11-1324-23/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211
* AMDGPU: Extend add x, (ext setcc) combine to subMatt Arsenault2019-11-131-0/+22
| | | | | | This is the same as the add case, but inverts the operation type. This avoids regressions in a future patch.
* AMDGPU: Switch backend default max workgroup size to 1024Matt Arsenault2019-11-131-7/+1
| | | | | | | | | | | | | Previously this would default to 256, not the maximum supported size of 1024. Using a maximum lower than the hardware maximum requires language runtimes to enforce this limit for correctness, which no language has correctly done. Switch the default to the conservatively correct maximum, and force frontends to opt-in to the more optimal 256 default maximum. I don't really understand why the changes in occupancy-levels.ll increased the computed occupancy, which I expected to decrease. I'm not sure if these tests should be forcing the old maximum.
* AMDGPU Reduce reported maximum group size to 1024Matt Arsenault2019-11-131-1/+2
| | | | | While some targets allow encoding 2048, this was never tested or supported.
* AMDGPU/SI: make ~SIScheduleBlockCreator trivialFangrui Song2019-11-112-6/+2
|
* Use MCRegister in copyPhysRegMatt Arsenault2019-11-114-8/+8
|
* Remove duplicate MemVT to fix shadow variable warning. NFCI.Simon Pilgrim2019-11-091-1/+0
|
* Remove superfluous break after return. NFC.Simon Pilgrim2019-11-091-2/+0
|
* Fix shadow variable warning by reducing scope of CC/InverseCC CondCodes. NFCI.Simon Pilgrim2019-11-091-3/+3
|
* [AMDGPU][MC] Corrected src0 for v_movrelsd_b32 and v_movrelsd_2_b32Dmitry Preobrazhensky2019-11-081-6/+8
| | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=40903 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69888
* [AMDGPU] Fix bug introduced in 47a5c36b37f0dfukalov2019-11-071-1/+1
| | | | | | | | | | | | | | Summary: [AMDGPU] Fix bug introduced in 47a5c36b37f0 Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69915
* AMDGPU: Select global atomicrmw faddMatt Arsenault2019-11-065-13/+21
| | | | This only works if there is no use of the return value.
* [AMDGPU] Add handling of 160 bit registers in analyzeResourceUsageStanislav Mekhanoshin2019-11-061-0/+7
| | | | | | This was omitted. Also SReg_96Reg missed IsSGPR assignment. Differential Revision: https://reviews.llvm.org/D69919
* [AMDGPU] Improve code size cost model (part 2)dfukalov2019-11-061-18/+98
| | | | | | | | | | | | | | Summary: Added estimations for ShuffleVector, some cast and arithmetic instructions Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69629
* [AMDGPU] Add missing flags to DS_RealStanislav Mekhanoshin2019-11-051-0/+2
| | | | Differential Revision: https://reviews.llvm.org/D69867
* [AMDGPU] Removed dead code from R600ISelLowering.cppStanislav Mekhanoshin2019-11-051-6/+1
| | | | | | | | | This was added to inhibit a warning from gcc 7.3 according to the comment. However, it triggers warning from PVS. In addition I cannot reproduce it with gcc 7.4 and I also cannot reproduce it with gcc 7.3 using compiler explorer. Differential Revision: https://reviews.llvm.org/D69863
* [AMDGPU] Removed dead code handling M0CopyRegStanislav Mekhanoshin2019-11-051-14/+0
| | | | | | | Static analyzer complains about always false condition. See https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69860
* [globalisel] Rename G_GEP to G_PTR_ADDDaniel Sanders2019-11-055-11/+11
| | | | | | | | | | | | | | | | Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734
* [AMDGPU] return Fail instead of SolfFail from addOperand()Stanislav Mekhanoshin2019-11-051-1/+1
| | | | | | | | | | | | | | | | | | | | addOperand() method of AMDGPU disassembler returns SoftFail on error. All instances which may lead to that place are an impossible encdoing, not something which is possible to encode, but semantically incorrect as described for SoftFail. Then tablegen generates a check of the following form: if (Decode...(..) == MCDisassembler::Fail) { return MCDisassembler::Fail; } Since we can only return Success and SoftFail that is dead code as detected by the static code analyzer. Solution: return Fail as it should be. See https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69819
* [AMDGPU] Added assert in SIFoldOperands before ptr use. NFC.Stanislav Mekhanoshin2019-11-041-0/+1
|
* [AMDGPU] deduplicate tablegen predicatesStanislav Mekhanoshin2019-11-0411-38/+46
| | | | | | | | | | | | | | | We are duplicating predicates if several parts of the combined predicate list contain the same condition. Added code to deduplicate the list. We have AssemblerPredicates and AssemblerPredicate in the PredicateControl, but we never use AssemblerPredicates with an actual list, so this one is dropped. This addresses the first part of the llvm bug 43886: https://bugs.llvm.org/show_bug.cgi?id=43886 Differential Revision: https://reviews.llvm.org/D69815
* [SIMachineScheduler] Fixed ''then' statement is equivalent to the 'else' ↵Dávid Bolvanský2019-11-031-6/+1
| | | | statement.' warning. NFCI.
* [SILoadStoreOptimizer] Fixed typo. NFCI.Dávid Bolvanský2019-11-031-1/+1
|
* [amdgpu] Fix known bits compuation on `MUL_I24`/`MUL_U24`.Michael Liao2019-11-011-0/+3
| | | | | | | | | | Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D69735
* AMDGPU: Add default denormal mode to MachineFunctionInfoMatt Arsenault2019-11-013-6/+33
| | | | | | The default FP mode should really be a property of a specific function, and not a subtarget. Introduce the necessary fields to the SIMachineFunctionInfo to help move towards this goal.
* DAG: Add DAG argument to isFPExtFoldableMatt Arsenault2019-10-312-3/+4
| | | | | For AMDGPU this is dependent on the FP mode, which should eventually not be a property of the subtarget.
* DAG: Add new control for ISD::FMAD formationMatt Arsenault2019-10-312-0/+16
| | | | | | | | | For AMDGPU this depends on whether denormals are enabled in the default FP mode for the function. Currently this is treated as a subtarget feature, so FMAD is selectively legal based on that. I want to move this out of the subtarget features so this can be controlled with a denormal mode attribute. Additionally, this will allow folding based on a future ftz fast math flag.
* AMDGPU: Simplify getAddressSpace callsMatt Arsenault2019-10-314-11/+12
| | | | | These can be directly taken from the GlobalValue instead of going through the type.
* AMDGPU: Disallow spill folding with m0 copiesMatt Arsenault2019-10-302-0/+42
| | | | | | | | | | readlane and writelane instructions are not allowed to use m0 as the data operand, so spilling them is tricky and would require an intermediate SGPR to spill it. Constrain the virtual register class in this caes to disallow the inline spiller from folding the m0 operand directly into the spill instruction. I copied this hack from AArch64 which has the same problem for $sp.
* AMDGPU: Don't fold S_NOPs with implicit operandsMatt Arsenault2019-10-301-1/+3
|
* [AMDGPU] Simplify VCCZ bug handlingJay Foad2019-10-301-5/+1
| | | | | | | | | | | | | | | | | | | | Summary: VCCZBugHandledSet was used to make sure we don't apply the same workaround more than once to a single cbranch instruction, but it's not necessary because the workaround involves inserting an s_waitcnt instruction, which is enough for subsequent iterations to detect that no further workaround is necessary. Also beef up the test case to check that the workaround was only applied once. I have also manually verified that the test still passes even if I hack the big do-while loop in runOnMachineFunction to run a minimum of five iterations. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69621
* [AMDGPU] Consolidate one more getGeneration checkJay Foad2019-10-301-1/+1
| | | | | This one should have been done in r363902 when hasReadVCCZBug was introduced.
* AMDGPU/GlobalISel: Legalize FDIV32Austin Kerbow2019-10-292-0/+101
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69581
* AMDGPU: Make VReg_1 only include 1 artificial registerMatt Arsenault2019-10-281-1/+15
| | | | | | | | | | | | | | | | | | | | When TableGen is inferring register classes from contexts, it uses a sorting function based on the number of registers in the class. Since this was being treated as an alias of VGPR_32, they had exactly the same size. The sort used wasn't a stable sort, and even if it were, I believe the tie breaker would effectively end up being the alphabetical ordering of the class name. There appear to be issues trying to use an empty set of registers, so add only one so this will always sort to the end. Also add a comment explaining how VReg_1 is a dirty hack for SelectionDAG. This does end up changing the behavior of i1 with inline asm and VGPR constraints, but the existing behavior was was already nonsensical and inconsistent. It should probably be disallowed anyway. Fixes bug 43699
* AMDGPU: Avoid overwriting saved PCAustin Kerbow2019-10-281-6/+20
| | | | | | | | | | | | | | | | Summary: An outstanding load with same destination sgpr as call could cause PC to be updated with junk value on return. Reviewers: arsenm, rampitec Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69474
* [AMDGPU][MC][GFX10] Added v_interp_[p1/p2/mov]_f32_e64Dmitry Preobrazhensky2019-10-281-2/+6
| | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=43747 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69348
* [AMDGPU] Fix Vreg_1 PHI lowering in SILowerI1Copies.cdevadas2019-10-261-90/+89
| | | | | | | | | | | | | | | There is a minor flaw in the implementation of function lowerPhis. This function replaces values of regclass Vreg_1 (boolean values) involved in PHIs into an SGPR. Currently it iterates over the MBBs and performs an inplace lowering of PHIs and fails to lower any incoming value that itself is another PHI of Vreg_1 regclass. The failure occurs only when the MBB where the incoming PHI value belongs is not visited/lowered yet. To fix this problem, collect all Vreg_1 PHIs upfront and then perform the lowering. Differential Revision: https://reviews.llvm.org/D69182
* [AMDGPU] Enable SGPR copy foldingStanislav Mekhanoshin2019-10-252-14/+11
| | | | | | | | | | | | | That used to fail in the last testcase function because after %0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it was further folded into S_STORE_DWORD_IMM. Its legal effective subreg class is SReg_32 while instruction expects more restricted SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand() passed the legality check and it was caught in the verifier. Borrowed code from the verifier to check for RC legality. Differential Revision: https://reviews.llvm.org/D69445
* [AMDGPU] Fixed asan failure in SIFoldOperandsStanislav Mekhanoshin2019-10-251-3/+4
| | | | | | | Both tryFoldOMod() and tryFoldClamp() remove original instruction, so the check MI.modifiesRegister() may use a deleted MI. Differential Revision: https://reviews.llvm.org/D69448
* AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHGMatt Arsenault2019-10-259-76/+97
| | | | | | | | Custom lower this to a target instruction with the merge operands. I think it might be better to directly select this and emit a REG_SEQUENCE, but this would be more work since it would require splitting the tablegen patterns for these cases from the other atomics.
* AMDGPU: Fix the broken dominator tree when creating waterfall loop for ↵Changpeng Fang2019-10-251-2/+2
| | | | | | | | | | | | | resource descriptor Summary: In loadSRsrcFromVGPR, if MBB is the same as Succ, Remiander is not the immediate dominator of Succ. Reviewer: arsenm Differential Revision: https://reviews.llvm.org/D69358
* [AMDGPU] Fold AGPR reg_sequence initializersStanislav Mekhanoshin2019-10-251-22/+131
| | | | Differential Revision: https://reviews.llvm.org/D69413
* [AMDGPU] Disallow dpp combining for dpp instructions without Src2 operand ↵vpykhtin2019-10-251-1/+2
| | | | | | (when Src2 is required) Differential revision: https://reviews.llvm.org/D69430
* AMDGPU/GlobalISel: Legalize FDIV16Austin Kerbow2019-10-252-0/+41
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69347
* [AMDGPU] Fix mfma scheduling crashStanislav Mekhanoshin2019-10-241-1/+6
| | | | | | | An SUnit can be neither intruction not SDNode. It is all null if represents a nop. Fixed a crash on using SU->getInstr(). Differential Revision: https://reviews.llvm.org/D69395
* [NFC] Remove redundant linesdfukalov2019-10-241-4/+0
| | | | | | | | | | | | Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69375
* [AMDGPU] Skip additional folding on the same operand.Michael Liao2019-10-241-7/+19
| | | | | | | | | | Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69355
OpenPOWER on IntegriCloud