summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombiner] If a TokenFactor would be merged into its user, consider the ↵Nirav Dave2019-03-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | user later. Summary: A number of optimizations are inhibited by single-use TokenFactors not being merged into the TokenFactor using it. This makes we consider if we can do the merge immediately. Most tests changes here are due to the change in visitation causing minor reorderings and associated reassociation of paired memory operations. CodeGen tests with non-reordering changes: X86/aligned-variadic.ll -- memory-based add folded into stored leaq value. X86/constant-combiners.ll -- Optimizes out overlap between stores. X86/pr40631_deadstore_elision -- folds constant byte store into preceding quad word constant store. Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet Reviewed By: courbet Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59260 llvm-svn: 356068
* IR: Add immarg attributeMatt Arsenault2019-03-126-96/+4
| | | | | | | | | | | | | | | | | This indicates an intrinsic parameter is required to be a constant, and should not be replaced with a non-constant value. Add the attribute to all AMDGPU and generic intrinsics that comments indicate it should apply to. I scanned other target intrinsics, but I don't see any obvious comments indicating which arguments are intended to be only immediates. This breaks one questionable testcase for the autoupgrade. I'm unclear on whether the autoupgrade is supposed to really handle declarations which were never valid. The verifier fails because the attributes now refer to a parameter past the end of the argument list. llvm-svn: 355981
* Regenerate sign_extend.ll test.Simon Pilgrim2019-03-121-85/+443
| | | | | | This will change as part of the fix for the regressions in D58017. llvm-svn: 355933
* CodeGenPrep: preserve inbounds attribute when sinking GEPs.Tim Northover2019-03-121-1/+1
| | | | | | | | | Targets can potentially emit more efficient code if they know address computations never overflow. For example ILP32 code on AArch64 (which only has 64-bit address computation) can ignore the possibility of overflow with this extra information. llvm-svn: 355926
* [AMDGPU] Add support for immediate operand for S_ENDPGMDavid Stuttard2019-03-1281-622/+621
| | | | | | | | | | | | | | | | | Summary: Add support for immediate operand in S_ENDPGM Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6 Reviewers: alexshap Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59213 llvm-svn: 355902
* [MIPS GlobalISel] NarrowScalar G_MULPetar Avramovic2019-03-111-8/+7
| | | | | | | | | | | | | Narrow Scalar G_MUL for MIPS32. Revisit NarrowScalar implementation in LegalizerHelper. Introduce new helper function multiplyRegisters. It performs generic multiplication of values held in multiple registers. Generated instructions use only types NarrowTy and i1. Destination can be same or two times size of the source. Differential Revision: https://reviews.llvm.org/D58824 llvm-svn: 355814
* AMDGPU: Move d16 load matching to preprocess stepMatt Arsenault2019-03-083-3/+142
| | | | | | | | | | | | | | | | | When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731
* DAG: Don't try to cluster loads with tied inputsMatt Arsenault2019-03-081-1/+37
| | | | | | | | | | | | | | | | | | | | | This avoids breaking possible value dependencies when sorting loads by offset. AMDGPU has some load instructions that write into the high or low bits of the destination register, and have a tied input for the other input bits. These can easily have the same base pointer, but be a swizzle so the high address load needs to come first. This was inserting glue forcing the opposite ordering, producing a cycle the InstrEmitter would assert on. It may be potentially expensive to look for the dependency between the other loads, so just skip any where this could happen. Fixes bug 40936 by reverting r351379, which added a hacky attempt to fix this by adding chains in this case, which I think was just working around broken glue before the InstrEmitter. The core of the patch is re-implementing the fix for that problem. llvm-svn: 355728
* AMDGPU: Add more tests for d16 loadsMatt Arsenault2019-03-082-210/+868
| | | | | | Also fix a few cases that weren't testing what they were supposed to. llvm-svn: 355724
* AMDGPU: Correct DS implementation of areLoadsFromSameBasePtrMatt Arsenault2019-03-081-2/+2
| | | | | | | | | This was checking the wrong operands for the base register and the offsets. The indexes are shifted by the number of output registers from the machine instruction definition, and the chain is moved to the end. llvm-svn: 355722
* [AMDGPU] V_CVT_F32_UBYTE{0,1,2,3} are full rate instructionsCarl Ritson2019-03-082-6/+6
| | | | | | | | | | | | | | | | Summary: Fix a bug in the scheduling model where V_CVT_F32_UBYTE{0,1,2,3} are incorrectly marked as quarter rate instructions. Reviewers: arsenm, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59091 llvm-svn: 355671
* AMDHSA: Code object v3 updatesKonstantin Zhuravlyov2019-03-071-6/+6
| | | | | | | | | - Copy kernel symbol attributes into kernel descriptor attributes - Make sure kernel symbol's visibility is not "higher" than protected Differential Revision: https://reviews.llvm.org/D59057 llvm-svn: 355630
* AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV)Aakanksha Patil2019-03-077-24/+198
| | | | | | | | | | A previous patch for "uniform-work-group-size" attribute was found to break some RADV and possibly radeon SI tests and had to be retracted. This patch fixes that. Differential Revision: http://reviews.llvm.org/D58993 llvm-svn: 355574
* [AMDGPU] Add support for 64 bit buffer atomic artihmetic instructionsRyan Taylor2019-03-061-31/+110
| | | | | | | | | | | | | | | | Summary: This adds support for 64 bit buffer atomic arithmetic instructions but does not include cmpswap as that depends on a fix to the way the register pairs are handled Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58918 llvm-svn: 355520
* AMDGPU: Preserve undef flag when expanding SI_IFMatt Arsenault2019-03-051-8/+38
| | | | | | Fixes undefined value verifier error. llvm-svn: 355426
* [AMDGPU] Fix DPP operand order in atomic optimizerCarl Ritson2019-03-055-6/+15
| | | | | | | | | | | | | | | | | | | Summary: Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions. Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30 Reviewers: sheredom, tpr Reviewed By: sheredom Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58900 llvm-svn: 355394
* [AMDGPU] Omit KILL instructions from hazard recognizerDavid Stuttard2019-03-051-0/+32
| | | | | | | | | | | | | | | | | | Summary: In some cases the KILL was causing a hazard to be introduced as these were scheduled into hazard slots, but don't result in an instruction. KILL shouldn't be considered for hazard recognition. Change-Id: Ib6d2a2160f8c94cd0ce611ab198c7e4f46aeffcf Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58898 llvm-svn: 355384
* [Codegen] fix typos in test caseXing GUO2019-03-021-1/+1
| | | | llvm-svn: 355264
* [AMDGPU] Mark ds instructions as meybeAtomicStanislav Mekhanoshin2019-03-011-46/+46
| | | | | | | | | | | | These were not recognized as potential atomics by memory legalizer. The test was working not because legalizer did a right thing, but because it has skipped all these instructions. When I have fixed DS desciption test started to fail because region address has changed from 4 to 2 a while ago. Differential Revision: https://reviews.llvm.org/D58802 llvm-svn: 355179
* AMDGPU/GlobalISel: Implement select for G_INSERTTom Stellard2019-03-011-0/+49
| | | | | | | | | | | | Re-commit r344310. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53116 llvm-svn: 355159
* AMDGPU/GlobalISel: Implement select for G_EXTRACTTom Stellard2019-02-281-0/+77
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49714 llvm-svn: 355156
* AMDGPU/GlobalISel: Add regbankselect test for phisMatt Arsenault2019-02-281-0/+1434
| | | | | | | Add baseline for future fixes. These mostly show how this is broken and producing illegal situations. llvm-svn: 355057
* AMDGPU: Enable function calls by defaultMatt Arsenault2019-02-285-11/+39
| | | | | | | Fixes some crashes on illegal call situations which are unfortunately still valid IR. llvm-svn: 355051
* AMDGPU: Fix crashes in invalid call casesMatt Arsenault2019-02-283-20/+39
| | | | | | | We have to at least tolerate calls to kernels, possibly with a mismatched calling convention on the callsite. llvm-svn: 355049
* GlobalISel: Implement fewerElementsVector for phiMatt Arsenault2019-02-281-1/+203
| | | | llvm-svn: 355048
* GlobalISel: Implement moreElementsVector for phiMatt Arsenault2019-02-281-0/+73
| | | | llvm-svn: 355047
* Seperate volatility and atomicity/ordering in SelectionDAGPhilip Reames2019-02-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..). This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success. To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll).. Previously landed D57593 Fix a bug in the definition of isUnordered on MachineMemOperand D57596 [CodeGen] Be conservative about atomic accesses as for volatile D57802 Be conservative about unordered accesses for the moment rL353959: [Tests] First batch of cornercase tests for unordered atomics. rL353966: [Tests] RMW folding tests w/unordered atomic operations. rL353972: [Tests] More unordered atomic lowering tests. rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface rL354740: [Hexagon, SystemZ] Be super conservative about atomics rL354800: [Lanai] Be super conservative about atomics rL354845: [ARM] Be super conservative about atomics Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.) Differential Revision: https://reviews.llvm.org/D57601 llvm-svn: 355025
* [AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of ↵Dmitry Preobrazhensky2019-02-273-25/+25
| | | | | | | | | | | | instructions s_set_gpr_idx_on and s_set_gpr_idx_mode See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D58288 llvm-svn: 354969
* [AMDGPU] Fixed hang during DAG combineStanislav Mekhanoshin2019-02-261-0/+16
| | | | | | | | | | | | | SITargetLowering::reassociateScalarOps() does not touch constants so that DAGCombiner::ReassociateOps() does not revert the combine. However a global address is not a ConstantSDNode. Switched to the method used by DAGCombiner::ReassociateOps() itself to detect constants. Differential Revision: https://reviews.llvm.org/D58695 llvm-svn: 354926
* [AMDGPU] Regenerate bswap/bitreverse tests.Simon Pilgrim2019-02-262-96/+1621
| | | | | | Make codegen changes more obvious in D58017 llvm-svn: 354863
* [AMDGPU] Added target to mir test. NFC.Stanislav Mekhanoshin2019-02-251-1/+1
| | | | | | | Test was used without -mcpu, although tested instructions not available on all ASICs. llvm-svn: 354830
* RegBankSelect: Handle slightly more complex value mappingsMatt Arsenault2019-02-254-48/+1736
| | | | | | | | Try to use concat_vectors. Also remove unnecessary assert on pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers for AMDGPU. llvm-svn: 354828
* AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizesMatt Arsenault2019-02-259-28/+118
| | | | llvm-svn: 354825
* AMDGPU/GlobalISel: Clamp max implicit_def elementsMatt Arsenault2019-02-251-0/+86
| | | | llvm-svn: 354818
* [LowerSwitch][AMDGPU] Do not handle impossible valuesRoman Tereshin2019-02-221-4/+4
| | | | | | | | | | | | This patch adds LazyValueInfo to LowerSwitch to compute the range of the value being switched over and reduce the size of the tree LowerSwitch builds to lower a switch. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D58096 llvm-svn: 354670
* AMDGPU: Remove debugger related subtarget featuresMatt Arsenault2019-02-214-229/+1
| | | | | | As far as I know these aren't needed anymore. llvm-svn: 354634
* [llvm] Fix typo: 's/ ot / to /' [NFC]Mandeep Singh Grang2019-02-211-1/+1
| | | | llvm-svn: 354614
* AMDGPU/GlobalISel: Make phis legalMatt Arsenault2019-02-211-0/+1109
| | | | llvm-svn: 354592
* AMDGPU/GlobalISel: Fix bit count ops for non-power-of-2 typesMatt Arsenault2019-02-215-0/+270
| | | | llvm-svn: 354587
* [AMDGPU] fix commuted case of sub combineStanislav Mekhanoshin2019-02-211-0/+28
| | | | | | Differential Revision: https://reviews.llvm.org/D58481 llvm-svn: 354543
* AMDGPU/GlobalISel: Move SMRD selection logic to TableGenTom Stellard2019-02-201-13/+30
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52922 llvm-svn: 354516
* GlobalISel: Fix fewerElementsVector for ctlz with different result typeMatt Arsenault2019-02-205-0/+312
| | | | | | Also complete the set of related operations. llvm-svn: 354480
* GlobalISel: Implement moreElementsVector for g_insert resultsMatt Arsenault2019-02-2015-181/+736
| | | | llvm-svn: 354477
* GlobalISel: Implement moreElementsVector for selectMatt Arsenault2019-02-191-25/+25
| | | | llvm-svn: 354354
* GlobalISel: Implement moreElementsVector for G_EXTRACT sourceMatt Arsenault2019-02-191-10/+58
| | | | llvm-svn: 354348
* GlobalISel: Implement moreElementsVector for bit opsMatt Arsenault2019-02-194-92/+622
| | | | llvm-svn: 354345
* AMDGPU: Use MachineInstr::mayAlias to replace ↵Changpeng Fang2019-02-181-0/+130
| | | | | | | | | | | | | | | areMemAccessesTriviallyDisjoint in LoadStoreOptimizer pass. Summary: This is to fix a memory dependence bug in LoadStoreOptimizer. Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58295 llvm-svn: 354295
* GlobalISel: Implement widenScalar for g_extract scalar resultsMatt Arsenault2019-02-181-0/+293
| | | | llvm-svn: 354293
* Try to organize MachineVerifier testsMatt Arsenault2019-02-152-44/+0
| | | | | | | | | | The Verifier is separate from the MachineVerifier, so move it to a different directory. Some other verifier tests were scattered in target codegen tests as well (although I'm sure I missed some). Work towards using a more consistent naming scheme to make it clearer where the gaps still are for generic instructions. llvm-svn: 354138
* AMDGPU: Set ABI version to 1 for code object v3Konstantin Zhuravlyov2019-02-141-0/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D57811 llvm-svn: 354085
OpenPOWER on IntegriCloud