summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU] Divergence driven ISel. Assign register class for cross block ↵Alexander Timofeev2019-05-2432-163/+177
| | | | | | | | | | | | | | values according to the divergence. Details: To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand. For the divergent targets same value type requires different register classes dependent on the value divergence. Reviewers: rampitec, nhaehnle Differential Revision: https://reviews.llvm.org/D59990 llvm-svn: 361644
* AMDGPU: Correct maximum possible private allocation sizeMatt Arsenault2019-05-233-13/+44
| | | | | | | | | | | | | | | | We were assuming a much larger possible per-wave visible stack allocation than is possible: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/faa3ae51388517353afcdaf9c16621f879ef0a59/src/core/runtime/amd_gpu_agent.cpp#L70 Based on this, we can assume the high 15 bits of a frame index or sret are 0. The frame index value is the per-lane offset, so the maximum frame index value is MAX_WAVE_SCRATCH / wavesize. Remove the corresponding subtarget feature and option that made this configurable. llvm-svn: 361541
* AMDGPU/GlobalISel: Legality for integer min/maxMatt Arsenault2019-05-238-0/+1964
| | | | llvm-svn: 361519
* [AMDGPU] Regenerate vector sub testsSimon Pilgrim2019-05-231-131/+525
| | | | llvm-svn: 361485
* MC: Allow getMaxInstLength to depend on the subtargetMatt Arsenault2019-05-221-0/+33
| | | | | | | | | | | | Keep it optional in cases this is ever needed in some global context. Currently it's only used for getting an upper bound inline asm code size. For AMDGPU, gfx10 increases the maximum instruction size to 20-bytes. This avoids penalizing older subtargets when estimating code size, and making some annoying branch relaxation test adjustments. llvm-svn: 361405
* AMDGPU: Assume calls read execMatt Arsenault2019-05-211-0/+121
| | | | llvm-svn: 361333
* AMDGPU: Add some tests for inlineasm behaviorMatt Arsenault2019-05-211-0/+43
| | | | llvm-svn: 361332
* AMDGPU: Assume call pseudos are convergentMatt Arsenault2019-05-211-0/+105
| | | | | | | There should probably be nonconvergent versions, but my guess is it doesn't matter in practice. llvm-svn: 361331
* AMDGPU: Fix not marking new gfx10 SGPRs as CSRsMatt Arsenault2019-05-211-0/+15
| | | | llvm-svn: 361330
* [NFC][AMDGPU] Autogenerate llvm.amdgcn.s.barrier.ll testRoman Lebedev2019-05-211-10/+93
| | | | llvm-svn: 361320
* Fix register coalescer failure to prune valueStanislav Mekhanoshin2019-05-211-0/+39
| | | | | | | | | | | | | | | | | | Register coalescer fails for the test in the patch with the assertion in JoinVals::ConflictResolution `DefMI != nullptr'. It attempts to join live intervals for two adjacent instructions and erase the copy: %2:vreg_256 = COPY %1 %3:vreg_256 = COPY killed %1 The LI needs to be adjusted to kill subrange for the erased instruction and extend the subrange of the original def. That was done for the main interval only but not for the subrange. As a result subrange had a VNI pointing to the erased slot resulting in the above failure. Differential Revision: https://reviews.llvm.org/D62162 llvm-svn: 361293
* AMDGPU: Force skip branches over callsMatt Arsenault2019-05-201-0/+67
| | | | | | | | | | | | | | Unfortunately the way SIInsertSkips works is backwards, and is required for correctness. r338235 added handling of some special cases where skipping is mandatory to avoid side effects if no lanes are active. It conservatively handled asm correctly, but the same logic needs to apply to calls. Usually the call sequence code is larger than the skip threshold, although the way the count is computed is really broken, so I'm not sure if anything was likely to really hit this. llvm-svn: 361202
* RegAlloc: Fix verifier error with undef identity copiesMatt Arsenault2019-05-201-0/+69
| | | | | | | | The code did not match the example in the comment, and was checking the undef flag on the copy dest instead of source. The existing tests were only hitting the > 2 operands case. llvm-svn: 361156
* [AMDGPU] gfx1010 Avoid SMEM WAR hazard for some s_waitcnt valuesCarl Ritson2019-05-201-7/+116
| | | | | | | | | | | | | | | | | | Summary: Avoid introducing hazard mitigation when lgkmcnt is reduced to 0. Clarify code comments to explain assumptions made for this hazard mitigation. Expand and correct test cases to cover variants of s_waitcnt. Reviewers: nhaehnle, rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62058 llvm-svn: 361124
* UpdateTestChecks: fix AMDGPU handlingRoman Lebedev2019-05-181-228/+1347
| | | | | | | | | | | | | | | | | | | | | | | Summary: Was looking into supporting `(srl (shl x, c1), c2)` with c1 != c2 in dagcombiner, this test changes, but makes `update_llc_test_checks.py` unhappy. **Many** AMDGPU tests specify `-march`, not `-mtriple`, which results in `update_llc_test_checks.py` defaulting to x86 asm function detection heuristics, which don't work here. I propose to fix this by adding an infrastructure to map from `-march` to `-mtriple`, in the UpdateTestChecks tooling. Reviewers: RKSimon, MaskRay, arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62099 llvm-svn: 361101
* AMDGPU/GlobalISel: Implement s64->s64 [SU]ITOFPMatt Arsenault2019-05-172-0/+40
| | | | llvm-svn: 361082
* GlobalISel: Implement lower for S64->S32 [SU]ITOFPMatt Arsenault2019-05-172-0/+115
| | | | | | | | | | | | | This is ported from the custom AMDGPU DAG implementation. I think this is a better default expansion than what the DAG currently uses, at least if the target has CTLZ. This implements the signed version in terms of the unsigned conversion, which is implemented with bit operations. SelectionDAG has several other implementations that should eventually be ported depending on what instructions are legal. llvm-svn: 361081
* AMDGPU/GlobalISel: Legalize G_FCEILMatt Arsenault2019-05-171-0/+275
| | | | llvm-svn: 361028
* AMDGPU/GlobalISel: Legalize G_INTRINSIC_TRUNCMatt Arsenault2019-05-171-22/+208
| | | | llvm-svn: 361027
* AMDGPU/GlobalISel: Legalize G_FRINTMatt Arsenault2019-05-171-0/+160
| | | | llvm-svn: 361026
* AMDGPU/GlobalISel: Legalize G_FCOPYSIGNMatt Arsenault2019-05-171-0/+479
| | | | llvm-svn: 361025
* AMDGPU/GlobalISel: RegBankSelect for llvm.amdgcn.s.buffer.loadMatt Arsenault2019-05-171-0/+151
| | | | llvm-svn: 361023
* AMDGPU/GlobalISel: Use subreg index instead of extra unmergeMatt Arsenault2019-05-171-24/+16
| | | | | | | This saves instructions and extra steps, but I'm not sure about introducing subregister indexes at this point. llvm-svn: 361022
* AMDGPU/GlobalISel: Use waterfall loop for buffer_loadMatt Arsenault2019-05-172-6/+295
| | | | | | | This adds support for more complex waterfall loops that need to handle operands > 32-bits, and multiple operands. llvm-svn: 361021
* [AMDGPU] detect WaW hazards when moving/merging load/store instructionsRhys Perry2019-05-171-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In order to combine memory operations efficiently, the load/store optimizer might move some instructions around. It's usually safe to move instructions down past the merged instruction because the pass checks if memory operations can be re-ordered. Though, the current logic doesn't handle Write-after-Write hazards. This fixes a reflection issue with Monster Hunter World and DXVK. v2: - rebased on top of master - clean up the test case - handle WaW hazards correctly Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=40130 Original patch by Samuel Pitoiset. Reviewers: tpr, arsenm, nhaehnle Reviewed By: nhaehnle Subscribers: ronlieb, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D61313 llvm-svn: 361008
* [CodeGen] Fixed de-optimization of legalize subvector extractTim Renouf2019-05-161-0/+37
| | | | | | | | | | | | | | | The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU 3-dword memory instructions, caused a de-optimization problem for code with such a load that then bitcasts via vector of i8, because v12i8 is not an MVT so it legalizes the bitcast by widening it. This commit adds the ability to widen a bitcast using extract_subvector on the result, so the value does not need to go via memory. Differential Revision: https://reviews.llvm.org/D60457 Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64 llvm-svn: 360942
* AMDGPU: Introduce TokenFactor for ABI register copies in call sequenceMatt Arsenault2019-05-162-60/+61
| | | | | | | The call was missing chain dependencies on the pre-call copies. I don't think this was causing any real issues however. llvm-svn: 360906
* AMDGPU: Assume xnack is enabled by defaultMatt Arsenault2019-05-162-1/+4
| | | | | | | | | | | This is the conservatively correct default. It is always safe to assume xnack is enabled, but not the converse. Introduce a feature to blacklist targets where xnack can never be meaningfully enabled. I'm not sure the targets this is applied to is 100% correct. llvm-svn: 360903
* RegAllocFast: Improve hinting heuristicMatt Arsenault2019-05-161-40/+40
| | | | | | | | | | | | | | | Trace through multiple COPYs when looking for a physreg source. Add hinting for vregs that will be copied into physregs (we only hinted for vregs getting copied to a physreg previously). Give hinted a register a bonus when deciding which value to spill. This is part of my rewrite regallocfast series. In fact this one doesn't even have an effect unless you also flip the allocation to happen from back to front of a basic block. Nonetheless it helps to split this up to ease review of D52010 Patch by Matthias Braun llvm-svn: 360887
* AMDGPU/GlobalISel: Correct regbank for 1-bit and/or/xorMatt Arsenault2019-05-163-42/+42
| | | | | | Bool values should use the scc/vcc regbank since r350611. llvm-svn: 360877
* [AMDGPU] Increases available SGPR for Calling ConventionRyan Taylor2019-05-151-0/+265
| | | | | | | | | | | | | | | | Summary: SGPR in CC can be either hw initialized or set by other chained shaders and so this increases the SGPR count availalbe to CC to 105. Change-Id: I3dfadc750fe4a3e2bd07117a2899fd13f3e2fef3 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61261 llvm-svn: 360778
* [AMDGPU] Fixed handling of imemdiate i1 literalsStanislav Mekhanoshin2019-05-141-0/+23
| | | | | | | | This bug was exposed by the rL360395. Differential Revision: https://reviews.llvm.org/D61812 llvm-svn: 360689
* [AMDGPU] gfx1010 Strengthen some SMEM WAR hazard unit tests. NFC.Stanislav Mekhanoshin2019-05-141-2/+19
| | | | | | | | | Tighten conditions on SMEM WAR hazard unit tests to ensure rejection of workaround insertion where a s_waitcnt is present in dependency chain. The current workaround code already conforms to these revise tests. llvm-svn: 360686
* [AMDGPU] gfx1010 tests. NFC.Stanislav Mekhanoshin2019-05-138-97/+345
| | | | llvm-svn: 360615
* [DAG] Add SimplifyDemandedBits support for BITREVERSESimon Pilgrim2019-05-111-2/+1
| | | | | | Pulled out of D58017 while I continue to investigate the BSWAP regression on PPC llvm-svn: 360534
* [AMDGPU] Pattern for v_xor3_b32Stanislav Mekhanoshin2019-05-108-0/+401
| | | | | | | | | This also allows three op patterns to use increased constant bus limit of GFX10. Differential Revision: https://reviews.llvm.org/D61763 llvm-svn: 360395
* [AMDGPU] gfx1010 tests. NFC.Stanislav Mekhanoshin2019-05-082-24/+58
| | | | | | Added tests which now pass after code commits. llvm-svn: 360300
* AMDGPU: Select VOP3 form of addMatt Arsenault2019-05-087-67/+200
| | | | | | | | | | | | | | | | | | | | | | | | The VOP3 form should always be the preferred selection, to be shrunk later. This should only be an optimization issue, but this partially works around a problem from clobbering VCC when SIFixSGPRCopies rewrites an SCC defining operation directly to VCC. 3 of the testcases are regressions from failing to fold the immediate in cases it should. These can be avoided by improving the VCC liveness handling in SIFoldOperands. Simply increasing the threshold to computeRegisterLiveness works, although this is common enough that VCC liveness should probably be tracked throughout the pass. The hack of leaving behind an implicit_def instruction to avoid breaking iterator wastes instruction count, which inhibits finding the VCC def in long chains of adds. Doing this however exposes different, worse looking regressions from poor scheduling behavior. This could probably be avoided around by forcing the shrink of the addc here, but the scheduler should probably be fixed. The r600 add test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 360293
* [AMDGPU] gfx1010 exp modificationsStanislav Mekhanoshin2019-05-081-0/+15
| | | | | | Differential Revision: https://reviews.llvm.org/D61701 llvm-svn: 360287
* [AMDGPU] Reapplied BFE canonicalization from D60462Simon Pilgrim2019-05-081-2/+2
| | | | | | This was committed in rL358887 but reverted in rL360066 due to a x86 regression, really it should be have been pre-committed instead of being part of the SimplifyDemandedBits bitcast patch. llvm-svn: 360263
* [AMDGPU] Check MI bundles for hazardsAustin Kerbow2019-05-072-0/+160
| | | | | | | | | | | | | | | | Summary: GCNHazardRecognizer fails to identify hazards that are in and around bundles. This patch allows the hazard recognizer to consider bundled instructions in both scheduler and hazard recognizer mode. We ignore “bundledness” for the purpose of detecting hazards and examine the instructions individually. Reviewers: arsenm, msearles, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61564 llvm-svn: 360199
* AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operandNicolai Haehnle2019-05-071-0/+21
| | | | | | | | | | | | | | | | | | Summary: No test case because I don't know of a way to trigger this, but I accidentally caused this to fail while working on a different change. Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3 Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61490 llvm-svn: 360123
* [AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32Stanislav Mekhanoshin2019-05-0613-99/+141
| | | | | | | | | GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for all targets. Differential Revision: https://reviews.llvm.org/D61525 llvm-svn: 360094
* [AMDGPU] gfx1010 memory legalizerStanislav Mekhanoshin2019-05-066-817/+4456
| | | | | | Differential Revision: https://reviews.llvm.org/D61535 llvm-svn: 360087
* Revert r359392 and r358887Craig Topper2019-05-061-2/+2
| | | | | | | | | | | | | | | | | | | | Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066
* AMDGPU] gfx1010 hazard recognizerStanislav Mekhanoshin2019-05-045-0/+904
| | | | | | Differential Revision: https://reviews.llvm.org/D61536 llvm-svn: 359961
* [AMDGPU] gfx1010: use fmac instructionsStanislav Mekhanoshin2019-05-047-190/+899
| | | | | | Differential Revision: https://reviews.llvm.org/D61527 llvm-svn: 359959
* [AMDGPU] gfx1010 wait count insertionStanislav Mekhanoshin2019-05-031-0/+260
| | | | | | Differential Revision: https://reviews.llvm.org/D61534 llvm-svn: 359938
* [AMDGPU] gfx1010 s_code_end generationStanislav Mekhanoshin2019-05-031-0/+80
| | | | | | | | Also add some missing metadata in the streamer. Differential Revision: https://reviews.llvm.org/D61531 llvm-svn: 359937
* Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out ↵Matt Arsenault2019-05-031-3/+6
| | | | | | | | | | | of a block" This reverts commit r359912. This should pass now, since the clang test was made less fragile in r359918. llvm-svn: 359919
OpenPOWER on IntegriCloud