summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out ↵Nico Weber2019-05-031-6/+3
| | | | | | | | of a block" Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail. llvm-svn: 359912
* RegAllocFast: Add heuristic to detect values not live-out of a blockMatt Arsenault2019-05-031-3/+6
| | | | | | | | | Add an improved/new heuristic to catch more cases when values are not live out of a basic block. Patch by Matthias Braun llvm-svn: 359906
* AMDGPU: Select VOP3 form of subMatt Arsenault2019-05-032-48/+194
| | | | | | | | | | The VOP3 form should always be the preferred selection form to be shrunk later. The r600 sub test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 359899
* AMDGPU: Support shrinking add with FI in SIFoldOperandsMatt Arsenault2019-05-031-14/+13
| | | | | | Avoids test regression in a future patch llvm-svn: 359898
* AMDGPU: Add baseline test for future patchMatt Arsenault2019-05-031-0/+231
| | | | llvm-svn: 359893
* AMDGPU: Replace shrunk instruction with dummy implicit_defMatt Arsenault2019-05-031-0/+56
| | | | | | | | | | | | This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds. Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators. llvm-svn: 359891
* AMDGPU: Forgot to commit test file for r358890Matt Arsenault2019-05-031-0/+97
| | | | llvm-svn: 359885
* AMDGPU: Fix incorrect commute with sub when folding immediatesMatt Arsenault2019-05-031-8/+8
| | | | | | | | | When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode. llvm-svn: 359883
* AMDGPU: Fix test verificationMatt Arsenault2019-05-031-1/+3
| | | | | | This should run the verifier, and needs to enable trackRegLiveness. llvm-svn: 359882
* [AMDGPU] gfx1010 lost VOP2 forms of some add/subStanislav Mekhanoshin2019-05-022-19/+61
| | | | | | | | Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757
* [AMDGPU] gfx1010 allows VOP3 to have a literalStanislav Mekhanoshin2019-05-021-8/+54
| | | | | | Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756
* [AMDGPU] gfx1010 GCNRegBankReassign passStanislav Mekhanoshin2019-05-011-0/+336
| | | | | | | | Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 llvm-svn: 359704
* [AMDGPU] gfx1010 GCNNSAReassign passStanislav Mekhanoshin2019-05-012-71/+235
| | | | | | | | Convert NSA into non-NSA images. Differential Revision: https://reviews.llvm.org/D61341 llvm-svn: 359700
* [AMDGPU] gfx1010 MIMG implementationStanislav Mekhanoshin2019-05-0115-216/+483
| | | | | | Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698
* [AMDGPU] gfx1010 DS implementationStanislav Mekhanoshin2019-05-011-0/+262
| | | | | | Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696
* [llvm-readobj] Change -long-option to --long-option in tests. NFCFangrui Song2019-05-0135-127/+127
| | | | | | | | | | We use both -long-option and --long-option in tests. Switch to --long-option for consistency. In the "llvm-readelf" mode, -long-option is discouraged as it conflicts with grouped short options and it is not accepted by GNU readelf. While updating the tests, change llvm-readobj -s to llvm-readobj -S to reduce confusion ("s" is --section-headers in llvm-readobj but --symbols in llvm-readelf). llvm-svn: 359649
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-3084-1238/+1343
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* [DAG] Refactor DAGCombiner::ReassociateOpsBjorn Pettersson2019-04-294-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Extract the logic for doing reassociations from DAGCombiner::reassociateOps into a helper function DAGCombiner::reassociateOpsCommutative, and use that helper to trigger reassociation on the original operand order, or the commuted operand order. Codegen is not identical since the operand order will be different when doing the reassociations for the commuted case. That causes some unfortunate churn in some test cases. Apart from that this should be NFC. Reviewers: spatel, craig.topper, tstellar Reviewed By: spatel Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61199 llvm-svn: 359476
* Revert "AMDGPU: Split block for si_end_cf"Mark Searles2019-04-272-97/+55
| | | | | | | | | | This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363
* [AMDGPU] gfx1010 VOP3 and VOP3P implementationStanislav Mekhanoshin2019-04-261-0/+96
| | | | | | Differential Revision: https://reviews.llvm.org/D61202 llvm-svn: 359328
* [AMDGPU] gfx1010 VOP2 changesStanislav Mekhanoshin2019-04-261-0/+25
| | | | | | Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316
* [AMDGPU] Add gfx1010 target definitionsStanislav Mekhanoshin2019-04-242-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D61041 llvm-svn: 359113
* [AMDGPU] Fixed addReg() in SIOptimizeExecMaskingPreRA.cppStanislav Mekhanoshin2019-04-231-0/+22
| | | | | | | | The second argument is flags, not subreg. Differential Revision: https://reviews.llvm.org/D61031 llvm-svn: 359017
* [AMDGPU] Fix hidden argument metadata duplication for V3Scott Linder2019-04-234-148/+461
| | | | | | | | | | Essentially complete a proper rebase of the V3 metadata change over https://reviews.llvm.org/D49096. Minimize the diff between the V2 and V3 variants of the relevant lit tests, and clean up some trailing whitespace. llvm-svn: 358992
* AMDGPU: Fix LCSSA phi lowering in SILowerI1CopiesNicolai Haehnle2019-04-231-0/+33
| | | | | | | | | | | | | | | | | | | | | | Summary: When an LCSSA phi survives through instruction selection, the pass ends up removing that phi entirely because it is dominated by the logic that does the lanemask merging. This then used to trigger an assertion when processing a dependent phi instruction. Change-Id: Id4949719f8298062fe476a25718acccc109113b6 Reviewers: llvm-commits Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D60999 llvm-svn: 358983
* [DAGCombiner] Combine OR as ADD when no common bits are setBjorn Pettersson2019-04-234-43/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The DAGCombiner is rewriting (canonicalizing) an ISD::ADD with no common bits set in the operands as an ISD::OR node. This could sometimes result in "missing out" on some combines that normally are performed for ADD. To be more specific this could happen if we already have rewritten an ADD into OR, and later (after legalizations or combines) we expose patterns that could have been optimized if we had seen the OR as an ADD (e.g. reassociations based on ADD). To make the DAG combiner less sensitive to if ADD or OR is used for these "no common bits set" ADD/OR operations we now apply most of the ADD combines also to an OR operation, when value tracking indicates that the operands have no common bits set. Reviewers: spatel, RKSimon, craig.topper, kparzysz Reviewed By: spatel Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59758 llvm-svn: 358965
* [AMDGPU] Fix an issue in `op_sel_hi` skipping.Michael Liao2019-04-221-0/+7
| | | | | | | | | | | | | | | | | Summary: - Only apply packed literal `op_sel_hi` skipping on operands requiring packed literals. Even an instruction is `packed`, it may have operand requiring non-packed literal, such as `v_dot2_f32_f16`. Reviewers: rampitec, arsenm, kzhuravl Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60978 llvm-svn: 358922
* AMDGPU: Skip debug instructions in assertMatt Arsenault2019-04-221-0/+56
| | | | | | | | | | These are inserted after branch relaxation, and for some reason it's decided to put them in the long branch expansion block. It's probably not great to rely on the source block address, so this should probably be switched to being PC relative instead of relying on the block address llvm-svn: 358909
* AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sourcesMatt Arsenault2019-04-221-0/+54
| | | | llvm-svn: 358894
* GlobalISel: Legalize scalar G_EXTRACT sourcesMatt Arsenault2019-04-221-7/+7
| | | | llvm-svn: 358892
* [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handlingSimon Pilgrim2019-04-221-2/+2
| | | | | | | | | | | | This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887
* [AMDGPU] Regenerate uitofp i8 to float conversion tests. Simon Pilgrim2019-04-221-119/+771
| | | | | | Prep work for D60462 llvm-svn: 358879
* [AMDGPU] Regenerate extractelt->truncate test. Simon Pilgrim2019-04-191-27/+117
| | | | | | Prep work for D60462 llvm-svn: 358746
* [AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))Tim Renouf2019-04-181-0/+22
| | | | | | | | | | | fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but creating the new fadd folds to just fneg(A). When A has multiple uses, this confuses it and you get an assert. Fixed. Differential Revision: https://reviews.llvm.org/D60633 Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c llvm-svn: 358640
* AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructionsRhys Perry2019-04-174-0/+11
| | | | | | | | | | | | | | | | Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches. Reviewers: arsen, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60824 llvm-svn: 358592
* AMDGPU: Fix unreachable when counting register usage of SGPR96Matt Arsenault2019-04-151-0/+13
| | | | llvm-svn: 358447
* AMDGPU: Fix printed format of SReg_96Matt Arsenault2019-04-151-0/+10
| | | | | | | These are artificial, so I think this should only come up with inline asm comments. llvm-svn: 358446
* [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with ↵Amara Emerson2019-04-1515-651/+420
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | constants only. Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't be degraded. This change also improves the IRTranslator so that in most places, but not all, it creates constants using the MIRBuilder directly instead of first creating a new destination vreg and then creating a constant. By doing this, the buildConstant() method can just return the vreg of an existing G_CONSTANT instead of having to create a COPY from it. I measured a 0.2% improvement in compile time and a 0.9% improvement in code size at -O0 ARM64. Compile time: Program base cse diff test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8% test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7% test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4% test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3% test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2% test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2% test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2% test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1% test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1% test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0% Geomean difference -0.2% Code size: Program base cse diff test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7% test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2% test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1% test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1% test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 8033492 -0.6% test-suite :: CTMark/kimwitu++/kc.test 3870380 3853972 -0.4% test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0% test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0% Geomean difference -0.9% Differential Revision: https://reviews.llvm.org/D60580 llvm-svn: 358369
* Revert rL357745: [SelectionDAG] Compute known bits of CopyFromRegDavid Green2019-04-101-4/+6
| | | | | | | | | | Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not seeing through to the constant in other blocks. Revert this patch while we come up with a better way to handle that. I will try to follow this up with some better tests. llvm-svn: 358113
* GlobalISel: Support legalizing G_CONSTANT with irregular breakdownMatt Arsenault2019-04-101-0/+17
| | | | llvm-svn: 358109
* GlobalISel: Handle odd breakdowns for bit opsMatt Arsenault2019-04-103-0/+138
| | | | llvm-svn: 358105
* Revert LIS handling in MachineDCEStanislav Mekhanoshin2019-04-093-52/+89
| | | | | | | | | | One of out of tree targets has regressed with this patch. Reverting it for now and let liveness to be fully reconstructed in case pass was used after the LIS is created to resolve the regression. Differential Revision: https://reviews.llvm.org/D60466 llvm-svn: 358015
* AMDGPU/GlobalISel: Implement call lowering for shaders returning valuesTom Stellard2019-04-092-10/+21
| | | | | | | | | | Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits Differential Revision: https://reviews.llvm.org/D57166 llvm-svn: 357964
* Reapply [ValueTracking] Support min/max selects in computeConstantRange()Nikita Popov2019-04-073-38/+48
| | | | | | | | | | | | | | | | | | Add support for min/max flavor selects in computeConstantRange(), which allows us to fold comparisons of a min/max against a constant in InstSimplify. This fixes an infinite InstCombine loop, with the test case taken from D59378. Relative to the previous iteration, this contains some adjustments for AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen, which ends up constant folding some existing med3 tests after this change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option is added, which allows disabling scalar IR passes (that use InstSimplify) for testing purposes. Differential Revision: https://reviews.llvm.org/D59506 llvm-svn: 357870
* [AMDGPU] Add MachineDCE pass after RenameIndependentSubregsStanislav Mekhanoshin2019-04-056-2/+26
| | | | | | | | | | | | | | Detect dead lanes can create some dead defs. Then RenameIndependentSubregs will break a REG_SEQUENCE which may use these dead defs. At this point a dead instruction can be removed but we do not run a DCE anymore. MachineDCE was only running before live variable analysis. The patch adds a mean to preserve LiveIntervals and SlotIndexes in case it works past this. Differential Revision: https://reviews.llvm.org/D59626 llvm-svn: 357805
* AMDGPU/GlobalISel: Fix non-power-of-2 selectMatt Arsenault2019-04-051-0/+28
| | | | llvm-svn: 357762
* [SelectionDAG] Compute known bits of CopyFromRegPiotr Sobczak2019-04-051-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if the virtual reg used has one def only. This can be particularly useful when calling isBaseWithConstantOffset() with the ISD::CopyFromReg argument, as more optimizations may get enabled in the result. Also add a missing truncation on X86, found by testing of this patch. Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa Reviewers: bogner, craig.topper, RKSimon Reviewed By: RKSimon Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59535 llvm-svn: 357745
* AMDGPU: Split block for si_end_cfMatt Arsenault2019-04-032-56/+99
| | | | | | | | | | | | | | | Relying on no spill or other code being inserted before this was precarious. It relied on code diligently checking isBasicBlockPrologue which is likely to be forgotten. Ideally this could be done earlier, but this doesn't work because of phis. Any other instruction can't be placed before them, so we have to accept the position being incorrect during SSA. This avoids regressions in the fast register allocator rewrite from inverting the direction. llvm-svn: 357634
* AMDGPU: Assume ECC is enabled by default if supportedMatt Arsenault2019-04-032-1/+25
| | | | | | | | | | The test should really be checking for the property directly in the code object headers, but there are problems with this. I don't see this directly represented in the text form, and for the binary emission this is depending on a function level subtarget feature to emit a global flag. llvm-svn: 357558
* AMDGPU: Don't use the default cpu in a few testsMatt Arsenault2019-04-0310-1273/+1111
| | | | | | Avoids unnecessary test changes in a future commit. llvm-svn: 357539
OpenPOWER on IntegriCloud