summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU
Commit message (Collapse)AuthorAgeFilesLines
...
* AMDGPU] gfx1010 hazard recognizerStanislav Mekhanoshin2019-05-042-3/+268
| | | | | | Differential Revision: https://reviews.llvm.org/D61536 llvm-svn: 359961
* [AMDGPU] gfx1010: use fmac instructionsStanislav Mekhanoshin2019-05-044-39/+105
| | | | | | Differential Revision: https://reviews.llvm.org/D61527 llvm-svn: 359959
* [AMDGPU] gfx1010 wait count insertionStanislav Mekhanoshin2019-05-031-56/+144
| | | | | | Differential Revision: https://reviews.llvm.org/D61534 llvm-svn: 359938
* [AMDGPU] gfx1010 s_code_end generationStanislav Mekhanoshin2019-05-034-2/+45
| | | | | | | | Also add some missing metadata in the streamer. Differential Revision: https://reviews.llvm.org/D61531 llvm-svn: 359937
* [AMDGPU] gfx1010 loop alignmentStanislav Mekhanoshin2019-05-032-0/+78
| | | | | | Differential Revision: https://reviews.llvm.org/D61529 llvm-svn: 359935
* AMDGPU: Select VOP3 form of subMatt Arsenault2019-05-031-2/+2
| | | | | | | | | | The VOP3 form should always be the preferred selection form to be shrunk later. The r600 sub test needs to be split out because it asserts on the arguments in the new test during the calling convention lowering. llvm-svn: 359899
* AMDGPU: Support shrinking add with FI in SIFoldOperandsMatt Arsenault2019-05-031-35/+37
| | | | | | Avoids test regression in a future patch llvm-svn: 359898
* AMDGPU: Remove redundant patterns for shiftsMatt Arsenault2019-05-031-9/+4
| | | | llvm-svn: 359895
* AMDGPU: Remove redundant patterns for subMatt Arsenault2019-05-031-4/+0
| | | | | | | There were 2 patterns for sub, one selecting to sub and one to subrev. Only one of these will succeed, so remove the reversed one. llvm-svn: 359894
* AMDGPU: Replace shrunk instruction with dummy implicit_defMatt Arsenault2019-05-031-4/+8
| | | | | | | | | | | | This was broken if the original operand was killed. The kill flag would appear on both instructions, and fail the verifier. Keep the kill flag, but remove the operands from the old instruction. This has an added benefit of really reducing the use count for future folds. Ideally the pass would be structured more like what PeepholeOptimizer does to avoid this hack to avoid breaking instruction iterators. llvm-svn: 359891
* AMDGPU: Fix incorrect commute with sub when folding immediatesMatt Arsenault2019-05-031-1/+4
| | | | | | | | | When a fold of an immediate into a sub/subrev required shrinking the instruction, the wrong VOP2 opcode was used. This was using the VOP2 equivalent of the original instruction, not the commuted instruction with the inverted opcode. llvm-svn: 359883
* [SelectionDAG] remove constant folding limitations based on FP exceptionsSanjay Patel2019-05-021-5/+0
| | | | | | | | | | | | | | | | | We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791
* [AMDGPU] gfx1010 lost VOP2 forms of some add/subStanislav Mekhanoshin2019-05-021-0/+27
| | | | | | | | Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32. Differential Revision: llvm-svn: 359757
* [AMDGPU] gfx1010 allows VOP3 to have a literalStanislav Mekhanoshin2019-05-027-60/+133
| | | | | | Differential Revision: https://reviews.llvm.org/D61413 llvm-svn: 359756
* [AMDGPU] gfx1010 constant bus limitStanislav Mekhanoshin2019-05-024-24/+136
| | | | | | | | Constant bus limit has increased to 2 with GFX10. Differential Revision: https://reviews.llvm.org/D61404 llvm-svn: 359754
* [AMDGPU] gfx1010 GCNRegBankReassign passStanislav Mekhanoshin2019-05-014-0/+803
| | | | | | | | Reassign registers to reduce register bank conflicts. Differential Revision: https://reviews.llvm.org/D61344 llvm-svn: 359704
* [AMDGPU] gfx1010 GCNNSAReassign passStanislav Mekhanoshin2019-05-014-0/+362
| | | | | | | | Convert NSA into non-NSA images. Differential Revision: https://reviews.llvm.org/D61341 llvm-svn: 359700
* [AMDGPU] gfx1010 MIMG implementationStanislav Mekhanoshin2019-05-0112-161/+922
| | | | | | Differential Revision: https://reviews.llvm.org/D61339 llvm-svn: 359698
* [AMDGPU] gfx1010 DS implementationStanislav Mekhanoshin2019-05-013-165/+221
| | | | | | Differential Revision: https://reviews.llvm.org/D61332 llvm-svn: 359696
* [AMDGPU] gfx1010 VMEM and SMEM implementationStanislav Mekhanoshin2019-04-3016-317/+1071
| | | | | | Differential Revision: https://reviews.llvm.org/D61330 llvm-svn: 359621
* [TargetLowering] Change getOptimalMemOpType to take a function attribute listSjoerd Meijer2019-04-302-6/+5
| | | | | | | | | | | | The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537
* Avoid "checking a pointer after dereferencing" warning. NFCI.Simon Pilgrim2019-04-291-1/+1
| | | | | | Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359473
* Move if() to newline to stop ambiguity over whether it should be else if. NFCI.Simon Pilgrim2019-04-291-1/+2
| | | | | | Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359472
* Revert "AMDGPU: Split block for si_end_cf"Mark Searles2019-04-275-128/+17
| | | | | | | | | | This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4. We discovered some internal test failures, so reverting for now. Differential Revision: https://reviews.llvm.org/D61213 llvm-svn: 359363
* [AMDGPU] gfx1010 VOPC implementationStanislav Mekhanoshin2019-04-268-361/+696
| | | | | | Differential Revision: https://reviews.llvm.org/D61208 llvm-svn: 359358
* [AMDGPU] gfx1010 VOP3 and VOP3P implementationStanislav Mekhanoshin2019-04-264-102/+281
| | | | | | Differential Revision: https://reviews.llvm.org/D61202 llvm-svn: 359328
* [AMDGPU] gfx1010 VOP2 changesStanislav Mekhanoshin2019-04-266-154/+605
| | | | | | Differential Revision: https://reviews.llvm.org/D61156 llvm-svn: 359316
* [AMDGPU] gfx1010 - fix ubsan failureStanislav Mekhanoshin2019-04-251-1/+0
| | | | | | | Revert DecoderNamespace in one place for now. It will need more changes to properly work. llvm-svn: 359239
* [AMDGPU] gfx1010 VOP1 instructionsStanislav Mekhanoshin2019-04-256-102/+306
| | | | | | Differential Revision: https://reviews.llvm.org/D61099 llvm-svn: 359225
* [AMDGPU] gfx1010 utility functionsStanislav Mekhanoshin2019-04-254-29/+90
| | | | | | Differential Revision: https://reviews.llvm.org/D61094 llvm-svn: 359224
* Fix spelling error. NFCAustin Kerbow2019-04-241-1/+1
| | | | | | | | | | | | | | | | Summary: Test commit. Reviewers: msearles, jkorous Reviewed By: jkorous Subscribers: dexonsmith, arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61093 llvm-svn: 359154
* [AMDGPU] gfx1010 SOP instructionsStanislav Mekhanoshin2019-04-241-131/+305
| | | | | | Differential Revision: https://reviews.llvm.org/D61080 llvm-svn: 359139
* [AMDGPU] gfx1010 sgpr register changesStanislav Mekhanoshin2019-04-2410-41/+123
| | | | | | Differential Revision: https://reviews.llvm.org/D61045 llvm-svn: 359117
* [AMDGPU] Add gfx1010 target definitionsStanislav Mekhanoshin2019-04-2414-94/+516
| | | | | | Differential Revision: https://reviews.llvm.org/D61041 llvm-svn: 359113
* [AMDGPU][MC] Parser cleanup and refactoringDmitry Preobrazhensky2019-04-241-93/+48
| | | | | | | | Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60767 llvm-svn: 359096
* [AMDGPU] Fixed addReg() in SIOptimizeExecMaskingPreRA.cppStanislav Mekhanoshin2019-04-231-1/+1
| | | | | | | | The second argument is flags, not subreg. Differential Revision: https://reviews.llvm.org/D61031 llvm-svn: 359017
* [AMDGPU] Fix hidden argument metadata duplication for V3Scott Linder2019-04-231-30/+0
| | | | | | | | | | Essentially complete a proper rebase of the V3 metadata change over https://reviews.llvm.org/D49096. Minimize the diff between the V2 and V3 variants of the relevant lit tests, and clean up some trailing whitespace. llvm-svn: 358992
* AMDGPU: Fix LCSSA phi lowering in SILowerI1CopiesNicolai Haehnle2019-04-231-1/+8
| | | | | | | | | | | | | | | | | | | | | | Summary: When an LCSSA phi survives through instruction selection, the pass ends up removing that phi entirely because it is dominated by the logic that does the lanemask merging. This then used to trigger an assertion when processing a dependent phi instruction. Change-Id: Id4949719f8298062fe476a25718acccc109113b6 Reviewers: llvm-commits Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm Tags: #llvm Differential Revision: https://reviews.llvm.org/D60999 llvm-svn: 358983
* [CallSite removal] move InlineCost to CallBase usageFedor Sergeev2019-04-231-2/+3
| | | | | | | | | | | Converting InlineCost interface and its internals into CallBase usage. Inliners themselves are still not converted. Reviewed By: reames Tags: #llvm Differential Revision: https://reviews.llvm.org/D60636 llvm-svn: 358982
* [AMDGPU] Fix an issue in `op_sel_hi` skipping.Michael Liao2019-04-221-7/+16
| | | | | | | | | | | | | | | | | Summary: - Only apply packed literal `op_sel_hi` skipping on operands requiring packed literals. Even an instruction is `packed`, it may have operand requiring non-packed literal, such as `v_dot2_f32_f16`. Reviewers: rampitec, arsenm, kzhuravl Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60978 llvm-svn: 358922
* AMDGPU: Skip debug instructions in assertMatt Arsenault2019-04-221-2/+7
| | | | | | | | | | These are inserted after branch relaxation, and for some reason it's decided to put them in the long branch expansion block. It's probably not great to rely on the source block address, so this should probably be switched to being PC relative instead of relying on the block address llvm-svn: 358909
* AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sourcesMatt Arsenault2019-04-221-1/+3
| | | | llvm-svn: 358894
* AMDGPU: Fix not checking for copy when looking at copy srcMatt Arsenault2019-04-221-1/+6
| | | | | | | Effectively reverts r356956. The check for isFullCopy was excessive, but there still needs to be a check that this is a copy. llvm-svn: 358890
* [AMDGPU][MC] Corrected parsing of SP3 'neg' modifierDmitry Preobrazhensky2019-04-221-24/+58
| | | | | | | | | | See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60624 llvm-svn: 358888
* [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handlingSimon Pilgrim2019-04-221-11/+25
| | | | | | | | | | | | This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly. The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions. The X86 changes are all definite wins. Differential Revision: https://reviews.llvm.org/D60462 llvm-svn: 358887
* [CodeGen] Add "const" to MachineInstr::mayAliasBjorn Pettersson2019-04-193-19/+23
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The basic idea here is to make it possible to use MachineInstr::mayAlias also when the MachineInstr is const (or the "Other" MachineInstr is const). The addition of const in MachineInstr::mayAlias then rippled down to the need for adding const in several other places, such as TargetTransformInfo::getMemOperandWithOffset. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60856 llvm-svn: 358744
* [AMDGPU] Ignore non-SUnits edgesPiotr Sobczak2019-04-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Ignore edges to non-SUnits (e.g. ExitSU) when checking for low latency instructions. When calling the function isLowLatencyInstruction(), an ExitSU could be on the list of successors, not necessarily a regular SU. In other places in the code there is a check "Succ->NodeNum >= DAGSize" to prevent further processing of ExitSU as "Succ->getInstr()" is NULL in such a case. Also, 8 out of 9 cases of "SUnit *Succ = SuccDep.getSUnit())" has the guard, so it is clearly an omission here. Change-Id: Ica86f0327c7b2e6bcb56958e804ea6c71084663b Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: MatzeB, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60864 llvm-svn: 358740
* [AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))Tim Renouf2019-04-181-0/+10
| | | | | | | | | | | fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but creating the new fadd folds to just fneg(A). When A has multiple uses, this confuses it and you get an assert. Fixed. Differential Revision: https://reviews.llvm.org/D60633 Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c llvm-svn: 358640
* [AMDGPU][MC] Corrected handling of "-" before expressionsDmitry Preobrazhensky2019-04-171-38/+58
| | | | | | | | | | See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D60622 llvm-svn: 358596
* AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructionsRhys Perry2019-04-171-0/+4
| | | | | | | | | | | | | | | | Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches. Reviewers: arsen, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60824 llvm-svn: 358592
OpenPOWER on IntegriCloud