summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Fix crash when folding immediates into multiple usesNicolai Haehnle2017-07-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When an immediate is folded by constant folding, we re-scan the entire use list for two reasons: 1. The constant folding may have created a new use of the same reg. 2. The constant folding may have removed an additional use in the list we're currently traversing (e.g., constant folding an S_ADD_I32 c, c). However, this could previously lead to a crash when an unrelated use was added twice into the FoldList. Since we re-scan the whole list anyway, we might as well just clear the FoldList again before we do so. Using a MIR test to show this because real code seems to trigger the issue only in connection with some really subtle control flow structures. Fixes GL45-CTS.shading_language_420pack.binding_images on gfx9. Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35416 llvm-svn: 308314
* [AMDGPU] Fix -Wimplicit-fallthrough warnings. NFCI.Simon Pilgrim2017-07-071-0/+1
| | | | llvm-svn: 307381
* AMDGPU: Do operand folding in program orderMatt Arsenault2017-06-201-5/+3
| | | | | | | | | Before it was possible to partially fold use instructions before the defs. After the xor is folded into a copy, the same mov can end up in the fold list twice, so on the second attempt it will fail expecting to see a register to fold. llvm-svn: 305821
* AMDGPU: Preserve undef when folding register operandsMatt Arsenault2017-06-201-0/+2
| | | | | | | | If the source was a copy of an undef register, this would produce a read of an undefined register which is a verifier error. llvm-svn: 305816
* AMDGPU: Fix crash with undef vreg input operandMatt Arsenault2017-06-201-1/+1
| | | | llvm-svn: 305814
* [AMDGPU] Fix SIFoldOperands crash with clampStanislav Mekhanoshin2017-06-051-1/+2
| | | | | | | | | Fixes bug #33302. Pass did not account that Src1 of max instruction can be an immediate. Differential Revision: https://reviews.llvm.org/D33884 llvm-svn: 304696
* [AMDGPU] Preserve operand order in SIFoldOperandsStanislav Mekhanoshin2017-06-031-3/+18
| | | | | | | | | SIFoldOperands can commute operands even if no folding was done. This change is to preserve IR is no folding was done. Differential Revision: https://reviews.llvm.org/D33802 llvm-svn: 304625
* [AMDGPU] Allow SDWA in instructions with immediates and SGPRsStanislav Mekhanoshin2017-05-301-3/+4
| | | | | | | | | | | | | | | | An encoding does not allow to use SDWA in an instruction with scalar operands, either literals or SGPRs. That is however possible to copy these operands into a VGPR first. Several copies of the value are produced if multiple SDWA conversions were done. To cleanup MachineLICM (to hoist copies out of loops), MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace SGPR to VGPR copy with immediate copy right to the VGPR) runs are added after the SDWA pass. Differential Revision: https://reviews.llvm.org/D33583 llvm-svn: 304219
* [AMDGPU] SDWA Peephole: improve search for immediates in SDWA patternsSam Kolton2017-03-311-22/+1
| | | | | | | | | | | | | | | | | Previously compiler often extracted common immediates into specific register, e.g.: ``` %vreg0 = S_MOV_B32 0xff; %vreg2 = V_AND_B32_e32 %vreg0, %vreg1 %vreg4 = V_AND_B32_e32 %vreg0, %vreg3 ``` Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands: ``` SDWA src: %vreg2 src_sel:BYTE_0 SDWA src: %vreg4 src_sel:BYTE_0 ``` With this change peephole check if operand is either immediate or register that is copy of immediate. llvm-svn: 299202
* [AMDGPU] Fold V_CNDMASK with identical source operandsStanislav Mekhanoshin2017-03-241-0/+29
| | | | | | | | Such instructions sometimes appear after lowering and folding. Differential Revision: https://reviews.llvm.org/D31318 llvm-svn: 298723
* AMDGPU: Support v2i16/v2f16 packed operationsMatt Arsenault2017-02-271-6/+8
| | | | llvm-svn: 296396
* AMDGPU: Fold omod into instructionsMatt Arsenault2017-02-271-5/+139
| | | | llvm-svn: 296372
* AMDGPU: Use clamp with f64Matt Arsenault2017-02-221-1/+2
| | | | llvm-svn: 295908
* AMDGPU: Fold FP clamp as modifier bitMatt Arsenault2017-02-221-4/+72
| | | | | | | | | | | The manual is unclear on the details of this. It's not clear to me if denormals are not allowed with clamp, or if that is only omod. Not allowing denorms for fp16 or fp64 isn't useful so I also question if that is really a restriction. Same with whether this is valid without IEEE mode enabled. llvm-svn: 295905
* AMDGPU: Fix folding immediates into mac src2Matt Arsenault2017-01-111-2/+30
| | | | | | | Whether it is legal or not needs to check for the instruction it will be replaced with. llvm-svn: 291711
* AMDGPU: Constant fold when immediate is materializedMatt Arsenault2017-01-101-141/+228
| | | | | | In future commits these patterns will appear after moveToVALU changes. llvm-svn: 291615
* AMDGPU: Fix handling of 16-bit immediatesMatt Arsenault2016-12-101-5/+9
| | | | | | | | | | | | | | | | | | Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306
* AMDGPU : Add S_SETREG instructions to fix fdiv precision issues.Tom Stellard2016-12-071-0/+7
| | | | | | | | | | | | | | Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879
* AMDGPU: Refactor immediate folding logicMatt Arsenault2016-11-291-14/+50
| | | | | | | | | | | | | Change the logic for when to fold immediates to consider the destination operand rather than the source of the materializing mov instruction. No change yet, but this will allow for correctly handling i16/f16 operands. Since 32-bit moves are used to materialize constants for these, the same bitvalue will not be in the register. llvm-svn: 288184
* AMDGPU: Cleanup immediate folding codeMatt Arsenault2016-11-231-64/+62
| | | | | | | Move code down to use, reorder to avoid hard to follow immediate folding logic. llvm-svn: 287818
* AMDGPU: Fix debug printingMatt Arsenault2016-11-231-1/+1
| | | | | | The uint8_t was printed as a char which didn't really work. llvm-svn: 287817
* [AMDGPU] Add f16 support (VI+)Konstantin Zhuravlyov2016-11-131-7/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D25975 llvm-svn: 286753
* AMDGPU: Don't fold undef uses or copies with implicit usesMatt Arsenault2016-10-061-4/+22
| | | | llvm-svn: 283476
* AMDGPU: Remove leftover implicit operands when folding immediatesMatt Arsenault2016-10-061-7/+26
| | | | | | | | When constant folding an operation to a copy or an immediate mov, the implicit uses/defs of the old instruction were left behind, e.g. replacing v_or_b32 left the implicit exec use on the new copy. llvm-svn: 283471
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-3/+1
| | | | llvm-svn: 283004
* AMDGPU: Support folding FrameIndex operandsMatt Arsenault2016-09-141-9/+26
| | | | | | This avoids test regressions in a future commit. llvm-svn: 281491
* AMDGPU: Improve splitting 64-bit bit ops by constantsMatt Arsenault2016-09-141-0/+126
| | | | | | | | This addresses a TODO to handle operations besides and. This also starts eliminating no-op operations with a constant that can emerge later. llvm-svn: 281488
* AMDGPU: Don't fold subregister extracts into tied operandsMatt Arsenault2016-08-151-3/+15
| | | | llvm-svn: 278676
* CodeGen: Use MachineInstr& in TargetInstrInfo, NFCDuncan P. N. Exon Smith2016-06-301-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is mostly a mechanical change to make TargetInstrInfo API take MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator) when the argument is expected to be a valid MachineInstr. This is a general API improvement. Although it would be possible to do this one function at a time, that would demand a quadratic amount of churn since many of these functions call each other. Instead I've done everything as a block and just updated what was necessary. This is mostly mechanical fixes: adding and removing `*` and `&` operators. The only non-mechanical change is to split ARMBaseInstrInfo::getOperandLatencyImpl out from ARMBaseInstrInfo::getOperandLatency. Previously, the latter took a `MachineInstr*` which it updated to the instruction bundle leader; now, the latter calls the former either with the same `MachineInstr&` or the bundle leader. As a side effect, this removes a bunch of MachineInstr* to MachineBasicBlock::iterator implicit conversions, a necessary step toward fixing PR26753. Note: I updated WebAssembly, Lanai, and AVR (despite being off-by-default) since it turned out to be easy. I couldn't run tests for AVR since llc doesn't link with it turned on. llvm-svn: 274189
* AMDGPU: Cleanup subtarget handling.Matt Arsenault2016-06-241-4/+3
| | | | | | | | | Split AMDGPUSubtarget into amdgcn/r600 specific subclasses. This removes most of the static_casting of the basic codegen classes everywhere, and tries to restrict the features visible on the wrong target. llvm-svn: 273652
* Add optimization bisect opt-in calls for AMDGPU passesAndrew Kaylor2016-04-251-0/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D19450 llvm-svn: 267485
* AMDGPU: Fix passes depending on dominator tree for no reasonMatt Arsenault2016-02-111-8/+2
| | | | llvm-svn: 260494
* AMDGPU/SI: Fix a bug in SIFoldOperandsMarek Olsak2016-01-131-0/+11
| | | | | | | | | | | | Summary: ret.ll will contain a test for this Reviewers: tstellarAMD, arsenm Subscribers: arsenm Differential Revision: http://reviews.llvm.org/D16029 llvm-svn: 257590
* AMDGPU/SI: Fold operands with sub-registersNicolai Haehnle2016-01-071-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs, increasing the code size and VGPR pressure. These moves are now folded away. Note that this lack of operand folding was not a problem for VMEM loads, because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register coalescer. Some tests are updated, note that the fsub.ll test explicitly checks that the move is elided. With the IR generated by current Mesa, the changes are obviously relatively minor: 7063 shaders in 3531 tests Totals: SGPRS: 351872 -> 352560 (0.20 %) VGPRS: 199984 -> 200732 (0.37 %) Code Size: 9876968 -> 9881112 (0.04 %) bytes LDS: 91 -> 91 (0.00 %) blocks Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave Wait states: 295164 -> 295337 (0.06 %) Totals from affected shaders: SGPRS: 65784 -> 66472 (1.05 %) VGPRS: 38064 -> 38812 (1.97 %) Code Size: 1993828 -> 1997972 (0.21 %) bytes LDS: 42 -> 42 (0.00 %) blocks Scratch: 795648 -> 783360 (-1.54 %) bytes per wave Wait states: 54026 -> 54199 (0.32 %) Reviewers: tstellarAMD, arsenm, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15875 llvm-svn: 257074
* AMDGPU: Fix verifier error in SIFoldOperandsMatt Arsenault2015-10-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | There may be other use operands that also need their kill flags cleared. This happens in a few tests when SIFoldOperands is moved after PeepholeOptimizer. PeepholeOptimizer rewrites cases that look like: %vreg0 = ... %vreg1 = COPY %vreg0 use %vreg1<kill> %vreg2 = COPY %vreg0 use %vreg2<kill> to use the earlier source to %vreg0 = ... use %vreg0 use %vreg0 Currently SIFoldOperands sees the copied registers, so there is only one use. So far I haven't managed to come up with a test that currently has multiple uses of a foldable VGPR -> VGPR copy. llvm-svn: 250960
* Improved the interface of methods commuting operands, improved X86-FMA3 ↵Andrew Kaylor2015-09-281-3/+12
| | | | | | | | | | mem-folding&coalescing. Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com) Differential Revision: http://reviews.llvm.org/D11370 llvm-svn: 248735
* AMDGPU: Fix recomputing dominator tree unnecessarilyMatt Arsenault2015-09-251-0/+1
| | | | | | | SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands. llvm-svn: 248587
* AMDGPU/SI: Fix creating v_mov_b32s without exec usesMatt Arsenault2015-09-101-2/+14
| | | | | | | This will be caught by existing tests with a verifier check to be added in a future commit. llvm-svn: 247229
* AMDGPU/SI: Fold operands through REG_SEQUENCE instructionsTom Stellard2015-09-091-0/+21
| | | | | | | | | | | | | | Summary: This helps mostly when we use add instructions for address calculations that contain immediates. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12256 llvm-svn: 247157
* AMDGPU/SI: Fix some invaild assumptions when folding 64-bit immediatesTom Stellard2015-08-291-1/+5
| | | | | | | | | | | | | | | Summary: We were assuming tha if the use operand had a sub-register that the immediate was 64-bits, but this was breaking the case of folding a 64-bit immediate into another 64-bit instruction. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12255 llvm-svn: 246354
* AMDGPU/SI: Factor operand folding code into its own functionTom Stellard2015-08-281-67/+79
| | | | | | | | | | Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12254 llvm-svn: 246353
* AMDGPU/SI: Select mad patterns to v_mac_f32Tom Stellard2015-07-131-0/+31
| | | | | | | | | The two-address instruction pass will convert these back to v_mad_f32 if necessary. Differential Revision: http://reviews.llvm.org/D11060 llvm-svn: 242038
* R600 -> AMDGPU renameTom Stellard2015-06-131-0/+288
llvm-svn: 239657
OpenPOWER on IntegriCloud