summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Improve alphabetizing of load folding tables. NFCCraig Topper2017-02-111-18/+18
| | | | llvm-svn: 294857
* [X86][3DNow!] Enable PFSUB<->PFSUBR commutationSimon Pilgrim2017-02-111-0/+12
| | | | llvm-svn: 294847
* [AVX-512] Add VPINSRB/W/D/Q instructions to load folding tables.Craig Topper2017-02-111-0/+4
| | | | llvm-svn: 294830
* [AVX-512] Add VPSADBW instructions to load folding tables.Craig Topper2017-02-111-0/+3
| | | | llvm-svn: 294827
* [X86] Don't base domain decisions on VEXTRACTF128/VINSERTF128 if only AVX1 ↵Craig Topper2017-02-111-4/+19
| | | | | | | | | | | | is available. Seems the execution dependency pass likes to use FP instructions when most of the consuming code is integer if a vextractf128 instruction produced the register. Without AVX2 we don't have the corresponding integer instruction available. This patch suppresses the domain on these instructions to GenericDomain if AVX2 is not supported so that they are ignored by domain fixing. If AVX2 is supported we'll report the correct domain and allow them to switch between integer and fp. Overall I think this produces better results in the modified test cases. llvm-svn: 294824
* X86: Teach X86InstrInfo::analyzeCompare to recognize compares of symbols.Peter Collingbourne2017-02-091-17/+22
| | | | | | | | | | | | | | | This requires that we communicate to X86InstrInfo::optimizeCompareInstr that the second operand is neither a register nor an immediate. The way we do that is by setting CmpMask to zero. Note that there were already instructions where the second operand was not a register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero for those instructions. This seems like a latent bug, but I was unable to trigger it. Differential Revision: https://reviews.llvm.org/D28621 llvm-svn: 294634
* [X86] Disable conditional tail calls (PR31257)Hans Wennborg2017-02-071-79/+0
| | | | | | | | | They are currently modelled incorrectly (as calls, which clobber registers, confusing e.g. Machine Copy Propagation). Reverting until we figure out the proper solution. llvm-svn: 294348
* [AVX-512] Add masked and unmasked shift by immediate instructions to load ↵Craig Topper2017-02-071-0/+81
| | | | | | folding tables. llvm-svn: 294287
* [AVX-512] Add masked shift instructions to load folding tables.Craig Topper2017-02-071-0/+108
| | | | | | This adds the masked versions of everything, but the shift by immediate instructions. llvm-svn: 294286
* [AVX-512] Add some of the shift instructions to the load folding tables.Craig Topper2017-02-071-0/+49
| | | | | | | | This includes unmasked forms of variable shift and shifting by the lower element of a register. Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms. llvm-svn: 294285
* [AVX-512] Add VPSLLDQ/VPSRLDQ to load folding tables.Craig Topper2017-02-061-0/+6
| | | | llvm-svn: 294170
* [AVX-512] Add VPABSB/D/Q/W to load folding tables.Craig Topper2017-02-061-0/+34
| | | | llvm-svn: 294169
* [AVX-512] Add VSHUFPS/PD to load folding tables.Craig Topper2017-02-061-0/+16
| | | | llvm-svn: 294168
* [AVX-512] Add VPMULLD/Q/W instructions to load folding tables.Craig Topper2017-02-061-0/+27
| | | | llvm-svn: 294164
* [AVX-512] Add all masked and unmasked versions of VPMULDQ and VPMULUDQ to ↵Craig Topper2017-02-051-0/+16
| | | | | | load folding tables. llvm-svn: 294163
* [AVX-512] Add scalar masked max/min intrinsic instructions to the load ↵Craig Topper2017-02-051-0/+12
| | | | | | folding tables. llvm-svn: 294153
* [AVX-512] Add scalar masked add/sub/mul/div intrinsic instructions to the ↵Craig Topper2017-02-051-0/+24
| | | | | | load folding tables. llvm-svn: 294152
* [AVX-512] Add masked scalar FMA intrinsics to ↵Craig Topper2017-02-051-0/+24
| | | | | | isNonFoldablePartialRegisterLoad to improve load folding of scalar loads. llvm-svn: 294151
* [CodeGen] Move MacroFusion to the targetEvandro Menezes2017-02-011-159/+0
| | | | | | | | | | | | | This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target. In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end. Differential revision: https://reviews.llvm.org/D28489 llvm-svn: 293737
* [AVX-512] Don't both looking into the AVX512DQ execution domain fixing ↵Craig Topper2017-01-311-7/+10
| | | | | | tables if AVX512DQ isn't supported since we can't do any conversion anyway. llvm-svn: 293608
* [X86] Add AVX and SSE2 version of MOVSDmr to execution domain fixing table. ↵Craig Topper2017-01-311-0/+2
| | | | | | AVX-512 already did this for the EVEX version. llvm-svn: 293607
* [AVX-512] Fix copy and paste bug in execution domain fixing tables so that ↵Craig Topper2017-01-311-1/+1
| | | | | | we can convert 256-bit movnt instructions. llvm-svn: 293606
* [AVX-512] Remove duplicate CodeGenOnly patterns for scalar register ↵Craig Topper2017-01-301-5/+0
| | | | | | | | broadcast. We can use COPY_TO_REGCLASS like AVX does. This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX. llvm-svn: 293464
* [AVX-512] Remove KSET0B/KSET1B in favor of the patterns that select ↵Craig Topper2017-01-301-2/+0
| | | | | | KSET0W/KSET1W for v8i1. llvm-svn: 293458
* [AVX-512] Teach two address instruction pass to replace masked move ↵Craig Topper2017-01-141-0/+125
| | | | | | | | | | | | instructions with blendm instructions when its beneficial. Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions. This also picks up cases where the masked move was created due to a masked load intrinsic. Differential Revision: https://reviews.llvm.org/D28454 llvm-svn: 292005
* [AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance ↵Craig Topper2017-01-141-6/+25
| | | | | | | | | | AVX512_128_SET0 expansion to make this possible. We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD. This makes it possible for the register allocator to have all 32 registers available to work with. llvm-svn: 292004
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-131-49/+41
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* [AVX-512] Remove unmasked BLENDM instructions from the wrong load folding ↵Craig Topper2017-01-131-4/+0
| | | | | | | | table. The unmasked versions read memory from operand 2, but were in the operand 3 table. These aren't the most interesting set of blendm instructions as the unmasked version isn't useful. We were also missing the B and W forms. I'll add the masked versions of all sizes in a future patch. llvm-svn: 291885
* [X86] Move some entries in the load folding tables to move appropriate ↵Craig Topper2017-01-131-10/+10
| | | | | | grouping. NFC llvm-svn: 291884
* [AVX-512] Add patterns to use a zero masked VPTERNLOG instruction for ↵Craig Topper2017-01-091-0/+15
| | | | | | | | vselects of all ones and all zeros. Previously we emitted a VPTERNLOG and a separate masked move. llvm-svn: 291415
* [X86] Disable load unfolding for 128-bit MOVDDUP instructions since the load ↵Craig Topper2017-01-071-2/+2
| | | | | | size is smaller than the register size so unfolding would increase the load size. llvm-svn: 291338
* [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load ↵Craig Topper2016-12-271-2/+27
| | | | | | folding tables. llvm-svn: 290591
* [TargetInstrInfo] replace redundant expression in getMemOpBaseRegImmOfsMichael LeMay2016-12-191-2/+1
| | | | | | | | | | | | | | | Summary: The expression for computing the return value of getMemOpBaseRegImmOfs has only one possible value. The other value would result in a return earlier in the function. This patch replaces the expression with its only possible value. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27437 llvm-svn: 290133
* [AVX-512] Use EVEX encoded XOR instruction for zeroing scalar registers when ↵Craig Topper2016-12-181-3/+10
| | | | | | | | DQI and VLX instructions are available. This can give the register allocator more registers to use. llvm-svn: 290057
* [X86][SSE] Fix domains for scalar store instructionsSimon Pilgrim2016-12-151-0/+4
| | | | | | As discussed on D27692 llvm-svn: 289834
* [X86][AVX512] Moved instruction domain lookups to the right table. NFCI.Simon Pilgrim2016-12-151-4/+4
| | | | | | Avoid duplicating instructions in the int32/int64 domains. llvm-svn: 289830
* [X86][SSE] Fix domains for VZEXT_LOAD type instructionsSimon Pilgrim2016-12-151-0/+6
| | | | | | | | Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions. Differential Revision: https://reviews.llvm.org/D27684 llvm-svn: 289825
* [peephole] Enhance folding logic to work for STATEPOINTsPhilip Reames2016-12-131-18/+7
| | | | | | | | | | | | | | The general idea here is to get enough of the existing restrictions out of the way that the already existing folding logic in foldMemoryOperand can kick in for STATEPOINTs and fold references to immutable stack slots. The key changes are: Support for folding multiple operands at once which reference the same load Support for folding multiple loads into a single instruction Walk all the operands of the instruction for varidic instructions (this is a bug fix!) Once this lands, I'll post another patch which refactors the TII interface here. There's nothing actually x86 specific about the x86 code used here. Differential Revision: https://reviews.llvm.org/D24103 llvm-svn: 289510
* [X86] Remove some intrinsic instructions from hasPartialRegUpdateCraig Topper2016-12-121-8/+0
| | | | | | | | | | | | | | | | | Summary: These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits. As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded. Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too. Reviewers: spatel, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27611 llvm-svn: 289419
* [X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables.Craig Topper2016-12-091-6/+84
| | | | llvm-svn: 289186
* [AVX-512] Add vpermilps/pd to load folding tables.Craig Topper2016-12-091-0/+36
| | | | llvm-svn: 289173
* [X86] Do not assume "ri" instructions always have an immediate operandMichael Kuperstein2016-12-071-3/+3
| | | | | | | | | | | The second operand of an "ri" instruction may be an immediate, but it may also be a globalvariable, so we should make any assumptions. This fixes PR31271. Differential Revision: https://reviews.llvm.org/D27481 llvm-svn: 288964
* [X86] Remove scalar logical op alias instructions. Just use ↵Craig Topper2016-12-061-8/+0
| | | | | | | | | | | | | | | | | | | COPY_FROM/TO_REGCLASS and the normal packed instructions instead Summary: This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently. I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel. I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available. Reviewers: spatel, delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27401 llvm-svn: 288771
* [X86] Fix non-intrinsic roundss/roundsd to not read the destination registerMichael Kuperstein2016-12-051-2/+2
| | | | | | | | | | | | This changes the scalar non-intrinsic non-avx roundss/sd instruction definitions not to read their destination register - allowing partial dependency breaking. This fixes PR31143. Differential Revision: https://reviews.llvm.org/D27323 llvm-svn: 288703
* [AVX-512] Add many of the VPERM instructions to the load folding table. Move ↵Craig Topper2016-12-031-1/+53
| | | | | | VPERMPDZri to the correct table. llvm-svn: 288591
* [AVX-512] Add EVEX VPMADDUBSW and VPMADDWD to the load folding tables.Craig Topper2016-12-031-0/+18
| | | | llvm-svn: 288587
* [AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding ↵Craig Topper2016-12-021-0/+27
| | | | | | tables. llvm-svn: 288484
* [AVX-512] Add EVEX PSHUFB instructions to load folding tables.Craig Topper2016-12-021-0/+9
| | | | llvm-svn: 288482
* [AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables.Craig Topper2016-12-021-1/+25
| | | | llvm-svn: 288481
* MachineScheduler: Export function to construct "default" scheduler.Matthias Braun2016-11-281-2/+2
| | | | | | | | | | | | | | | | | | This makes the createGenericSchedLive() function that constructs the default scheduler available for the public API. This should help when you want to get a scheduler and the default list of DAG mutations. This also shrinks the list of default DAG mutations: {Load|Store}ClusterDAGMutation and MacroFusionDAGMutation are no longer added by default. Targets can easily add them if they need them. It also makes it easier for targets to add alternative/custom macrofusion or clustering mutations while staying with the default createGenericSchedLive(). It also saves the callback back and forth in TargetInstrInfo::enableClusterLoads()/enableClusterStores(). Differential Revision: https://reviews.llvm.org/D26986 llvm-svn: 288057
OpenPOWER on IntegriCloud