summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Introduce new MOVSSrm/MOVSDrm opcodes that use VR128 register class.Craig Topper2019-06-181-7/+49
| | | | | | | | | | | | | | | | | | | | | | Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt. Use the new versions in patterns that previously used a COPY_TO_REGCLASS to VR128. These patterns expect the upper bits to be zero. The current set up appears to work, but I'm not sure we should be enforcing upper bits being zero through a COPY_TO_REGCLASS. I wanted to flip the arrangement and use a COPY_TO_REGCLASS to FR32/FR64 for the patterns that need an f32/f64 result, but that complicated fastisel and globalisel. I've been doing some experiments with reducing some isel patterns and ended up in a situation where I had a (SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our post-isel peephole was unable to avoid using an instruction for the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128 instruction removes the COPY_TO_REGCLASS that was breaking this. llvm-svn: 363643
* [CodeGen] Add getMachineMemOperand + MachineMemOperand::Flags allocator ↵Simon Pilgrim2019-06-131-8/+2
| | | | | | | | helper wrapper. NFCI. Pre-commit for D62726 on behalf of @luke (Luke Lau) llvm-svn: 363257
* [X86] Add VCMPSSZrr_Intk and VCMPSDZrr_Intk to isNonFoldablePartialRegisterLoad.Craig Topper2019-06-121-0/+2
| | | | | | | | | The non-masked versions are already in there. I'm having some trouble coming up with a way to test this right now. Most load folding should happen during isel so I'm not sure how to get peephole pass to do it. llvm-svn: 363125
* [X86] Add load folding isel patterns to scalar_math_patterns and ↵Craig Topper2019-06-111-0/+1
| | | | | | | | AVX512_scalar_math_fp_patterns. Also add a FIXME for the peephole pass not being able to handle this. llvm-svn: 363032
* [SystemZ, RegAlloc] Favor 3-address instructions during instruction selection.Jonas Paulsson2019-06-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868
* [X86] Make masked floating point equality/ordered compares commutable for ↵Craig Topper2019-06-061-4/+14
| | | | | | | | load folding purposes. Same as what is supported for the unmasked form. llvm-svn: 362717
* [X86] Add the vector integer min/max instructions to ↵Craig Topper2019-06-051-0/+84
| | | | | | | | | | | | | | | | | isAssociativeAndCommutative. As far as I know these should be freely reassociatable just like the floating point MAXC/MINC instructions. The *reduce* test changes are largely regressions and caused by the "generic" CPU we default to not having a scheduler model. The machine-combiner-int-vec.ll test shows the positive benefits of this change. Differential Revision: https://reviews.llvm.org/D62787 llvm-svn: 362629
* [X86] Add the SSE versions of PMULLW and PMULLD to isAssociativeAndCommutative.Craig Topper2019-06-021-0/+2
| | | | llvm-svn: 362309
* [X86] Add VP2INTERSECT instructionsPengfei Wang2019-05-311-0/+8
| | | | | | | | | | Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188
* [X86-64] Fix 256-bit SET0 lowering for non-VLX targetsDavid Greene2019-05-281-0/+6
| | | | | | | | | | If we don't have VLX then 256-bit SET0 should be lowered to VPXOR with ZMM registers. This restores functionality accidentally removed by r309926. Differential Revision: https://reviews.llvm.org/D62415 llvm-svn: 361843
* Factor out redzone ABI checks [NFCI]Philip Reames2019-05-101-1/+1
| | | | | | | | | | As requested in D58632, cleanup our red zone detection logic in the X86 backend. The existing X86MachineFunctionInfo flag is used to track whether we *use* the redzone (via a particularly optimization?), but there's no common way to check whether the function *has* a red zone. I'd appreciate careful review of the uses being updated. I think they are NFC, but a careful eye from someone else would be appreciated. Differential Revision: https://reviews.llvm.org/D61799 llvm-svn: 360479
* [X86] Reduce scope of variables where possible. NFCI.Simon Pilgrim2019-05-071-1/+1
| | | | | | Fixes cppcheck warnings. llvm-svn: 360131
* [X86] X86InstrInfo::findThreeSrcCommutedOpIndices - fix unread variable warning.Simon Pilgrim2019-05-061-1/+2
| | | | | | scan-build was reporting that CommutableOpIdx1 never used its original initialized value - move it down to where its first used to make the real initialization more obvious (and matches the comment that's there). llvm-svn: 360028
* [CodeGen] Add "const" to MachineInstr::mayAliasBjorn Pettersson2019-04-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The basic idea here is to make it possible to use MachineInstr::mayAlias also when the MachineInstr is const (or the "Other" MachineInstr is const). The addition of const in MachineInstr::mayAlias then rippled down to the need for adding const in several other places, such as TargetTransformInfo::getMemOperandWithOffset. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60856 llvm-svn: 358744
* [X86] In CopyToFromAsymmetricReg, use VR128 instead of FR32 instructions for ↵Craig Topper2019-04-171-12/+12
| | | | | | | | | | | | | GR32<->XMM register copies. We have two versions of some instructions, VR128 versions and FR32 versions that are marked as CodeGenOnly. This change switches to using the VR128 versions for these copies. It's after register allocation so the class size no longer matters. This matches how GR64 works. llvm-svn: 358555
* [X86] Merge the different Jcc instructions for each condition code into ↵Craig Topper2019-04-051-59/+20
| | | | | | | | | | | | | | | | | | | | | single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between Jcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: spatel, lebedev.ri, courbet, gchatelet, RKSimon Reviewed By: RKSimon Subscribers: MatzeB, qcolombet, eraman, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60228 llvm-svn: 357802
* [X86] Merge the different SETcc instructions for each condition code into ↵Craig Topper2019-04-051-64/+16
| | | | | | | | | | | | | | | | | | | | | single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri Reviewed By: andreadb Subscribers: hiraditya, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60138 llvm-svn: 357801
* [X86] Merge the different CMOV instructions for each condition code into ↵Craig Topper2019-04-051-185/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | single instructions that store the condition code as an immediate. Summary: Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models. This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between CMOV instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. This does complicate the scheduler models a little since we can't assign the A and BE instructions to a separate class now. I plan to make similar changes for SETcc and Jcc. Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet Reviewed By: RKSimon Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60041 llvm-svn: 357800
* [IR] Refactor attribute methods in Function class (NFC)Evandro Menezes2019-04-041-5/+5
| | | | | | | | Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731
* [X86] Mark the default case of the X86InstrInfo::convertToThreeAddress ↵Craig Topper2019-04-021-1/+1
| | | | | | | | | switch as unreachable. This function should only be called with instructions that are really convertible. And all convertible instructions need to be handled by the switch. So nothing should use the default. llvm-svn: 357529
* [X86] Add post-isel pseudos for rotate by immediate using SHLD/SHRDCraig Topper2019-03-271-0/+18
| | | | | | | | | | | | | | Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD. When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization. This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work. Fixes PR41055 Differential Revision: https://reviews.llvm.org/D59391 llvm-svn: 357096
* [X86] Make ADD*_DB post-RA pseudos and expand them in expandPostRAPseudo.Craig Topper2019-03-181-0/+11
| | | | | | | | | These are used to help convert OR->LEA when needed to avoid avoid a copy. They aren't need after register allocation. Happens to remove an ugly goto from X86MCCodeEmitter.cpp llvm-svn: 356356
* [X86] Remove VCVTSI2SDZrrb_Int as it shouldn't exist.Craig Topper2019-03-111-1/+0
| | | | | | This would convert a signed 32-bit integer to double precision with rounding. But there's nothing to round. llvm-svn: 355795
* [X86] Enable 8-bit SHL to convert to LEACraig Topper2019-03-051-0/+4
| | | | | | Differential Revision: https://reviews.llvm.org/D58870 llvm-svn: 355425
* [X86] Allow 8-bit INC/DEC to be converted to LEA.Craig Topper2019-03-051-2/+7
| | | | | | | | We already do this for 16/32/64 as well as 8-bit add with register/immediate. Might as well do it for 8-bit INC/DEC too. Differential Revision: https://reviews.llvm.org/D58869 llvm-svn: 355424
* [X86] Enable 8-bit OR with disjoint bits to convert to LEACraig Topper2019-03-051-9/+17
| | | | | | | | We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64. Differential Revision: https://reviews.llvm.org/D58863 llvm-svn: 355423
* [X86] Use X86::LAST_VALID_COND instead of assuming X86::COND_S is the last ↵Craig Topper2019-02-281-1/+1
| | | | | | encoding. NFC llvm-svn: 355059
* Fix MSVC constant truncation warnings. NFCI.Simon Pilgrim2019-02-231-11/+11
| | | | llvm-svn: 354731
* [X86] Sign extend the 8-bit immediate when commuting blend instructions to ↵Craig Topper2019-02-231-3/+5
| | | | | | | | | | | | match isel. Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t. This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost. The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results. llvm-svn: 354724
* [X86] Don't prevent load folding for cvtsi2ss/cvtsi2sd based on ↵Craig Topper2019-02-161-39/+48
| | | | | | | | | | hasPartialRegUpdate. Preventing the load fold won't fix the partial register update since the input we can fold is a GPR. So it will do nothing to prevent a false dependency on an XMM register. llvm-svn: 354193
* X86: Replace isSafeToClobberEFLAGS implementationMatt Arsenault2019-02-151-85/+1
| | | | | | Also use modifiesRegister instead of looping over operands. llvm-svn: 354098
* [X86] Remove isReMaterializable from X87 floating point constant loads and ↵Craig Topper2019-02-081-1/+0
| | | | | | | | | | | | | | | | | | constant pool loads. Summary: These instructions update FPSW so they aren't generically safe to rematerialize into any location if FPSW is live for a comparison result. They also use FPCW for exception masking control. Though the only exception they can generate is stack overflow and we manage the stack ourselves so that's not really going to occur. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57934 llvm-svn: 353536
* [X86][AVX] Add VMOVDDUP-VPBROADCASTQ execution domain mappingSimon Pilgrim2019-02-011-0/+4
| | | | | | | | Noticed in D57514. Differential Revision: https://reviews.llvm.org/D57519 llvm-svn: 352922
* Fix "comparison of unsigned expression >= 0 is always true" warning. NFCI.Simon Pilgrim2019-01-221-1/+1
| | | | llvm-svn: 351816
* [X86][SSE] Add selective commutation support for insertps (PR40340)Simon Pilgrim2019-01-221-0/+22
| | | | | | | | When we are inserting 1 "inline" element, and zeroing 2 of the other elements then we can safely commute the insertps source inputs to improve memory folding. Differential Revision: https://reviews.llvm.org/D56843 llvm-svn: 351807
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the ↵Craig Topper2018-12-211-6/+18
| | | | | | | | | | | | | | sign flag is used. The BEXTR instruction documents the SF bit as undefined. The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed. Fixes PR40060 Differential Revision: https://reviews.llvm.org/D55807 llvm-svn: 349956
* [X86] Add BSR to isUseDefConvertible.Craig Topper2018-12-181-6/+6
| | | | | | | | We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there. This addresses one issue from PR40090. llvm-svn: 349531
* [X86] Const correct some helper functions X86InstrInfo.cpp. NFCCraig Topper2018-12-181-6/+7
| | | | llvm-svn: 349440
* [X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr.Craig Topper2018-12-171-0/+4
| | | | | | These seem to have been missed when the other TBM instructions were added. llvm-svn: 349404
* [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEASanjay Patel2018-12-121-7/+18
| | | | | | | | | | This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. That allows us to combine add ops with register moves and save some instructions. This is another step towards allowing add truncation in generic DAGCombiner (see D54640). Differential Revision: https://reviews.llvm.org/D55494 llvm-svn: 348946
* [x86] clean up code for converting 16-bit ops to LEA; NFCSanjay Patel2018-12-111-62/+57
| | | | | | | | As discussed in D55494, we want to extend this to handle 8-bit ops too, but that could be extended further to enable this on 32-bit systems too. llvm-svn: 348851
* [x86] remove dead code for 16-bit LEA formation; NFCSanjay Patel2018-12-111-57/+13
| | | | | | | | | | | As discussed in: D55494 ...this code has been disabled/dead for a long time (the code references Athlon and Pentium 4), and there's almost no chance that it will be used given the last decade of uarch evolution. Also, in SDAG we promote 16-bit ops to 32-bit, so there's almost no way to test this code any more. llvm-svn: 348845
* [x86] don't try to convert add with undef operands to LEASanjay Patel2018-12-091-36/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing code tries to handle an undef operand while transforming an add to an LEA, but it's incomplete because we will crash on the i16 test with the debug output shown below. It's better to just give up instead. Really, GlobalIsel should have folded these before we could get into trouble. # Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected bb.0 (%ir-block.0): liveins: $edi %1:gr32 = COPY killed $edi %0:gr16 = COPY %1.sub_16bit:gr32 %5:gr64_nosp = IMPLICIT_DEF %5.sub_16bit:gr64_nosp = COPY %0:gr16 %6:gr64_nosp = IMPLICIT_DEF %6.sub_16bit:gr64_nosp = COPY %2:gr16 %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg %3:gr16 = COPY killed %4.sub_16bit:gr32 $ax = COPY killed %3:gr16 RET 0, implicit killed $ax # End machine code for function add_undef_i16. *** Bad machine code: Reading virtual register without a def *** - function: add_undef_i16 - basic block: %bb.0 (0x7fe6cd83d940) - instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16 - operand 1: %2:gr16 LLVM ERROR: Found 1 machine code errors. Differential Revision: https://reviews.llvm.org/D54710 llvm-svn: 348722
* [CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operandFrancis Visoiu Mistrih2018-11-281-6/+7
| | | | | | | | | | | | | | | | | | Currently, instructions doing memory accesses through a base operand that is not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`. This means that functions such as `TII::shouldClusterMemOps` will bail out on instructions using an FI as a base instead of a register. The goal of this patch is to refactor all this to return a base operand instead of a base register. Then in a separate patch, I will add FI support to the mem op clustering in the MachineScheduler. Differential Revision: https://reviews.llvm.org/D54846 llvm-svn: 347746
* [TableGen] Refactor macro names (NFC)Evandro Menezes2018-11-271-1/+1
| | | | | | Make the names for the macros for `TargetInstrInfo` uniform. llvm-svn: 347706
* LivePhysRegs/IfConversion: Change some types from unsigned to MCPhysReg; NFCMatthias Braun2018-11-061-1/+1
| | | | | | | | Change the type in a couple of lists and sets that only store physical registers from unsigned to MCPhysRegs. The later is only 16bits and saves us a bit of memory. llvm-svn: 346254
* Revert r345165 "[X86] Bring back the MOV64r0 pseudo instruction"Craig Topper2018-10-311-19/+4
| | | | | | Google is reporting regressions on some benchmarks. llvm-svn: 345785
* [tblgen][PredicateExpander] Add the ability to describe more complex ↵Andrea Di Biagio2018-10-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | constraints on instruction operands. Before this patch, class PredicateExpander only knew how to expand simple predicates that performed checks on instruction operands. In particular, the new scheduling predicate syntax was not rich enough to express checks like this one: Foo(MI->getOperand(0).getImm()) == ExpectedVal; Here, the immediate operand value at index zero is passed in input to function Foo, and ExpectedVal is compared against the value returned by function Foo. While this predicate pattern doesn't show up in any X86 model, it shows up in other upstream targets. So, being able to support those predicates is fundamental if we want to be able to modernize all the scheduling models upstream. With this patch, we allow users to specify if a register/immediate operand value needs to be passed in input to a function as part of the predicate check. Now, register/immediate operand checks all derive from base class CheckOperandBase. This patch also changes where TIIPredicate definitions are expanded by the instructon info emitter. Before, definitions were expanded in class XXXGenInstrInfo (where XXX is a target name). With the introduction of this new syntax, we may want to have TIIPredicates expanded directly in XXXInstrInfo. That is because functions used by the new operand predicates may only exist in the derived class (i.e. XXXInstrInfo). This patch is a non functional change for the existing scheduling models. In future, we will be able to use this richer syntax to better describe complex scheduling predicates, and expose them to llvm-mca. Differential Revision: https://reviews.llvm.org/D53880 llvm-svn: 345714
* [ELF] Fix large code model MIR verifier errorsReid Kleckner2018-10-241-6/+22
| | | | | | | | | | | Instead of using the MOVGOT64r pseudo, use the existing MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to create a "scratch register operand" for the pseudo to use, and the register allocator can make better decisions. Fixes some X86 verifier errors tracked in PR27481. llvm-svn: 345219
OpenPOWER on IntegriCloud