summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions.Craig Topper2016-11-271-0/+20
| | | | llvm-svn: 288009
* [X86] Add SHL by 1 to the load folding tables.Craig Topper2016-11-271-0/+4
| | | | | | I don't think isel selects these today, favoring adding the register to itself instead. But the load folding tables shouldn't be so concerned with what isel will use and just represent the relationships. llvm-svn: 288007
* [AVX-512] Add integer and fp unpck instructions to load folding tables.Craig Topper2016-11-271-0/+108
| | | | llvm-svn: 288004
* [X86] Add TB_NO_REVERSE to entries in the load folding table where the ↵Craig Topper2016-11-271-188/+206
| | | | | | | | | | instruction's load size is smaller than the register size. If we were to unfold these, the load size would be increased to the register size. This is not safe to do since the enlarged load can do things like cross a page boundary into a page that doesn't exist. I probably missed some instructions, but this should be a large portion of them. llvm-svn: 288001
* [AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables.Craig Topper2016-11-271-0/+84
| | | | llvm-svn: 287995
* [X86] Remove alignment restrictions from load folding table for some ↵Craig Topper2016-11-271-13/+13
| | | | | | | | instructions that don't have a restriction. Most of these are the SSE4.1 PMOVZX/PMOVSX instructions which all read less than 128-bits. The only other was PMOVUPD which by definition is an unaligned load. llvm-svn: 287991
* [AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables.Craig Topper2016-11-261-0/+36
| | | | llvm-svn: 287975
* [AVX-512] Add masked 128/256-bit integer add/sub instructions to load ↵Craig Topper2016-11-261-0/+64
| | | | | | folding tables. llvm-svn: 287974
* [AVX-512] Add masked 512-bit integer add/sub instructions to load folding ↵Craig Topper2016-11-261-0/+31
| | | | | | tables. llvm-svn: 287972
* [AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables.Craig Topper2016-11-261-0/+8
| | | | llvm-svn: 287970
* [X86] Add SSE, AVX, and AVX2 version of MOVDQU to the load/store folding ↵Craig Topper2016-11-261-0/+6
| | | | | | | | tables for consistency. Not sure this is truly needed but we had the floating point equivalents, the aligned equivalents, and the EVEX equivalents. So this just makes it complete. llvm-svn: 287960
* [AVX-512] Put the AVX-512 sections of the load folding tables into mostly ↵Craig Topper2016-11-251-365/+373
| | | | | | alphabetical order. This is consistent with the older sections of the table. NFC llvm-svn: 287956
* [AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables.Craig Topper2016-11-251-0/+32
| | | | llvm-svn: 287937
* [X86] Allow folding of stack reloads when loading a subreg of the spilled regMichael Kuperstein2016-11-231-0/+16
| | | | | | | | | | | | | We did not support subregs in InlineSpiller:foldMemoryOperand() because targets may not deal with them correctly. This adds a target hook to let the spiller know that a target can handle subregs, and actually enables it for x86 for the case of stack slot reloads. This fixes PR30832. Differential Revision: https://reviews.llvm.org/D26521 llvm-svn: 287792
* [X86] Remove alternate CodeGenOnly version of (v)movq that declared the load ↵Craig Topper2016-11-221-3/+3
| | | | | | | | | | size as i128mem. Change all uses to the use the i64mem version. I'm sure this caused the load size to misprint in Intel syntax output. We were also inconsistent about which patterns used which instruction between VEX and EVEX. There are two different reg/reg versions of movq, one from a GPR and one from the lower 64-bits of an XMM register. This changes the loading folding table to use the single i64mem memory form for folding both cases. But we need to use TB_NO_REVERSE to prevent a duplicate entry in the unfolding table. llvm-svn: 287622
* [AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from ↵Craig Topper2016-11-221-6/+115
| | | | | | | | | | | | | | | | | VPERMI2(B/W/D/Q/PS/PD). Summary: The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities. We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too. Reviewers: igorb, delena, Ayal, Farhana, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25652 llvm-svn: 287621
* Fixing a small typo (A->U). Michael Zuckerman2016-11-211-1/+1
| | | | | | | | | This seem to fixes PR30992. - HasAVX512 ? X86::VMOVAPSZ128rm_NOVLX + HasAVX512 ? X86::VMOVUPSZ128rm_NOVLX llvm-svn: 287532
* [AVX-512] Add EVEX form of VMOVZPQILo2PQIZrm to load folding tables to match ↵Craig Topper2016-11-211-0/+1
| | | | | | SSE and AVX. llvm-svn: 287523
* [x86] add fake scalar FP logic instructions to ReplaceableInstrs to save ↵Sanjay Patel2016-11-161-0/+8
| | | | | | | | | | | | | | | | | | | some bytes We can replace "scalar" FP-bitwise-logic with other forms of bitwise-logic instructions. Scalar SSE/AVX FP-logic instructions only exist in your imagination and/or the bowels of compilers, but logically equivalent int, float, and double variants of bitwise-logic instructions are reality in x86, and the float variant may be a shorter instruction depending on which flavor (SSE or AVX) of vector ISA you have...so just prefer float all the time. This is a preliminary step towards solving PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137 Differential Revision: https://reviews.llvm.org/D26712 llvm-svn: 287122
* [X86] Cleanup 'x' and 'y' mnemonic suffixes for ↵Craig Topper2016-11-141-3/+3
| | | | | | | | | | | | | vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions. -Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions. -Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions. -Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax. -Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing. This should fix at least some of PR28850. llvm-svn: 286787
* Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which ↵Peter Collingbourne2016-11-091-0/+4
| | | | | | | | | represents a relocatable immediate.", with a fix for 32-bit x86. Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions that take a global address operand. llvm-svn: 286420
* [X86] Broadcast from memory intructions aren't unfoldableZvi Rackover2016-11-041-8/+8
| | | | | | | | Broadcast from memory instructions should be treated as moves. They can't be unfolded. Fixes pr30693. llvm-svn: 285998
* [X86] Use intrinsics table for PMADDUBSW and PMADDWD so that we can use the ↵Craig Topper2016-10-301-3/+3
| | | | | | | | legacy intrinsics to select EVEX encoded instructions when available. This removes a couple tablegen classes that become unused after this change. Another class gained an additional parameter to allow PMADDUBSW to specify a different result type from its input type. llvm-svn: 285515
* [X86] Use intrinsics table for VPMULHRSW intrincis so that the legacy ↵Craig Topper2016-10-291-3/+3
| | | | | | | | intrinsics can select EVEX encoded instructions when available. This requires a minor rename of the instructions due to the use of different tablegen classes and how the names are concatenated. llvm-svn: 285501
* Target: Remove unused entities.Peter Collingbourne2016-10-091-26/+0
| | | | llvm-svn: 283690
* [AVX-512] Add subvector insert and extract to load/store folding tables.Craig Topper2016-10-091-0/+25
| | | | llvm-svn: 283689
* [AVX-512] Add the vector down convert instructions to the store folding tables.Craig Topper2016-10-091-0/+24
| | | | llvm-svn: 283687
* [X86][SSE] Update register class during MOVSD/MOVSS - BLENDPD/BLENDPS ↵Simon Pilgrim2016-10-071-0/+11
| | | | | | | | | | | | | | commutation MOVSD/MOVSS take a 128-bit register and a FR32/FR64 register input, the commutation code wasn't taking this into account leading to verification errors. This patch inserts a vreg copy mi to ensure that the registers are correct. Fix for PR30607 Differential Revision: https://reviews.llvm.org/D25280 llvm-svn: 283539
* Revert r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions ↵Hans Wennborg2016-10-051-3/+4
| | | | | | | | | (PR26302)" This is suspected to cause a miscompile in Chromium. Reverting while investigating. llvm-svn: 283329
* [X86] Add MOV8rm_NOREX to switch in isReallyTriviallyReMaterializable to ↵Craig Topper2016-10-041-0/+1
| | | | | | match MOV8rm. llvm-svn: 283184
* [X86] Mark all sizes of (V)MOVUPD as trivially rematerializable.Craig Topper2016-10-031-0/+6
| | | | | | I don't know for sure that we truly needs this, but its the only vector load that isn't rematerializable. Making it consistent allows it to not be a special case in the td files. llvm-svn: 283083
* [X86][SSE] Enable commutation from MOVSD/MOVSS to BLENDPD/BLENDPS on SSE41+ ↵Simon Pilgrim2016-10-011-0/+30
| | | | | | | | | | | | targets Instead of selecting between MOVSD/MOVSS and BLENDPD/BLENDPS at shuffle lowering by subtarget this will help us select the instruction based on actual commutation requirements. We could possibly add BLENDPD/BLENDPS -> MOVSD/MOVSS commutation and MOVSD/MOVSS memory folding using a similar approach if it proves useful I avoided adding AVX512 handling as I'm not sure when we should be making use of VBLENDPD/VBLENDPS on EVEX targets llvm-svn: 283037
* Use StringRef in Pass/PassManager APIs (NFC)Mehdi Amini2016-10-011-2/+2
| | | | llvm-svn: 283004
* X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)Hans Wennborg2016-09-301-4/+3
| | | | | | | | | | | We can't use Jcc to leave a Win64 function in general, because that confuses the unwinder. However, for "leaf" functions, that is, functions where the return address is always on top of the stack and which don't have unwind info, it's OK. Differential Revision: https://reviews.llvm.org/D24836 llvm-svn: 282920
* [AVX-512] Add the special stack spilling pseudos for XMM16-31 and YMM16-31 ↵Craig Topper2016-09-301-0/+8
| | | | | | without VLX to teh isFrameLoadOpcode and isFrameStoreOpcode. llvm-svn: 282842
* [AVX-512] Support spills of XMM16-31 and YMM16-31 when VLX isn't available.Craig Topper2016-09-291-8/+111
| | | | | | | | This adds new pseudo instructions that can be selected during register allocation to represent loads and stores of XMM/YMM registers when AVX512F is available, but VLX isn't. They will be converted to VEX encoded moves if the register turns out to be XMM0-15/YMM0-15. Otherwise either an EVEX VEXTRACT(store) or VBROADCAST(load) will be used. Fixes one of the cases from PR29112. llvm-svn: 282690
* [X86] Add EVEX encoded VBROADCASTSS/SD and VPBROADCASTD/Q to execution ↵Craig Topper2016-09-291-0/+10
| | | | | | domain fixing table. llvm-svn: 282687
* [X86] Add VBROADCASTF128/VBROADCASTI128 to execution domain fixing tables.Craig Topper2016-09-291-1/+2
| | | | llvm-svn: 282684
* [X86] Use std::max to calculate alignment instead of assuming RC->getSize() ↵Craig Topper2016-09-271-2/+2
| | | | | | will not return a value greater than 32. I think it theoretically could be 64 for AVX-512. llvm-svn: 282471
* [AVX-512] Replace get512BitSuperRegister with calls to ↵Craig Topper2016-09-251-4/+10
| | | | | | TargetRegisterInfo::getMatchingSuperReg. llvm-svn: 282359
* [AVX-512] Add rounding versions of instructions to hasUndefRegUpdate.Craig Topper2016-09-251-0/+13
| | | | llvm-svn: 282357
* [AVX-512] Add the scalar unsigned integer to fp conversion instructions to ↵Craig Topper2016-09-251-0/+16
| | | | | | hasUndefRegUpdate. llvm-svn: 282356
* [AVX-512] Remove duplicate instructions for converting integer to scalar ↵Craig Topper2016-09-251-8/+0
| | | | | | floating point. We can use patterns to point to the other instructions instead. llvm-svn: 282355
* [AVX-512] Add support for commuting VPTERNLOG instructions.Craig Topper2016-09-221-36/+139
| | | | | | | | | | VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector. We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value. This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction. llvm-svn: 282132
* [AVX-512] Teach X86InstrInfo::copyPhysReg to use a 512-bit move if ↵Craig Topper2016-09-201-5/+25
| | | | | | | | XMM16-XMM31 or YMM16-YMM31 are the source or dest of the copy and VLX is not supported. This can happen with SUBREG_TO_REG of ZMM16-ZMM31. Fixes PR30430. llvm-svn: 281959
* Finish renaming remaining analyzeBranch functionsMatt Arsenault2016-09-141-2/+2
| | | | llvm-svn: 281535
* Make analyzeBranch family of instruction names consistentMatt Arsenault2016-09-141-2/+2
| | | | | | | analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506
* AArch64: Use TTI branch functions in branch relaxationMatt Arsenault2016-09-141-2/+7
| | | | | | | | | The main change is to return the code size from InsertBranch/RemoveBranch. Patch mostly by Tim Northover llvm-svn: 281505
* [X86] Copy imp-uses when folding tailcall into conditional branch.Ahmed Bougacha2016-09-121-1/+1
| | | | | | | | | | | r280832 added 32-bit support for emitting conditional tail-calls, but dropped imp-used parameter registers. This went unnoticed until r281113, which added 64-bit support, as this is only exposed with parameter passing via registers. Don't drop the imp-used parameters. llvm-svn: 281223
* CodeGen: Give MachineBasicBlock::reverse_iterator a handle to the current MIDuncan P. N. Exon Smith2016-09-111-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that MachineBasicBlock::reverse_instr_iterator knows when it's at the end (since r281168 and r281170), implement MachineBasicBlock::reverse_iterator directly on top of an ilist::reverse_iterator by adding an IsReverse template parameter to MachineInstrBundleIterator. This replaces another hard-to-reason-about use of std::reverse_iterator on list iterators, matching the changes for ilist::reverse_iterator from r280032 (see the "out of scope" section at the end of that commit message). MachineBasicBlock::reverse_iterator now has a handle to the current node and has obvious invalidation semantics. r280032 has a more detailed explanation of how list-style reverse iterators (invalidated when the pointed-at node is deleted) are different from vector-style reverse iterators like std::reverse_iterator (invalidated on every operation). A great motivating example is this commit's changes to lib/CodeGen/DeadMachineInstructionElim.cpp. Note: If your out-of-tree backend deletes instructions while iterating on a MachineBasicBlock::reverse_iterator or converts between MachineBasicBlock::iterator and MachineBasicBlock::reverse_iterator, you'll need to update your code in similar ways to r280032. The following table might help: [Old] ==> [New] delete &*RI, RE = end() delete &*RI++ RI->erase(), RE = end() RI++->erase() reverse_iterator(I) std::prev(I).getReverse() reverse_iterator(I) ++I.getReverse() --reverse_iterator(I) I.getReverse() reverse_iterator(std::next(I)) I.getReverse() RI.base() std::prev(RI).getReverse() RI.base() ++RI.getReverse() --RI.base() RI.getReverse() std::next(RI).base() RI.getReverse() (For more details, have a look at r280032.) llvm-svn: 281172
OpenPOWER on IntegriCloud