summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][MMX] Add support for MMX zero vector creationSimon Pilgrim2018-01-151-0/+6
| | | | | | | | | | As mentioned on PR35869, (and came up recently on D41517) we don't create a MMX zero register via the PXOR but instead perform a spill to stack from a XMM zero register. This patch adds support for direct MMX zero vector creation and should make it easier to add better constant vector creation in the future as well. Differential Revision: https://reviews.llvm.org/D41908 llvm-svn: 322525
* [X86][SSE] Add custom execution domain fixing for ↵Simon Pilgrim2018-01-151-3/+188
| | | | | | | | | | BLENDPD/BLENDPS/PBLENDD/PBLENDW (PR34873) Add support for custom execution domain fixing and implement support for BLENDPD/BLENDPS/PBLENDD/PBLENDW. Differential Revision: https://reviews.llvm.org/D42042 llvm-svn: 322524
* [MachineOutliner] AArch64: Handle instrs that use SP and will never need fixupsJessica Paquette2018-01-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit does two things. Firstly, it adds a collection of flags which can be passed along to the target to encode information about the MBB that an instruction lives in to the outliner. Second, it adds some of those flags to the AArch64 outliner in order to add more stack instructions to the list of legal instructions that are handled by the outliner. The two flags added check if - There are calls in the MachineBasicBlock containing the instruction - The link register is available in the entire block If the link register is available and there are no calls, then a stack instruction can always be outlined without fixups, regardless of what it is, since in this case, the outliner will never modify the stack to create a call or outlined frame. The motivation for doing this was checking which instructions are most often missed by the outliner. Instructions like, say %sp<def> = ADDXri %sp, 32, 0; flags: FrameDestroy are very common, but cannot be outlined in the case that the outliner might modify the stack. This commit allows us to outline instructions like this. llvm-svn: 322048
* [X86] Add VSHUFF32X4 and similar instructions to load folding tables.Craig Topper2018-01-071-0/+24
| | | | llvm-svn: 321978
* [X86] Add the 16 and 8-bit CRC32 instructions to the load folding tables.Craig Topper2018-01-071-0/+3
| | | | llvm-svn: 321958
* [X86] Correct the load folding flags for xmm fp->mmx conversion instructions.Craig Topper2018-01-071-4/+4
| | | | | | The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment. llvm-svn: 321956
* [X86] Add TB_NO_REVERSE to some scalar intrinsic instructions in the load ↵Craig Topper2018-01-071-4/+4
| | | | | | folding table. llvm-svn: 321955
* [X86] Don't put any EVEX_B instructions in the tablegen generated load ↵Craig Topper2018-01-071-0/+1
| | | | | | | | folding tables. EVEX_B means different things for memory and register forms. The instructions should not be considered equivalent. llvm-svn: 321954
* [X86] Add 128 and 256-bit VPOPCNTD/Q instructions to load folding tables.Craig Topper2018-01-071-0/+30
| | | | llvm-svn: 321953
* [X86] Add some 8 and 16-bit instructions to the load folding tables.Craig Topper2018-01-071-0/+6
| | | | llvm-svn: 321952
* [X86] Add EVEX vcvtph2ps to the load folding tables.Craig Topper2018-01-071-1/+4
| | | | llvm-svn: 321951
* [X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex ↵Craig Topper2018-01-071-2/+3
| | | | | | | | versions of cvtps2ph to the store folding tables. The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros. llvm-svn: 321950
* [X86] Add CMP8ri8 to load folding tables.Craig Topper2018-01-071-0/+1
| | | | llvm-svn: 321949
* Remove superfluous break after a return. NFCI.Simon Pilgrim2017-12-171-1/+0
| | | | llvm-svn: 320941
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-14/+14
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* [X86] Rename some instructions that start with Int_ to have the _Int at the end.Craig Topper2017-12-101-64/+64
| | | | | | | | This matches AVX512 version and is more consistent overall. And improves our scheduler models. In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses. llvm-svn: 320325
* [X86] Fix a few instructions that were named Z512 instead of just Z.Craig Topper2017-12-101-3/+3
| | | | | | This makes things consistent with our normal instruction naming. llvm-svn: 320316
* [X86] Rename some instructions so that 'b' is added as a suffix instead of ↵Craig Topper2017-12-101-6/+6
| | | | | | replacing an 'r' llvm-svn: 320290
* [X86] Rename the rb form of scalar ADD/SUB/MUL/DIV to include _Int since ↵Craig Topper2017-12-101-6/+6
| | | | | | they can only be selected by intrinsics. llvm-svn: 320283
* [CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.Francis Visoiu Mistrih2017-12-071-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/<def>//g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<def[ ]*,[ ]*dead>/dead \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ]*,[ ]*dead>/implicit-def dead \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022
* Re-commit r319490 "XOR the frame pointer with the stack cookie when ↵Hans Wennborg2017-12-051-0/+15
| | | | | | | | | | | | | | | | | | protecting the stack" The patch originally broke Chromium (crbug.com/791714) due to its failing to specify that the new pseudo instructions clobber EFLAGS. This commit fixes that. > Summary: This strengthens the guard and matches MSVC. > > Reviewers: hans, etienneb > > Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits > > Differential Revision: https://reviews.llvm.org/D40622 llvm-svn: 319824
* Revert r319490 "XOR the frame pointer with the stack cookie when protecting ↵Hans Wennborg2017-12-041-15/+0
| | | | | | | | | | | | | | | | | | the stack" This broke the Chromium build (crbug.com/791714). Reverting while investigating. > Summary: This strengthens the guard and matches MSVC. > > Reviewers: hans, etienneb > > Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits > > Differential Revision: https://reviews.llvm.org/D40622 > > git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319490 91177308-0d34-0410-b5e6-96231b3b80d8 llvm-svn: 319706
* Mark all library options as hidden.Zachary Turner2017-12-011-2/+3
| | | | | | | | | | | | | | | | | These command line options are not intended for public use, and often don't even make sense in the context of a particular tool anyway. About 90% of them are already hidden, but when people add new options they forget to hide them, so if you were to make a brand new tool today, link against one of LLVM's libraries, and run tool -help you would get a bunch of junk that doesn't make sense for the tool you're writing. This patch hides these options. The real solution is to not have libraries defining command line options, but that's a much larger effort and not something I'm prepared to take on. Differential Revision: https://reviews.llvm.org/D40674 llvm-svn: 319505
* XOR the frame pointer with the stack cookie when protecting the stackReid Kleckner2017-11-301-0/+15
| | | | | | | | | | | | Summary: This strengthens the guard and matches MSVC. Reviewers: hans, etienneb Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits Differential Revision: https://reviews.llvm.org/D40622 llvm-svn: 319490
* [CodeGen] Print register names in lowercase in both MIR and debug outputFrancis Visoiu Mistrih2017-11-281-1/+1
| | | | | | | | | | | As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187
* [X86] Don't invert NewCC variable while processing the jcc/setcc/cmovcc ↵Craig Topper2017-11-231-7/+9
| | | | | | | | | | instructions in optimizeCompareInstr. The NewCC variable is calculated outside of the loop that processes jcc/setcc/cmovcc instructions. If we invert it during the loop it can cause an incorrect value to be used by a later iteration. Instead only read it during the loop and use a new variable to store the possibly inverted value. Fixes PR35399. llvm-svn: 318934
* [x86][icelake]vpclmulqdq introductionCoby Tayree2017-11-211-1/+5
| | | | | | | an icelake promotion of pclmulqdq Differential Revision: https://reviews.llvm.org/D40101 llvm-svn: 318741
* [X86] Add scalar register class versions of VRNDSCALE instructions and ↵Craig Topper2017-11-111-2/+6
| | | | | | | | | | rename the existing versions to _Int. This is consistent with out normal implementation of scalar instructions. While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice. llvm-svn: 317977
* [X86] Prevent fast isel from folding loads into the instructions listed in ↵Craig Topper2017-11-011-0/+7
| | | | | | | | | | | | | | hasPartialRegUpdate. This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses. We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch. Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want. Differential Revision: https://reviews.llvm.org/D39402 llvm-svn: 317112
* [X86] Make AVX512_512_SET0 XMM16-31 lower to 128-bit XOR when AVX512VL is ↵Craig Topper2017-10-311-13/+2
| | | | | | | | | | enabled. Use 128-bit VLX instruction when VLX is enabled. Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior. Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used? llvm-svn: 316978
* [X86] Rearrange code in X86InstrInfo.cpp to put all the ↵Craig Topper2017-10-301-270/+270
| | | | | | | | foldMemoryOperandImpl methods together without partial/undef register handling in the middle. NFC I have a future patch that wants to make use of the one of the partial functions in one of the earlier memory folding methods and the current ordering prevents that. llvm-svn: 316883
* Strip trailing whitespace. NFCI.Simon Pilgrim2017-10-211-4/+4
| | | | llvm-svn: 316277
* [X86][SSE] Add extractps/pextrd equivalence to domain tablesSimon Pilgrim2017-10-211-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D39135 llvm-svn: 316274
* [X86] Add AVX512 versions of VCVTPD2PS to load folding tables.Craig Topper2017-10-141-0/+3
| | | | llvm-svn: 315801
* [X86] Add AVX512 flavors of VCVTDQ2PD plus VCVTUDQ2PD to the load folding ↵Craig Topper2017-10-141-0/+6
| | | | | | tables. llvm-svn: 315796
* [X86] Remove TB_NO_REVERSE from VCVTDQ2PDYrr and VCVTPS2PDYrr in the load ↵Craig Topper2017-10-141-2/+2
| | | | | | | | folding tables. I believe these were added incorrectly under the belief that the load size was smaller than the input register size, but that's not true. llvm-svn: 315795
* [X86] Add missing entries in 'MemoryFoldTable2Addr' to get complete form of ↵Ayman Musa2017-10-081-0/+50
| | | | | | | | | | the table. Get the folding table 'MemoryFoldTable2Addr' to a complete state as part of the process explained in https://reviews.llvm.org/D38028 Differential Revision: https://reviews.llvm.org/D38500 llvm-svn: 315174
* [MachineOutliner] Disable outlining from LinkOnceODRs by defaultJessica Paquette2017-10-071-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Say you have two identical linkonceodr functions, one in M1 and one in M2. Say that the outliner outlines A,B,C from one function, and D,E,F from another function (where letters are instructions). Now those functions are not identical, and cannot be deduped. Locally to M1 and M2, these outlining choices would be good-- to the whole program, however, this might not be true! To mitigate this, this commit makes it so that the outliner sees linkonceodr functions as unsafe to outline from. It also adds a flag, -enable-linkonceodr-outlining, which allows the user to specify that they want to outline from such functions when they know what they're doing. Changing this handles most code size regressions in the test suite caused by competing with linker dedupe. It also doesn't have a huge impact on the code size improvements from the outliner. There are 6 tests that regress > 5% from outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most tests either improve or are not impacted. Not outlined vs outlined without linkonceodrs: https://hastebin.com/raw/qeguxavuda Not outlined vs outlined with linkonceodrs: https://hastebin.com/raw/edepoqoqic Outlined with linkonceodrs vs outlined without linkonceodrs: https://hastebin.com/raw/awiqifiheb Numbers generated using compare.py with -m size.__text. Tests run for AArch64 with -Oz -mllvm -enable-machine-outliner -mno-red-zone. llvm-svn: 315136
* [X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input ↵Craig Topper2017-10-041-10/+0
| | | | | | | | | | | | | | instead of FR32/FR64 This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted. This should fix PR33079 Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results. Differential Revision: https://reviews.llvm.org/D38449 llvm-svn: 314914
* [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMemCraig Topper2017-10-011-4/+4
| | | | | | | | | | | | | | | | | | | Summary: Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable. For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem. I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995. Reviewers: aymanmus, RKSimon, zvi Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38120 llvm-svn: 314639
* [MachineOutliner] AArch64: Avoid saving + restoring LR if possibleJessica Paquette2017-09-271-30/+54
| | | | | | | | | | | | | | | | This commit allows the outliner to avoid saving and restoring the link register on AArch64 when it is dead within an entire class of candidates. This introduces changes to the way the outliner interfaces with the target. For example, the target now interfaces with the outliner using a MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and getOutliningFrameOverhead. This also improves several comments on the outliner's cost model. https://reviews.llvm.org/D36721 llvm-svn: 314341
* Revert r314249 "Recommit r314151 "[X86] Make all the NOREX CodeGenOnly ↵Craig Topper2017-09-271-21/+0
| | | | | | | | instructions into postRA pseudos like the NOREX version of TEST.""" This caused PR34751 llvm-svn: 314339
* Revert r314248 "[X86] Don't emit X86::MOV8rr_NOREX from ↵Craig Topper2017-09-271-5/+7
| | | | | | | | X86InstrInfo::copyPhysReg." This contributed to PR34751 llvm-svn: 314338
* Recommit r314151 "[X86] Make all the NOREX CodeGenOnly instructions into ↵Craig Topper2017-09-261-0/+21
| | | | | | | | postRA pseudos like the NOREX version of TEST."" The late MOV8rr_NOREX that caused the crash has been removed. llvm-svn: 314249
* [X86] Don't emit X86::MOV8rr_NOREX from X86InstrInfo::copyPhysReg.Craig Topper2017-09-261-7/+5
| | | | | | This hook is called after register allocation with two physical registers. We don't need a separate instruction at that time to force register class constraints. I left in the assert though. We also have a fatal error in X86MCCodeEmitter if we ever encode an H-reg and a REX prefix. llvm-svn: 314248
* Revert "[X86] Make all the NOREX CodeGenOnly instructions into postRA ↵Benjamin Kramer2017-09-261-21/+0
| | | | | | | | pseudos like the NOREX version of TEST." Makes llc crash. This reverts commit r314151. llvm-svn: 314199
* [X86] Make all the NOREX CodeGenOnly instructions into postRA pseudos like ↵Craig Topper2017-09-251-0/+21
| | | | | | the NOREX version of TEST. llvm-svn: 314151
* [X86] Add IFMA instructions to the load folding tables and make them ↵Craig Topper2017-09-241-0/+53
| | | | | | commutable for the multiply operands. llvm-svn: 314080
* [X86] Make sure we still mark the full register as implicitly defined when ↵Craig Topper2017-09-241-4/+10
| | | | | | | | we shrink 256/512 bit zeroing xors to 128-bit. Not sure if anything really cares, but this seems like the right thing to do. llvm-svn: 314071
* [X86] Add VPERMPD/VPERMQ and VPERMPS/VPERMD to the execution domain fixing ↵Craig Topper2017-09-191-0/+16
| | | | | | table. llvm-svn: 313610
OpenPOWER on IntegriCloud