summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrInfo.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Simplify handling of hidden stub.Rafael Espindola2016-05-171-1/+0
| | | | | | | | | Since r207518 they are printed exactly like non-hidden stubs on x86 and since r207517 on ARM. This means we can use a single set for all stubs in those platforms. llvm-svn: 269776
* Fix for PR27750. Correctly handle the case where the fallthrough block andDavid L Kreitzer2016-05-171-5/+9
| | | | | | | | target block are the same in getFallThroughMBB. Differential Revision: http://reviews.llvm.org/D20288 llvm-svn: 269760
* [X86] Properly check that EAX is dead when copying EFLAGS.Quentin Colombet2016-05-101-4/+9
| | | | | | | | | | | | This fixes a bug introduced in r267623, where we got smarter and avoided to save EAX before using it. However, we failed to check if any of the subregister of EAX were alive and thus, missed cases where we have to save EAX before using it. The problem may happen on every X86/i386/... platform. This fixes llvm.org/PR27624 llvm-svn: 269115
* [foldMemoryOperand()] Pass LiveIntervals to enable liveness check.Jonas Paulsson2016-05-101-3/+5
| | | | | | | | | | | | | | | SystemZ (and probably other targets as well) can fold a memory operand by changing the opcode into a new instruction that as a side-effect also clobbers the CC-reg. In order to do this, liveness of that reg must first be checked. When LIS is passed, getRegUnit() can be called on it and the right LiveRange is computed on demand. Reviewed by Matthias Braun. http://reviews.llvm.org/D19861 llvm-svn: 269026
* [X86][AVX512] Strengthen the assertions from r269001. We need VLX to use the ↵Craig Topper2016-05-101-2/+3
| | | | | | 128/256-bit move opcodes for extended registers. llvm-svn: 269019
* [X86][AVX512] Use the proper load/store for AVX512 registers.Quentin Colombet2016-05-101-8/+20
| | | | | | | | | | | | | | When loading or storing AVX512 registers we were not using the AVX512 variant of the load and store for VR128 and VR256 like registers. Thus, we ended up with the wrong encoding and actually were dropping the high bits of the instruction. The result was that we load or store the wrong register. The effect is visible only when we emit the object file directly and disassemble it. Then, the output of the disassembler does not match the assembly input. This is related to llvm.org/PR27481. llvm-svn: 269001
* [AVX512] Add VLX 128/256-bit SET0 operations that encode to 128/256-bit EVEX ↵Craig Topper2016-05-081-0/+4
| | | | | | encoded VPXORD so all 32 registers can be used. llvm-svn: 268884
* livePhysRegs: Pass MBB by reference in addLive{Ins|Outs}(); NFCMatthias Braun2016-05-031-1/+1
| | | | | | | The block must no be nullptr for the addLiveIns()/addLiveOuts() function. llvm-svn: 268340
* LivePhysRegs: Automatically determine presence of pristine regs.Matthias Braun2016-05-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | Remove the AddPristinesAndCSRs parameters from addLiveIns()/addLiveOuts(). We need to respect pristine registers after prologue epilogue insertion, Seeing that we got this wrong in at least two commits already, we should rather pay the small price to query MachineFrameInfo for it. There are three cases that did not set AddPristineAndCSRs to true even after register allocation: - ExecutionDepsFix: live-out registers are used as a hint that the register is used soon. This is not true for pristine registers so use the new addLiveOutsNoPristines() to maintain this behaviour. - SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like a bug, should do the right thing automatically now. - StackMapLivenessAnalysis: Not adding pristine registers looks like a bug to me. Added a FIXME comment but maintain the current behaviour as a change may need to get coordinated with GC runtimes. llvm-svn: 268336
* Enable the X86 call frame optimization for the 64-bit targets that allow it.David L Kreitzer2016-05-021-0/+6
| | | | | | | | Fixes PR27241. Differential Revision: http://reviews.llvm.org/D19688 llvm-svn: 268227
* Change AVX512 braodcastsd/ss patterns interaction with spilling . New ↵Igor Breger2016-05-011-41/+45
| | | | | | | | implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth. Differential Revision: http://reviews.llvm.org/D19579 llvm-svn: 268190
* [X86] Reduce memory usage of MemOp2RegOp and RegOp2MemOp folding maps.Craig Topper2016-04-301-10/+6
| | | | llvm-svn: 268164
* [X86] Remove unused operand from a function and all its callers. NFCCraig Topper2016-04-281-1/+1
| | | | llvm-svn: 267854
* [X86] Teach the expansion of copy instructions how to do proper liveness.Quentin Colombet2016-04-261-15/+22
| | | | | | | When the simple analysis provided by MachineBasicBlock::computeRegisterLiveness fails, fall back on the LivePhysReg utility. llvm-svn: 267623
* Optimization bisect support in X86-specific passesAndrew Kaylor2016-04-261-1/+4
| | | | | | Differential Revision: http://reviews.llvm.org/D19439 llvm-svn: 267608
* [NFC] Header cleanupMehdi Amini2016-04-181-1/+0
| | | | | | | | | | | | | | Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595
* Silencing warnings from MSVC 2015 Update 2. All of these changes silence ↵Aaron Ballman2016-03-301-4/+4
| | | | | | "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929
* X86: Use push-pop for materializing 8-bit immediates for minsize (take 2)Hans Wennborg2016-03-251-0/+58
| | | | | | | | | This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375
* [X86][XOP] Fixed instruction postfixes to more closely match operandsSimon Pilgrim2016-03-241-6/+6
| | | | | | Suggested by Sanjay in D18189 as the multiple folding options in XOP instructions can be tricky llvm-svn: 264305
* Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.Cong Hou2016-03-231-26/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 264199
* [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.Chad Rosier2016-03-091-1/+1
| | | | | | http://reviews.llvm.org/D17967 llvm-svn: 263021
* [X86] Use MCPhysReg and uint16_t for static arrays of registers and opcodes ↵Craig Topper2016-03-021-4/+4
| | | | | | respectively should reduce size tiny bit. NFC llvm-svn: 262458
* CodeGen: TII: Take MachineInstr& in predicate API, NFCDuncan P. N. Exon Smith2016-02-231-5/+5
| | | | | | | | | | | | | Change TargetInstrInfo API to take `MachineInstr&` instead of `MachineInstr*` in the functions related to predicated instructions (I'll try to come back later and get some of the rest). All of these functions require non-null parameters already, so references are more clear. As a bonus, this happens to factor away a host of implicit iterator => pointer conversions. No functionality change intended. llvm-svn: 261605
* [X86] Remove the now-unused X86ISD::PSIGN. NFC.Ahmed Bougacha2016-02-161-9/+9
| | | | llvm-svn: 261025
* AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 ↵Igor Breger2016-02-151-8/+23
| | | | | | | | | | are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878
* [X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding supportSimon Pilgrim2016-02-081-0/+23
| | | | | | | | | | As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents. This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines. Differential Revision: http://reviews.llvm.org/D16956 llvm-svn: 260168
* [X86] Fix a bug in getMemOpBaseRegImmOfsSanjoy Das2016-02-021-1/+5
| | | | | | | | | Fix a crash in `getMemOpBaseRegImmOfs` that happens if the base of `MemOp` is a frame index memory operand. The fix is to have `getMemOpBaseRegImmOfs` bail out in such cases. We can possibly be more clever here, if needed. llvm-svn: 259456
* Revert "Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed."Benjamin Kramer2016-01-271-75/+19
| | | | | | | | | and "Add a missing test case for r258847." This reverts commit r258847, r258848. Causes miscompilations and backend errors. llvm-svn: 258927
* Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.Cong Hou2016-01-261-19/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 258847
* [X86][AVX] Add commutation support for VPERM2X128 instructions Simon Pilgrim2016-01-251-0/+14
| | | | | | | | Its main use is to allow memory folding of the 1st operand Differential Revision: http://reviews.llvm.org/D16521 llvm-svn: 258726
* [X86] Make MOV32ri64 a post-RA pseudo instead of a CodeGenOnly instruction. ↵Craig Topper2016-01-051-1/+4
| | | | | | It was only needed for rematerialization. llvm-svn: 256818
* Revert "[X86] Use push-pop for materializing small constants under 'minsize'"David Majnemer2016-01-051-48/+0
| | | | | | | | | | | | The red zone consists of 128 bytes beyond the stack pointer so that the allocation of objects in leaf functions doesn't require decrementing rsp. In r255656, we introduced an optimization that would cheaply materialize certain constants via push/pop. Push decrements the stack pointer and stores it's result at what is now the top of the stack. However, this means that using push/pop would encroach on the red zone. PR26023 gives an example where this corrupts an object in the red zone. llvm-svn: 256808
* MachineInstrBundle: Fix reversed isSuperRegisterEq() callMatthias Braun2016-01-051-0/+4
| | | | | | | | | | | | | | | Unfortunately this fix had the effect of exposing the -verify-machineinstrs FIXME of X86InstrInfo.cpp in two testcases for which I disabled it for now. Two testcases also have additional pushq/popq where the corrected code cannot prove that %rax is dead any longer. Looking at the examples, this could potentially be fixed by improving computeRegisterLiveness() to check the live-in lists of the successors blocks when reaching the end of a block. This fixes http://llvm.org/PR25951. llvm-svn: 256799
* [X86] Make hasFP constant timeDavid Majnemer2016-01-041-1/+2
| | | | | | | | | | We need a frame pointer if there is a push/pop sequence after the prologue in order to unwind the stack. Scanning the instructions to figure out if this happened made hasFP not constant-time which is a violation of expectations. Let's compute this up-front and reuse that computation when we need it. llvm-svn: 256730
* use range-based for-loops; NFCISanjay Patel2015-12-291-18/+15
| | | | llvm-svn: 256573
* tidy up; NFCSanjay Patel2015-12-281-6/+7
| | | | llvm-svn: 256506
* [X86, Win64] Use a frame pointer if pushf is emittedDavid Majnemer2015-12-271-2/+2
| | | | | | | | | | | | | | | A frame pointer must be used if stack pointer is modified after the prologue. LLVM will emit pushf/popf if we need to save/restore the FLAGS register, requiring us to have a frame pointer for the function. There is a small twist: this sequence might exist in user code via inline-assembly. For now, conservatively assume that such functions require a frame pointer. For real world justification, please see clang's implementation of __readeflags. This fixes PR25945. llvm-svn: 256456
* [X86] Replace MVT::SimpleValueType in the AsmParser library and ↵Craig Topper2015-12-251-6/+6
| | | | | | | | getX86SubSuperRegister with just an unsigned representing size. This a is step towards fixing a layering violation so the X86 AsmParser won't depending on CodeGen types. llvm-svn: 256425
* AVX-512: Kreg set 0/1 optimizationElena Demikhovsky2015-12-241-6/+28
| | | | | | | | | | | | | | | | | The patterns that set a mask register to 0/1 KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn are replaced with KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization. KNL does not recognize dependency-breaking idioms for mask registers, so kxnor %k1, %k1, %k2 has a RAW dependence on %k1. Using %k0 as the undef input register is a performance heuristic based on the assumption that %k0 is used less frequently than the other mask registers, since it is not usable as a write mask. Differential Revision: http://reviews.llvm.org/D15739 llvm-svn: 256365
* [X86] Use range-based for loop. NFCCraig Topper2015-12-201-3/+2
| | | | llvm-svn: 256127
* [X86] Use push-pop for materializing small constants under 'minsize'Hans Wennborg2015-12-171-0/+48
| | | | | | | | | | | | | Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 llvm-svn: 255936
* Fix "Not having LAHF/SAHF" assert.Hans Wennborg2015-12-151-1/+2
| | | | | | It wants to assert that the subtarget is 64-bit, not the register. llvm-svn: 255703
* [X86] Smaller code for materializing 32-bit 1 and -1 constantsHans Wennborg2015-12-151-5/+43
| | | | | | | | | "movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes. This commit makes LLVM use the latter when optimizing for size. Differential Revision: http://reviews.llvm.org/D14971 llvm-svn: 255656
* CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()Matthias Braun2015-12-111-15/+10
| | | | | | | | | | | | | | | | | | | | computeRegisterLiveness() was broken in that it reported dead for a register even if a subregister was alive. I assume this was because the results of analayzePhysRegs() are hard to understand with respect to subregisters. This commit: Changes the results of analyzePhysRegs (=struct PhysRegInfo) to be clearly understandable, also renames the fields to avoid silent breakage of third-party code (and improve the grammar). Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling it and removing workarounds for the bug. This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033 Differential Revision: http://reviews.llvm.org/D15320 llvm-svn: 255362
* [X86][ADX] Added memory folding patterns and stack folding testsSimon Pilgrim2015-12-051-0/+6
| | | | llvm-svn: 254844
* X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supportedHans Wennborg2015-12-041-4/+25
| | | | | | | | | | | | | | | These instructions are not supported by all CPUs in 64-bit mode. Emitting them causes Chromium to crash on start-up for users with such chips. (GCC puts these instructions behind -msahf on 64-bit for the same reason.) This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering from before r244503 when the instructions are not available. Differential Revision: http://reviews.llvm.org/D15240 llvm-svn: 254793
* X86InstrInfo::copyPhysReg: workaround reg livenessJF Bastien2015-12-041-3/+13
| | | | | | | | | | | | | | | | | | | | | | Summary: computeRegisterLiveness and analyzePhysReg are currently getting confused about liveness in some cases, breaking copyPhysReg's calculation of whether AX is dead in some cases. Work around this issue temporarily by assuming that AX is always live. See detail in: https://llvm.org/bugs/show_bug.cgi?id=25033#c7 And associated bugs PR24535 PR25033 PR24991 PR24992 PR25201. This workaround makes the code correct but slightly inefficient, but it seems to confuse the machine instr verifier which now things EAX was undefined in some cases where it's being conservatively saved / restored. Reviewers: majnemer, sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15198 llvm-svn: 254680
* [X86] Use range-based for loops. NFCCraig Topper2015-12-011-6/+6
| | | | llvm-svn: 254387
* [X86] Use array_lengthof instead of calculating manually. Also change index ↵Craig Topper2015-12-011-7/+7
| | | | | | types to size_t to match. llvm-svn: 254386
* Revert r254279 "[X86] Use ArrayRef. NFC". It seems to have upset an MSVC ↵Craig Topper2015-11-301-4/+7
| | | | | | build bot. llvm-svn: 254280
OpenPOWER on IntegriCloud