summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrInfo.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][LWP] Add llvm support for LWP instructions.Simon Pilgrim2017-05-031-0/+59
| | | | | | | | This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302028
* [globalisel][tablegen] Compute available feature bits correctly.Daniel Sanders2017-04-291-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (RecomputePerFunction==0) and per-function (RecomputePerFunction==1). Per-function predicates are currently recomputed more frequently than necessary since the only predicate in this category is cheap to test. Per-module predicates are now computed in getSubtargetImpl() while per-function predicates are computed in selectImpl(). Tablegen now manages the PredicateBitset internally. It should only be necessary to add the required includes. Also fixed a problem revealed by the test case where constrainSelectedInstRegOperands() would attempt to tie operands that BuildMI had already tied. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32491 llvm-svn: 301750
* Rename FastString flag.Clement Courbet2017-04-211-1/+1
| | | | llvm-svn: 300959
* X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copiesClement Courbet2017-04-211-0/+1
| | | | | | | | | | | | when the subtarget has fast strings. This has two advantages: - Speed is improved. For example, on Haswell thoughput improvements increase linearly with size from 256 to 512 bytes, after which they plateau: (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes). - Code is much smaller (no need to handle boundaries). llvm-svn: 300957
* [X86] Added missing mayLoad/mayStore attributes to some X86 instructions.Ayman Musa2017-04-131-4/+8
| | | | | | | | | Throughout the effort of automatically generating the X86 memory folding tables these missing information were encountered. This is a preparation work for a future patch including the automation of these tables. Differential Revision: https://reviews.llvm.org/D31714 llvm-svn: 300190
* [X86] Remove unused predicate. NFCCraig Topper2017-03-171-1/+0
| | | | llvm-svn: 298050
* [X86] Use SHLD with both inputs from the same register to implement rotate ↵Craig Topper2017-02-211-0/+1
| | | | | | | | | | | | | | | | | | | on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697
* X86: Introduce relocImm-based patterns for cmp.Peter Collingbourne2017-02-091-0/+16
| | | | | | Differential Revision: https://reviews.llvm.org/D28690 llvm-svn: 294636
* [X86] Remove the HLE feature flag.Craig Topper2017-02-091-2/+0
| | | | | | We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line. llvm-svn: 294562
* [X86] Clzero intrinsic and its addition under znver1Craig Topper2017-02-091-2/+14
| | | | | | | | | | | | | | | | | This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558
* [X86] Add test for clflushopt intrinsic and only enable it to be selected if ↵Craig Topper2017-02-081-0/+2
| | | | | | the feature flag is set. llvm-svn: 294407
* [X86] Remove PCOMMIT instruction support since Intel has deprecated this ↵Craig Topper2017-02-081-1/+0
| | | | | | | | instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction llvm-svn: 294405
* [X86]Enable the use of 'mov' with a 64bit GPR and a large immediateCoby Tayree2017-01-251-1/+1
| | | | | | | | | | | Enable the next form (intel style): "mov <reg64>, <largeImm>" which is should be available, where <largeImm> stands for immediates which exceed the range of a singed 32bit integer Differential Revision: https://reviews.llvm.org/D28988 llvm-svn: 293030
* [X86] Fix for bugzilla 31576 - add support for "data32" instruction prefixMarina Yatsina2017-01-181-1/+6
| | | | | | | | | | | This patch fixes bugzilla 31576 (https://llvm.org/bugs/show_bug.cgi?id=31576). "data32" instruction prefix was not defined in the llvm. An exception had to be added to the X86 tablegen and AsmPrinter because both "data16" and "data32" are encoded to 0x66 (but in different modes). Differential Revision: https://reviews.llvm.org/D28468 llvm-svn: 292352
* [AVX-512] Correct memory operand size for VPGATHERQPS and VPGATHERQDCraig Topper2017-01-161-0/+2
| | | | | | | | | with ZMM index. Similar for SCATTER and the prefetch gather and scatter instructions. Fixes PR31618. llvm-svn: 292088
* [AVX-512] Fix register class in one of the gather/scatter memory operands so ↵Craig Topper2017-01-161-1/+1
| | | | | | that all 32 bit registers can be allowed. llvm-svn: 292087
* IR, X86: Understand !absolute_symbol metadata on global variables.Peter Collingbourne2016-12-081-2/+2
| | | | | | | | | | | | | | | | | Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html Differential Revision: https://reviews.llvm.org/D25878 llvm-svn: 289087
* Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which ↵Peter Collingbourne2016-11-091-5/+9
| | | | | | | | | represents a relocatable immediate.", with a fix for 32-bit x86. Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions that take a global address operand. llvm-svn: 286420
* Revert r286384, "X86: Introduce the "relocImm" ComplexPattern, which ↵Peter Collingbourne2016-11-091-9/+5
| | | | | | | | | represents a relocatable immediate." Suspected to be the cause of a sanitizer-windows bot failure: Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420 llvm-svn: 286385
* X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable ↵Peter Collingbourne2016-11-091-5/+9
| | | | | | | | | | | | | | | immediate. A relocatable immediate is either an immediate operand or an operand that can be relocated by the linker to an immediate, such as a regular symbol in non-PIC code. Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands of type "imm32_su". Remove a number of now-redundant patterns. Differential Revision: https://reviews.llvm.org/D25812 llvm-svn: 286384
* [X86] Take advantage of the lzcnt instruction on btver2 architectures when ↵Pierre Gousseau2016-10-141-0/+1
| | | | | | | | | | | | | | | | ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248
* [x86] Accept 'retn' as an alias to 'ret[lqw]'\'ret' (At&t\Intel)Marina Yatsina2016-09-281-0/+6
| | | | | | | | | | Implement 'retn' simply by aliasing it to the relevant 'ret' instruction Commit on behalf of coby Differential Revision: https://reviews.llvm.org/D24346 llvm-svn: 282601
* [AVX-512] Use 512-bit vcvtps2ph/vcvtph2ps to implement fp_to_f16/f16_to_fp ↵Craig Topper2016-09-201-0/+1
| | | | | | | | when F16C and VLX are not supported. Fixes PR23941. llvm-svn: 281958
* [X86] Create a new instruction format to handle 4VOp3 encoding. This saves ↵Craig Topper2016-08-221-4/+4
| | | | | | one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling. llvm-svn: 279424
* [x86] Allow merging multiple instances of an immediate within a basic block ↵Sanjay Patel2016-08-161-5/+8
| | | | | | | | | | | | | | for code size savings, for 64-bit constants. This patch handles 64-bit constants which can be encoded as 32-bit immediates. It extends the functionality added by https://reviews.llvm.org/D11363 for 32-bit constants to 64-bit constants. Patch by Sunita Marathe! Differential Revision: https://reviews.llvm.org/D23391 llvm-svn: 278857
* [X86] Don't mark addressing mode operands as "outs". NFC-ish.Ahmed Bougacha2016-07-141-12/+12
| | | | | | | Nothing in-tree can tell the difference, but it's incorrect: the addressing mode registers aren't what's defined. llvm-svn: 275426
* [LLVM][INTRINSICS] adding intrinsics of CLFLUSHOPTMichael Zuckerman2016-07-051-1/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D21789 llvm-svn: 274553
* Convert a few more comparisons to isPositionIndependent(). NFC.Rafael Espindola2016-06-271-1/+1
| | | | llvm-svn: 273945
* Delete the IsStatic predicate.Rafael Espindola2016-06-271-1/+0
| | | | | | In all its uses it was equivalent to IsNotPIC. llvm-svn: 273943
* [X86] Remove dead ISD opcodes. NFC.Ahmed Bougacha2016-06-241-4/+0
| | | | llvm-svn: 273716
* [X86] Define segment MI operands as regs instead of i8imm.Ahmed Bougacha2016-06-021-9/+10
| | | | | | | | | | | | | | | | | | | We've been pretending that segments are i8imm since the initial support (r68645), predating the addition of the SEGMENT_REG class (r81895). That happens to works, but is wrong, and inconsistent with how we print (e.g., X86ATTInstPrinter::printMemReference) and parse them (e.g., X86Operand::addMemOperands). This change shouldn't affect any tool users, but is visible to library users or out-of-tree tablegen backends: this causes MCOperandInfo for the segment op to have an RC instead of "unknown", and TII::getRegClass to actually return something. As the registers are reserved and no vregs of the class ever created, that shouldn't change anything. No test change; no suspicious getRegClass() in X86 and CodeGen. llvm-svn: 271559
* X86: permit using SjLj EH on x86 targets as an optionSaleem Abdulrasool2016-05-311-0/+7
| | | | | | | | | | | This adds support to the backed to actually support SjLj EH as an exception model. This is *NOT* the default model, and requires explicitly opting into it from the frontend. GCC supports this model and for MinGW can still be enabled via the `--using-sjlj-exceptions` options. Addresses PR27749! llvm-svn: 271244
* Remember the relocation model. NFC.Rafael Espindola2016-05-191-1/+1
| | | | | | This avoids passing a TargetMachine in a few places. llvm-svn: 270095
* Style fixes. NFC.Rafael Espindola2016-05-191-1/+1
| | | | llvm-svn: 270093
* Re-commit r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA ↵Hans Wennborg2016-05-181-2/+4
| | | | | | | | | | | | instructions" with an additional fix to make RegAllocFast ignore undef physreg uses. It would previously get confused about the "push %eax" instruction's use of eax. That method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate as well, but since that runs after register-allocation, we didn't run into the RegAllocFast issue before. llvm-svn: 269949
* Add new flag and intrinsic support for MWAITX and MONITORX instructionsAshutosh Nema2016-05-181-11/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT pair while adding a timer function, such that another termination of the MWAITX instruction occurs when the timer expires. The presence of the MONITORX and MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29. The MONITORX and MWAITX instructions are intercepted by the same bits that intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be monitored. MWAITX instruction causes the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is "0F 01 FB". These opcode information is used in adding tests for the disassembler. These instructions are enabled for AMD's bdver4 architecture. Patch by Ganesh Gopalasubramanian! Reviewers: echristo, craig.topper, RKSimon Subscribers: RKSimon, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19795 llvm-svn: 269911
* Revert r269828 "X86: Avoid using _chkstk when lowering WIN_ALLOCA instructions"Hans Wennborg2016-05-171-4/+2
| | | | | | Seems to have broken the Windows ASan bot. Reverting while investigating. llvm-svn: 269833
* X86: Avoid using _chkstk when lowering WIN_ALLOCA instructionsHans Wennborg2016-05-171-2/+4
| | | | | | | | | | | | | | | This patch moves the expansion of WIN_ALLOCA pseudo-instructions into a separate pass that walks the CFG and lowers the instructions based on a conservative estimate of the offset between the stack pointer and the lowest accessed stack address. The goal is to reduce binary size and run-time costs by removing calls to _chkstk. While it doesn't fix all the code quality problems with inalloca calls, it's an incremental improvement for PR27076. Differential Revision: http://reviews.llvm.org/D20263 llvm-svn: 269828
* [X86] Fix InstAliases to not allow FARCALL32i/FARCALL16i/FARJMP32i/FARJMP16i ↵Craig Topper2016-05-071-8/+8
| | | | | | in 64-bit mode. llvm-svn: 268863
* [X86] Remove isel patterns for selecting tzcnt/lzcnt from ↵Craig Topper2016-04-241-80/+0
| | | | | | cmove/ne+cttz/ctlz. These are folded by DAG combine now. llvm-svn: 267326
* [X86] Fix patterns that turn cmove/cmovne+ctlz/cttz into lzcnt/tzcnt ↵Craig Topper2016-04-241-30/+24
| | | | | | instructions. Only one of the conditions should be valid for each pattern, not both. Update tests accordingly. llvm-svn: 267311
* X86: Use push-pop for materializing 8-bit immediates for minsize (take 2)Hans Wennborg2016-03-251-0/+3
| | | | | | | | | This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375
* [X86] Remove many operands that represent memory stores from outs to ins. ↵Craig Topper2016-03-131-10/+10
| | | | | | These operands are the registers and immediates that specify the memory address not the memory itself thus they are inputs. llvm-svn: 263354
* [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.Quentin Colombet2016-03-121-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cmpxchg[8|16]b uses RBX as one of its argument. In other words, using this instruction clobbers RBX as it is defined to hold one the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. Reserved registers have special semantic that only the target understands and enforces, because of that, the register allocator don’t use them, but also, don’t try to make sure they are used properly (remember it does not know how they are supposed to be used). Therefore, when RBX is used as a reserved register but defined by something that is not compatible with that use, the register allocator will not fix the surrounding code to make sure it gets saved and restored properly around the broken code. This is the responsibility of the target to do the right thing with its reserved register. To fix that, when the base pointer needs to be preserved, we use a different pseudo instruction for cmpxchg that save rbx. That pseudo takes two more arguments than the regular instruction: - One is the value to be copied into RBX to set the proper value for the comparison. - The other is the virtual register holding the save of the value of RBX as the base pointer. This saving is done as part of isel (i.e., we emit a copy from rbx). cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp This gets expanded into: rbx = copy input_for_rbx_reg cmpxchg <regular cmpxchg args> rbx = save_of_rbx_as_bp Note: The actual modeling of the pseudo is a bit more complicated to make sure the interferes that appears after the pseudo gets expanded are properly modeled before that expansion. This fixes PR26883. llvm-svn: 263325
* [X86] Support cleaning more than 2**16 bytes of stackDavid Majnemer2016-03-041-1/+1
| | | | | | | | | | | | | | | | | | | The x86 ret instruction has a 16 bit immediate indicating how many bytes to pop off of the stack beyond the return address. There is a problem when extremely large structs are passed by value: we might not be able to fit the number of bytes to pop into the return instruction. To fix this, expand RET_FLAG a little later and use a special sequence to clean the stack: pop %ecx ; return address is now in %ecx add $n, %esp ; clean the stack push %ecx ; bring the return address back on the stack ret ; pop the return address and jmp to it's value llvm-svn: 262755
* [X86] Permit reading of the FLAGS register without it being previously definedDavid Majnemer2016-03-021-2/+2
| | | | | | | | | | | We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS. While technically correct, this is not be desirable for folks who want to examine aspects of the FLAGS register which are not related to computation like whether or not CPUID is a valid instruction. Differential Revision: http://reviews.llvm.org/D17782 llvm-svn: 262465
* [X86] Move the ATOMIC_LOAD_OP ISel from DAGToDAG to ISelLowering. NFCI.Ahmed Bougacha2016-02-291-0/+20
| | | | | | | | | | | | | | This is long-standing dirtiness, as acknowledged by r77582: The current trick is to select it into a merge_values with the first definition being an implicit_def. The proper solution is to add new ISD opcodes for the no-output variant. Doing this before selection will let us combine away some constructs. Differential Revision: http://reviews.llvm.org/D17659 llvm-svn: 262244
* [X86] Remove the unused SDTX86atomicBinary. NFC.Ahmed Bougacha2016-02-261-2/+0
| | | | llvm-svn: 262086
* AVX512F: Add GATHER/SCATTER assembler Intel syntax tests for knl/skx/avx . ↵Igor Breger2016-02-251-21/+26
| | | | | | | | Change memory operand parser handling. Differential Revision: http://reviews.llvm.org/D17564 llvm-svn: 261862
* AVX512: Fix predicate of AVX pcmpeqw/b , pcmpgtb/w/d instructions . AVX512 ↵Igor Breger2016-02-231-0/+2
| | | | | | | | version of this instructions return result in kmask register, so AVX patterns should not be disabled. Differential Revision: http://reviews.llvm.org/D17517 llvm-svn: 261619
OpenPOWER on IntegriCloud