summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrInfo.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Remove DATA32_PREFIX. Hack the printing for DATA16_PREFIX to print ↵Craig Topper2018-04-221-6/+1
| | | | | | | | | | 'data32' in 16-bit mode. Hack the asm parser to convert 'data32' to 'data16' in 16-bit mode. Improve the error messages to match GNU assembler. This also allows us to remove the hack from the disassembler table building. llvm-svn: 330531
* [X86] WaitPKG instructionsGabor Buella2018-04-201-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Three new instructions: umonitor - Sets up a linear address range to be monitored by hardware and activates the monitor. The address range should be a writeback memory caching type. umwait - A hint that allows the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. tpause - Directs the processor to enter an implementation-dependent optimized state until the TSC reaches the value in EDX:EAX. Also modifying the description of the mfence instruction, as the rep prefix (0xF3) was allowed before, which would conflict with umonitor during disassembly. Before: $ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble .text mfence After: $ echo 0xf3,0x0f,0xae,0xf0 | llvm-mc -disassemble .text umonitor %rax Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45253 llvm-svn: 330462
* [X86] Tag CLDEMOTE instruction with WriteLoad scheduling classSimon Pilgrim2018-04-201-1/+2
| | | | | | Same as other cacheline instructions llvm-svn: 330424
* [X86] Correct the Defs, Uses, hasSideEffects, mayLoad, mayStore for XCHG and ↵Craig Topper2018-04-181-34/+47
| | | | | | | | XADD instructions. I don't think we emit any of these from codegen except for using XCHG16ar as 2 byte NOP. llvm-svn: 330298
* [X86] Fix the Uses/Defs,mayLoad,mayStore,hasSideEffects flags for the ↵Craig Topper2018-04-181-6/+13
| | | | | | | | CMPXCHG instructions. The compiler only emits the locked version of these which use different instruction definitions. The versions fixed here are only used by the assembler/disassembler. llvm-svn: 330287
* [X86] Introduce cldemote instructionGabor Buella2018-04-131-0/+4
| | | | | | | | | | | | | | Hint to hardware to move the cache line containing the address to a more distant level of the cache without writing back to memory. Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45256 llvm-svn: 329992
* [X86] Remove remaining gpr schedule itineraries (PR37093)Simon Pilgrim2018-04-121-17/+17
| | | | llvm-svn: 329938
* [X86] Remove remaining system/special schedule itineraries (PR37093)Simon Pilgrim2018-04-121-368/+319
| | | | llvm-svn: 329906
* [X86] Remove system/control schedule itineraries (PR37093)Simon Pilgrim2018-04-121-10/+10
| | | | llvm-svn: 329903
* [X86] Describe wbnoinvd instructionGabor Buella2018-04-111-0/+1
| | | | | | | | | | | | | | | Similar to the wbinvd instruction, except this one does not invalidate caches. Ring 0 only. The encoding matches a wbinvd instruction with an F3 prefix. Reviewers: craig.topper, zvi, ashlykov Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D43816 llvm-svn: 329847
* [x86] Model the direction flag (DF) separately from the rest of EFLAGS.Chandler Carruth2018-04-101-38/+33
| | | | | | | | | | | | | | | | | | | | | This cleans up a number of operations that only claimed te use EFLAGS due to using DF. But no instructions which we think of us setting EFLAGS actually modify DF (other than things like popf) and so this needlessly creates uses of EFLAGS that aren't really there. In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD, and the whole-flags writes (WRFLAGS and POPF) need to model this. I've also somewhat cleaned up some of the flag management instruction definitions to be in the correct .td file. Adding this extra register also uncovered a failure to use the correct datatype to hold X86 registers, and I've corrected that as necessary here. Differential Revision: https://reviews.llvm.org/D45154 llvm-svn: 329673
* [X86] Merge itineraries for CLC, CMC, and STC.Craig Topper2018-04-061-3/+3
| | | | | | These are very simple flag setting instructions that appear to only be a single uop. They're unlikely to need this separation. llvm-svn: 329414
* [X86] Use the same predicate for the load for PMOVSXBQ and PMOVZXBQ.Craig Topper2018-04-041-8/+0
| | | | | | These both use a 16-bit load, but one used loadi16_anyext and the other used extloadi32i16. The only difference between them is that loadi16_anyext checked that the load was at least 2 byte aligned and non-volatile. But the alignment doesn't matter here. Just use extloadi32i16 for both. llvm-svn: 329154
* [X86] Add an itinerary to BTR64rr.Craig Topper2018-04-021-1/+2
| | | | llvm-svn: 328956
* [X86] Remove ReadAfterLd from BMI and TBM instructions that don't have a ↵Craig Topper2018-03-291-3/+3
| | | | | | | | | | register operand in their memory form The memory form of these instructions only read an input from memory. They don't have any register operands. Differential Revision: https://reviews.llvm.org/D44836 llvm-svn: 328828
* [X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated ↵Craig Topper2018-03-291-7/+12
| | | | | | | | | | SchedRW for BEXTR/BZHI. These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd Differential Revision: https://reviews.llvm.org/D44838 llvm-svn: 328823
* [X86] Rename RIi64_NOREX tblgen class to just Ii64. Make RIi64 inherit from ↵Craig Topper2018-03-291-10/+10
| | | | | | | | it. NFC This feels more consistent with the other classes. We don't need to say _NOREX if we didn't start it with an R in the first place. llvm-svn: 328757
* [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes ↵Simon Pilgrim2018-03-261-24/+24
| | | | | | | | | | | | (PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566
* [x86] put nops into the WriteNop class and customize for JaguarSanjay Patel2018-03-191-1/+1
| | | | | | | | | | | | 1. Given that we already have a classification bucket with 'nop' in the name, that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'. 2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably llvm-mca) how to model the resource requirements better even though a nop has no dependencies. Differential Revision: https://reviews.llvm.org/D44608 llvm-svn: 327853
* [X86] Added support for nocf_check attribute for indirect Branch TrackingOren Ben Simhon2018-03-171-0/+8
| | | | | | | | | | | | | | | X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET). IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp. The `nocf_check` attribute has two roles in the context of X86 IBT technology: 1. Appertains to a function - do not add ENDBR instruction at the beginning of the function. 2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction. This patch implements `nocf_check` context for Indirect Branch Tracking. It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks. Differential Revision: https://reviews.llvm.org/D41879 llvm-svn: 327767
* [X86] Use btc/btr/bts to implement xor/and/or that affects a single bit in ↵Craig Topper2018-02-151-1/+1
| | | | | | | | | | | | the upper 32-bits of a 64-bit operation. We can't fold a large immediate into a 64-bit operation. But if we know we're only operating on a single bit we can use the bit instructions. For now only do this for optsize. Differential Revision: https://reviews.llvm.org/D37418 llvm-svn: 325287
* [X86] Dont' allow 'outs' and 'ins' in at&t syntax without suffixes.Craig Topper2018-02-141-6/+6
| | | | | | The match would be ambiguous, but at&t asm parsing doesn't support ambiguous matches and will just return the first. llvm-svn: 325192
* [X86] Don't swap argument on BOUND instruction in at&t syntax.Craig Topper2018-02-141-2/+3
| | | | | | | | | | | | The bound instruction does not have reversed operands in gas. Fixes PR27653. Patch by Maya Madhavan. Differential Revision: https://reviews.llvm.org/D43243 llvm-svn: 325178
* [X86] Simplify X86DAGToDAGISel::matchBEXTRFromAnd by creating an ↵Craig Topper2018-02-121-0/+2
| | | | | | | | X86ISD::BEXTR node and calling Select. Add isel patterns to recognize this node. This removes a bunch of special case code for selecting the immediate and folding loads. llvm-svn: 324939
* [X86] Remove unused multiclass argument. NFCCraig Topper2018-02-121-3/+3
| | | | llvm-svn: 324938
* [X86] Add missing scheduling class tag for i64 absolute address movesSimon Pilgrim2018-02-121-5/+5
| | | | | | | | Expand existing SchedRW to encompass these like it did for the other memory offset movs - added comments to closing braces to keep track of def scopes. We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639). llvm-svn: 324910
* [X86] Allow zextload/extload i1->i8 to be folded into instructions during iselCraig Topper2018-02-121-1/+11
| | | | | | | | Previously we just emitted this as a MOV8rm which would likely get folded during the peephole pass anyway. This just makes it explicit earlier. The gpr-to-mask.ll test changed because the kaddb instruction has no memory form. llvm-svn: 324860
* [NFC] fix trivial typos in comments and documentsHiroshi Inoue2018-01-261-1/+1
| | | | | | "in in" -> "in", "on on" -> "on" etc. llvm-svn: 323508
* [X86] Remove 'NOREX' comment from the printing of _NOREX instructions.Craig Topper2018-01-231-3/+3
| | | | | | Some of the NOREX instructions are used in 32-bit mode making this printing confusing. It also doesn't provide a lot of value since you can see the h-register being used by the instruction. llvm-svn: 323174
* Introduce the "retpoline" x86 mitigation technique for variant #2 of the ↵Chandler Carruth2018-01-221-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155
* [X86] Add intrinsic support for the RDPID instructionCraig Topper2018-01-181-0/+1
| | | | | | | | This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg. Differential Revision: https://reviews.llvm.org/D42205 llvm-svn: 322910
* [X86] Revisit the fix I made years ago to make 'xchgl %eax, %eax' not encode ↵Craig Topper2018-01-161-11/+8
| | | | | | | | | | using the 0x90 encoding in 64-bit mode. Prior to this we had a separate instruction and register class that excluded eax to prevent matching the instruction that would encode with 0x90. This patch changes this to just use an InstAlias to force xchgl %eax, %eax to use XCHG32rr instruction in 64-bit mode. This gets rid of the separate instruction and register class. llvm-svn: 322532
* [X86] Make 'xchgq %rax, %rax' an alias for the 0x90 nop encoding to match gas.Craig Topper2018-01-161-0/+4
| | | | | | Previously we encoded it as 0x48 0x90. llvm-svn: 322531
* [X86] Don't allow lods/stos/scas/cmps/movs to be parsed without a suffix and ↵Craig Topper2018-01-121-20/+20
| | | | | | | | only memory operand in at&t syntax. Without a register with a size being mentioned the instruction is ambiguous in at&t syntax. With Intel syntax the memory operation caries a size that can be used to disambiguate. llvm-svn: 322356
* [X86] Don't require suffix on 'clr' mnemonic in intel syntaxCraig Topper2018-01-121-4/+4
| | | | llvm-svn: 322355
* [X86] Add 'l' and 'q' suffixes to the tbm instruction mnemonics.Craig Topper2018-01-121-6/+6
| | | | | | While the suffix isn't required to disambiguate the instructions, it is required in order to parse the instructions when the suffix is specified in order to match the GNU assembler. llvm-svn: 322354
* [X86] Disable movsq/stosq/scasqcmpsq/lodsq parsing in 64-bit mode.Craig Topper2018-01-121-5/+10
| | | | llvm-svn: 322352
* [X86] Remove assembler predicates from all AVX512 related feature flags.Craig Topper2018-01-061-26/+13
| | | | | | | | We don't do fine grained feature control like this on features prior to AVX512. We do still have checks in place in the assembly parser itself that prevents %zmm references or %xmm16-31 from being parsed without at least -mattr=avx512f. Same for rounding control and mask operands. That will prevent the table matcher from matching for any instructions that need those features and that's probably good enough. llvm-svn: 321947
* [X86] Stop printing moves between VR64 and GR64 with 'movd' mnemonic. Use ↵Craig Topper2018-01-051-3/+5
| | | | | | | | 'movq' instead. This behavior existed to work with an old version of the gnu assembler on MacOS that only accepted this form. Newer versions of GNU assembler and the current LLVM derived version of the assembler on MacOS support movq as well. llvm-svn: 321898
* [X86] Add assembler predicates to BITALG/VBMI2/VNNI features to be ↵Craig Topper2017-12-241-3/+6
| | | | | | consistent with the other AVX512 ISAs. llvm-svn: 321416
* [X86] Add prefetchwt1 instruction and overhaul priorities and isel enabling ↵Craig Topper2017-12-221-0/+3
| | | | | | | | | | | | | | for prefetch instructions. Previously prefetch was only considered legal if sse was enabled, but it should be supported with 3dnow as well. The prfchw flag now imply at least some form of prefetch without the write hint is available, either the sse or 3dnow version. This is true even if 3dnow and sse are explicitly disabled. Similarly prefetchwt1 feature implies availability of prefetchw and the the prefetcht0/1/2/nta instructions. This way we can support _MM_HINT_ET0 using prefetchw and _MM_HINT_ET1 with prefetchwt1. And its assumed that if we have levels for the write hint we would have levels for the non-write hint, thus why we enable the sse prefetch instructions. I believe this behavior is consistent with gcc. I've updated the prefetch.ll to test all of these combinations. llvm-svn: 321335
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-4/+4
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* [X86] Add 'Requires<[In64BitMode]>' to a bunch of instructions that only ↵Craig Topper2017-12-151-8/+15
| | | | | | | | have memory and immediate operands. The asm parser wasn't preventing these from being accepted in 32-bit mode. Instructions that use a GR64 register are protected by the parser rejecting the register in 32-bit mode. llvm-svn: 320846
* [X86] Remove the 'Requires' In64BitMode/Not64BitMode from the LWP instructions.Craig Topper2017-12-151-4/+4
| | | | | | These aren't doing anything due to a top level "let Predicates =". I think the GR32/GR64 register class protects these anyway. llvm-svn: 320844
* [X86] Add LWP schedule testsSimon Pilgrim2017-12-111-2/+2
| | | | | | Tag LWP instructions as WriteSystem llvm-svn: 320387
* [X86] Tag LOCK/REX64/DATA16/DATA32 instruction prefix scheduler classesSimon Pilgrim2017-12-091-3/+7
| | | | llvm-svn: 320266
* [X86] Tag REP/REPNE prefix instructions as microcoded scheduler classesSimon Pilgrim2017-12-091-3/+2
| | | | llvm-svn: 320263
* [X86] Tag missing EH pseudo instruction scheduler classesSimon Pilgrim2017-12-091-1/+2
| | | | llvm-svn: 320262
* [X86] Tag move immediate instructions scheduler classesSimon Pilgrim2017-12-081-10/+16
| | | | llvm-svn: 320169
* [X86][AVX512] Tag CLWB instruction to CLFLUSH/PREFETCH scheduler classSimon Pilgrim2017-12-081-3/+2
| | | | llvm-svn: 320156
OpenPOWER on IntegriCloud