summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrCompiler.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] When using "and $0" and "orl $-1" to store 0 and -1 for minsize, make ↵Craig Topper2018-08-061-6/+12
| | | | | | | | | | sure the store isn't volatile If the store is volatile this might be a memory mapped IO access. In that case we shouldn't generate a load that didn't exist in the source Differential Revision: https://reviews.llvm.org/D50270 llvm-svn: 339041
* [X86] Add isel patterns for atomic_load+sub+atomic_sub.Craig Topper2018-08-031-2/+1
| | | | | | Despite the comment removed in this patch, this is beneficial when the RHS of the sub is a register. llvm-svn: 338930
* [X86] Remove RELEASE_ and ACQUIRE_ pseudo instructions. Use isel patterns ↵Craig Topper2018-08-031-106/+73
| | | | | | | | | | | | and the normal instructions instead At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering. So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions. Differential Revision: https://reviews.llvm.org/D50212 llvm-svn: 338925
* [X86] Prevent promotion of i16 add/sub/and/or/xor to i32 if we can fold an ↵Craig Topper2018-08-031-2/+8
| | | | | | | | atomic load and atomic store. This makes them consistent with i8/i32/i64. Which still seems to be more aggressive on folding than icc, gcc, or MSVC. llvm-svn: 338795
* [X86] Allow 'atomic_store (neg/not atomic_load)' to isel to a RMW instruction.Craig Topper2018-08-021-12/+9
| | | | | | There was a FIXMe in the td file about a type inference issue that was easy to fix. llvm-svn: 338782
* Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code ↵Reid Kleckner2018-07-231-0/+4
| | | | | | | | | | | | | | models" Don't try to generate large PIC code for non-ELF targets. Neither COFF nor MachO have relocations for large position independent code, and users have been using "large PIC" code models to JIT 64-bit code for a while now. With this change, if they are generating ELF code, their JITed code will truly be PIC, but if they target MachO or COFF, it will contain 64-bit immediates that directly reference external symbols. For a JIT, that's perfectly fine. llvm-svn: 337740
* [X86] Merge the FR128 and VR128 regclass since they have identical spill and ↵Craig Topper2018-07-161-1/+1
| | | | | | | | | | alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147
* Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC ↵Jonas Devlieghere2018-06-281-4/+0
| | | | | | | | | | | | | code models"" Reverting because this is causing failures in the LLDB test suite on GreenDragon. LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression llvm-svn: 335894
* [X86] Change how we prefer shift by immediate over folding a load into a shift.Craig Topper2018-06-281-28/+26
| | | | | | | | | | | | BMI2 added new shift by register instructions that have the ability to fold a load. Normally without doing anything special isel would prefer folding a load over folding an immediate because the load folding pattern has higher "complexity". This would require an instruction to move the immediate into a register. We would rather fold the immediate instead and have a separate instruction for the load. We used to enforce this priority by artificially lowering the complexity of the load pattern. This patch changes this to instead reject the load fold in isProfitableToFoldLoad if there is an immediate. This is more consistent with other binops and feels less hacky. llvm-svn: 335804
* [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit ↵Craig Topper2018-06-271-0/+31
| | | | | | | | | | position If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index. Fixes PR37938. llvm-svn: 335754
* [X86] Use XOR for SUB (C, X) during isel if will help fold an immediateCraig Topper2018-06-261-0/+38
| | | | | | | | | | | | | | | | | Summary: Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining. This seems like the safest way to do this trick. And we consider doing it as a DAG combine later. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48557 llvm-svn: 335575
* Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code ↵Reid Kleckner2018-06-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | models" The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC code model is not implemented for MachO yet. Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335508
* [X86] Regroup some isel patterns. NFCCraig Topper2018-06-241-32/+22
| | | | | | For some reason the 64-bit patterns were separated from their 8/16/32-bit friends, but only for add/sub/mul. For and/or/xor they were together. llvm-svn: 335429
* Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"Reid Kleckner2018-06-211-4/+0
| | | | | | MCJIT can't handle R_X86_64_GOT64 yet. llvm-svn: 335300
* [X86] Implement more of x86-64 large and medium PIC code modelsReid Kleckner2018-06-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. Reviewers: chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335297
* [X86][BMI][TBM] Only demand bottom 16-bits of the BEXTR control op (PR34042)Simon Pilgrim2018-06-061-11/+0
| | | | | | | | Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH). Differential Revision: https://reviews.llvm.org/D47690 llvm-svn: 334083
* [X86] Use uint32_t instead of unsigned in GetLo32XForm for readability. NFCCraig Topper2018-04-151-1/+1
| | | | | | GetLo8XForm right next to it uses uint8_t so uint32_t is consistent. llvm-svn: 330104
* [X86] Remove remaining gpr schedule itineraries (PR37093)Simon Pilgrim2018-04-121-67/+52
| | | | llvm-svn: 329938
* [X86] Remove remaining system/special schedule itineraries (PR37093)Simon Pilgrim2018-04-121-53/+39
| | | | llvm-svn: 329906
* [X86] Remove system/control schedule itineraries (PR37093)Simon Pilgrim2018-04-121-7/+5
| | | | llvm-svn: 329903
* [x86] Model the direction flag (DF) separately from the rest of EFLAGS.Chandler Carruth2018-04-101-4/+4
| | | | | | | | | | | | | | | | | | | | | This cleans up a number of operations that only claimed te use EFLAGS due to using DF. But no instructions which we think of us setting EFLAGS actually modify DF (other than things like popf) and so this needlessly creates uses of EFLAGS that aren't really there. In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD, and the whole-flags writes (WRFLAGS and POPF) need to model this. I've also somewhat cleaned up some of the flag management instruction definitions to be in the correct .td file. Adding this extra register also uncovered a failure to use the correct datatype to hold X86 registers, and I've corrected that as necessary here. Differential Revision: https://reviews.llvm.org/D45154 llvm-svn: 329673
* [X86] Add itineraries to ADD.*_DB instructions to match their normal ↵Craig Topper2018-03-231-11/+20
| | | | | | counterparts. llvm-svn: 328352
* [X86] Rename MOVSX32_NOREXrr8 to MOVSX32rr8_NOREX so that the scheduler ↵Craig Topper2018-03-201-8/+8
| | | | | | | | model regular expressions will pick it up with the regular version. Do the same for MOVSX32_NOREXrm8, MOVZX32_NOREXrr8, and MOVZX32_NOREXrm8 llvm-svn: 327948
* [X86] Add MOV16ri*/MOV32ri*/MOV64ri* to scheduler models to match MOV8ri. ↵Craig Topper2018-03-191-2/+2
| | | | | | Correct SchedRW and itinerary for MOV32ri64. llvm-svn: 327872
* [X86] Use btc/btr/bts to implement xor/and/or that affects a single bit in ↵Craig Topper2018-02-151-0/+31
| | | | | | | | | | | | the upper 32-bits of a 64-bit operation. We can't fold a large immediate into a 64-bit operation. But if we know we're only operating on a single bit we can use the bit instructions. For now only do this for optsize. Differential Revision: https://reviews.llvm.org/D37418 llvm-svn: 325287
* [X86] Simplify X86DAGToDAGISel::matchBEXTRFromAnd by creating an ↵Craig Topper2018-02-121-0/+31
| | | | | | | | X86ISD::BEXTR node and calling Select. Add isel patterns to recognize this node. This removes a bunch of special case code for selecting the immediate and folding loads. llvm-svn: 324939
* Introduce the "retpoline" x86 mitigation technique for variant #2 of the ↵Chandler Carruth2018-01-221-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155
* [X86] Add an override of targetShrinkDemandedConstant to limit the damage ↵Craig Topper2018-01-201-1/+5
| | | | | | | | | | | | | | | | | | | that shrinkdemandedbits can do to zext_in_reg operations Summary: This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx. We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42265 llvm-svn: 323048
* [X86] Tag ADJSTACK instructions as INTALU scheduler classSimon Pilgrim2017-12-101-11/+9
| | | | llvm-svn: 320299
* [X86] Tag MORESTACK instructions as ret scheduler classSimon Pilgrim2017-12-101-3/+3
| | | | llvm-svn: 320296
* [X86] Tag PIC setup instruction as jump scheduler classSimon Pilgrim2017-12-101-2/+3
| | | | llvm-svn: 320276
* [X86] Tag ACQUIRE/RELEASE atomic instructions as microcoded scheduler classesSimon Pilgrim2017-12-101-3/+5
| | | | | Note: We may be too pessimistic here and should possibly use something closer to the LOCK arithmetic instructions llvm-svn: 320275
* [X86] Tag TLS instructions as system scheduler classesSimon Pilgrim2017-12-101-1/+2
| | | | llvm-svn: 320274
* [X86] Tag ALLOCA/VAARG instructions as system scheduler classesSimon Pilgrim2017-12-101-0/+2
| | | | llvm-svn: 320273
* Strip trailing whitespace. NFCI.Simon Pilgrim2017-12-091-1/+1
| | | | llvm-svn: 320265
* [X86] Tag missing EH pseudo instruction scheduler classesSimon Pilgrim2017-12-091-2/+2
| | | | llvm-svn: 320262
* [X86] Tag frame pointer XORs instruction scheduler classesSimon Pilgrim2017-12-091-2/+4
| | | | llvm-svn: 320261
* [X86] CMOV pseudo instructions shouldn't need scheduling info as they should ↵Simon Pilgrim2017-12-081-2/+2
| | | | | | be lowered early llvm-svn: 320193
* [X86] Tag move immediate instructions scheduler classesSimon Pilgrim2017-12-081-10/+13
| | | | llvm-svn: 320169
* [X86] Replace tabs with spaces. NFCI.Simon Pilgrim2017-12-071-12/+12
| | | | llvm-svn: 320065
* Re-commit r319490 "XOR the frame pointer with the stack cookie when ↵Hans Wennborg2017-12-051-0/+9
| | | | | | | | | | | | | | | | | | protecting the stack" The patch originally broke Chromium (crbug.com/791714) due to its failing to specify that the new pseudo instructions clobber EFLAGS. This commit fixes that. > Summary: This strengthens the guard and matches MSVC. > > Reviewers: hans, etienneb > > Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits > > Differential Revision: https://reviews.llvm.org/D40622 llvm-svn: 319824
* Revert r319490 "XOR the frame pointer with the stack cookie when protecting ↵Hans Wennborg2017-12-041-9/+0
| | | | | | | | | | | | | | | | | | the stack" This broke the Chromium build (crbug.com/791714). Reverting while investigating. > Summary: This strengthens the guard and matches MSVC. > > Reviewers: hans, etienneb > > Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits > > Differential Revision: https://reviews.llvm.org/D40622 > > git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319490 91177308-0d34-0410-b5e6-96231b3b80d8 llvm-svn: 319706
* XOR the frame pointer with the stack cookie when protecting the stackReid Kleckner2017-11-301-0/+9
| | | | | | | | | | | | Summary: This strengthens the guard and matches MSVC. Reviewers: hans, etienneb Subscribers: hiraditya, JDevlieghere, vlad.tsyrklevich, llvm-commits Differential Revision: https://reviews.llvm.org/D40622 llvm-svn: 319490
* Control-Flow Enforcement Technology - Shadow Stack support (LLVM side)Oren Ben Simhon2017-11-261-7/+7
| | | | | | | | | | | | | | | | | | Shadow stack solution introduces a new stack for return addresses only. The HW has a Shadow Stack Pointer (SSP) that points to the next return address. If we return to a different address, an exception is triggered. The shadow stack is managed using a series of intrinsics that are introduced in this patch as well as the new register (SSP). The intrinsics are mapped to new instruction set that implements CET mechanism. The patch also includes initial infrastructure support for IBT. For more information, please see the following: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf Differential Revision: https://reviews.llvm.org/D40223 Change-Id: I4daa1f27e88176be79a4ac3b4cd26a459e88fed4 llvm-svn: 318996
* [X86] Make sure we don't create locked inc/dec instructions when the carry ↵Craig Topper2017-10-301-8/+28
| | | | | | | | | | | | | | | | | | | | | | | flag is being used. Summary: INC/DEC don't update the carry flag so we need to make sure we don't try to use it. This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested. The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860. This should fully fix PR35068 finishing the fix started in r316860. Reviewers: RKSimon, zvi, spatel Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39411 llvm-svn: 316913
* [X86] Remove combine that turns X86ISD::LSUB into X86ISD::LADD. Update ↵Craig Topper2017-10-291-8/+8
| | | | | | | | | | | | patterns that depended on this. If the carry flag is being used, this transformation isn't safe. This does prevent some test cases from using DEC now, but I'll try to look into that separately. Fixes PR35068. llvm-svn: 316860
* [X86] Use _NOREX MOVZX instructions for some patterns even in 32-bit mode.Craig Topper2017-10-021-32/+6
| | | | | | This unifies the patterns between both modes. This should be effectively NFC since all the available registers in 32-bit mode statisfy this constraint. llvm-svn: 314643
* [X86] Don't select anyext GR32->GR64 to SUBREG_TO_REG. Use INSERT_SUBREG ↵Craig Topper2017-09-251-1/+1
| | | | | | | | | | | | instead. As far as I know SUBREG_TO_REG is stating that the upper bits are 0. But if we are just converting the GR32 with no checks, then we have no reason to say the upper bits are 0. I don't really know how to test this today since I can't find anything that looks that closely at SUBREG_TO_REG. The test changes here seems to be some perturbance of register allocation. Differential Revision: https://reviews.llvm.org/D38001 llvm-svn: 314152
* [X86] Make sure we still emit zext for GR32 to GR64 when the source of the ↵Craig Topper2017-09-181-2/+4
| | | | | | | | | | | | zext is AssertZext The AssertZext we might see in this case is only giving information about the lower 32 bits. It isn't providing information about the upper 32 bits. So we should emit a zext. This fixes PR28540. Differential Revision: https://reviews.llvm.org/D37729 llvm-svn: 313563
* [X86] Don't emit COPY_TO_REG to ABCD registers before EXTRACT_SUBREG of sub_8bitCraig Topper2017-09-181-32/+6
| | | | | | | | This is similar to D37843, but for sub_8bit. This fixes all of the patterns except for the 2 that emit only an EXTRACT_SUBREG. That causes a verifier error with global isel because global isel doesn't know to issue the ABCD when doing this extract on 32-bits targets. Differential Revision: https://reviews.llvm.org/D37890 llvm-svn: 313558
OpenPOWER on IntegriCloud