summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* don't repeat function/class/variable names in comments; NFCSanjay Patel2015-10-131-60/+46
| | | | llvm-svn: 250162
* Fix line-ending issue. NFC.Michael Kuperstein2015-10-131-2/+2
| | | | llvm-svn: 250151
* [X86] Mark the AAD and AAM aliases as not valid in 64-bit mode.Craig Topper2015-10-131-2/+2
| | | | llvm-svn: 250148
* [X86] Change all the i8imm operands in XOP instructions to u8imm so the ↵Craig Topper2015-10-131-10/+10
| | | | | | parser will check the size. llvm-svn: 250147
* x86: preserve flags when folding atomic operationsJF Bastien2015-10-131-6/+8
| | | | | | | | | | | | | | | | | | | | | Summary: D4796 taught LLVM to fold some atomic integer operations into a single instruction. The pattern was unaware that the instructions clobbered flags. This patch adds the missing EFLAGS definition. Floating point operations don't set flags, the subsequent fadd optimization is therefore correct. The same applies for surrounding load/store optimizations. Reviewers: rsmith, rtrieu Subscribers: llvm-commits, reames, morisset Differential Revision: http://reviews.llvm.org/D13680 llvm-svn: 250135
* Make Win64 localescape offsets FP relative instead of SP relativeReid Kleckner2015-10-121-8/+2
| | | | | | | | | We made them SP relative back in March (r233137) because that's the value the runtime passes to EH functions. With the new cleanuppad IR, funclets adjust their frame argument from SP to FP, so our offsets should now be FP-relative. llvm-svn: 250088
* [x86] Fix wrong lowering of vsetcc nodes (PR25080).Andrea Di Biagio2015-10-121-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong assumption that for non-AVX512 targets, the source type and destination type of a type-legalized setcc node were always the same type. This assumption was unfortunately incorrect; the type legalizer is not always able to promote the return type of a setcc to the same type as the first operand of a setcc. In the case of a vsetcc node, the legalizer firstly checks if the first input operand has a legal type. If so, then it promotes the return type of the vsetcc to that same type. Otherwise, the return type is promoted to the 'next legal type', which, for vectors of MVT::i1 is always a 128-bit integer vector type. Example (-mattr=+avx): %0 = trunc <8 x i32> %a to <8 x i23> %1 = icmp eq <8 x i23> %0, zeroinitializer The initial selection dag for the code above is: v8i1 = setcc t5, t7, seteq:ch t5: v8i23 = truncate t2 t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1 t7: v8i32 = build_vector of all zeroes. The type legalizer would firstly check if 't5' has a legal type. If so, then it would reuse that same type to promote the return type of the setcc node. Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to promote the return type of the setcc node. Consequently, the setcc return type is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the following dag node: v8i16 = setcc t32, t25, seteq:ch where t32 and t25 are now values of type v8i32. Before this patch, function LowerVSETCC would have wrongly expanded the setcc to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an instruction. In our case, ISel would have matched a VPCMPEQWrr: t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25 However, t36 and t25 are both VR256, while the result type is instead of class VR128. This inconsistency ended up causing the insertion of COPY instructions like this: %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3 Which is an invalid full copy (not a sub register copy). Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy instruction" in the attempt to expand the malformed pseudo COPY instructions. This patch fixes the problem adding the missing logic in LowerVSETCC to handle the corner case of a setcc with 128-bit return type and 256-bit operand type. This problem was originally reported by Dimitry as PR25080. It has been latent for a very long time. I have added the minimal reproducible from that bugzilla as test setcc-lowering.ll. Differential Revision: http://reviews.llvm.org/D13660 llvm-svn: 250085
* combine predicates; NFCISanjay Patel2015-10-121-4/+1
| | | | llvm-svn: 250075
* fix typos; NFCSanjay Patel2015-10-121-3/+2
| | | | llvm-svn: 250059
* fix capitalization; NFCSanjay Patel2015-10-121-2/+2
| | | | llvm-svn: 250049
* [X86] Add XSAVE intrinsic familyAmjad Aboud2015-10-125-23/+79
| | | | | | | | | | | | Add intrinsics for the XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64) XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64) XSAVEC instructions (XSAVEC/XSAVEC64) XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64) Differential Revision: http://reviews.llvm.org/D13012 llvm-svn: 250029
* [x86] PR24562: fix incorrect folding of PSHUFB nodes with a mask where all ↵Andrea Di Biagio2015-10-121-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | indices have the most significant bit set. This patch fixes a problem in function 'combineX86ShuffleChain' that causes a chain of shuffles to be wrongly folded away when the combined shuffle mask has only one element. We may end up with a combined shuffle mask of one element as a result of multiple calls to function 'canWidenShuffleElements()'. Function canWidenShuffleElements attempts to simplify a shuffle mask by widening the size of the elements being shuffled. For every pair of shuffle indices, function canWidenShuffleElements checks if indices refer to adjacent elements. If all pairs refer to "adjacent" elements then the shuffle mask is safely widened. As a consequence of widening, we end up with a new shuffle mask which is half the size of the original shuffle mask. The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero indices. Function canWidenShuffleElements would combine each pair of SM_SentinelZero indices into a single SM_SentinelZero index. So, in a logarithmic number of steps (4 in this case), the pshufb mask is simplified to a mask with only one index which is equal to SM_SentinelZero. Before this patch, function combineX86ShuffleChain wrongly assumed that a mask of size one is always equivalent to an identity mask. So, the entire shuffle chain was just folded away as the combined shuffle mask was treated as a no-op mask. With this patch we know check if the only element of a combined shuffle mask is SM_SentinelZero. In case, we propagate a zero vector. Differential Revision: http://reviews.llvm.org/D13364 llvm-svn: 250027
* [X86] Use u8imm for the immediate type for all shift and rotate ↵Craig Topper2015-10-121-70/+70
| | | | | | instructions. This way the assembler will perform range checking. Believe this matches gas behavior. llvm-svn: 250016
* [X86] Add support to assembler and MCInst lowering to use the other vmovq ↵Craig Topper2015-10-122-24/+28
| | | | | | %xmmX, %xmmX encoding if it would be a shorter VEX encoding. llvm-svn: 250014
* [X86] Cleanup formatting a bit. NFCCraig Topper2015-10-121-14/+14
| | | | llvm-svn: 250013
* [X86] Change the immediate for IN/OUT instructions to u8imm so the assembly ↵Craig Topper2015-10-122-12/+12
| | | | | | parser will check the size. llvm-svn: 250012
* [X86] Add some instruction aliases to get the assembly parser table to favor ↵Craig Topper2015-10-122-63/+31
| | | | | | | | arithmetic instructions with 8-bit immediates over the forms that implicitly use the ax/eax/rax. This allows us to remove the explicit code for working around the existing priority llvm-svn: 250011
* [X86] Fix CMP and TEST with al/ax/eax/rax to not mark EFLAGS as a use or ↵Craig Topper2015-10-111-27/+34
| | | | | | al/ax/eax/rax as a def. Probably doesn't have a functional affect since these aren't used in isel. llvm-svn: 249994
* [X86] Remove special validation for INT immediate operand from AsmParser. ↵Craig Topper2015-10-112-24/+1
| | | | | | | | Instead mark its operand type as u8imm which will cause it to fail to match. This is more consistent with other instruction behavior. This also fixes a bug where negative immediates below -128 were not being reported as errors. llvm-svn: 249989
* [X86] Simplify immediate range checking code.Craig Topper2015-10-112-18/+13
| | | | llvm-svn: 249979
* [X86][XOP] Added support for the lowering of 128-bit vector integer ↵Simon Pilgrim2015-10-115-12/+62
| | | | | | | | comparisons to XOP PCOM/PCOMU instructions. The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646). llvm-svn: 249976
* Use range-based for loops. NFC.Craig Topper2015-10-101-17/+15
| | | | llvm-svn: 249941
* Use emplace_back instead of a constructor call and push_back. NFCCraig Topper2015-10-101-22/+18
| | | | llvm-svn: 249940
* [WinEH] Remove more dead codeDavid Majnemer2015-10-102-21/+8
| | | | | | wineh-parent is dead, so is ValueOrMBB. llvm-svn: 249920
* [WinEH] Delete the old landingpad implementation of Windows EHReid Kleckner2015-10-091-89/+16
| | | | | | | | | | | The new implementation works at least as well as the old implementation did. Also delete the associated preparation tests. They don't exercise interesting corner cases of the new implementation. All the codegen tests of the EH tables have already been ported. llvm-svn: 249918
* [WinEH] Insert the catchpad return before CSR restorationDavid Majnemer2015-10-091-18/+21
| | | | | | | | x64 catchpads use rax to inform the unwinder where control should go next. However, we must initialize rax before the epilogue sequence so as to not perturb the unwinder. llvm-svn: 249910
* Fix assert in X86 backend.James Y Knight2015-10-091-8/+8
| | | | | | | | | | | | | | When running combine on an extract_vector_elt, it wants to look through a bitcast to check if the argument to the bitcast was itself an extract_vector_elt with particular operands. However, it called getOperand() on the argument to the bitcast *before* checking that the opcode was EXTRACT_VECTOR_ELT, assert-failing if there were zero operands for the actual opcode. Fix, and add trivial test. llvm-svn: 249891
* Revert "Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on ↵Reid Kleckner2015-10-092-23/+7
| | | | | | | | | | Win64""" This reverts commit r249794. Apparently my checkouts are full of unexpected surprises today. llvm-svn: 249796
* Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64""Reid Kleckner2015-10-092-7/+23
| | | | | | | | This reverts commit r249032. TODO write commit msg llvm-svn: 249794
* Add Triple::isAndroid().Evgeniy Stepanov2015-10-081-3/+1
| | | | | | | This is a simple refactoring that replaces Triple.getEnvironment() checks for Android with Triple.isAndroid(). llvm-svn: 249750
* Move the MMX subtarget feature out of the SSE set of features and intoEric Christopher2015-10-083-168/+298
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | its own variable. This is needed so that we can explicitly turn off MMX without turning off SSE and also so that we can diagnose feature set incompatibilities that involve MMX without SSE. Rationale: // sse3 __m128d test_mm_addsub_pd(__m128d A, __m128d B) { return _mm_addsub_pd(A, B); } // mmx void shift(__m64 a, __m64 b, int c) { _mm_slli_pi16(a, c); _mm_slli_pi32(a, c); _mm_slli_si64(a, c); _mm_srli_pi16(a, c); _mm_srli_pi32(a, c); _mm_srli_si64(a, c); _mm_srai_pi16(a, c); _mm_srai_pi32(a, c); } clang -msse3 -mno-mmx file.c -c For this code we should be able to explicitly turn off MMX without affecting the compilation of the SSE3 function and then diagnose and error on compiling the MMX function. This matches the existing gcc behavior and follows the spirit of the SSE/MMX separation in llvm where we can (and do) turn off MMX code generation except in the presence of intrinsics. Updated a couple of tests, but primarily tested with a couple of tests for turning on only mmx and only sse. This is paired with a patch to clang to take advantage of this behavior. llvm-svn: 249731
* [WinEH] Relax assertion in the presence of stack realignmentReid Kleckner2015-10-081-5/+5
| | | | | | The code is correct as is, but we should test it. llvm-svn: 249715
* [X86] Disable X86CallFrameOptimization on Darwin in presence of EHFrederic Riss2015-10-081-0/+6
| | | | | | | | | | | | | | We emit 1 compact unwind encoding per function, and this can’t represent the varying stack pointer that will be generated by X86CallFrameOptimization. Disable the optimization on Darwin. (It might be possible to split the function into multiple ranges and emit 1 compact unwind info per range. The compact unwind emission code isn’t ready for that and this kind of info certainly isn’t tested/used anywhere. It might be worth exploring this path if we want to get the space savings at some point though) llvm-svn: 249694
* AVX512: vpextrb/w/d/q and vpinsrb/w/d/q implementation.Igor Breger2015-10-083-44/+173
| | | | | | | | | This instructions doesn't have intrincis. Added tests for lowering and encoding. Differential Revision: http://reviews.llvm.org/D12317 llvm-svn: 249688
* [X86] Fix wrong treatment of multi-lane blends in BUILD_VECTORtoBlendMask()Michael Kuperstein2015-10-081-2/+12
| | | | | | | | | | | This fixes two separate bugs: 1) The mask for the high lane was not set correctly. That fixes PR24532. 2) The transformation should bail out if it believes it involves more than 2 lanes, as it does not currently do anything sensible in this case. Differential Revision: http://reviews.llvm.org/D13505 llvm-svn: 249669
* [WinEH] Fix 32-bit funclet epilogues in the presence of dynamic allocasReid Kleckner2015-10-071-2/+5
| | | | | | | | In particular, passing non-trivially copyable objects by value on win32 uses a dynamic alloca (inalloca). We would clobber ESP in the epilogue and end up returning to outer space. llvm-svn: 249637
* Test commit access. Fixed comment to have correct input parameter name andKevin B. Smith2015-10-071-1/+1
| | | | | | period termination. llvm-svn: 249571
* [X86] Emit .cfi_escape GNU_ARGS_SIZE when adjusting the stack before callsMichael Kuperstein2015-10-071-2/+19
| | | | | | | | | | | | When outgoing function arguments are passed using push instructions, and EH is enabled, we may need to indicate to the stack unwinder that the stack pointer was adjusted before the call. This should fix the exception handling issues in PR24792. Differential Revision: http://reviews.llvm.org/D13132 llvm-svn: 249522
* AVX512: Change encoding of vpshuflw and vpshufhw instructions. Implement WIG ↵Igor Breger2015-10-071-3/+2
| | | | | | | | | | as W0 and not W1, like all other instruction have been implemented. Add encoding tests. Differential Revision: http://reviews.llvm.org/D13471 llvm-svn: 249521
* Fix Clang-tidy modernize-use-nullptr warnings in source directories and ↵Hans Wennborg2015-10-061-4/+5
| | | | | | | | | | generated files; other minor cleanups. Patch by Eugene Zelenko! Differential Revision: http://reviews.llvm.org/D13321 llvm-svn: 249482
* [WinEH] Recognize CoreCLR personality functionJoseph Tremoulet2015-10-062-6/+6
| | | | | | | | | | | | | | | Summary: - Add CoreCLR to if/else ladders and switches as appropriate. - Rename isMSVCEHPersonality to isFuncletEHPersonality to better reflect what it captures. Reviewers: majnemer, andrew.w.kaylor, rnk Subscribers: pgavlin, AndyAyers, llvm-commits Differential Revision: http://reviews.llvm.org/D13449 llvm-svn: 249455
* [X86] Teach constant hoisting that ANDs with 64-bit immediates in the range ↵Craig Topper2015-10-061-1/+7
| | | | | | | | 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted. Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister. llvm-svn: 249370
* [X86] Remove unnecessary AddComplexity directive. The instruction is already ↵Craig Topper2015-10-061-1/+0
| | | | | | wrapped in the equivalent earlier. NFC llvm-svn: 249369
* [WinEH] Update CATCHRET's operand to match its successorDavid Majnemer2015-10-051-0/+1
| | | | | | | | | | | | The CATCHRET operand did not match the MachineFunction's CFG. This mismatch happened because FrameLowering created a new MachineBasicBlock and updated the CFG but forgot to update the CATCHRET operand. Let's make sure this doesn't happen again by strengthing the funclet membership analysis: it can now reason about the membership of all basic blocks, not just those inside of funclets. llvm-svn: 249344
* AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions.Igor Breger2015-10-044-61/+102
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12690 llvm-svn: 249261
* [X86] Lower SEXTLOAD using SIGN_EXTEND_VECTOR_INREG. NCI.Simon Pilgrim2015-10-031-22/+5
| | | | | | The custom lowering in LowerExtendedLoad is doing the equivalent shuffle, so make use of existing lowering code to reduce duplication. llvm-svn: 249243
* Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts ↵Andrea Di Biagio2015-10-021-0/+24
| | | | | | | | | | | | | | | | | | | | | | | between 128/256-bit vector types." This patch teaches FastIsel the following two things: 1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types; 2) On AVX, no instructions are needed for bitcasts between 256-bit vector types. Example: %1 = bitcast <4 x i31> %V to <2 x i64> Before (-fast-isel -fast-isel-abort=1): FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64> Now we don't fall back to SelectionDAG and we correctly fold that computation propagating the register associated to %V. Originally reviewed here: http://reviews.llvm.org/D13347 llvm-svn: 249147
* Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between ↵Andrea Di Biagio2015-10-021-24/+0
| | | | | | | | | 128/256-bit vector types. r249121 caused a Clang test failure (avx2-buitins.c). Revert r249121 while I keep investigating on the reason why that test failed. llvm-svn: 249124
* [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit ↵Andrea Di Biagio2015-10-021-0/+24
| | | | | | | | | | | | | | | | | | | | | | | vector types. This patch teaches FastIsel the following two things: 1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types; 2) On AVX, no instructions are needed for bitcasts between 256-bit vector types. Example: %1 = bitcast <4 x i31> %V to <2 x i64> Before (-fast-isel -fast-isel-abort=1): FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64> Now we don't fall back to SelectionDAG and we correctly fold that computation propagating the register associated to %V. Differential Revision: http://reviews.llvm.org/D13347 llvm-svn: 249121
* [WinEH] Emit __C_specific_handler tables for the new IRReid Kleckner2015-10-011-4/+15
| | | | | | | | | | | | We emit denormalized tables, where every range of invokes in the same state gets a complete list of EH action entries. This is significantly simpler than trying to infer the correct nested scoping structure from the MI. Fortunately, for SEH, the nesting structure is really just a size optimization. With this, some basic __try / __except examples work. llvm-svn: 249078
OpenPOWER on IntegriCloud