summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* MC: rename EmitWin64EH routinesSaleem Abdulrasool2014-06-292-15/+13
| | | | | | | | | | | | | | | Rename the routines to reflect the reality that they are more related to call frame information than to Win64 EH. Although EH is implemented in an intertwined manner by augmenting with an exception handler and an associated parameter, the majority of these routines emit information required to unwind the frames. This also helps identify that these routines are generic for most windows platforms (they apply equally to nearly all architectures except x86) although the encoding of the information is architecture dependent. Unwinding data is emitted via EmitWinCFI* and exception handling information via EmitWinEH*. llvm-svn: 211994
* [x86] Fix a bug in the v8i16 shuffling exposed by the new splat-likeChandler Carruth2014-06-281-1/+1
| | | | | | | | | | | lowering for v16i8. ASan and some bots caught this bug with existing test cases. Fixing it even fixed a miscompile with one of the test cases. I'm still a bit suspicious of this test case as I've not taken a proper amount of time to think about it, but the fix here is strict goodness. llvm-svn: 211976
* [x86] Add handling for splat-like widenings of v16i8 shuffles.Chandler Carruth2014-06-281-0/+80
| | | | | | | | | | | | | | | | These show up really frequently, not the least with actual splats. =] We lowered these quite badly before. The new code path tries to widen i8 shuffles to i16 shuffles in a splat-like way. There are still some inefficiencies in our i16 splat logic though, so we aren't really done here. Also, for certain patterns (bit of a gather-and-splat) we still generate pretty silly code, and I've left a fixme for addressing it. However, I'm not actually worried about this code pattern as much. The old shuffle lowering generates a 29 instruction monstrosity for it that should execute much more slowly. llvm-svn: 211974
* [x86] Fix another bug hit when bootstrapping with the new shuffleChandler Carruth2014-06-271-1/+1
| | | | | | | | | | | | | lowering. For maximum irony, I had already discovered this bug, diagnosed it, and left FIXMEs about it in the test cases. =[ I just failed to go back over those until after i had reduced a bootstrap miscompile down to a single TU, stared at the assembly for an hour, and figured out the bug. Again. Oh well. llvm-svn: 211955
* [x86] Fix a miscompile in the new shuffle lowering uncovered byChandler Carruth2014-06-271-13/+13
| | | | | | | | | a bootstrap. I managed to mis-remember how PACKUS worked on x86, and was using undef for the high bytes instead of zero. The fix is fairly obvious. llvm-svn: 211922
* [FastISel][X86] Fix typos.Juergen Ributzka2014-06-271-13/+13
| | | | llvm-svn: 211911
* Clean up unused variable warning in release build.Alexander Kornienko2014-06-271-0/+1
| | | | llvm-svn: 211902
* [x86] Clean up some unused variables, especially in release builds.Chandler Carruth2014-06-271-9/+6
| | | | llvm-svn: 211894
* [x86] Teach the target combine step to aggressively fold pshufd insturcions.Chandler Carruth2014-06-271-11/+77
| | | | | | | | | | | | | Summary: This allows it to fold pshufd instructions across intervening half-shuffles and other noise. This pattern actually shows up in the generic lowering tests, but I've also added direct tests using intrinsics to make sure that the specific desired functionality is working even if the lowering stuff changes in the future. Differential Revision: http://reviews.llvm.org/D4292 llvm-svn: 211892
* [x86] Teach the target-specific combining how to aggressively foldChandler Carruth2014-06-271-0/+90
| | | | | | | | | | | | | | | | | | half-shuffles, even looking through intervening instructions in a chain. Summary: This doesn't happen to show up with any test cases I've found for the current shuffle lowering, but previous attempts would benefit from this and it seems generally useful. I've tested it directly using intrinsics, which also shows that it will work with hand vectorized code as well. Note that even though pshufd isn't directly used in these tests, it gets exercised because we combine some of the half shuffles into a pshufd first, and then merge them. Differential Revision: http://reviews.llvm.org/D4291 llvm-svn: 211890
* [x86] Teach the X86 backend to DAG-combine SSE2 shuffles that areChandler Carruth2014-06-271-1/+101
| | | | | | | | | | | | | | | | | trivially redundant. This fixes several cases in the new vector shuffle lowering algorithm which would generate redundant shuffle instructions for the sake of simplicity. I'm also deleting a testcase which was somewhat ridiculous. It was checking for a bug in 2007 about incorrectly transforming shuffles by looking for the string "-86" in the output of a pretty substantial function. This test case doesn't seem to have any value at this point. Differential Revision: http://reviews.llvm.org/D4240 llvm-svn: 211889
* [x86] Begin a significant overhaul of how vector lowering is done in theChandler Carruth2014-06-271-0/+1029
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86 backend. This sketches out a new code path for vector lowering, hidden behind an off-by-default flag while it is under development. The fundamental idea behind the new code path is to aggressively break down the problem space in ways that ease selecting the odd set of instructions available on x86, and carefully avoid scalarizing code even when forced to use older ISAs. Notably, this starts off restricting itself to SSE2 and implements the complete vector shuffle and blend space for 128-bit vectors in SSE2 without scalarizing. The plan is to layer on top of this ISA extensions where we can bail out of the complex SSE2 lowering and opt for a cheaper, specialized instruction (or set of instructions). It also needs to be generalized to AVX and AVX512 vector widths. Currently, this does a decent but not perfect job for SSE2. There are some specific shortcomings that I plan to address: - We need a peephole combine to fold together shuffles where possible. There are cases where a previous shuffle could be modified slightly to arrange for elements to be in the correct position and a later shuffle eliminated. Doing this eagerly added quite a bit of complexity, and so my plan is to combine away these redundancies afterward. - There are a lot more clever ways to use unpck and pack that need to be added. This is essential for real world shuffles as it turns out... Once SSE2 is polished a bit I should be able to get interesting numbers on performance improvements on benchmarks conducive to vectorization. All of this will be off by default until it is functionally equivalent of course. Differential Revision: http://reviews.llvm.org/D4225 llvm-svn: 211888
* Rename getX86ConditonCode -> getX86ConditionCodeCraig Topper2014-06-271-5/+5
| | | | llvm-svn: 211869
* [X86] AVX512: Add vbroadcasti*Adam Nemet2014-06-271-0/+22
| | | | | | | | | For now I used a separate template for these sub-vector/tuple broadcasts rather than sharing the mem variants with avx512_int_broadcast_rm. <rdar://problem/17402869> llvm-svn: 211828
* Revert "Introduce a string_ostream string builder facilty"Alp Toker2014-06-263-4/+8
| | | | | | Temporarily back out commits r211749, r211752 and r211754. llvm-svn: 211814
* Remove extraneous includes from the target machines.Eric Christopher2014-06-261-4/+0
| | | | llvm-svn: 211800
* Silence a warning due to a comparison between signed and unsigned.Andrea Di Biagio2014-06-261-1/+1
| | | | | | No functional change intended. llvm-svn: 211782
* [X86] Improve the selection of SSE3/AVX addsub instructions. Andrea Di Biagio2014-06-261-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | This patch teaches the backend how to canonicalize a shuffle vectors according to the rule: - (shuffle (FADD A, B), (FSUB A, B), Mask) -> (shuffle (FSUB A, -B), (FADD A, -B), Mask) Where 'Mask' is: <0,5,2,7> ;; for v4f32 and v4f64 shuffles. <0,3> ;; for v2f64 shuffles. <0,9,2,11,4,13,6,15> ;; for v8f32 shuffles. In general, ISel only knows how to pattern-match a canonical 'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction. This new rule allows to convert a non-canonical dag sequence into a canonical one that will be matched by a single ADDSUB at ISel stage. The idea of converting a non-canonical ADDSUB into a canonical one by swapping the first two operands of the shuffle, and then negating the second operand of the FADD and FSUB, was originally proposed by Hal Finkel. llvm-svn: 211771
* [X86] AVX512: Fix asm syntax for packed vcmpAdam Nemet2014-06-261-3/+3
| | | | | | | | The *_alt defs for vcmp are used by the InstParser (the asm string in the main def is used by the InstPrinter) . The former was accepting vector registers as destination rather than mask registers. llvm-svn: 211750
* Introduce a string_ostream string builder faciltyAlp Toker2014-06-263-8/+4
| | | | | | | | | | | | | | | | | | | | string_ostream is a safe and efficient string builder that combines opaque stack storage with a built-in ostream interface. small_string_ostream<bytes> additionally permits an explicit stack storage size other than the default 128 bytes to be provided. Beyond that, storage is transferred to the heap. This convenient class can be used in most places an std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair would previously have been used, in order to guarantee consistent access without byte truncation. The patch also converts much of LLVM to use the new facility. These changes include several probable bug fixes for truncated output, a programming error that's no longer possible with the new interface. llvm-svn: 211749
* [FastISel][X86] More refactoring of select lowering and XALU folding. NFC.Juergen Ributzka2014-06-251-83/+55
| | | | llvm-svn: 211740
* [FastISel][X86] Refactor XALU folding. NFC.Juergen Ributzka2014-06-251-124/+70
| | | | llvm-svn: 211735
* [FastISel][X86] Only fold the cmp into the select when both instructions are ↵Juergen Ributzka2014-06-251-5/+15
| | | | | | | | | | | | in the same basic block. If the cmp is in a different basic block, then it is possible that not all operands of that compare have defined registers. This can happen when one of the operands to the cmp is a load and the load gets folded into the cmp. In this case FastISel will skip the load instruction and the vreg is never defined. llvm-svn: 211730
* [X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP ↵Andrea Di Biagio2014-06-252-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | (or VPERM2X128). This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to the check for 'isBlendMask'; the idea is that, when possible, we should firstly check if a shuffle performs a blend, and in case, try to lower it into a BLENDI instead of selecting a SHUFP or (worse) a VPERM2X128. In general: - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128; - BLENDPS/D instructions tend to always have better 'reciprocal throughput' than the equivalent SHUFPS/D; - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more than one execution port. This patch: - Moves the check for 'isBlendMask' immediately before the check for 'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE'; - Updates existing tests for sse/avx shuffle/blend instructions to verify that we select (v)blendps/d when possible (instead of (v)shufps/d or vperm2f128). llvm-svn: 211720
* Fix indentation.Juergen Ributzka2014-06-251-7/+7
| | | | llvm-svn: 211717
* [x86] Add intrinsics for the pshufd, pshuflw, and pshufhw instructions.Chandler Carruth2014-06-251-0/+12
| | | | llvm-svn: 211694
* Re-apply r211399, "Generate native unwind info on Win64" with a fix to ↵NAKAMURA Takumi2014-06-257-113/+329
| | | | | | | | | | | | | | | | | | | | | | | ignore SEH pseudo ops in X86 JIT emitter. -- This patch enables LLVM to emit Win64-native unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment. Because the unwind info is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer. Patch by Vadim Chugunov! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D4081 llvm-svn: 211691
* Reformat.NAKAMURA Takumi2014-06-252-9/+10
| | | | llvm-svn: 211689
* [X86] Add target combine rule to select ADDSUB instructions from a build_vectorAndrea Di Biagio2014-06-252-1/+138
| | | | | | | | | | | | | | | | | This patch teaches the backend how to combine a build_vector that implements an 'addsub' between packed float vectors into a sequence of vector add and vector sub followed by a VSELECT. The new VSELECT is expected to be lowered into a BLENDI. At ISel stage, the sequence 'vector add + vector sub + BLENDI' is pattern-matched against ISel patterns added at r211427 to select 'addsub' instructions. Added three more ISel patterns for ADDSUB. Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub' instructions. llvm-svn: 211679
* [FastISel][X86] Fold XALU condition into branch and compare.Juergen Ributzka2014-06-241-0/+150
| | | | | | | Optimize the codegen of select and branch instructions to directly use the EFLAGS from the {s|u}{add|sub|mul}.with.overflow intrinsics. llvm-svn: 211645
* vpblend intrinsics combines as shifts intrinsics due to absence return stmt ↵Robert Khasanov2014-06-241-0/+2
| | | | | | | | | | between them Fix PR20088 Differential Revision: http://reviews.llvm.org/D4277 llvm-svn: 211617
* [Disasm][AVX512] Implement decoding of top bit for non-destructive reg fieldsAdam Nemet2014-06-241-1/+2
| | | | | | | | | V' bit in the P2 byte of the EVEX prefix provides the top bit of the NDD and NDS register fields. This was simply not used in the decoder until now. Fixes <rdar://problem/17402661> llvm-svn: 211565
* [FastISel][X86] Lower unsupported selects to control-flow.Juergen Ributzka2014-06-231-0/+71
| | | | | | | | The extends the select lowering coverage by emiting pseudo cmov instructions. These insturction will be later on lowered to control-flow to simulate the select. llvm-svn: 211545
* [FastISel][X86] Add support for floating-point select.Juergen Ributzka2014-06-231-0/+128
| | | | | | | | | | This extends the select lowering to support floating-point selects. The lowering depends on SSE instructions and that the conditon comes from a floating-point compare. Under this conditions it is possible to emit an optimized instruction sequence that doesn't require any branches to simulate the select. llvm-svn: 211544
* [FastISel][X86] Optimize selects when the condition comes from a compare.Juergen Ributzka2014-06-233-37/+152
| | | | | | | Optimize the select instructions sequence to use the EFLAGS directly from a compare when possible. llvm-svn: 211543
* Revert r211399, "Generate native unwind info on Win64"NAKAMURA Takumi2014-06-226-325/+117
| | | | | | It broke Legacy JIT Tests on x86_64-{mingw32|msvc}, aka Windows x64. llvm-svn: 211480
* Fix PR20087 by using the source index when changing the vector loadFilipe Cabecinhas2014-06-221-2/+2
| | | | llvm-svn: 211472
* [X86] Add ISel patterns to select SSE3/AVX ADDSUB instructions.Andrea Di Biagio2014-06-211-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds ISel patterns to select SSE3/AVX ADDSUB instructions from a sequence of "vadd + vsub + blend". Example: /// typedef float float4 __attribute__((ext_vector_type(4))); float4 foo(float4 A, float4 B) { float4 X = A - B; float4 Y = A + B; return (float4){X[0], Y[1], X[2], Y[3]}; } /// Before this patch, (with flag -mcpu=corei7) llc produced the following assembly sequence: movaps %xmm0, %xmm2 addps %xmm1, %xmm2 subps %xmm1, %xmm0 blendps $10, %xmm2, %xmm0 With this patch, we now get a single addsubps %xmm1, %xmm0 llvm-svn: 211427
* Delete dead code.Rafael Espindola2014-06-201-17/+8
| | | | | | The compact unwind info is only used by code that knows it is supported. llvm-svn: 211412
* Don't produce eh_frame relocations when targeting the IOS simulator.Rafael Espindola2014-06-201-2/+3
| | | | | | First step for fixing pr19185. llvm-svn: 211404
* Generate native unwind info on Win64Reid Kleckner2014-06-206-117/+325
| | | | | | | | | | | | | | | | | | | | This patch enables LLVM to emit Win64-native unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment. Because the unwind info is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer. Patch by Vadim Chugunov! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D4081 llvm-svn: 211399
* Add Support to Recognize and Vectorize NON SIMD instructions in SLPVectorizer.Karthik Bhat2014-06-201-8/+38
| | | | | | | | | This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and vectorizes them as vector shuffles if they are profitable. These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86. Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015 llvm-svn: 211339
* [x86] Make the x86 PACKSSWB, PACKSSDW, PACKUSWB, and PACKUSDWChandler Carruth2014-06-204-21/+152
| | | | | | | | | | | | | | | instructions available as synthetic SDNodes PACKSS and PACKUS that will select to the correct instruction variants based on the return type. This allows us to use these rather important instructions when lowering vector shuffles. Also moves the relevant instruction definitions to be split out from the fully generic multiclasses to allow them to match these new SDNodes in the same way that the UNPCK instructions do. No functionality should actually be changed here. llvm-svn: 211332
* Fix typosAlp Toker2014-06-191-1/+1
| | | | llvm-svn: 211304
* [X86] Teach how to combine horizontal binop even in the presence of undefs.Andrea Di Biagio2014-06-191-40/+115
| | | | | | | | | | | | | | Before this change, the backend was unable to fold a build_vector dag node with UNDEF operands into a single horizontal add/sub. This patch teaches how to combine a build_vector with UNDEF operands into a horizontal add/sub when possible. The algorithm conservatively avoids to combine a build_vector with only a single non-UNDEF operand. Added test haddsub-undef.ll to verify that we correctly fold horizontal binop even in the presence of UNDEFs. llvm-svn: 211265
* MS asm: Properly handle quoted symbol namesDavid Majnemer2014-06-191-2/+4
| | | | | | | | | | | | | We would get confused by '@' characters in symbol names, we would mistake the text following them for the variant kind. When an identifier a string, the variant kind will never show up inside of it. Instead, check to see if there is a variant following the string. This fixes PR19965. llvm-svn: 211249
* [X86] AVX512: Add non-temporal storesAdam Nemet2014-06-181-0/+29
| | | | | | | | | | | Note that I followed the AVX2 convention here and didn't add LLVM intrinsics for stores. These can be generated with the nontemporal hint on LLVM IR stores (see new test). The GCC builtins are lowered directly into nontemporal stores. <rdar://problem/17082571> llvm-svn: 211176
* [X86] AVX512: Specify compressed displacement for vmovntdqaAdam Nemet2014-06-181-1/+1
| | | | | | | Use the max 64-bit element size with EVEX_CD8. This should work since element size is ignored for a full-vector access (FVM). llvm-svn: 211175
* Add pattern for unsigned v4i32->v4f64 convert on AVX512.Cameron McInally2014-06-181-0/+4
| | | | llvm-svn: 211164
* Allow X86FastIsel to cope with 64 bit absolute relocationsLouis Gerbarg2014-06-171-10/+12
| | | | | | | | | | | | This patch is a follow up to r211040 & r211052. Rather than bailing out of fast isel this patch will generate an alternate instruction (movabsq) instead of the leaq. While this will always have enough room to handle the 64 bit displacment it is generally over kill for internal symbols (most displacements will be within 32 bits) but since we have no way of communicating the code model to the the assmebler in order to avoid flagging an absolute leal/leaq as illegal when using a symbolic displacement. llvm-svn: 211130
OpenPOWER on IntegriCloud