summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
* NFC - Typo fixes in X86 flags-copy-lowering.mir testGabor Buella2018-07-071-3/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D48934 llvm-svn: 336484
* Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084.Nico Weber2018-07-061-20/+20
| | | | llvm-svn: 336453
* [SelectionDAG] https://reviews.llvm.org/D48278Diogo N. Sampaio2018-07-061-20/+20
| | | | | | | | | | | | | | | | D48278 Allow to reduce redundant shift masks. For example: x1 = x & 0xAB00 x2 = (x >> 8) & 0xAB can be reduced to: x1 = x & 0xAB00 x2 = x1 >> 8 It only allows folding when the masks and shift values are constants. llvm-svn: 336426
* [X86] Remove FMA4 scalar intrinsics. Use llvm.fma intrinsic instead.Craig Topper2018-07-063-16/+28
| | | | | | | | The intrinsics can be implemented with a f32/f64 llvm.fma intrinsic and an insert into a zero vector. There are a couple regressions here due to SelectionDAG not being able to pull an fneg through an extract_vector_elt. I'm not super worried about this though as InstCombine should be able to do it before we get to SelectionDAG. llvm-svn: 336416
* [X86] Remove all of the avx512 masked packed fma intrinsics. Use llvm.fma or ↵Craig Topper2018-07-063-356/+1967
| | | | | | | | | | unmasked 512-bit intrinsics with rounding mode. This upgrades all of the intrinsics to use fneg instructions to convert fma into fmsub/fnmsub/fnmadd/fmsubadd. And uses a select instruction for masking. This matches how clang uses the intrinsics these days. llvm-svn: 336409
* [X86] Cleanup some of the avx512 masked fma tests to prepare for removing ↵Craig Topper2018-07-063-1217/+1099
| | | | | | | | | | and autoupgrading. -Split cases that call 2 intrinsics in the same case. -Remove testing mask3 and maskz intrinsics with an all ones mask. These won't be interesting after the upgrade. -Restore test cases for some intrinsics that are marked for deletion, but haven't been deleted yet. llvm-svn: 336408
* [x86]Add a test case to show missed vfnmadd generation.Easwaran Raman2018-07-061-0/+25
| | | | llvm-svn: 336404
* [X86] Remove the last of the 'x86.fma.' intrinsics and autoupgrade them to ↵Craig Topper2018-07-054-184/+1586
| | | | | | | | 'llvm.fma'. Add upgrade tests for all. Still need to remove the AVX512 masked versions. llvm-svn: 336383
* [X86] Add SHUF128 to target shuffle decoding.Craig Topper2018-07-052-65/+127
| | | | | | Differential Revision: https://reviews.llvm.org/D48954 llvm-svn: 336376
* [X86][SSE] Add srem x, (1 << c) combine testsSimon Pilgrim2018-07-051-0/+227
| | | | | | Now that D45806 has landed we can start trying to avoid scalarizing srem by constant - these tests demonstrate some example cases. llvm-svn: 336360
* [AArch64, PowerPC, x86] add tests for signbit bit hacks; NFCSanjay Patel2018-07-051-0/+160
| | | | llvm-svn: 336348
* [X86][SSE] Add extra v16i16 shl x,c -> pmullw testSimon Pilgrim2018-07-051-0/+27
| | | | | | We want to compare shifts with repeated vs non-repeated v8i16 shuffle masks (for PBLENDW ymm) llvm-svn: 336333
* [X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart ↵Craig Topper2018-07-054-88/+104
| | | | | | independent FMA and extractelement/insertelement. llvm-svn: 336315
* [X86] Add support for combining FMSUB/FNMADD/FNMSUB ISD nodes with an fneg ↵Craig Topper2018-07-053-38/+14
| | | | | | | | | | input. Previously we could only negate the FMADD opcodes. This used to be mostly ok when we lowered FMA intrinsics during lowering. But with the move to llvm.fma from target specific intrinsics, we can combine (fneg (fma)) to (fmsub) earlier. So if we start with (fneg (fma (fneg))) we would get stuck at (fmsub (fneg)). This patch fixes that so we can also combine things like (fmsub (fneg)). llvm-svn: 336304
* [X86] Remove some of the packed FMA3 intrinsics since we no longer use them ↵Craig Topper2018-07-052-6/+18
| | | | | | | | | | in clang. There's a regression in here due to inability to combine fneg inputs of X86ISD::FMSUB/FNMSUB/FNMADD nodes. More removals to come, but I wanted to stop and fix the regression that showed up in this first. llvm-svn: 336303
* [X86][SSE] Add v16i16 shl x,c -> pmullw testSimon Pilgrim2018-07-041-3/+26
| | | | llvm-svn: 336277
* [X86][SSE] Add SSE2 target to some shift testsSimon Pilgrim2018-07-043-275/+638
| | | | | | | | Show the difference in behaviour cf SSE41 (no PMULLD, PBLENDW etc.) Raised by D48936 llvm-svn: 336271
* [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values (REAPPLIED)Simon Pilgrim2018-07-041-22/+12
| | | | | | | | We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. Reapplied with a fixed (extra null tests) version of rL336113 after reversion in rL336189 - extra test case added at rL336247. llvm-svn: 336250
* [X86][SSE] Add reduced crash test case for r336113 - [X86][SSE] Blend any ↵Simon Pilgrim2018-07-041-0/+25
| | | | | | | | v8i16/v4i32 shift with 2 shift unique values The patch was reverted at r336189 due to crashes llvm-svn: 336247
* [ImplicitNullChecks] Check for rewrite of register used in 'test' instructionMax Kazantsev2018-07-041-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | The following code pattern: mov %rax, %rcx test %rax, %rax %rax = .... je throw_npe mov(%rcx), %r9 mov(%rax), %r10 gets transformed into the following incorrect code after implicit null check pass: mov %rax, %rcx %rax = .... faulting_load_op("movl (%rax), %r10", throw_npe) mov(%rcx), %r9 For implicit null check pass, if the register that is checked for null value (ie, the register used in the 'test' instruction) is written into before the condition jump, we should avoid doing the optimization. Patch by Surya Kumari Jangala! Differential Revision: https://reviews.llvm.org/D48627 Reviewed By: skatkov llvm-svn: 336241
* [X86] Add tests for low/high bit clearing with different attributes.Roman Lebedev2018-07-032-0/+2776
| | | | | | | | | | | | | | D48768 may turn some of these into shifts. Reviewers: spatel Reviewed By: spatel Subscribers: spatel, RKSimon, llvm-commits, craig.topper Differential Revision: https://reviews.llvm.org/D48767 llvm-svn: 336224
* [X86][AsmParser] Don't consider %eip as a valid register outside of 32-bit mode.Craig Topper2018-07-031-1/+1
| | | | | | | | This might make the error message added in r335668 unneeded, but I'm not sure yet. The check for RIP is technically unnecessary since RIP is in GR64, but that fact is kind of surprising so be explicit. llvm-svn: 336217
* [DAGCombiner] visitSDIV - Permit MIN_SIGNED_VALUE in pow2 vector codegenSimon Pilgrim2018-07-031-177/+206
| | | | | | Now that D45806 has landed, we can re-enable support for MIN_SIGNED_VALUE in the sdiv by pow2-constant code llvm-svn: 336198
* Revert "[X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique values"Benjamin Kramer2018-07-031-12/+22
| | | | | | This reverts commit r336113. It causes crashes. llvm-svn: 336189
* [X86] Add avx512vl command line to break-false-dep.llCraig Topper2018-07-031-2/+23
| | | | llvm-svn: 336169
* [X86] Add phony registers for high halves of regs with low halvesKrzysztof Parzyszek2018-07-024-4/+4
| | | | | | | | | | Add registers still missing after r328016 (D43353): - for bits 15-8 of SI, DI, BP, SP (*H), and R8-R15 (*BH), - for bits 31-16 of R8-R15 (*WH). Thanks to Craig Topper for pointing it out. llvm-svn: 336134
* [X86] Don't use aligned load/store instructions for fp128 if the load/store ↵Craig Topper2018-07-022-3/+3
| | | | | | | | | | isn't aligned. Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions. Should fix PR38001 llvm-svn: 336121
* [X86][SSE] Blend any v8i16/v4i32 shift with 2 shift unique valuesSimon Pilgrim2018-07-021-22/+12
| | | | | | We were only doing this for basic blends, despite shuffle lowering now being good enough to handle more complex blends. This means that the two v8i16 splat shifts are performed in parallel instead of serially as the general shift case. llvm-svn: 336113
* [X86][SSE] Add v8i16 shift test for 2 shift values that doesn't match basic ↵Simon Pilgrim2018-07-021-0/+32
| | | | | | | | blend We have special case support for 2 shift values for basic blends, but irregular shift patterns end up using the generic lowering, despite shuffle lowering being good enough to handle more complex blends. llvm-svn: 336112
* [X86] Fix a few test names in avx512-intrinsics-fast-isel.ll to match their ↵Craig Topper2018-07-011-8/+8
| | | | | | | | clang intrinsic names. I thought I fixed these yesterday, but I guess I missed a few. llvm-svn: 336071
* [DAGCombiner] Handle correctly non-splat power of 2 -1 divisor (PR37119)Simon Pilgrim2018-06-301-268/+80
| | | | | | | | | | The combine added in commit 329525 overlooked the case where one, but not all, of the divisor elements is -1, -1 is the only power of two value for which the sdiv expansion recipe breaks. Thanks to @zvi for the original patch. Differential Revision: https://reviews.llvm.org/D45806 llvm-svn: 336048
* [X86] Update some avx512 fast-isel tests to match their real clang IRgen.Craig Topper2018-06-302-332/+353
| | | | | | | | | | Especially of note was the test_mm_mask_set1_epi64 and other set1 tests that were truncating the element to be broadcasted to i8 and broadcasting that instead of a whole 64 bit value. Some of the others were just correcting mask sizes on parameters due to bugs in the clang test case they were generated from that have now been fixed. Some were converting i8 to <4 x i1>/<2 x i1> by truncating to i4/i2 and then bitcasting. But the clang codegen is bitcast to <8 x i1>, then extract to <4 x i1>/<2 x i1>. This is likely to incur less trouble from the integer type legalizer in the backend. llvm-svn: 336045
* [X86] Change some chec-prefixes from X32 to X86 to match the FileCheck ↵Craig Topper2018-06-301-48/+48
| | | | | | | | command line. I think this test changed and these test cases were created around the same time and missed the change. llvm-svn: 336044
* [X86] Remove test cases from avx512vl-intrinsics-fast-isel.ll for intrinsics ↵Craig Topper2018-06-301-49/+0
| | | | | | that don't really exist in clang. llvm-svn: 336043
* [X86] Remove masking from avx512 rotate intrinsics. Use select in IR instead.Craig Topper2018-06-307-232/+2526
| | | | llvm-svn: 336035
* [X86] Limit the number of target specific nodes emitted in LowerShiftPartsCraig Topper2018-06-292-42/+22
| | | | | | | | | | The important part is the creation of the SHLD/SHRD nodes. The compare and the conditional move can use target independent nodes that can be legalized on their own. This gives some opportunities to trigger the optimizations present in the lowering for those things. And its just better to limit the number of places we emit target specific nodes. The changed test cases still aren't optimal. Differential Revision: https://reviews.llvm.org/D48619 llvm-svn: 335998
* [X86][SSE] Support v16i8/v32i8 vector rotationsSimon Pilgrim2018-06-293-1341/+1130
| | | | | | | | | | This uses the same technique as for shifts - split the rotation into 4/2/1-bit partial rotations and select those partials based on the amount bit, making use of PBLENDVB if available. This halves the use of PBLENDVB compared to expanding to shifts, which can be a slow op. Unfortunately I haven't found a decent way to share much of this code with the shift equivalent. Differential Revision: https://reviews.llvm.org/D48655 llvm-svn: 335957
* [X86] Remove masking from the avx512 packed sqrt intrinsics. Use select in ↵Craig Topper2018-06-293-8/+490
| | | | | | | | IR instead. While there improve the coverage of the intrinsic testing and add fast-isel tests. llvm-svn: 335944
* [COFF] Fix constant sharing regression for MinGWMartin Storsjo2018-06-281-0/+9
| | | | | | | | | | This fixes a regression since SVN r334523, where the object files built targeting MinGW were rejected by GNU binutils tools. Prior to that commit, we only put constants in comdat for MSVC configurations. Differential Revision: https://reviews.llvm.org/D48567 llvm-svn: 335918
* [X86] Suppress load folding into and/or/xor if it will prevent matching ↵Craig Topper2018-06-281-68/+42
| | | | | | | | | | btr/bts/btc. This is a follow up to r335753. At the time I forgot about isProfitableToFold which makes this pretty easy. Differential Revision: https://reviews.llvm.org/D48706 llvm-svn: 335895
* Revert "Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC ↵Jonas Devlieghere2018-06-285-408/+8
| | | | | | | | | | | | | code models"" Reverting because this is causing failures in the LLDB test suite on GreenDragon. LLVM ERROR: unsupported relocation with subtraction expression, symbol '__GLOBAL_OFFSET_TABLE_' can not be undefined in a subtraction expression llvm-svn: 335894
* [DAGCombiner] Ensure we use the correct CC result type in visitSDIV (REAPPLIED)Simon Pilgrim2018-06-281-6/+19
| | | | | | | | | | We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. Reapply original commit rL335821 which was reverted at rL335871 due to a WebAssembly bug that was fixed at rL335884. llvm-svn: 335886
* SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)Matthias Braun2018-06-282-2/+29
| | | | | | | | | | | | | Add NoTrapAfterNoreturn target option which skips emission of traps behind noreturn calls even if TrapUnreachable is enabled. Enable the feature on Mach-O to save code size; Comments suggest it is not possible to enable it for the other users of TrapUnreachable. rdar://41530228 DifferentialRevision: https://reviews.llvm.org/D48674 llvm-svn: 335877
* Revert "[DAGCombiner] Ensure we use the correct CC result type in visitSDIV"Haojian Wu2018-06-281-19/+6
| | | | | | | | This reverts commit r335821. This crashes the webassembly test, run "ninja check-llvm-codegen-webassembly" to reproduce. llvm-svn: 335871
* [DAGCombiner] Ensure we use the correct CC result type in visitSDIVSimon Pilgrim2018-06-281-6/+19
| | | | | | | | We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. llvm-svn: 335821
* [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zerosSanjay Patel2018-06-274-29/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761
* [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit ↵Craig Topper2018-06-271-68/+33
| | | | | | | | | | position If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index. Fixes PR37938. llvm-svn: 335754
* [X86] Add test cases for D48606.Craig Topper2018-06-271-0/+995
| | | | llvm-svn: 335753
* [X86][SSE] Add missing AVX512 rotation testsSimon Pilgrim2018-06-273-230/+943
| | | | | | Increase coverage to make sure we're not doing anything stupid without AVX512BW llvm-svn: 335746
* [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics ↵Craig Topper2018-06-2710-93/+93
| | | | | | | | that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744
OpenPOWER on IntegriCloud