summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86ISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][SSE] Add support for combining PINSRW into a target shuffle.Simon Pilgrim2017-01-311-2/+31
| | | | | | | | | | Also add the ability to recognise PINSR(Vex, 0, Idx). Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat. The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch. llvm-svn: 293627
* [X86] Remove 'else' after 'return'. NFCCraig Topper2017-01-311-1/+1
| | | | llvm-svn: 293589
* [X86][SSE] Fix unsigned <= 0 warning in assert. NFCI.Simon Pilgrim2017-01-301-2/+2
| | | | | | Thanks to @mkuper llvm-svn: 293561
* [X86][SSE] Generalize the number of decoded shuffle inputs. NFCI.Simon Pilgrim2017-01-301-22/+8
| | | | | | | | combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs. This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns. llvm-svn: 293560
* [X86][SSE] Add support for combining PINSRW+ASSERTZEXT+PEXTRW patterns with ↵Simon Pilgrim2017-01-301-0/+20
| | | | | | target shuffles llvm-svn: 293500
* [X86][MCU] Minor bug fix for r293469 + test caseAsaf Badouh2017-01-301-1/+1
| | | | llvm-svn: 293478
* [X86][MCU] replace select with bit manipulation instead of branchesAsaf Badouh2017-01-301-2/+41
| | | | | | | | | Differential Revision: https://reviews.llvm.org/D28354 llvm-svn: 293469
* [AVX-512] Don't reuse VSHLI/VSRLI for mask register shifts. VSHLI/VSHRI ↵Craig Topper2017-01-301-21/+21
| | | | | | shift within elements while KSHIFT moves whole elements. llvm-svn: 293448
* [AVX-512] Fix lowering for mask register concatenation with undef in the ↵Craig Topper2017-01-291-1/+1
| | | | | | | | lower half. Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width. llvm-svn: 293446
* [X86][SSE] Lower scalar_to_vector(0) to zero vectorSimon Pilgrim2017-01-291-4/+15
| | | | | | | | | | Replaces an xor+movd/movq with an xorps which will be shorter in codesize, avoid an int-fpu transfer, allow modern cores to fast path the result during decode and helps other combines recognise an all-zero vector. The only reason I can think of that we'd want to keep scalar_to_vector in this case is to help recognise the upper elts are undef but this doesn't seem to be a problem. Differential Revision: https://reviews.llvm.org/D29097 llvm-svn: 293438
* [X86 Codegen] Fixed a bug in unsigned saturationElena Demikhovsky2017-01-291-23/+1
| | | | | | | | | | | PACKUSWB converts Signed word to Unsigned byte, (the same about DW) and it can't be used for umin+truncate pattern. AVX-512 VPMOVUS* instructions fit the pattern since they convert Unsigned to Unsigned. See https://llvm.org/bugs/show_bug.cgi?id=31773 Differential Revision: https://reviews.llvm.org/D29196 llvm-svn: 293431
* [X86] Fix vector ANDN matching to work correctly when both inputs to the AND ↵Craig Topper2017-01-281-12/+7
| | | | | | are XORs. llvm-svn: 293403
* [X86][SSE] Add support for combining ANDNP byte masks with target shufflesSimon Pilgrim2017-01-261-6/+26
| | | | llvm-svn: 293178
* [X86][SSE] Pull out target shuffle resolve code into helper. NFCI.Simon Pilgrim2017-01-261-14/+21
| | | | | | Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit. llvm-svn: 293175
* [AVX-512] Move the combine that runs combineBitcastForMaskedOp to the last ↵Craig Topper2017-01-261-1/+1
| | | | | | DAG combine phase where I had originally meant to put it. llvm-svn: 293157
* [X86] When bitcasting INSERT_SUBVECTOR/EXTRACT_SUBVECTOR to match masked ↵Craig Topper2017-01-261-2/+2
| | | | | | operations, use the correct type for the immediate operand. llvm-svn: 293156
* [X86][SSE] Add explicit braces to avoid -Wdangling-else warning.Martin Bohme2017-01-241-1/+2
| | | | | | | | | | Reviewers: RKSimon Subscribers: llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D29076 llvm-svn: 292924
* Fix unused variable warningSimon Pilgrim2017-01-241-2/+2
| | | | llvm-svn: 292921
* [X86][SSE] Add support for constant folding vector arithmetic shift by ↵Simon Pilgrim2017-01-241-6/+19
| | | | | | immediates llvm-svn: 292919
* [X86][SSE] Add support for constant folding vector logical shift by immediatesSimon Pilgrim2017-01-241-4/+20
| | | | llvm-svn: 292915
* [X86] Remove unnecessary peakThroughBitcasts call that's already take care ↵Craig Topper2017-01-241-2/+0
| | | | | | of by the ISD::isBuildVectorAllOnes check below. llvm-svn: 292894
* [X86] Don't split v8i32 all ones values if only AVX1 is available. Keep it ↵Craig Topper2017-01-241-22/+4
| | | | | | | | intact and split it at isel. This allows us to remove the check in ANDN combining that had to look through the extraction. llvm-svn: 292881
* [X86] Remove Undef handling from extractSubVector. This is now handled ↵Craig Topper2017-01-241-4/+0
| | | | | | inside getNode. llvm-svn: 292877
* [X86][SSE] Add missing X86ISD::ANDNP combines.Simon Pilgrim2017-01-221-0/+15
| | | | llvm-svn: 292767
* [X86][SSE] Improve shuffle combining with zero insertionsSimon Pilgrim2017-01-221-0/+9
| | | | | | Add support for handling shuffles with scalar_to_vector(0) llvm-svn: 292766
* [x86] avoid crashing with illegal vector type (PR31672)Sanjay Patel2017-01-221-14/+26
| | | | | | https://llvm.org/bugs/show_bug.cgi?id=31672 llvm-svn: 292758
* [X86] Don't allow commuting to form phsub operations.Craig Topper2017-01-211-2/+2
| | | | | | Fixes PR31714. llvm-svn: 292713
* [X86][SSE] Improve comments describing combineTruncatedArithmetic. NFCI.Simon Pilgrim2017-01-191-0/+7
| | | | llvm-svn: 292502
* [X86][SSE] Attempt to pre-truncate arithmetic operations that have already ↵Simon Pilgrim2017-01-191-6/+16
| | | | | | | | been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further llvm-svn: 292493
* Recommiting unsigned saturation with a bugfix.Elena Demikhovsky2017-01-191-0/+100
| | | | | | | A test case that crached is added to avx512-trunc.ll. (PR31589) llvm-svn: 292479
* [AVX-512] Support ADD/SUB/MUL of mask vectorsCraig Topper2017-01-191-18/+19
| | | | | | | | | | | | | | | | | Summary: Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND. We already do this for scalar i1 operations so I just extended it to vectors of i1. Reviewers: zvi, delena Reviewed By: delena Subscribers: guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D28888 llvm-svn: 292474
* [X86] Merge LowerADD and LowerSUB into a single LowerADD_SUB since they are ↵Craig Topper2017-01-191-13/+3
| | | | | | identical. llvm-svn: 292469
* Revert r291670 because it introduces a crash.Michael Kuperstein2017-01-181-97/+0
| | | | | | | | | r291670 doesn't crash on the original testcase from PR31589, but it crashes on a slightly more complex one. PR31589 has the new reproducer. llvm-svn: 292444
* Revert 292404 due to buildbot failures.Kirill Bobyrev2017-01-181-2/+2
| | | | llvm-svn: 292407
* [X86] Minor code cleanup to fix several clang-tidy warnings. NFCKirill Bobyrev2017-01-181-2/+2
| | | | llvm-svn: 292404
* [X86] Improve mul combine for negative multiplayer (2^c - 1)Michael Zuckerman2017-01-181-16/+31
| | | | | | | | | | | This patch improves the mul instruction combine function (combineMul) by adding new layer of logic. In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective. Differential Revision: https://reviews.llvm.org/D28232 llvm-svn: 292358
* Revert r291640 change to fold X86 comparison with atomic_load_add.Bob Wilson2017-01-171-22/+10
| | | | | | | | Even with the fix from r291630, this still causes problems. I get widespread assertion failures in the Swift runtime's WeakRefCount::increment() function. I sent a reduced testcase in reply to the commit. llvm-svn: 292242
* [AVX-512] Add support for taking a bitcast between a SUBV_BROADCAST and ↵Craig Topper2017-01-171-2/+18
| | | | | | VSELECT and moving it to the input of the SUBV_BROADCAST if it will help with using a masked operation. llvm-svn: 292201
* Fix blend mask by switch the side of the operand since Blend node uses ↵Michael Zuckerman2017-01-151-2/+2
| | | | | | opposite mask then Select NODE. llvm-svn: 292066
* [X86] Simplify the code that calculates a scaled blend mask. We don't need a ↵Craig Topper2017-01-141-2/+1
| | | | | | second loop. llvm-svn: 291996
* [AVX-512] Change blend mask in lowerVectorShuffleAsBlend to a 64-bit value. ↵Craig Topper2017-01-141-9/+9
| | | | | | | | Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there. This fixes a undefined sanitizer failure from r291888. llvm-svn: 291994
* Apply clang-tidy's performance-unnecessary-value-param to LLVM.Benjamin Kramer2017-01-131-5/+5
| | | | | | | With some minor manual fixes for using function_ref instead of std::function. No functional change intended. llvm-svn: 291904
* [X86][AVX512] Add support for variable ASHR v2i64/v4i64 support without VLXSimon Pilgrim2017-01-131-1/+1
| | | | | | | | | | Use v8i64 variable ASHR instructions if we don't have VLX. This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll. Differential Revision: https://reviews.llvm.org/D28604 llvm-svn: 291901
* [CodeGen] Rename MachineInstrBuilder::addOperand. NFCDiana Picus2017-01-131-41/+41
| | | | | | | | | | | Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891
* [X86][AVX512] Adding missing shuffle lowering to blend mask instructions Michael Zuckerman2017-01-131-1/+46
| | | | | | | | | | | | | | | Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) . In this patch, I added new pattern match for this case. Reviewers: 1. craig.topper 2. guyblank 3. RKSimon 4. igorb Differential Revision: https://reviews.llvm.org/D28483 llvm-svn: 291888
* [X86] Replace AND+IMM64 with SRL/SHLNikolai Bozhenov2017-01-121-7/+54
| | | | | | | | | | | | | | | | | | | | | | | Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result is unused and the mask has only higher/lower bits set. For example, with this patch LLVM emits shrq $41, %rdi je instead of movabsq $0xFFFFFE0000000000, %rcx testq %rcx, %rdi je This reduces number of instructions, code size and register pressure. The transformation is applied only for cases where the mask cannot be encoded as an immediate value within TESTQ instruction. Differential Revision: https://reviews.llvm.org/D28198 llvm-svn: 291806
* [X86] Tune bypassing of slow division for Intel CPUsNikolai Bozhenov2017-01-121-2/+2
| | | | | | | | | | | | | | | | | | 64-bit integer division in Intel CPUs is extremely slow, much slower than 32-bit division. On the other hand, 8-bit and 16-bit divisions aren't any faster. The only important exception is Atom where DIV8 is fastest. Because of that, the patch 1) Enables bypassing of 64-bit division for Atom, Silvermont and all big cores. 2) Modifies 64-bit bypassing to use 32-bit division instead of 16-bit one. This doesn't make the shorter division slower but increases chances of taking it. Moreover, it's much more likely to prove at compile-time that a value fits 32 bits and doesn't require a run-time check (e.g. zext i32 to i64). Differential Revision: https://reviews.llvm.org/D28196 llvm-svn: 291800
* [AVX-512] Improve lowering of zero_extend of v4i1 to v4i32 and v2i1 to v2i64 ↵Craig Topper2017-01-121-4/+4
| | | | | | with VLX, but no DQ or BW support. llvm-svn: 291747
* [AVX-512] Improve lowering of sign_extend of v4i1 to v4i32 and v2i1 to v2i64 ↵Craig Topper2017-01-121-11/+13
| | | | | | when avx512vl is available, but not avx512dq. llvm-svn: 291746
* [X86][AVX512] Fix PR31515 - Do not flip vselect condition if it's not a vXi1 ↵Elad Cohen2017-01-121-5/+8
| | | | | | | | | | | | | | | mask r289653 added a case where `vselect <cond> <vector1> <all-zeros>` is transformed to: `vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>` This was not aimed to catch cases where Cond is not a vXi1 mask but it does. Moreover, when Cond type is VxiN (N > 1) then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond). This patch changes the above to xor with allones, and avoids entering the case for non-mask Conds. llvm-svn: 291745
OpenPOWER on IntegriCloud