| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
Also add the ability to recognise PINSR(Vex, 0, Idx).
Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat.
The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch.
llvm-svn: 293627
|
|
|
|
| |
llvm-svn: 293589
|
|
|
|
|
|
| |
Thanks to @mkuper
llvm-svn: 293561
|
|
|
|
|
|
|
|
| |
combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs.
This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns.
llvm-svn: 293560
|
|
|
|
|
|
| |
target shuffles
llvm-svn: 293500
|
|
|
|
| |
llvm-svn: 293478
|
|
|
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28354
llvm-svn: 293469
|
|
|
|
|
|
| |
shift within elements while KSHIFT moves whole elements.
llvm-svn: 293448
|
|
|
|
|
|
|
|
| |
lower half.
Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width.
llvm-svn: 293446
|
|
|
|
|
|
|
|
|
|
| |
Replaces an xor+movd/movq with an xorps which will be shorter in codesize, avoid an int-fpu transfer, allow modern cores to fast path the result during decode and helps other combines recognise an all-zero vector.
The only reason I can think of that we'd want to keep scalar_to_vector in this case is to help recognise the upper elts are undef but this doesn't seem to be a problem.
Differential Revision: https://reviews.llvm.org/D29097
llvm-svn: 293438
|
|
|
|
|
|
|
|
|
|
|
| |
PACKUSWB converts Signed word to Unsigned byte, (the same about DW) and it can't be used for umin+truncate pattern.
AVX-512 VPMOVUS* instructions fit the pattern since they convert Unsigned to Unsigned.
See https://llvm.org/bugs/show_bug.cgi?id=31773
Differential Revision: https://reviews.llvm.org/D29196
llvm-svn: 293431
|
|
|
|
|
|
| |
are XORs.
llvm-svn: 293403
|
|
|
|
| |
llvm-svn: 293178
|
|
|
|
|
|
| |
Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit.
llvm-svn: 293175
|
|
|
|
|
|
| |
DAG combine phase where I had originally meant to put it.
llvm-svn: 293157
|
|
|
|
|
|
| |
operations, use the correct type for the immediate operand.
llvm-svn: 293156
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: RKSimon
Subscribers: llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D29076
llvm-svn: 292924
|
|
|
|
| |
llvm-svn: 292921
|
|
|
|
|
|
| |
immediates
llvm-svn: 292919
|
|
|
|
| |
llvm-svn: 292915
|
|
|
|
|
|
| |
of by the ISD::isBuildVectorAllOnes check below.
llvm-svn: 292894
|
|
|
|
|
|
|
|
| |
intact and split it at isel.
This allows us to remove the check in ANDN combining that had to look through the extraction.
llvm-svn: 292881
|
|
|
|
|
|
| |
inside getNode.
llvm-svn: 292877
|
|
|
|
| |
llvm-svn: 292767
|
|
|
|
|
|
| |
Add support for handling shuffles with scalar_to_vector(0)
llvm-svn: 292766
|
|
|
|
|
|
| |
https://llvm.org/bugs/show_bug.cgi?id=31672
llvm-svn: 292758
|
|
|
|
|
|
| |
Fixes PR31714.
llvm-svn: 292713
|
|
|
|
| |
llvm-svn: 292502
|
|
|
|
|
|
|
|
| |
been extended
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further
llvm-svn: 292493
|
|
|
|
|
|
|
| |
A test case that crached is added to avx512-trunc.ll.
(PR31589)
llvm-svn: 292479
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND.
We already do this for scalar i1 operations so I just extended it to vectors of i1.
Reviewers: zvi, delena
Reviewed By: delena
Subscribers: guyblank, llvm-commits
Differential Revision: https://reviews.llvm.org/D28888
llvm-svn: 292474
|
|
|
|
|
|
| |
identical.
llvm-svn: 292469
|
|
|
|
|
|
|
|
|
| |
r291670 doesn't crash on the original testcase from PR31589,
but it crashes on a slightly more complex one.
PR31589 has the new reproducer.
llvm-svn: 292444
|
|
|
|
| |
llvm-svn: 292407
|
|
|
|
| |
llvm-svn: 292404
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves the mul instruction combine function (combineMul)
by adding new layer of logic.
In this patch, we are adding the ability to fold (mul x, -((1 << c) -1))
or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective.
Differential Revision: https://reviews.llvm.org/D28232
llvm-svn: 292358
|
|
|
|
|
|
|
|
| |
Even with the fix from r291630, this still causes problems. I get
widespread assertion failures in the Swift runtime's WeakRefCount::increment()
function. I sent a reduced testcase in reply to the commit.
llvm-svn: 292242
|
|
|
|
|
|
| |
VSELECT and moving it to the input of the SUBV_BROADCAST if it will help with using a masked operation.
llvm-svn: 292201
|
|
|
|
|
|
| |
opposite mask then Select NODE.
llvm-svn: 292066
|
|
|
|
|
|
| |
second loop.
llvm-svn: 291996
|
|
|
|
|
|
|
|
| |
Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there.
This fixes a undefined sanitizer failure from r291888.
llvm-svn: 291994
|
|
|
|
|
|
|
| |
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.
llvm-svn: 291904
|
|
|
|
|
|
|
|
|
|
| |
Use v8i64 variable ASHR instructions if we don't have VLX.
This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll.
Differential Revision: https://reviews.llvm.org/D28604
llvm-svn: 291901
|
|
|
|
|
|
|
|
|
|
|
| |
Rename from addOperand to just add, to match the other method that has been
added to MachineInstrBuilder for adding more than just 1 operand.
See https://reviews.llvm.org/D28057 for the whole discussion.
Differential Revision: https://reviews.llvm.org/D28556
llvm-svn: 291891
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) .
In this patch, I added new pattern match for this case.
Reviewers:
1. craig.topper
2. guyblank
3. RKSimon
4. igorb
Differential Revision: https://reviews.llvm.org/D28483
llvm-svn: 291888
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Emit SHRQ/SHLQ instead of ANDQ with a 64 bit constant mask if the result
is unused and the mask has only higher/lower bits set. For example, with
this patch LLVM emits
shrq $41, %rdi
je
instead of
movabsq $0xFFFFFE0000000000, %rcx
testq %rcx, %rdi
je
This reduces number of instructions, code size and register pressure.
The transformation is applied only for cases where the mask cannot be
encoded as an immediate value within TESTQ instruction.
Differential Revision: https://reviews.llvm.org/D28198
llvm-svn: 291806
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
64-bit integer division in Intel CPUs is extremely slow, much slower
than 32-bit division. On the other hand, 8-bit and 16-bit divisions
aren't any faster. The only important exception is Atom where DIV8
is fastest. Because of that, the patch
1) Enables bypassing of 64-bit division for Atom, Silvermont and
all big cores.
2) Modifies 64-bit bypassing to use 32-bit division instead of
16-bit one. This doesn't make the shorter division slower but
increases chances of taking it. Moreover, it's much more likely
to prove at compile-time that a value fits 32 bits and doesn't
require a run-time check (e.g. zext i32 to i64).
Differential Revision: https://reviews.llvm.org/D28196
llvm-svn: 291800
|
|
|
|
|
|
| |
with VLX, but no DQ or BW support.
llvm-svn: 291747
|
|
|
|
|
|
| |
when avx512vl is available, but not avx512dq.
llvm-svn: 291746
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
mask
r289653 added a case where `vselect <cond> <vector1> <all-zeros>`
is transformed to:
`vselect xor(cond, DAG.getConstant(1, DL, CondVT) <all-zeros> <vector1>`
This was not aimed to catch cases where Cond is not a vXi1
mask but it does. Moreover, when Cond type is VxiN (N > 1)
then xor(cond, DAG.getConstant(1, DL, CondVT) != NOT(cond).
This patch changes the above to xor with allones, and avoids
entering the case for non-mask Conds.
llvm-svn: 291745
|