| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through.
llvm-svn: 351013
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pattern:
t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0>
t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0>
...shows up in PR33758:
https://bugs.llvm.org/show_bug.cgi?id=33758
...although this patch doesn't make any difference to the final result on that yet.
In the affected tests here, it looks like it just makes RA wiggle. But we might
as well squash this to prevent it interfering with other pattern-matching.
Differential Revision:
https://reviews.llvm.org/D56604
llvm-svn: 351008
|
| |
|
|
|
|
| |
Add additional vXi32 and vXi64 tests.
llvm-svn: 351003
|
| |
|
|
|
|
|
|
| |
Make use of vblendvpd to select on the signbit
Differential Revision: https://reviews.llvm.org/D56544
llvm-svn: 350999
|
| |
|
|
|
|
|
|
| |
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.
Differential Revision: https://reviews.llvm.org/D56544
llvm-svn: 350998
|
| |
|
|
|
|
|
|
|
|
| |
The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do.
This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there.
Fixes another instruction for PR34877.
llvm-svn: 350994
|
| |
|
|
|
|
|
|
|
|
| |
avx512dq, use v16i1 as the intermediate mask type instead of v8i1.
We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type.
By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned.
llvm-svn: 350989
|
| |
|
|
|
|
|
|
|
|
|
|
| |
the output has more elements than the input due to needing to be 128 bits.
We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask.
This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that.
This fixes some of PR34877
llvm-svn: 350985
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When legalizing the result of a SELECT_CC node by promoting the
floating-point type, use the promoted-to type rather than the original
type.
Fix PR40273.
Reviewers: efriedma, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56566
llvm-svn: 350951
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we limited this transform to cases where the
extraction into the build vector happens from vectors of
the same type as the build vector, but that's not required.
There's a slight potential regression seen in the AVX512
result for phadd -- we're using the 256-bit flavor of the
instruction now even though the 128-bit subset is sufficient.
The same problem could already be seen in the AVX2 result.
Follow-up patches will attempt to narrow that back down.
llvm-svn: 350928
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector.
We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions.
Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector.
This fixes some of the regressions from D350800.
llvm-svn: 350918
|
| |
|
|
|
|
|
|
|
|
| |
This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created.
This should help some of the regressions from D56387
Differential Revision: https://reviews.llvm.org/D56421
llvm-svn: 350875
|
| |
|
|
|
|
|
|
|
|
|
| |
When we use the partial-matching function on a 128-bit chunk, we must
account for the possibility that we've matched undef halves of the
original source vectors, so the outputs may need to be reset.
This should allow closing PR40243:
https://bugs.llvm.org/show_bug.cgi?id=40243
llvm-svn: 350830
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a partial fix for:
https://bugs.llvm.org/show_bug.cgi?id=40243
...as seen in the integer test, we still need to correct the result when using the
existing (old) horizontal op matching function because it does not model the way
x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op).
A potential follow-up change for that is discussed in the bug report - see also D56490.
This generally duplicates a lot of the existing matching code, but we can't just remove
that without introducing regressions, so the existing code is renamed and used less often.
Follow-ups may try to reduce that overlap.
Differential Revision: https://reviews.llvm.org/D56450
llvm-svn: 350826
|
| |
|
|
| |
llvm-svn: 350822
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
injecting VK32/VK64 references into the MachineIR
Summary:
This pass replaces GR8/GR16/GR32/GR64 with their equivalent sized mask register classes. But VK32/VK64 aren't legal without AVX512BW. Apparently this mostly appears to work if the register coalescer is able to remove the VK32/VK64 register class reference. Or if we don't ever spill it. But there's no guarantee of that.
Another Intel employee managed to trigger a crash due to this with ISPC. Unfortunately, I've lost the test case he sent me at the time. I'm trying to get him to reproduce it for me. I'd like to get this in before 8.0 branches since its a little scary.
The regressions here are unfortunate, but I think we can make some improvements to DAG combine, load folding, etc. to fix them. Just not sure if we can get that done for 8.0.
Fixes PR39741
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56460
llvm-svn: 350800
|
| |
|
|
| |
llvm-svn: 350745
|
| |
|
|
|
|
| |
Share prefixes whenever possible, use X86 instead of X32.
llvm-svn: 350722
|
| |
|
|
| |
llvm-svn: 350716
|
| |
|
|
| |
llvm-svn: 350707
|
| |
|
|
| |
llvm-svn: 350646
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
merging a src register in ToBeUpdated set.
This is to fix PR40061 related with https://reviews.llvm.org/rL339035.
In https://reviews.llvm.org/rL339035, live interval of source pseudo register
in rematerialized copy may be saved in ToBeUpdated set and its update may be
postponed.
In PR40061, %t2 = %t1 is rematerialized and %t1 is added into toBeUpdated set
to postpone its live interval update. After the rematerialization, the live
interval of %t1 is larger than necessary. Then %t1 is merged into %t3 and %t1
gets removed. After the merge, %t3 contains live interval larger than necessary.
Because %t3 is not in toBeUpdated set, its live interval is not updated after
register coalescing and it will break some assumption in regalloc.
The patch requires the live interval of destination register in a merge to be
updated if the source register is in ToBeUpdated.
Differential revision: https://reviews.llvm.org/D55867
llvm-svn: 350586
|
| |
|
|
|
|
|
|
| |
Replace with target independent funnel shift intrinsics."
The MSVC limit we hit on AutoUpgrade.cpp has been worked around for now.
llvm-svn: 350567
|
| |
|
|
|
|
|
|
| |
Replace with target independent funnel shift intrinsics."
The AutoUpgrade.cpp if/else cascade hit an MSVC limit again.
llvm-svn: 350562
|
| |
|
|
|
|
|
|
| |
independent funnel shift intrinsics.
Differential Revision: https://reviews.llvm.org/D56377
llvm-svn: 350554
|
| |
|
|
|
|
| |
Based off work for D55935
llvm-svn: 350548
|
| |
|
|
|
|
|
| |
These tests show missed optimizations and a miscompile
similar to PR40243 - https://bugs.llvm.org/show_bug.cgi?id=40243
llvm-svn: 350533
|
| |
|
|
|
|
|
|
| |
require a modulo.
Planning to replace these with funnel shift intrinsics which would mask out the extra bits. This will help minimize test diffs.
llvm-svn: 350504
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: AVX512VBMI2 supports a funnel shift by immediate and a funnel shift by a variable vector.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56361
llvm-svn: 350498
|
| |
|
|
|
|
| |
(truncate (v64i8)))) on KNL.
llvm-svn: 350481
|
| |
|
|
|
|
|
|
| |
input is a truncate from v16i8/v32i8.
This is especially helpful on targets without avx512bw since we don't have a good way to convert from v16i8/v32i8 to v16i1/v32i1 for the truncate anyway. If we're just going to convert it to a GPR we might as well use pmovmskb to accomplish both.
llvm-svn: 350480
|
| |
|
|
| |
llvm-svn: 350474
|
| |
|
|
|
|
| |
when -mprefer-vector-width-256 is in effect and BWI is not available.
llvm-svn: 350473
|
| |
|
|
|
|
|
|
| |
Patch by Xiang Zhang.
Differential Revision: https://reviews.llvm.org/D56080
llvm-svn: 350436
|
| |
|
|
|
|
|
|
|
|
| |
This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits.
My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common.
Differential Revision: https://reviews.llvm.org/D56283
llvm-svn: 350432
|
| |
|
|
|
|
| |
These are modified versions of the FP tests from rL349923.
llvm-svn: 350430
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The 1st try for this was at rL350369, but it caused IR-level diffs because
our cost models differentiate custom vs. legal/promote lowering. So that was
reverted at rL350373. The cost models were fixed independently at rL350403,
so this is effectively the same patch as last time.
Original commit message:
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.
We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.
We can extend this to integer ops in a follow-up patch.
Differential Revision: https://reviews.llvm.org/D56011
llvm-svn: 350421
|
| |
|
|
|
|
| |
Repeat of the generic SimplifyDemandedBits shift combine
llvm-svn: 350399
|
| |
|
|
|
|
| |
A future patch will combine logical shifts more aggressively.
llvm-svn: 350396
|
| |
|
|
|
|
|
|
|
|
| |
zero flag is used.
Doing this late so we will prefer to fold the AND into a masked comparison first. That can be better for the live range of the mask register.
Differential Revision: https://reviews.llvm.org/D56246
llvm-svn: 350374
|
| |
|
|
|
|
| |
There are non-codegen tests that need to be updated with this code change.
llvm-svn: 350373
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.
We need to do this late to not interfere with other pattern matching of larger
horizontal sequences.
We can extend this to integer ops in a follow-up patch.
Differential Revision: https://reviews.llvm.org/D56011
llvm-svn: 350369
|
| |
|
|
| |
llvm-svn: 350364
|
| |
|
|
| |
llvm-svn: 350362
|
| |
|
|
|
|
| |
This tests a case where we need to be able to compute sign bits for two insert_subvectors that is a liveout of a basic block. The result is then used as a boolean vector in another basic block.
llvm-svn: 350359
|
| |
|
|
| |
llvm-svn: 350358
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These are similar patterns, but when you throw AVX512 onto the pile,
the number of variations explodes. For FP, we really don't care about
AVX1 vs. AVX2 for FP ops. There may be some superficial shuffle diffs,
but that's not what we're testing for here, so I removed those RUNs.
Separating by type also lets us specify 'sse3' for the FP file vs. 'ssse3'
for the integer file...because x86.
llvm-svn: 350357
|
| |
|
|
| |
llvm-svn: 350356
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:
// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)
We want to have this in the DAG too because as we can see in some of the test diffs (reductions),
the pattern may not be visible in IR.
Given that this is already an IR canonicalization, any backend that would prefer a vector op over
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.
Differential Revision: https://reviews.llvm.org/D55722
llvm-svn: 350354
|
| |
|
|
| |
llvm-svn: 350338
|