| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
All default to NoItinerary
llvm-svn: 319326
|
|
|
|
| |
llvm-svn: 319316
|
|
|
|
|
|
| |
instruction scheduler classes
llvm-svn: 319312
|
|
|
|
|
|
|
|
|
|
| |
legal. Fix infinite loop in op legalization when promotion requires 2 steps.
Previously we had an isel pattern to add the truncate. Instead use Promote to add the truncate to the DAG before isel.
The Promote legalization code had to be updated to prevent an infinite loop if promotion took multiple steps because it wasn't remembering the previously tried value.
llvm-svn: 319259
|
|
|
|
| |
llvm-svn: 319143
|
|
|
|
|
|
|
|
| |
SSE_PACK/SSE_PMADD schedule classes
llvm-svn: 319065
|
|
|
|
| |
llvm-svn: 319045
|
|
|
|
|
|
|
|
|
|
| |
As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general).
Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes.
Differential Revision: https://reviews.llvm.org/D40351
llvm-svn: 319016
|
|
|
|
|
|
|
|
|
|
| |
galois field arithmetic (GF(2^8)) insns:
gf2p8affineinvqb
gf2p8affineqb
gf2p8mulb
Differential Revision: https://reviews.llvm.org/D40373
llvm-svn: 318993
|
|
|
|
|
|
|
|
|
|
| |
2/3
vpshufbitqmb encoding
3/3
vpshufbitqmb intrinsics
Differential Revision: https://reviews.llvm.org/D40222
llvm-svn: 318904
|
|
|
|
|
|
|
|
| |
Now we consistently represent the mask result without relying on isel ignoring it.
We now have a more general SDNode and type constraints to represent these nodes in isel patterns. This allows us to present both both vXi1 and XMM/YMM mask types with a single set of constraints.
llvm-svn: 318821
|
|
|
|
|
|
|
| |
vpopcnt{b,w}
Differential Revision: https://reviews.llvm.org/D40213
llvm-svn: 318748
|
|
|
|
|
|
|
|
|
| |
Introducing Vector Neural Network Instructions, consisting of:
vpdpbusd{s}
vpdpwssd{s}
Differential Revision: https://reviews.llvm.org/D40208
llvm-svn: 318746
|
|
|
|
|
|
|
|
|
|
|
| |
introducing vbmi2, consisting of
vpcompress{b,w}
vpexpand{b,w}
vpsh{l,r}d{w,d,q}
vpsh{l,r}dv{w,d,q}
Differential Revision: https://reviews.llvm.org/D40206
llvm-svn: 318745
|
|
|
|
|
|
|
| |
an icelake promotion of pclmulqdq
Differential Revision: https://reviews.llvm.org/D40101
llvm-svn: 318741
|
|
|
|
|
|
|
| |
an icelake promotion of AES
Differential Revision: https://reviews.llvm.org/D40078
llvm-svn: 318740
|
|
|
|
|
|
| |
Also fix the memop in the ins for these instructions. Not sure what effect this has.
llvm-svn: 318624
|
|
|
|
|
|
| |
sse_load_f32/f64.
llvm-svn: 318623
|
|
|
|
|
|
|
|
|
|
| |
mask instead of a VK4 mask.
This allows us to remove extra extend creation during lowering and more accurately reflects the semantics of the instruction.
While there add an extra output VT to X86 masked gather node to better match the isel pattern predicate. Currently we're exploiting the fact that the isel table doesn't count how many output results a node actually has if the result type of any can be inferred from the first result and the type constraints defined in tablegen. I think we might ultimately want to lower all MGATHER/MSCATTER to an X86ISD node with the extra mask result and stop relying on this hole in the isel checking.
llvm-svn: 318278
|
|
|
|
|
|
| |
vrcp14ss/sd, rsqrt14ss/sd instructions.
llvm-svn: 318022
|
|
|
|
|
|
| |
intrinsics.
llvm-svn: 318019
|
|
|
|
|
|
| |
sse_load_f32/sse_load_f64 to increase load folding opportunities.
llvm-svn: 318016
|
|
|
|
| |
llvm-svn: 318009
|
|
|
|
|
|
|
|
| |
and without the rounding operand. NFCI
I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations.
llvm-svn: 318007
|
|
|
|
|
|
| |
This fixes a bug where we selected packed instructions for scalar intrinsics.
llvm-svn: 317999
|
|
|
|
|
|
|
|
| |
when avx512vl is enabled.
This matches what we do for scalar and 512-bit types.
llvm-svn: 317991
|
|
|
|
|
|
|
|
|
|
| |
rename the existing versions to _Int.
This is consistent with out normal implementation of scalar instructions.
While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice.
llvm-svn: 317977
|
|
|
|
| |
llvm-svn: 317975
|
|
|
|
| |
llvm-svn: 317974
|
|
|
|
| |
llvm-svn: 317973
|
|
|
|
|
|
| |
No existing processor has both so it doesn't really matter what we do here. But we were previously just relying on pattern order which gave FMA4 priority.
llvm-svn: 317775
|
|
|
|
|
|
| |
instructions. Remove bad pattern that had vf432 vcvtps2ph storing 128-bits.
llvm-svn: 317662
|
|
|
|
| |
llvm-svn: 317548
|
|
|
|
|
|
|
|
| |
intrinsics.
Looks like there's some missed load folding opportunities for i64 loads.
llvm-svn: 317544
|
|
|
|
|
|
|
|
| |
COPY_TO_REGCLASS.
ExeDepsFix pass should take care of making the registers match.
llvm-svn: 317542
|
|
|
|
|
|
|
|
|
|
| |
The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available.
Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns.
All known CPUs with AVX512 have F16C so this should safe for now.
llvm-svn: 317521
|
|
|
|
|
|
| |
Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers.
llvm-svn: 317453
|
|
|
|
|
|
| |
Fixes PR35161.
llvm-svn: 317445
|
|
|
|
|
|
| |
Other patterns had higher priority so this wasn't noticed. But we shouldn't be dependent on pattern order.
llvm-svn: 317442
|
|
|
|
|
|
| |
missed in r317413.
llvm-svn: 317441
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SSE rcp/rsqrt intrinsics when AVX512 features are enabled.
Summary:
AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement.
Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt.
I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed.
As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here.
This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions.
Going forward I think our focus should be on
-Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14.
-Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER
-Supporting double precision.
Reviewers: zvi, DavidKreitzer, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39583
llvm-svn: 317413
|
|
|
|
|
|
| |
to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible.
llvm-svn: 317299
|
|
|
|
|
|
|
|
|
|
| |
patterns so they don't get created with a v64i8 type.
Not sure why tablegen didn't error on this.
Fixes PR35158.
llvm-svn: 317079
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
extract subvector of vXi1 from vYi1 is poorly supported by LLVM and most of the time end with an assertion.
This patch fixes this issue by adding new patterns to the TD file.
Reviewers:
1. guyblank
2. igorb
3. zvi
4. ayman
5. craig.topper
Differential Revision: https://reviews.llvm.org/D39292
Change-Id: Ideb4d7e946c8d40cfce2920891f2d89fe64c58f8
llvm-svn: 316981
|
|
|
|
|
|
|
|
| |
Remove AssertZext and instead add PEXTRW/PEXTRB support to computeKnownBitsForTargetNode to simplify instruction selection.
Differential Revision: https://reviews.llvm.org/D39169
llvm-svn: 316336
|
|
|
|
| |
llvm-svn: 316296
|
|
|
|
|
|
| |
This should be NFC. Will be used in future patches to fix disassembler bugs.
llvm-svn: 316284
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
lowering.
Summary:
This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes.
There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8.
Reviewers: RKSimon, zvi, delena
Reviewed By: delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38714
llvm-svn: 315860
|
|
|
|
| |
llvm-svn: 315802
|
|
|
|
| |
llvm-svn: 315800
|