| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
These select the same instruction as the non-bitcasted pattern. So this provides no additional value.
llvm-svn: 315799
|
|
|
|
|
|
|
|
| |
extended VCVTPD2UDQZ128rr and VCVTTPD2UDQZ128rr.
We don't need a bitconvert as a root pattern in these cases. The types in the other parts of the pattern are sufficient to express the behavior of these instructions.
llvm-svn: 315798
|
|
|
|
|
|
|
|
|
|
| |
VCVTUDQ2PD.
This matches the patterns we have for the SSE/AVX version.
This is a prerequisite for D38714.
llvm-svn: 315797
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
available
This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations.
For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP.
We may be able to use this for AVX1 as well which would allow us to remove more isel patterns.
I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP.
Differential Revision: https://reviews.llvm.org/D38836
llvm-svn: 315768
|
|
|
|
|
|
|
|
| |
Prefer vbroadcastsd/vpbroadcastq instead.
There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table.
llvm-svn: 315674
|
|
|
|
|
|
|
|
| |
also checking presence of BWI instructions.
The EVEX->VEX pass probably obscures this.
llvm-svn: 315365
|
|
|
|
|
|
| |
This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass.
llvm-svn: 315274
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
versions of scalar arithmetic patterns
Summary:
We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD.
Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary.
Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd.
This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type.
The rest of the patch is removing all the excess scalar patterns.
Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38023
llvm-svn: 315181
|
|
|
|
|
|
|
|
|
|
|
| |
memory foldable"
This attribute will be used in a tablegen backend that generated the X86 memory folding tables which will be added in a future pass.
Instructions with this attribute unset will be excluded from the full set of X86 instructions available for the pass.
Differential Revision: https://reviews.llvm.org/D38027
llvm-svn: 315171
|
|
|
|
|
|
|
|
| |
Add isel patterns to make up for it.
This will allow for some flexibility in canonicalizing bitcasts around insert_subvector.
llvm-svn: 315160
|
|
|
|
|
|
|
|
|
| |
Patch to fix ternlog instructions with a folded
broadcast. The broadcast decorator, e.g. {1toX}, was missing.
Differential Revision: https://reviews.llvm.org/D38649
llvm-svn: 315122
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead of FR32/FR64
This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted.
This should fix PR33079
Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results.
Differential Revision: https://reviews.llvm.org/D38449
llvm-svn: 314914
|
|
|
|
|
|
| |
where the instruction already produces the correct register class.
llvm-svn: 314638
|
|
|
|
| |
llvm-svn: 314598
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
insert_subvector with zero after masked compares with fewer patterns with predicate
This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer.
This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends.
This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size.
Differential Revision: https://reviews.llvm.org/D38217
llvm-svn: 314133
|
|
|
|
|
|
| |
This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us.
llvm-svn: 314083
|
|
|
|
|
|
| |
commutable for the multiply operands.
llvm-svn: 314080
|
|
|
|
|
|
|
|
| |
instructions when VLX isn't available.
We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'.
llvm-svn: 314072
|
|
|
|
|
|
|
|
|
|
| |
selection to avoid duplicate patterns
Similar to what we do for X86ISD::SHRUNKBLEND just turn X86ISD::SELECT into ISD::VSELECT. This allows us to remove the duplicated TRUNC patterns.
Differential Revision: https://reviews.llvm.org/D38022
llvm-svn: 313644
|
|
|
|
|
|
|
|
| |
undef preserved source.
We canonicalize undef preserved sources to zero during intrinsic lowering.
llvm-svn: 313612
|
|
|
|
|
|
|
|
| |
real instruction.
It was used in patterns, but we had the exact same patterns with Unpckl as well. So now just use Unpckl in the instruction patterns.
llvm-svn: 313506
|
|
|
|
|
|
|
|
| |
For some reason the SSE1 pattern expected a X86Movlhps pattern to have a v4f32 type, but AVX and AVX512 expected it to have a v4i32 type.
I'm not even sure this pattern is even reachable post SSE1, but I'm starting with fixing this obvious bug.
llvm-svn: 313495
|
|
|
|
|
|
| |
Lowering doesn't emit these.
llvm-svn: 313492
|
|
|
|
|
|
| |
doesn't emit these.
llvm-svn: 313491
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add patterns for
fptoui <16 x float> to <16 x i8>
fptoui <16 x float> to <16 x i16>
Reviewers: igorb, delena, craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37505
llvm-svn: 312704
|
|
|
|
|
|
|
|
| |
X86ISD::MOVSD.
I don't think we ever generate these. If we did, I would expect we would also be able to generate v16f32 and v8f64, but we don't have those patterns.
llvm-svn: 312694
|
|
|
|
|
|
| |
This moves more of our subvector insert/extract tricks to X86InstrVecCompiler.td and refactors them into multiclasses.
llvm-svn: 312661
|
|
|
|
|
|
|
|
|
|
|
|
| |
patterns from SSE and AVX512
This patch moves some of similar non-instruction patterns from X86InstrSSE.td and X86InstrAVX512.td to a common file.
This is intended as a starting point. There are many other optimization patterns that exist in both files that we could move here.
Differential Revision: https://reviews.llvm.org/D37455
llvm-svn: 312649
|
|
|
|
|
|
| |
This matches what we already do for AVX512. The peephole pass makes up for this in most if not all cases. But this makes isel behavior for these consistent with every other instruction.
llvm-svn: 312613
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
FR32X)))) patterns
We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512.
With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128
The same thing can happen for AVX with vblendps and those separate patterns already exist.
For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too.
For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too.
So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register.
llvm-svn: 312564
|
|
|
|
|
|
|
|
| |
(v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64.
We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32.
llvm-svn: 312543
|
|
|
|
|
|
| |
had their patterns removed.
llvm-svn: 312520
|
|
|
|
|
|
|
|
| |
This reorders some patterns to get tablegen to detect them as duplicates. Tablegen only detects duplicates when creating variants for commutable operations. It does not detect duplicates between the patterns as written in the td file. So we need to ensure all the FMA patterns in the td file are unique.
This also uses null_frag to remove some other unneeded patterns.
llvm-svn: 312470
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
patterns.
This uses the capability introduced in r312464 to make SDNode patterns commutable on the first two operands.
This allows us to remove some of the extra FMA patterns that have to put loads and mask operands in different places to cover all cases. This even includes patterns that were missing to support match a load in the first operand with FMA4. Non-broadcast loads with masking for AVX512.
I believe this is causing us to generate some duplicate patterns because tablegen's isomorphism checks don't catch isomorphism between the patterns as written in the td. It only detects isomorphism in the commuted variants it tries to create. The the unmasked 231 and 132 memory forms are isomorphic as written in the td file so we end up keeping both. I think we precommute the 132 pattern to fix this.
We also need a follow up patch to go back to the legacy FMA3 instructions and add patterns to the 231 and 132 forms which we currently don't have.
llvm-svn: 312469
|
|
|
|
|
|
| |
register that I missed in r312450.
llvm-svn: 312459
|
|
|
|
|
|
|
|
| |
into a move instruction which will implicitly zero the upper elements.
Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed.
llvm-svn: 312450
|
|
|
|
|
|
|
|
| |
Previously we generated a register only pattern for each of the 3 instruction forms, but they are all identical as far as isel is concerned. So drop the others and just keep the 213 version.
This removes 2968 bytes from the isel table.
llvm-svn: 312313
|
|
|
|
| |
llvm-svn: 312312
|
|
|
|
| |
llvm-svn: 312311
|
|
|
|
|
|
| |
opportunities.
llvm-svn: 312310
|
|
|
|
|
|
|
|
|
|
|
|
| |
unless we're matching a masked op or broadcast
Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does.
Since there's no functional difference, just remove the extra patterns and save some isel table size.
Differential Revision: https://reviews.llvm.org/D36854
llvm-svn: 312138
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vbroadcastf32x2/vbroadcasti32x2
Summary:
This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern.
Fixes PR34357
We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch.
Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to? There's a 128-bit v2i32 integer broadcast but not an fp one.
Reviewers: aymanmus, zvi, igorb
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37286
llvm-svn: 312101
|
|
|
|
|
|
|
|
|
|
| |
a 512-bit register
This enables the use of a smaller encoding by using a VEX instruction when possible.
Differential Revision: https://reviews.llvm.org/D37092
llvm-svn: 312100
|
|
|
|
|
|
| |
the lowest subvector. This time with bitcasts between the vselect and the extract.
llvm-svn: 311856
|
|
|
|
|
|
|
|
|
|
| |
between the vselect and the extract_subvector. Remove the late DAG combine.
We used to do a late DAG combine to move the bitcasts out of the way, but I'm starting to think that it's better to canonicalize extract_subvector's type to match the type of its input. I've seen some cases where we've formed two different extract_subvector from the same node where one had a bitcast and the other didn't.
Add some more test cases to ensure we've also got most of the zero masking covered too.
llvm-svn: 311837
|
|
|
|
|
|
|
|
| |
extract_subvector of the lowest subvector.
This only supports 32 and 64 bit element sizes for now. But we could probably do 16 and 8-bit elements with BWI.
llvm-svn: 311821
|
|
|
|
|
|
|
|
|
|
| |
There's no reason to have a target specific node with the same semantics as a target independent opcode.
This should simplify D36335 so that it doesn't need to touch X86ISelDAGToDAG.cpp
Differential Revision: https://reviews.llvm.org/D36983
llvm-svn: 311568
|
|
|
|
|
|
|
|
|
|
| |
broadcasts when AVX512DQ is enabled.
There's no functional difference between the AVX512DQ instructions if we're not masking.
This change unifies test checks and removes extra isel entries. Similar was done for subvector insert and extracts recently.
llvm-svn: 311308
|
|
|
|
| |
llvm-svn: 311297
|
|
|
|
|
|
|
|
| |
predicate.
We can read the memoryVT and get its store size directly from the SDNode to check its alignment.
llvm-svn: 311265
|