| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
combine the mask results.
While the description for the instruction does mention OR, its talking about how the individual classification test results are ORed together.
The incoming mask is used as a zeroing write mask. If the bit is 1 the classification is written to the output. The bit is 0 the output is 0. This equivalent to an AND.
Here is pseudocode from the intrinsics guide
FOR j := 0 to 1
i := j*64
IF k1[j]
k[j] := CheckFPClass_FP64(a[i+63:i], imm8[7:0])
ELSE
k[j] := 0
FI
ENDFOR
k[MAX:2] := 0
llvm-svn: 326306
|
|
|
|
| |
llvm-svn: 326048
|
|
|
|
|
|
|
|
| |
masked move patterns.
This portion can be matched by other patterns. We don't need it to make the larger pattern valid. It's sufficient to have a v1i1 mask input without caring where it came from.
llvm-svn: 325999
|
|
|
|
|
|
|
|
| |
broacast.
Fixes PR31617
llvm-svn: 325957
|
|
|
|
|
|
|
|
|
|
|
|
| |
vector.
These can be created by type legalization promoting the inputs to select to match scalar boolean contents.
We were trying to pattern match them away during isel, but its better to just remove them from the DAG.
I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal.
llvm-svn: 325949
|
|
|
|
|
|
| |
do it in two places.
llvm-svn: 325546
|
|
|
|
|
|
|
|
|
|
| |
VPTESTM/VPTESTNM matching.
Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node.
This removes over 24000 bytes from the isel table.
llvm-svn: 325526
|
|
|
|
|
|
|
|
|
|
| |
other operand to the shorter encoding.
Previously we used the immediate encoding if the load was in operand 0 and the short encoding if the load was in operand 1.
This added an insane number of bytes to the size of the isel table. I'm wondering if we should always use the immediate form during isel and change to the short form during emission. This would remove the need to pattern match every combination for both the immediate form and the short form during isel. We could do the same with vpcmpgt
llvm-svn: 325456
|
|
|
|
|
|
|
|
| |
moves/loads/stores.
We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).
llvm-svn: 324905
|
|
|
|
|
|
|
|
|
|
| |
VMOVQ/VMOVHLPS/VMOVLHPS/VMOVHPD/VMOVHPS/VMOVLPD/VMOVLPS
Tag AVX512 variants to match SSE/AVX originals.
We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).
llvm-svn: 324901
|
|
|
|
|
|
|
|
| |
ISD::ADD implies individual vector element addition with no carries between elements. But for a vXi1 type that would be the same as XOR. And we already turn ISD::ADD into ISD::XOR for all vXi1 types during lowering. So the ISD::ADD pattern would never be able to match anyway.
KADD is different, it adds the elements but also propagates a carry between them. This just a way of doing an add in k-register without bitcasting to the scalar domain. There's still no way to match the pattern, but at least its not obviously wrong.
llvm-svn: 324861
|
|
|
|
|
|
|
|
| |
Remove combineBitcastForMaskedOp.
Add test cases for the merge masked versions to make sure we have all those covered.
llvm-svn: 324210
|
|
|
|
|
|
|
|
|
|
| |
patterns instead.
We always created X86ISD::SHUF128 with a 64-bit element type so we can use isel patterns to detect a bitconvert to 32-bit to handle masking.
The test changes are because we also match the bitconvert even if there is no masking. This leads to unnecessary isel pattern, but it requires more multiclass hackery in tablegen to get rid of it.
llvm-svn: 324205
|
|
|
|
|
|
| |
I meant to do this with the unaligned stores in r322820, but looks like I missed it.
llvm-svn: 323708
|
|
|
|
|
|
| |
eq/ne with immallzeros.
llvm-svn: 323612
|
|
|
|
|
|
| |
We can widen the mask and extract it back down.
llvm-svn: 323610
|
|
|
|
|
|
|
|
|
|
| |
pattern match the immediate value during isel.
Legalization is still biased to turn LT compares in to GT by swapping operands to avoid needing extra isel patterns to commute.
I'm hoping to remove TESTM/TESTNM next and this should simplify that by making EQ/NE more similar.
llvm-svn: 323604
|
|
|
|
|
|
|
|
| |
Previously we had to materialize all 1s in a register using vpternlog or pcmpeq and then xor with that. By using vpternlog directly we can do it in one operation.
This is implemented using isel patterns, but we should maybe consider creating a generalized vpternlog combiner.
llvm-svn: 323572
|
|
|
|
|
|
|
|
| |
(V)PEXTRW/(V)PINSRW
The weirdest being that PEXTRWrr was tagged as a memory operation.
llvm-svn: 323353
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
conversion
There are a couple tricky things with this patch.
I had to add an override of isVectorLoadExtDesirable to stop DAG combine from combining sign_extend with loads after legalization since we legalize sextload using a load+sign_extend. Overriding this hook actually prevents a lot sextloads from being created in the first place.
I also had to add isel patterns because DAG combine blindly combines sign_extend+truncate to a smaller sign_extend which defeats what legalization was trying to do.
Differential Revision: https://reviews.llvm.org/D42407
llvm-svn: 323301
|
|
|
|
|
|
|
|
| |
instructions to get them picked up by the scheduler model regexs.
All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up.
llvm-svn: 323261
|
|
|
|
|
|
| |
Add missing patterns for inserting v1i1 into a zero vector. Use insert_subvector to zero upper bits before inserting an element into a vXi1 vector. Replace kshift based isel pattern with insert_subvector based pattern now that code that caused the pattern has been fixed to emit insert_subvector.
llvm-svn: 323173
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains).
2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings.
3. BreakFalseDeps - Breaks false dependencies.
4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks.
This also included the following changes to ExcecutionDepsFix original logic:
1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class.
2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class.
Additional changes in affected files:
1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate.
2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate.
Additional refactoring changes will follow.
This commit is (almost) NFC.
The only functional change is that now BreakFalseDeps will break dependency for all register classes.
Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet.
In a future commit several instructions (and tests) will be added.
This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869
Most of the patches are intended at refactoring the existent code.
Additional relevant reviews:
https://reviews.llvm.org/D40331
https://reviews.llvm.org/D40332
https://reviews.llvm.org/D40333
https://reviews.llvm.org/D40334
Differential Revision: https://reviews.llvm.org/D40330
Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275
llvm-svn: 323087
|
|
|
|
|
|
|
|
| |
consistency with loads.
Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads.
llvm-svn: 322820
|
|
|
|
|
|
|
|
| |
integer vector loads.
These patterns were just looking for a vXi64 bitcasted to vXi32, but there is no advantage to using vmovdqa32 over vmovdqa64.
llvm-svn: 322819
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is similar to https://reviews.llvm.org/D41983.
Reviewers: gchatelet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42069
llvm-svn: 322486
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For example, VSQRTSDZr and VSQRTSSZr were missing the predicate.
Also fix braces indentation and braces for consistency.
Reviewers: craig.topper, RKSimon
Suscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41983
llvm-svn: 322478
|
|
|
|
|
|
| |
types have the same number of elements.
llvm-svn: 322455
|
|
|
|
|
|
|
|
| |
Extend vXi1 conditions of vXi8/vXi16 selects even before type legalization gets a chance to split wide vectors. Previously we would only extend 128 and 256 bit vectors. But if we start with a 512 bit vector or wider that needs to be split we wouldn't extend until after the split had taken place. By extending early we improve the results of type legalization.
Don't widen condition of 128/256 bit vXi16/vXi8 selects when we have BWI but not VLX. We can still use a mask register by widening the select to 512-bits instead. This is similar to what we do for compares already.
llvm-svn: 322450
|
|
|
|
|
|
|
|
| |
v16i32/v8i64.
We have custom lowering on vzext that produces a vselect and a build vector. So zext never gets to isel.
llvm-svn: 322381
|
|
|
|
|
|
|
|
| |
If the index is v2i64 we can use the scatter instruction that has v4i32/v4f32 data register, v2i64 index, and v2i1 mask. Similar was already done for gather.
Implement custom widening for v2i32 data to remove the code that reverses type legalization during lowering.
llvm-svn: 322254
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
patterns.
The pattern was this
def : Pat<(i32 (zext (i8 (bitconvert (v8i1 VK8:$src))))),
(MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit))>, Requires<[NoDQI]>;
but if you just let (i32 (zext X)) match byte itself you'll get MOVZX32rr8. And if you let (i8 (bitconvert (v8i1 VK8:$src))) match by itself you'll get (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit).
So we can just let isel do the two patterns naturally.
llvm-svn: 322049
|
|
|
|
|
|
| |
CVT2MASK is just checking the sign bit which can be represented with a comparison with zero.
llvm-svn: 321985
|
|
|
|
|
|
| |
128/256-bit compares when VLX is not available.
llvm-svn: 321984
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.
It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.
This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.
We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.
I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.
There's definitely room for improvement with some follow up patches.
Reviewers: RKSimon, zvi, guyblank
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41560
llvm-svn: 321967
|
|
|
|
|
|
|
|
| |
matcher table.
This is also needed to fix PR35837.
llvm-svn: 321946
|
|
|
|
| |
llvm-svn: 321945
|
|
|
|
|
|
|
|
| |
isel pattern that only existed for the assembler. Use VCVTTSD2SIrrb_Int instead.
For consistency use the _Int version of VCVTTSD2SIrr_Int and VCVTTSD2SIrm_Int for the assembler as well.
llvm-svn: 321944
|
|
|
|
|
|
|
|
|
|
| |
assembler matcher table
We should always prefer the VEX encoded version of these instructions. There is no advantage to the EVEX version.
Fixes PR35837.
llvm-svn: 321939
|
|
|
|
|
|
| |
This makes the names consistent with the mnemonics like every other instruction.
llvm-svn: 321931
|
|
|
|
|
|
|
|
| |
instructions.
This matches their VEX equivalents.
llvm-svn: 321912
|
|
|
|
|
|
|
|
|
|
| |
instructions as well.
Without this we allow "vmovd %rax, %xmm0", but not "vmovd %rax, %xmm16"
This exists due to continue a silly bug where really old versions of the GNU assembler required movd instead of movq on these instructions. This compatibility hack then crept forward to avx version too, but we didn't propagate it to avx512.
llvm-svn: 321903
|
|
|
|
|
|
| |
registers to implement 128/256-bit operations without VLX.
llvm-svn: 321613
|
|
|
|
|
|
|
|
| |
false input being zero.
We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately.
llvm-svn: 321612
|
|
|
|
|
|
|
|
| |
don't have DQI.
We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns.
llvm-svn: 321598
|
|
|
|
|
|
| |
This is better handled by a DAG combine if its not already being done. No lit tests fail from the removal of these patterns.
llvm-svn: 321597
|
|
|
|
|
|
| |
I don't think anything would actually expect the other bits to be zero.
llvm-svn: 321596
|
|
|
|
|
|
|
|
| |
natively.
We should only be creating natively supported kshifts now.
llvm-svn: 321577
|
|
|
|
|
|
|
|
| |
This allows us to remove some isel patterns.
This is mostly NFC, but we now use KSHIFTB instead of KSHIFTW with DQI.
llvm-svn: 321576
|
|
|
|
|
|
| |
The custom lowering already widens the result type to 512-bits if VLX isn't supported.
llvm-svn: 321533
|