| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
| |
reductions (PR32841)
Extension to D39729 which performed this for vXi16, with the same bit flipping to handle SMAX/SMIN/UMAX cases, vXi8 UMIN horizontal reductions can be performed.
This makes use of the fact that by performing a pair-wise i8 SHUFFLE/UMIN before PHMINPOSUW, we both get the UMIN of each pair but also zero-extend the upper bits ready for v8i16.
Differential Revision: https://reviews.llvm.org/D41294
llvm-svn: 321070
|
| |
|
|
|
|
|
|
| |
v16i16 instead.
BWI supports shifting by word amounts. Even if VLX isn't support we can still widen to v32i16 and extract the lower half. For SKX its preferrable to not use 512-bit vector if we can.
llvm-svn: 321059
|
| |
|
|
|
|
|
|
| |
iterating over every integer VT and checking their size.
Previously, we were checking for MVTs with sizes betwen 8 and 64 which only includes i8, i16, i32, and i64 today. But I don't think we should assume that and should list the types that are legal for x86. I also don't think we need i64 since type legalization is guaranteed to split those up.
llvm-svn: 321058
|
| |
|
|
|
|
| |
I doubt there's any way to create a ashr for an FP type.
llvm-svn: 321057
|
| |
|
|
|
|
|
|
| |
zero vector.
Pretty sure these are handled by a target independent DAG combine that turns them into undef these days.
llvm-svn: 321056
|
| |
|
|
|
|
|
|
|
|
| |
for a non-uniform shift.
My reading of the SDM says that all bits of the shift amount are used. If the value of the element is larger than the number of bits the result the shift result is zero. So I think we need to zero_extend here to avoid garbage in the upper bits.
In reality we lower any_extend as zero_extend so in most cases it would be hard to hit this.
llvm-svn: 321055
|
| |
|
|
|
|
| |
This recommits the change from r321026. I have a fix for the lld test now.
llvm-svn: 321038
|
| |
|
|
|
|
|
|
|
|
|
| |
Note:
- X86ISelLowering: setLibcallName(SINCOS) was superfluous as
InitLibcalls() already does it.
- ARMISelLowering: Setting libcallnames for sincos/sincosf seemed
superfluous as in the darwin case it wouldn't be used while for all
other cases InitLibcalls already does it.
llvm-svn: 321036
|
| |
|
|
| |
llvm-svn: 321035
|
| |
|
|
|
|
|
|
| |
empty CPU string." while I investigate how to fix an lld test failure.
Looks like lld also needs to pass a -mcpu in some of its tests
llvm-svn: 321033
|
| |
|
|
|
|
|
|
|
|
| |
Update tests to force a CPU with NOPL
Empty string should be equivalent to "generic" which doesn't allow NOPL. Force tests to use specificy 'pentiumpro' to guarantee NOPL.
Fixes PR35686
llvm-svn: 321026
|
| |
|
|
|
|
|
|
| |
recently.
The block I moved things that need BWI and 512-bit or VLX is incorrectly qualified with just hasBWI || hasVLX. Here I've qualified it with hasBWI && (hasAVX512 || hasVLX) where the hasAVX512 will be replaced with allowing 512-bit vectors in an upcoming patch.
llvm-svn: 320957
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
inserts functional and add tests.
Summary:
We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2.
The missing coverage was noticed during the review of D40335.
Reviewers: RKSimon, zvi, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41133
llvm-svn: 320950
|
| |
|
|
| |
llvm-svn: 320941
|
| |
|
|
|
|
| |
using a SmallVector that really only ever has one element as a set.
llvm-svn: 320940
|
| |
|
|
|
|
| |
extractions.
llvm-svn: 320937
|
| |
|
|
|
|
| |
This allows us to remove some isel patterns that allowed MVT::i8 result type.
llvm-svn: 320936
|
| |
|
|
|
|
|
|
| |
and allow that to be legaized to VEXTRACT.
I think we can remove the VEXTRACT node completely and use a canonicalized EXTRACT_VECTOR_ELT instead. This is a first step.
llvm-svn: 320935
|
| |
|
|
| |
llvm-svn: 320934
|
| |
|
|
|
|
|
|
| |
Assuming we can safely adjust the broadcast index for the new type to keep it suitably aligned, then peek through BITCASTs when looking for the broadcast source.
Fixes PR32007
llvm-svn: 320933
|
| |
|
|
| |
llvm-svn: 320932
|
| |
|
|
|
|
| |
Strip excess BITCASTs from EXTRACT_SUBVECTOR input
llvm-svn: 320930
|
| |
|
|
|
|
|
|
| |
getVectorMaskingNode/getScalarMaskingNode when its going to emit an ISD::OR/ISD::AND. NFCI
In those cases, the pass thru operand of the methods isn't used. The calls to the scalar version were passing a MVT::i1 zero, which is an illegal type at the stage this code runs.
llvm-svn: 320928
|
| |
|
|
|
|
| |
instead of creating a select with one input being 0.
llvm-svn: 320927
|
| |
|
|
|
|
| |
Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half.
llvm-svn: 320926
|
| |
|
|
|
|
| |
We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions.
llvm-svn: 320924
|
| |
|
|
|
|
|
|
| |
scheduler models. Combine into single InstrRW entries.
The reduces the number of scheduler groups in subtarget info.
llvm-svn: 320923
|
| |
|
|
| |
llvm-svn: 320917
|
| |
|
|
|
|
|
|
| |
classes LZCNT/POPCNT.
I think when this instruction was first published it was only for a Knights CPU and thus VLX version was missing.
llvm-svn: 320910
|
| |
|
|
|
|
| |
Hopefully r320864 has fixed the offending case that failed the assert.
llvm-svn: 320898
|
| |
|
|
|
|
| |
The Function can never be nullptr so we can return a reference.
llvm-svn: 320884
|
| |
|
|
|
|
|
|
| |
for 32-bit mode.
This seemed to work due to a quirk in the X86 MC encoder that didn't emit a REX byte that the AND64ri8 implies when in 32-bit mode. This made the encoding the same as AND32ri8. I tried to add an assert to catch the dropped REX prefix that caught this.
llvm-svn: 320864
|
| |
|
|
|
|
|
|
| |
target specific nodes.
The target independent nodes will get legalized to the target specific nodes by their own legalization process. Someday I'd like to stop using a target specific for zero extends and truncates of legal types so the less places we reference the target specific opcode the better.
llvm-svn: 320863
|
| |
|
|
|
|
| |
When I wrote it I thought we were missing a potential optimization for KNL. But investigating further shows that for KNL we still do the optimal thing by widening to v4f32 and then using special isel patterns to widen again to zmm a register.
llvm-svn: 320862
|
| |
|
|
|
|
| |
It seems to be failing real code which is concerning, but we were silently getting away with it. I'll investigate further.
llvm-svn: 320850
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
non-constant index
Summary:
Currently we don't handle v32i1/v64i1 insert_vector_elt correctly as we fail to look at the number of elements closely and assume it can only be v16i1 or v8i1.
We also can't type legalize v64i1 insert_vector_elt correctly on KNL due to the type not being byte addressable as required by the legalizing through memory accesses path requires.
For the first issue, the patch now tries to pick a 512-bit register with the correct number of elements and promotes to that.
For the second issue, we now extend the vector to a byte addressable type, do the stores to memory, load the two halves, and then truncate the halves back to the original type. Technically since we changed the type, we may not need two loads, but actually checking that is more work and for the v64i1 case we do need them.
Reviewers: RKSimon, delena, spatel, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40942
llvm-svn: 320849
|
| |
|
|
|
|
|
|
| |
have memory and immediate operands.
The asm parser wasn't preventing these from being accepted in 32-bit mode. Instructions that use a GR64 register are protected by the parser rejecting the register in 32-bit mode.
llvm-svn: 320846
|
| |
|
|
|
|
| |
This instruction doesn't access memory. It juse use a similar looking memory encoding. Don't require Intel syntax to put "qword ptr" in front of it.
llvm-svn: 320845
|
| |
|
|
|
|
| |
These aren't doing anything due to a top level "let Predicates =". I think the GR32/GR64 register class protects these anyway.
llvm-svn: 320844
|
| |
|
|
|
|
| |
This has no effect due to a top level "let Predicates =" around the instructions. But its also not required because the GR64 usage in the instruction guarantees it can never match.
llvm-svn: 320843
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D40960
llvm-svn: 320837
|
| |
|
|
|
|
|
|
|
|
| |
assembler in 32-bit mode.
There was a top level "let Predicates =" in the .td file that was overriding the Requires on each instruction.
I've added an assert to the code emitter to catch more cases like this. I'm sure this isn't the only place where the right predicates aren't being applied. This assert already found that we don't block btq/btsq/btrq in 32-bit mode.
llvm-svn: 320830
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
debug output
Work towards the unification of MIR and debug output by printing
`%stack.0` instead of `<fi#0>`, and `%fixed-stack.0` instead of
`<fi#-4>` (supposing there are 4 fixed stack objects).
Only debug syntax is affected.
Differential Revision: https://reviews.llvm.org/D41027
llvm-svn: 320827
|
| |
|
|
|
|
|
|
| |
legalization if we have AVX512F, but not VLX. NFC
Previously we widened it using isel patterns.
llvm-svn: 320824
|
| |
|
|
|
|
|
|
| |
A couple places didn't use the same SDValue variables to connect everything all the way through.
I don't have a test case for a bug in insert into the lower bits of a non-zero, non-undef vector. Not sure the best way to create that. We don't create the case when lowering concat_vectors which is the main way to get insert_subvectors.
llvm-svn: 320790
|
| |
|
|
| |
llvm-svn: 320784
|
| |
|
|
|
|
|
|
|
|
|
|
| |
that require 512-bit registers OR VLX into separate sections. NFCI
We have several instructions that were introduced in AVX512F that are only available in 512-bit form on KNL. We still make use of them for 128/256 by artificially widening and extracting during isel.
This commit separates these operations from the true 512-bit operations. This way we can qualify the normal 512-bit operations with needing 512-bit register support. And these special operations will get qualified with needing 512-bit registers OR VLX.
The 512-bit register qualification will be introduced in a future patch this just gets everything grouped to minimize deltas on that patch.
llvm-svn: 320782
|
| |
|
|
|
|
|
|
| |
Previously they were sort of interleaved in with XMM/YMM/ZMM action related code.
Trying to separate things so its easier to split 512-bit vectors later.
llvm-svn: 320781
|
| |
|
|
|
|
|
|
| |
be custom lowering inserting v1i1 into v8i1 under this.
I don't have a test case at the moment. Just noticed while auditing things.
llvm-svn: 320780
|
| |
|
|
|
|
|
|
|
|
| |
in the X86ISelLowering constructor. NFCI
Move it into the separate hasVLX block later in the constructor.
I'm trying to separate 128/256 and 512-bit related code so we can eventually qualify the hasAVX512 block with support for 512-bit vectors required by the prefer-vector-width feature support being talked about in D41096.
llvm-svn: 320779
|