| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
comi/ucomi/cvtss2si/cvtsd2si/cvttss2si/cvttsd2si/cvtsi2ss/cvtsi2sd instructions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ops."
This seems to be causing some performance regresions that I'm
trying to investigate.
One thing that stands out is that this transform can increase
the live range of the operands of the earlier logic op. This
can be bad for register allocation. If there are two logic
op inputs we should really combine the one that is closest, but
SelectionDAG doesn't have a good way to do that. Maybe we need
to do this as a basic block transform in Machine IR.
llvm-svn: 373401
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's room from improvement here, but this is a decent
starting point.
There are a few minor regressions in the vector-rotate tests,
where we are now forming a vpternlog from an and before we get
a chance to form it for a bitselect that we were matching
previously. This results in an AND and an ANDN feeding the
vpternlog where previously we just had an AND after the
vpternlog. I think we can probably DAG combine the AND with
the bitselect to get back to similar codegen.
llvm-svn: 373172
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-x86-experimental-vector-widening-legalization by default."
The assert that caused this to be reverted should be fixed now.
Original commit message:
This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.
This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.
Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.
llvm-svn: 368183
|
|
|
|
|
|
|
|
|
| |
This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810.
This commit broke the MSan buildbots. See
https://reviews.llvm.org/rL367901 for more information.
llvm-svn: 368107
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.
This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.
Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.
llvm-svn: 367901
|
|
|
|
|
|
|
|
| |
r362199 fixed it for zero masking, but not zero masking. The load
folding in the peephole pass hid the bug. This patch turns off
the peephole pass on the relevant test to ensure coverage.
llvm-svn: 362440
|
|
|
|
|
|
|
|
| |
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction
Differential Revision: https://reviews.llvm.org/D62807
llvm-svn: 362397
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits.
Similar to PR42079.
Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the
load pattern used in the instructions.
This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe
we should use VZMOVL for widened loads even when we don't need the upper bits
as zeroes?
llvm-svn: 362203
|
|
|
|
|
|
| |
starts as v2i32.
llvm-svn: 362202
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
extloads instead.
DAG combine will usually fold fpextend+load to an fp extload anyway. So the
256 and 512 patterns were probably unnecessary. The 128 bit pattern was special
in that it looked for a v4f32 load, but then used it in an instruction that
only loads 64-bits. This is bad if the load happens to be volatile. We could
probably make the patterns volatile aware, but that's more work for something
that's probably rare. The peephole pass might kick in and save us anyway. We
might also be able to fix this with some additional DAG combines.
This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be
used. Previously we looked for the unlikely vselect+fpextend+load.
llvm-svn: 362199
|
|
|
|
|
|
| |
vselect+extload.
llvm-svn: 362198
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
printing.
We require d/q suffixes on the memory form of these instructions to disambiguate the memory size.
We don't require it on the register forms, but need to support parsing both with and without it.
Previously we always printed the d/q suffix on the register forms, but it's redundant and
inconsistent with gcc and objdump.
After this patch we should support the d/q for parsing, but not print it when its unneeded.
llvm-svn: 360085
|
|
|
|
|
|
|
|
| |
For PCMPGT(0, X) patterns where we only demand the sign bit (e.g. BLENDV or MOVMSK) then we can use X directly.
Differential Revision: https://reviews.llvm.org/D57667
llvm-svn: 353051
|
|
|
|
|
|
|
|
| |
-x86-experimental-vector-widening legalization not combining vpacksswb+vpmovdw.
We are able to combine vpackuswb+vpmovdw, but we didn't have packsswb+vpmovdw at the time that combine was added.
llvm-svn: 348909
|
|
|
|
|
|
|
|
|
|
| |
I'm looking into whether we can make this the default legalization strategy. Adding these tests to help cover the changes that will be necessary.
This patch adds copies of some tests with the command line switch enabled. By making copies its easier to compare the two legalization strategies.
I've also removed RUN lines from some of these tests that already had -x86-experimental-vector-widening-legalization
llvm-svn: 346745
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.
I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.
For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.
Differential Revision: https://reviews.llvm.org/D54073
llvm-svn: 346595
|
|
|
|
|
|
|
|
|
|
|
|
| |
As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well.
Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets.
The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it.
Differential Revision: https://reviews.llvm.org/D53649
llvm-svn: 345256
|
|
|
|
|
|
|
|
| |
avx512-cvt.ll
This will cover the (v2i32 (setcc v2f32)) case in replaceNodeResults. That code shouldn't be needed at all in this mode. A future patch will skip it.
llvm-svn: 341171
|
|
|
|
|
|
|
|
|
|
|
|
| |
operations with AVX512F, but not AVX512DQ.
AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions.
Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences.
This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used.
llvm-svn: 337137
|
|
|
|
|
|
| |
Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway.
llvm-svn: 325534
|
|
|
|
|
|
|
|
|
|
| |
(v4i1 (setcc (v4f32)))
Undef VLX, getSetCCResultType returns v2i1/v4i1 for v2f32/v4f32 so default type legalization will end up changing the setcc result type back to vXi1 if it had been extended. The resulting extend gets messed up further by type legalization and is difficult to recombine back to (v4i32 (setcc (v4f32))) after legalization.
I went ahead and enabled this for SSE2 and later since its always the result we want and this helps type legalization get there in less steps.
llvm-svn: 324822
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
before type legalization.
This prevents extends of masks being introduced during lowering where it become difficult to combine them out.
There are a few oddities in here.
We sometimes concatenate two k-registers produced by two compares, sign_extend the combined pair, then extract two halves. This worked better previously because the sign_extend wasn't created until after the fp_to_sint was split which led to a split sign_extend being created.
We probably also need to custom type legalize (v2i32 (sext v2i1)) via widening.
llvm-svn: 324820
|
|
|
|
|
|
|
|
|
|
| |
upcoming patch.
The update script sometimes has trouble when there are check-prefixes representing every possible combination of feature flags. I have a patch where the update script was generating something that didn't pass lit.
This patch just removes some check-prefixes and expands out some of the checks to workaround this.
llvm-svn: 324819
|
|
|
|
|
|
| |
Strangely the code was already present, just the setOperationAction wasn't being called without VLX.
llvm-svn: 324806
|
|
|
|
|
|
|
|
|
|
| |
extend and a shift.
This avoids a constant pool load to create 1.
The int->float are showing converts to mask and back. We probably need to widen inputs to sint_to_fp/uint_to_fp before type legalization.
llvm-svn: 324805
|
|
|
|
|
|
|
|
| |
These used things like unsigned less than zero, which is always false because there is no unsigned number less than zero.
I plan to teach DAG combine to optimize these so need to stop using them.
llvm-svn: 324315
|
|
|
|
|
|
|
|
|
|
|
|
| |
Discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html
In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.
llvm-svn: 323922
|
|
|
|
|
|
|
|
|
|
| |
inserting into a vXi1 vector.
The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code.
This patch is a little larger than it should be due to differences between the DQI handling between the two today.
llvm-svn: 323212
|
|
|
|
|
|
| |
CVT2MASK is just checking the sign bit which can be represented with a comparison with zero.
llvm-svn: 321985
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.
It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.
This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.
We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.
I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.
There's definitely room for improvement with some follow up patches.
Reviewers: RKSimon, zvi, guyblank
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41560
llvm-svn: 321967
|
|
|
|
|
|
| |
result type would still be legal.
llvm-svn: 321638
|
|
|
|
| |
llvm-svn: 321632
|
|
|
|
|
|
| |
Currently we do a lot of scalarization in these test cases.
llvm-svn: 321631
|
|
|
|
|
|
| |
Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32.
llvm-svn: 321420
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Work towards the unification of MIR and debug output by refactoring the
interfaces.
For MachineOperand::print, keep a simple version that can be easily called
from `dump()`, and a more complex one which will be called from both the
MIRPrinter and MachineInstr::print.
Add extra checks inside MachineOperand for detached operands (operands
with getParent() == nullptr).
https://reviews.llvm.org/D40836
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/<def>//g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<def[ ]*,[ ]*dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ]*,[ ]*dead>/implicit-def dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g'
llvm-svn: 320022
|
|
|
|
|
|
|
|
|
|
| |
is not 512-bits and vlx is not enabled.
Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements.
If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate.
llvm-svn: 319740
|
|
|
|
|
|
|
|
|
|
| |
is not 512-bits and vlx is not enabled.
Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements.
If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate.
llvm-svn: 319728
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.
The MIR printer prints the IR name of a MBB only for block definitions.
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix
Differential Revision: https://reviews.llvm.org/D40422
llvm-svn: 319665
|
|
|
|
| |
llvm-svn: 319266
|
|
|
|
| |
llvm-svn: 319261
|
|
|
|
|
|
|
|
|
|
| |
legal. Fix infinite loop in op legalization when promotion requires 2 steps.
Previously we had an isel pattern to add the truncate. Instead use Promote to add the truncate to the DAG before isel.
The Promote legalization code had to be updated to prevent an infinite loop if promotion took multiple steps because it wasn't remembering the previously tried value.
llvm-svn: 319259
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.
* Only debug printing is affected. It now follows MIR.
Differential Revision: https://reviews.llvm.org/D40417
llvm-svn: 319187
|
|
|
|
|
|
|
|
|
|
| |
enabled. Use 128-bit VLX instruction when VLX is enabled.
Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior.
Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used?
llvm-svn: 316978
|
|
|
|
| |
llvm-svn: 316944
|
|
|
|
| |
llvm-svn: 316457
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
flag.NFC.
NFC.
Updated 8 regression tests to use -mattr instead of -mcpu flag as follows:
-mcpu=knl --> -mattr=+avx512f
-mcpu=skx --> -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq
The updates are as part of the preparation of a large commit to add all instruction scheduling for the SKX target.
Reviewers: delena, zvi, RKSimon
Differential Revision: https://reviews.llvm.org/D38222
Change-Id: I2381c9b5bb75ecacfca017243c22d054f6eddd14
llvm-svn: 314306
|
|
|
|
|
|
|
|
| |
unpcklpd for the packed single domain.
MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter.
llvm-svn: 313509
|
|
|
|
|
|
| |
Adding more tests for AVX512 fp<->int conversions that were missing.
llvm-svn: 312921
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add patterns for
fptoui <16 x float> to <16 x i8>
fptoui <16 x float> to <16 x i16>
Reviewers: igorb, delena, craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37505
llvm-svn: 312704
|