| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
| |
This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.
Differential Revision: https://reviews.llvm.org/D45203
llvm-svn: 335037
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
files. Remove remaining manual table entries from the tablegen emitter.
This adds an EVEX2VEXOverride string to the X86 instruction class in X86InstrFormats.td. If this field is set it will add manual entry in the EVEX->VEX tables that doesn't check the encoding information.
Then use this mechanism to map VMOVDU/A8/16, 128-bit VALIGN, and VPSHUFF/I instructions to VEX instructions.
Finally, remove the manual table from the emitter.
This has the bonus of fully sorting the autogenerated EVEX->VEX tables by their EVEX instruction enum value. We may be able to use this to do a binary search for the conversion and get rid of the need to create a DenseMap.
llvm-svn: 335018
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
VEX.W==1, but can be converted to their VEX equivalent that uses VEX.W==0.
EVEX makes heavy use of the VEX.W bit to indicate 64-bit element vs 32-bit elements. Many of the VEX instructions were split into 2 versions with different masking granularity.
The EVEX->VEX table generate can collapse the two versions if the VEX version uses is tagged as VEX_WIG. But if the VEX version is instead marked VEX.W==0 we can't combine them because we don't know if there is also a VEX version with VEX.W==1.
This patch adds a new VEX_W1X tag that indicates the EVEX instruction encodes with VEX.W==1, but is safe to convert to a VEX instruction with VEX.W==0.
This allows us to remove a bunch of manual EVEX->VEX table entries. We may want to look into splitting up the VEX_WPrefix field which would simplify the disassembler.
llvm-svn: 335017
|
|
|
|
|
|
| |
The instructions that use this class don't have another source register. So I think this was just marking one of the address operands as ReadAfterLd?
llvm-svn: 334994
|
|
|
|
|
|
|
|
| |
the emitter source.
Rather than having an exclusion list in tablegen sources, add a flag to the X86 instruction records that can be used to suppress checking for convertibility.
llvm-svn: 334971
|
|
|
|
|
|
|
|
|
|
| |
encodings to match gas and our EVEX instructions.
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.
Also remove the vpextrw.s EVEX alias. That's not something gas implements.
llvm-svn: 334922
|
|
|
|
|
|
|
|
|
|
| |
with reversed operands to InstAliases.
The .s assembly strings allow the reversed forms to be targeted from assembly which matches gas behavior. But when printing the instructions we should print them without the .s to match other tooling like objdump. By using InstAliases we can use the normal string in the instruction and just hide it from the assembly parser.
Ideally we'd add the .s versions to the legacy SSE and VEX versions as well for full compatibility with gas. Not sure how we got to state where only EVEX was supported.
llvm-svn: 334920
|
|
|
|
|
|
|
|
| |
tables.
Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table.
llvm-svn: 334898
|
|
|
|
|
|
|
|
|
|
| |
instructions.
VMOVPQIto64Zmr is not a 64-bit mode only instruction. But I don't know how to test this because VMOVPQIto64mr should always have priority over it in 32-bit mode since its only advantage is XMM16-XMM31 which aren't usable in 32-bit mode.
VMOVPQIto64Zrr is a 64-bit mode only instruction, but we don't need to explicitly mark it as such because it uses a GR64 register which won't parse in 32-bit mode.
llvm-svn: 334896
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Complementary patch to lowering sqrt intrinsics in Clang.
Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k
Reviewed By: craig.topper
Subscribers: tkrupa, mike.dvoretsky, llvm-commits
Differential Revision: https://reviews.llvm.org/D41599
llvm-svn: 334849
|
|
|
|
|
|
| |
consistency.
llvm-svn: 334785
|
|
|
|
|
|
| |
be consistent with other scalar instructions.
llvm-svn: 334727
|
|
|
|
|
|
| |
Some of these instructions are already in the manual folding table so we should have them in the auto table too.
llvm-svn: 334725
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and isel patterns (PR37551)
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.
This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll
Reviewers: RKSimon, gbedwell, spatel
Reviewed By: spatel
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D47993
llvm-svn: 334685
|
|
|
|
|
|
|
|
| |
NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator.
Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore.
llvm-svn: 334563
|
|
|
|
| |
llvm-svn: 334562
|
|
|
|
| |
llvm-svn: 334529
|
|
|
|
| |
llvm-svn: 334481
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from ffloor/fnearbyint/fceil/frint/ftrunc.
We were missing packed isel folding patterns for all of sse41, avx, and avx512.
For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns.
Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch.
Some of this was spotted in the review for D47993.
This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns.
llvm-svn: 334460
|
|
|
|
|
|
|
|
| |
Necessary for D46276 as even though btver2 doesn't use these instructions, its now flagged as complete so complains if ANY instruction isn't tagged.....
UnsupportedFeatures wouldn't help here as these instructions don't appear to have a feature predicate (like a lot of AVX512).
llvm-svn: 334423
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes most of the scheduling info for SKX vector operations.
I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM.
The before/after llvm-exegesis analysis are in the phabricator diff.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47721
llvm-svn: 334407
|
|
|
|
|
|
|
|
|
|
|
| |
The index size is represented by the letter after the 'v'. The number represents the memory size. If an 'x' appears after the number its means the index register can be from VR128X/VR256X instead of VR128/VR256.
As vy512mem uses a VR256X index it should have an x.
And vz256mem uses a VR512 index so it shouldn't have an x.
I admit these names kind of suck and are confusing.
llvm-svn: 334120
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The new rules are straightforward. The main rules to keep in mind
are:
1. NAME is an implicit template argument of class and multiclass,
and will be substituted by the name of the instantiating def/defm.
2. The name of a def/defm in a multiclass must contain a reference
to NAME. If such a reference is not present, it is automatically
prepended.
And for some additional subtleties, consider these:
3. defm with no name generates a unique name but has no special
behavior otherwise.
4. def with no name generates an anonymous record, whose name is
unique but undefined. In particular, the name won't contain a
reference to NAME.
Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.
The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.
Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.
Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.
Change-Id: I694095231565b30f563e6fd0417b41ee01a12589
Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D47430
llvm-svn: 333900
|
|
|
|
|
|
| |
This doesn't affect the assembly or disassembly, but is more accurate.
llvm-svn: 333822
|
|
|
|
|
|
|
|
|
|
| |
instructions so they can be assembled and disassembled.
These instructions are unusual in that they operate on 4 consecutive registers so supporting them in codegen will be more difficult than normal.
Includes an assembler check to warn if the source register is not the first register of a 4 register group.
llvm-svn: 333812
|
|
|
|
|
|
| |
value is a zero vector.
llvm-svn: 333800
|
|
|
|
|
|
|
|
| |
We only need the extractelt that corresponds to the register we're trying to insert back into. We can't guarantee the others haven't been optimized out depending on how those operands were produced.
So instead just look for an FR32/FR64 input and emit a COPY_TO_REGCLASS to VR128 in the output pattern. This matches what we do for ADD/SUB/MUL/DIV.
llvm-svn: 333473
|
|
|
|
| |
llvm-svn: 333464
|
|
|
|
|
|
|
|
| |
that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction.
The order should be controlled in the input pattern.
llvm-svn: 333463
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Introduction of mask scalar TableGen patterns.
2. Introduction of new scalar move TableGen patterns
and refactoring of existing ones.
3. Folding of pattern created by introducing scalar
masking in Clang header files.
Patch by tkrupa
Differential Revision: https://reviews.llvm.org/D47012
llvm-svn: 333419
|
|
|
|
|
|
|
|
|
|
| |
These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node.
This is a step towards reducing the number of intrinsics needed.
A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial.
llvm-svn: 333383
|
|
|
|
|
|
|
|
|
|
| |
to make masking pattern matching easier. Add extra patterns with bitcasts instead.
This basically reverts r280696 in favor of using extra patterns as mentioned as an alternative in that commit message. For now I've only added the cases we have test cases for, but it should be easy to add more in the future.
This will help to convert VPERMI2PS/VPERMT2PS intrinsics to use a single ISD node opcode. And hopefully allow some intrinsics to be removed.
llvm-svn: 333365
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.
Patch by Nikola Prica.
Differential Revision: https://reviews.llvm.org/D45204
llvm-svn: 333093
|
|
|
|
|
|
|
|
|
|
| |
BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737)
The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1:
SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM)
Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner)
llvm-svn: 332745
|
|
|
|
|
|
|
|
|
|
| |
WriteVecLoad/WriteVecStore scheduler classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc.
Fixes BtVer2/SLM which have different behaviours for GPR stores.
llvm-svn: 332718
|
|
|
|
|
|
|
|
|
|
| |
classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc.
Fixes BtVer2/SLM which have different behaviours for GPR stores.
llvm-svn: 332714
|
|
|
|
|
|
|
|
| |
FIXME comments.
The FIXME comments were about preventing load folding to avoid a partial xmm update. But these instructions use GPR as input when the load isn't folded. This won't help prevent a partial xmm update.
llvm-svn: 332573
|
|
|
|
|
|
| |
A lot of the models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first
llvm-svn: 332451
|
|
|
|
|
|
|
|
| |
BtVer2 - Fixes schedules for (V)CVTPS2PD instructions
A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first
llvm-svn: 332376
|
|
|
|
|
|
|
|
|
| |
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency
Allows us to remove the WriteCvtF2FSt conversion store class
llvm-svn: 332357
|
|
|
|
| |
llvm-svn: 332274
|
|
|
|
|
|
| |
intrinsics.
llvm-svn: 332271
|
|
|
|
|
|
| |
uitofp+insertelement instead.
llvm-svn: 332206
|
|
|
|
|
|
|
|
| |
instructions under AVX512.
This matches what we do for sint_to_fp.
llvm-svn: 332205
|
|
|
|
|
|
| |
instructions.
llvm-svn: 332189
|
|
|
|
|
|
| |
clang has used for a very long time.
llvm-svn: 332186
|
|
|
|
|
|
| |
Nothing uses this yet but this will allow us to specialize MMX/XMM/YMM/ZMM vector moves.
llvm-svn: 332090
|
|
|
|
| |
llvm-svn: 332079
|
|
|
|
|
|
|
|
|
|
|
|
| |
from r331958.
Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets.
So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened.
We may be able to drop some of the old patterns, but I leave that for a future patch.
llvm-svn: 332049
|
|
|
|
|
|
|
|
| |
WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes
Split off XMM classes from the default (MMX) classes.
llvm-svn: 331999
|