| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD
This makes the Y position consistent with other instructions.
This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.
llvm-svn: 328254
|
|
|
|
|
|
|
|
| |
assigned Port015 instead of Port01.
The VEC ADD and VEC MUL units aren't present on port 5 on SkylakeClient.
llvm-svn: 328241
|
|
|
|
|
|
| |
to as best as I understand how they are implemented.
llvm-svn: 328231
|
|
|
|
|
|
| |
Models were completely overriding all SSE42 strins instructions when the default classes could be used for exactly the same coverage.
llvm-svn: 328203
|
|
|
|
|
|
| |
Models were completely overriding all CLMUL instructions when the WriteCLMUL default classes could be used for exactly the same coverage.
llvm-svn: 328194
|
|
|
|
|
|
|
|
|
| |
Models were completely overriding all AES instructions when the WriteAES default classes could be used for exactly the same coverage.
Removes 6 unnecessary scheduler classes from every model.
Note: Still looking for a way for tblgen to warn when this is happening - often the override is more complete than the default.
llvm-svn: 328192
|
|
|
|
|
|
|
|
|
|
| |
SchedWriteRes group (NFCI) (PR35955)
I've also merged some VEX/non-VEX instregex strings with a (V?) prefix or (Y?) ymm variant - there are still a lot more of these to do.
This reduces the size of the optimized llc binary on my computer by 400K. Presumably because we went from 5000+ scheduler classes per CPU to ~2000.
llvm-svn: 328179
|
|
|
|
|
|
|
|
|
|
| |
llvm-exegesis.
Also restrict to port 0 and 1 for SkylakeClient. It looks like the scheduler models don't account for client not having a full vector ALU on port 5 like server.
Fixes PR36808.
llvm-svn: 328061
|
|
|
|
|
|
|
|
| |
Sandybridge/Haswell/Broadwell/Skylake scheduler models.
Move it from a load+store group on SNB to a load only group, the same group as CMP.
llvm-svn: 327944
|
|
|
|
|
|
|
|
| |
JRCXZ was already present, but not the others.
We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output.
llvm-svn: 327881
|
|
|
|
|
|
| |
models to match their 8-bit counterpart.
llvm-svn: 327874
|
|
|
|
|
|
| |
Correct SchedRW and itinerary for MOV32ri64.
llvm-svn: 327872
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults.
This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases.
I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific.
Differential Revision: https://reviews.llvm.org/D44612
llvm-svn: 327855
|
|
|
|
|
|
|
|
| |
match ADD8i8.
Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm.
llvm-svn: 327820
|
|
|
|
|
|
| |
expressions in scheduler models.
llvm-svn: 327816
|
|
|
|
|
|
|
|
| |
the scheduler models as much as possible. NFCI
This reduces the total number of generated scheduler classes from 5404 to 5316.
llvm-svn: 327815
|
|
|
|
| |
llvm-svn: 327787
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The information was so wildly inaccurate and incomplete its better to just remove it.
MMX_MASKMOVQ64 showed up twice in several scheduler models. In Haswell and Broadwell they were on adjacent lines. On Skylake the copies had different information.
MMX_MASKMOVQ and MASKMOVDQU were completely missing.
MMX_MASKMOVQ64 was listed on Haswell/Broadwell as 1 cycle on port 1 despite it being a store instruction.
Filed PR36780 to track fixing this right.
llvm-svn: 327783
|
|
|
|
|
|
|
|
|
|
|
|
| |
scheduler classes
As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types.
I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428.
Differential Revision: https://reviews.llvm.org/D44471
llvm-svn: 327630
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Only IMUL16rri uses an extra P0156. IMUL32* and IMUL16rr only use
P1.
This was computed using https://github.com/google/EXEgesis/blob/master/exegesis/tools/compute_itineraries.cc
This can easily be validated by running perf on the following code:
```
int main(int argc, char**argv) {
int a = argc;
int b = argc;
int c = argc;
int d = argc;
for (int i = 0; i < LOOP_ITERATIONS; ++i) {
asm volatile(
R"(
.rept 10000
imull $0x2, %%edx, %%eax
imull $0x2, %%ecx, %%ebx
imull $0x2, %%eax, %%edx
imull $0x2, %%ebx, %%ecx
.endr
)"
: "+a"(a), "+b"(b), "+c"(c), "+d"(d)
:
:);
}
return a+b+c+d;
}
```
-> test.cc
perf stat -x, -e cycles --pfm-events=uops_executed_port:port_0:u,uops_executed_port:port_1:u,uops_executed_port:port_2:u,uops_executed_port:port_3:u,uops_executed_port:port_4:u,uops_executed_port:port_5:u,uops_executed_port:port_6:u,uops_executed_port:port_7:u test
Reviewers: craig.topper, RKSimon, gadi.haber
Subscribers: llvm-commits, gchatelet, chandlerc
Differential Revision: https://reviews.llvm.org/D43460
llvm-svn: 326877
|
|
|
|
|
|
|
|
|
|
|
|
| |
to some of them in SkylakeClient model.
The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results.
This changes them all to use explicit instrs instead.
While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too.
llvm-svn: 323406
|
|
|
|
| |
llvm-svn: 323403
|
|
|
|
|
|
|
|
|
|
| |
consistency. NFC
MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation.
SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128.
AVX2 instructions should use a Y to indicate 256-bits.
llvm-svn: 323402
|
|
|
|
|
|
|
|
| |
expressions.
These were treated as optional suffixes, but the regular expressions are already prefix matches so this is unnecessary. It breaks the binary search optimization in tablegen due to the top level question mark.
llvm-svn: 323401
|
|
|
|
|
|
| |
for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI
llvm-svn: 323352
|
|
|
|
|
|
| |
The regexs are treated as a prefix match already so the checking for optional text at the end provides no value. Instead it prevents the binary search optimization in tablegen from kicking in due to the top level question mark.
llvm-svn: 323351
|
|
|
|
|
|
|
|
| |
Intel scheduler models.
Combine all the separate condition codes into a singular expression when possible.
llvm-svn: 322924
|
|
|
|
| |
llvm-svn: 322615
|
|
|
|
|
|
| |
We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions.
llvm-svn: 320924
|
|
|
|
|
|
|
|
| |
scheduler models. Combine into single InstrRW entries.
The reduces the number of scheduler groups in subtarget info.
llvm-svn: 320923
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the number of scheduler entries needed for FMA3 instructions."
I've hopefully sidestepped the MSVC issue that caused it to be reverted. We no longer include the Sched enum from X86GenInstrInfo.inc on the X86 target. So hopefully MSVC's preprocessor will skip over it and nothing will notice the 11000 character enum name.
Original commit message:
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
llvm-svn: 320655
|
|
|
|
|
|
|
|
|
|
| |
[X86] Use regular expressions more aggressively to reduce the number of scheduler entries needed for FMA3 instructions.
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
llvm-svn: 320470
|
|
|
|
|
|
|
|
|
|
| |
scheduler entries needed for FMA3 instructions.
When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models.
This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do.
llvm-svn: 320461
|
|
|
|
|
|
|
|
| |
This matches AVX512 version and is more consistent overall. And improves our scheduler models.
In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses.
llvm-svn: 320325
|
|
|
|
|
|
| |
The VEX versions were present but not the legacy SSE versions.
llvm-svn: 320294
|
|
|
|
|
|
| |
Sandybridge,Haswell,Broadwell,Skylake
llvm-svn: 320293
|
|
|
|
|
|
| |
Sandy Bridge is also missing it, but it has other issues. See PR35590.
llvm-svn: 320292
|
|
|
|
|
|
| |
Similar for all sizes of AND/OR/XOR/SUB/ADC/SBB/CMP.
llvm-svn: 320291
|
|
|
|
|
|
| |
Somehow CMPSSrr/rm was there and the VEX version was there, but this was consistently missing.
llvm-svn: 320289
|
|
|
|
|
|
|
|
|
|
| |
should be outside of multicharacter parenthesized expressions
If the question mark is inside the parentheses it only applies to the single character proceeding it.
I had to make a few additional cleanups to fix some duplicate warnings that were exposed by fixing this.
llvm-svn: 320279
|
|
|
|
|
|
| |
sandybridge,haswell,broadwell,skylakeclient scheduler models.
llvm-svn: 320277
|
|
|
|
|
|
|
|
|
|
| |
As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general).
Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes.
Differential Revision: https://reviews.llvm.org/D40351
llvm-svn: 319016
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Updated the scheduling information for the SkylakeClient target with the following changes:
1. regrouped the instructions after adding load and store latencies.
2. regrouped the instructions after adding identified missing ports in several groups.
The changes were made after revisiting the latencies impact of all the load and store uOps.
Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D38727
Change-Id: I778a308cc11e490e8fa5e27e2047412a1dca029f
llvm-svn: 315978
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.
For the register®ister form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register®ister form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.
I believe this supercedes D38025 which was trying to switch the register®ister form back to pre-PR22995.
Reviewers: aymanmus, RKSimon, zvi
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38120
llvm-svn: 314639
|
|
|
|
| |
llvm-svn: 313863
|
|
This patch adds the instruction scheduling information for the SkylakeClient (SKL) architecture target by adding the file X86SchedSkylakeClient.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.
The patch continues the scheduling replacement and insertion effort started with the SNB target in r307529 and r310792 and for HSW in r311879.
Please expect some performance fluctuations due to code alignment effects.
Reviewers: craig.topper, zvi, chandlerc, igorb, aymanmus, RKSimon, delena
Differential Revision: https://reviews.llvm.org/D37294
llvm-svn: 313613
|