summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86SchedSkylakeClient.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Rename VROUNDYPS* and VROUNDYPD* instructions to VROUNDPSY* and ↵Craig Topper2018-03-221-4/+4
| | | | | | | | | | VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD This makes the Y position consistent with other instructions. This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary. llvm-svn: 328254
* [X86][SkylakeClient] Fix a bunch of instructions that were incorrectly ↵Craig Topper2018-03-221-111/+93
| | | | | | | | assigned Port015 instead of Port01. The VEC ADD and VEC MUL units aren't present on port 5 on SkylakeClient. llvm-svn: 328241
* [X86] Correct the scheduling data for some of the 32 and 64 bit multiplies ↵Craig Topper2018-03-221-15/+6
| | | | | | to as best as I understand how they are implemented. llvm-svn: 328231
* [X86][SSE42] Use the default PCMPEST/PCMPIST scheduler classes directly. NFCI.Simon Pilgrim2018-03-221-66/+33
| | | | | | Models were completely overriding all SSE42 strins instructions when the default classes could be used for exactly the same coverage. llvm-svn: 328203
* [X86][CLMUL] Use the default CLMUL scheduler classes directly. NFCI.Simon Pilgrim2018-03-221-20/+8
| | | | | | Models were completely overriding all CLMUL instructions when the WriteCLMUL default classes could be used for exactly the same coverage. llvm-svn: 328194
* [X86] Use the default AES scheduler classes directly. NFCI.Simon Pilgrim2018-03-221-57/+25
| | | | | | | | | Models were completely overriding all AES instructions when the WriteAES default classes could be used for exactly the same coverage. Removes 6 unnecessary scheduler classes from every model. Note: Still looking for a way for tblgen to warn when this is happening - often the override is more complete than the default. llvm-svn: 328192
* [X86][Skylake] Merge multiple InstrRW entries that map to the same ↵Craig Topper2018-03-221-1994/+1935
| | | | | | | | | | SchedWriteRes group (NFCI) (PR35955) I've also merged some VEX/non-VEX instregex strings with a (V?) prefix or (Y?) ymm variant - there are still a lot more of these to do. This reduces the size of the optimized llc binary on my computer by 400K. Presumably because we went from 5000+ scheduler classes per CPU to ~2000. llvm-svn: 328179
* [X86] Change PMULLD to 10 cycles on Skylake per Agner's tables and ↵Craig Topper2018-03-201-6/+24
| | | | | | | | | | llvm-exegesis. Also restrict to port 0 and 1 for SkylakeClient. It looks like the scheduler models don't account for client not having a full vector ALU on port 5 like server. Fixes PR36808. llvm-svn: 328061
* [X86] Add TEST16mi/TEST32mi/TEST64mi32 to the ↵Craig Topper2018-03-201-1/+1
| | | | | | | | Sandybridge/Haswell/Broadwell/Skylake scheduler models. Move it from a load+store group on SNB to a load only group, the same group as CMP. llvm-svn: 327944
* [X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models.Craig Topper2018-03-191-1/+1
| | | | | | | | JRCXZ was already present, but not the others. We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output. llvm-svn: 327881
* [X86] Add the rest of the TEST with immediate instructions to the scheduler ↵Craig Topper2018-03-191-2/+2
| | | | | | models to match their 8-bit counterpart. llvm-svn: 327874
* [X86] Add MOV16ri*/MOV32ri*/MOV64ri* to scheduler models to match MOV8ri. ↵Craig Topper2018-03-191-1/+1
| | | | | | Correct SchedRW and itinerary for MOV32ri64. llvm-svn: 327872
* [X86] Generalize schedule classes to support multiple stagesSimon Pilgrim2018-03-191-83/+41
| | | | | | | | | | | | Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults. This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases. I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific. Differential Revision: https://reviews.llvm.org/D44612 llvm-svn: 327855
* [X86] Add ADD16i16/ADD32i32/ADD64i32 and similar to the scheduler models to ↵Craig Topper2018-03-191-6/+8
| | | | | | | | match ADD8i8. Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm. llvm-svn: 327820
* [X86] Merge 32 and 64-bit RORX/SHLX/SARX/SHRX into single regular ↵Craig Topper2018-03-191-8/+4
| | | | | | expressions in scheduler models. llvm-svn: 327816
* [X86] Merge 8-bit instructions into instregex with 16/32/64 instructions in ↵Craig Topper2018-03-191-203/+102
| | | | | | | | the scheduler models as much as possible. NFCI This reduces the total number of generated scheduler classes from 5404 to 5316. llvm-svn: 327815
* [X86] Fix a bunch of overlapping regular expressions in the scheduler models.Craig Topper2018-03-181-19/+14
| | | | llvm-svn: 327787
* [X86] Remove MMX_MASKMOVQ64 and VMASKMOVDQU from scheduler models.Craig Topper2018-03-181-9/+0
| | | | | | | | | | | | | | The information was so wildly inaccurate and incomplete its better to just remove it. MMX_MASKMOVQ64 showed up twice in several scheduler models. In Haswell and Broadwell they were on adjacent lines. On Skylake the copies had different information. MMX_MASKMOVQ and MASKMOVDQU were completely missing. MMX_MASKMOVQ64 was listed on Haswell/Broadwell as 1 cycle on port 1 despite it being a store instruction. Filed PR36780 to track fixing this right. llvm-svn: 327783
* [X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore ↵Simon Pilgrim2018-03-151-0/+8
| | | | | | | | | | | | scheduler classes As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types. I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428. Differential Revision: https://reviews.llvm.org/D44471 llvm-svn: 327630
* [X86] Add IMUL scheduling info on sandybridge, fix it on >=haswell.Clement Courbet2018-03-071-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Only IMUL16rri uses an extra P0156. IMUL32* and IMUL16rr only use P1. This was computed using https://github.com/google/EXEgesis/blob/master/exegesis/tools/compute_itineraries.cc This can easily be validated by running perf on the following code: ``` int main(int argc, char**argv) { int a = argc; int b = argc; int c = argc; int d = argc; for (int i = 0; i < LOOP_ITERATIONS; ++i) { asm volatile( R"( .rept 10000 imull $0x2, %%edx, %%eax imull $0x2, %%ecx, %%ebx imull $0x2, %%eax, %%edx imull $0x2, %%ebx, %%ecx .endr )" : "+a"(a), "+b"(b), "+c"(c), "+d"(d) : :); } return a+b+c+d; } ``` -> test.cc perf stat -x, -e cycles --pfm-events=uops_executed_port:port_0:u,uops_executed_port:port_1:u,uops_executed_port:port_2:u,uops_executed_port:port_3:u,uops_executed_port:port_4:u,uops_executed_port:port_5:u,uops_executed_port:port_6:u,uops_executed_port:port_7:u test Reviewers: craig.topper, RKSimon, gadi.haber Subscribers: llvm-commits, gchatelet, chandlerc Differential Revision: https://reviews.llvm.org/D43460 llvm-svn: 326877
* [X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency ↵Craig Topper2018-01-251-26/+26
| | | | | | | | | | | | to some of them in SkylakeClient model. The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results. This changes them all to use explicit instrs instead. While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too. llvm-svn: 323406
* [X86] Name the MMX phaddd instruction with 3 Ds instead of just 2. NFCCraig Topper2018-01-251-2/+2
| | | | llvm-svn: 323403
* [X86] Remove 64/128/256 from MMX/SSE/AVX instruction names for overall ↵Craig Topper2018-01-251-66/+66
| | | | | | | | | | consistency. NFC MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation. SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128. AVX2 instructions should use a Y to indicate 256-bits. llvm-svn: 323402
* [X86] Remove unnecessary '_alt' and '_Int' from scheduler model regular ↵Craig Topper2018-01-251-1/+1
| | | | | | | | expressions. These were treated as optional suffixes, but the regular expressions are already prefix matches so this is unnecessary. It breaks the binary search optimization in tablegen due to the top level question mark. llvm-svn: 323401
* [X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 ↵Craig Topper2018-01-241-11/+9
| | | | | | for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI llvm-svn: 323352
* [X86] Remove '(_REV)?' from a bunch of scheduler regular expressions. NFCCraig Topper2018-01-241-41/+41
| | | | | | The regexs are treated as a prefix match already so the checking for optional text at the end provides no value. Instead it prevents the binary search optimization in tablegen from kicking in due to the top level question mark. llvm-svn: 323351
* [X86] Make better use of instregex for cmovcc/setcc/jcc instructions in the ↵Craig Topper2018-01-191-96/+10
| | | | | | | | Intel scheduler models. Combine all the separate condition codes into a singular expression when possible. llvm-svn: 322924
* [X86] Remove duplicate lines from scheduler models. NFCCraig Topper2018-01-171-2/+0
| | | | llvm-svn: 322615
* [X86] Combine some more scheduler model entries using regular expressions.Craig Topper2017-12-161-48/+24
| | | | | | We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions. llvm-svn: 320924
* [X86] Use instrs instead of instregex for gather/scatter instructions in the ↵Craig Topper2017-12-161-30/+16
| | | | | | | | scheduler models. Combine into single InstrRW entries. The reduces the number of scheduler groups in subtarget info. llvm-svn: 320923
* Recommit r320461 "[X86] Use regular expressions more aggressively to reduce ↵Craig Topper2017-12-131-192/+10
| | | | | | | | | | | | | | the number of scheduler entries needed for FMA3 instructions." I've hopefully sidestepped the MSVC issue that caused it to be reverted. We no longer include the Sched enum from X86GenInstrInfo.inc on the X86 target. So hopefully MSVC's preprocessor will skip over it and nothing will notice the 11000 character enum name. Original commit message: When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models. This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do. llvm-svn: 320655
* Revert r320461 - causing ICE in windows buildssSimon Pilgrim2017-12-121-10/+192
| | | | | | | | | | [X86] Use regular expressions more aggressively to reduce the number of scheduler entries needed for FMA3 instructions. When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models. This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do. llvm-svn: 320470
* [X86] Use regular expressions more aggressively to reduce the number of ↵Craig Topper2017-12-121-192/+10
| | | | | | | | | | scheduler entries needed for FMA3 instructions. When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models. This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do. llvm-svn: 320461
* [X86] Rename some instructions that start with Int_ to have the _Int at the end.Craig Topper2017-12-101-4/+4
| | | | | | | | This matches AVX512 version and is more consistent overall. And improves our scheduler models. In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses. llvm-svn: 320325
* [X86] Add MOVQI2PQIrm, MOVSDmr, and MOVSDrm to scheduler informationCraig Topper2017-12-101-0/+3
| | | | | | The VEX versions were present but not the legacy SSE versions. llvm-svn: 320294
* [X86] Add LEA64_32r to scheduler models for ↵Craig Topper2017-12-101-1/+1
| | | | | | Sandybridge,Haswell,Broadwell,Skylake llvm-svn: 320293
* [X86] Add IN16/OUT16 to scheduling information for Haswell,Broadwell,SkylakeCraig Topper2017-12-101-4/+4
| | | | | | Sandy Bridge is also missing it, but it has other issues. See PR35590. llvm-svn: 320292
* [X86] Fix scheduler models to support ADD32ri in addition to ADD32ri8. ↵Craig Topper2017-12-101-16/+16
| | | | | | Similar for all sizes of AND/OR/XOR/SUB/ADC/SBB/CMP. llvm-svn: 320291
* [X86] Add CMPSDrr/rm to the scheduler models.Craig Topper2017-12-101-0/+2
| | | | | | Somehow CMPSSrr/rm was there and the VEX version was there, but this was consistently missing. llvm-svn: 320289
* [X86] Fix bad regular expressions in the scheduler models. Question marks ↵Craig Topper2017-12-101-53/+51
| | | | | | | | | | should be outside of multicharacter parenthesized expressions If the question mark is inside the parentheses it only applies to the single character proceeding it. I had to make a few additional cleanups to fix some duplicate warnings that were exposed by fixing this. llvm-svn: 320279
* [X86] Add the commutable floating point min/max pseudo instructions to ↵Craig Topper2017-12-101-40/+40
| | | | | | sandybridge,haswell,broadwell,skylakeclient scheduler models. llvm-svn: 320277
* [X86][FMA] Tag all FMA/FMA4 instructions with WriteFMA schedule classSimon Pilgrim2017-11-271-1/+1
| | | | | | | | | | As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general). Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes. Differential Revision: https://reviews.llvm.org/D40351 llvm-svn: 319016
* [X86][SKL] Updated scheduling information for the SkylakeClient targetGadi Haber2017-10-171-3072/+3271
| | | | | | | | | | | | | | Updated the scheduling information for the SkylakeClient target with the following changes: 1. regrouped the instructions after adding load and store latencies. 2. regrouped the instructions after adding identified missing ports in several groups. The changes were made after revisiting the latencies impact of all the load and store uOps. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D38727 Change-Id: I778a308cc11e490e8fa5e27e2047412a1dca029f llvm-svn: 315978
* [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMemCraig Topper2017-10-011-2/+2
| | | | | | | | | | | | | | | | | | | Summary: Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable. For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem. I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995. Reviewers: aymanmus, RKSimon, zvi Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38120 llvm-svn: 314639
* [X86] Remove execute permissions from a couple files.Craig Topper2017-09-211-0/+0
| | | | llvm-svn: 313863
* [X86][Skylake] Adding the scheduling information for the SkylakeClient targetGadi Haber2017-09-191-0/+4011
This patch adds the instruction scheduling information for the SkylakeClient (SKL) architecture target by adding the file X86SchedSkylakeClient.td located under the X86 Target. We used the scheduling information retrieved from the Skylake architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction. The patch continues the scheduling replacement and insertion effort started with the SNB target in r307529 and r310792 and for HSW in r311879. Please expect some performance fluctuations due to code alignment effects. Reviewers: craig.topper, zvi, chandlerc, igorb, aymanmus, RKSimon, delena Differential Revision: https://reviews.llvm.org/D37294 llvm-svn: 313613
OpenPOWER on IntegriCloud