summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86SchedSkylakeServer.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classesSimon Pilgrim2018-03-271-10/+5
| | | | | | | | Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer. Differential Revision: https://reviews.llvm.org/D44924 llvm-svn: 328664
* [X86] Add WriteCRC32 scheduler classSimon Pilgrim2018-03-261-0/+1
| | | | | | | | Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis. Differential Revision: https://reviews.llvm.org/D44647 llvm-svn: 328582
* [X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes ↵Simon Pilgrim2018-03-261-14/+10
| | | | | | | | | | | | (PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566
* [X86] Merge the SSE and AVX versions of fp divs and sqrts in the ↵Craig Topper2018-03-261-3/+3
| | | | | | | | SandyBridge/Haswell/Broadwell/Skylake scheduler models. I've used Agner's data as best I could to get the values to converge on. llvm-svn: 328473
* [X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 ↵Craig Topper2018-03-251-8/+8
| | | | | | | | for Skylake. This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX. llvm-svn: 328465
* [X86] Use WriteResPair for WriteIDiv to cleanup sched defs. NFCI.Simon Pilgrim2018-03-251-10/+4
| | | | llvm-svn: 328460
* [X86][SkylakeServer] Merge multiple instregex. NFCISimon Pilgrim2018-03-251-7/+7
| | | | llvm-svn: 328452
* [X86] Add the ability to override memory folding latency to schedules and ↵Simon Pilgrim2018-03-251-5/+6
| | | | | | | | | | add 1uop for memory folds for Intel models The Intel models need an extra 1uop for memory folded instructions, plus a lot of instructions take a non-default memory latency which should allow us to use the multiclass a lot more to tidy things up. Differential Revision: https://reviews.llvm.org/D44840 llvm-svn: 328446
* [X86] Rename VROUNDYPS* and VROUNDYPD* instructions to VROUNDPSY* and ↵Craig Topper2018-03-221-4/+4
| | | | | | | | | | VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD This makes the Y position consistent with other instructions. This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary. llvm-svn: 328254
* [X86] Correct the scheduling data for some of the 32 and 64 bit multiplies ↵Craig Topper2018-03-221-17/+6
| | | | | | to as best as I understand how they are implemented. llvm-svn: 328231
* [X86][SSE42] Use the default PCMPEST/PCMPIST scheduler classes directly. NFCI.Simon Pilgrim2018-03-221-69/+32
| | | | | | Models were completely overriding all SSE42 strins instructions when the default classes could be used for exactly the same coverage. llvm-svn: 328203
* [X86][CLMUL] Use the default CLMUL scheduler classes directly. NFCI.Simon Pilgrim2018-03-221-20/+8
| | | | | | Models were completely overriding all CLMUL instructions when the WriteCLMUL default classes could be used for exactly the same coverage. llvm-svn: 328194
* [X86] Use the default AES scheduler classes directly. NFCI.Simon Pilgrim2018-03-221-61/+25
| | | | | | | | | Models were completely overriding all AES instructions when the WriteAES default classes could be used for exactly the same coverage. Removes 6 unnecessary scheduler classes from every model. Note: Still looking for a way for tblgen to warn when this is happening - often the override is more complete than the default. llvm-svn: 328192
* [X86][Skylake] Merge multiple InstrRW entries that map to the same ↵Craig Topper2018-03-221-4160/+4122
| | | | | | | | | | SchedWriteRes group (NFCI) (PR35955) I've also merged some VEX/non-VEX instregex strings with a (V?) prefix or (Y?) ymm variant - there are still a lot more of these to do. This reduces the size of the optimized llc binary on my computer by 400K. Presumably because we went from 5000+ scheduler classes per CPU to ~2000. llvm-svn: 328179
* [X86] Change PMULLD to 10 cycles on Skylake per Agner's tables and ↵Craig Topper2018-03-201-12/+30
| | | | | | | | | | llvm-exegesis. Also restrict to port 0 and 1 for SkylakeClient. It looks like the scheduler models don't account for client not having a full vector ALU on port 5 like server. Fixes PR36808. llvm-svn: 328061
* [X86] Add TEST16mi/TEST32mi/TEST64mi32 to the ↵Craig Topper2018-03-201-1/+1
| | | | | | | | Sandybridge/Haswell/Broadwell/Skylake scheduler models. Move it from a load+store group on SNB to a load only group, the same group as CMP. llvm-svn: 327944
* [X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models.Craig Topper2018-03-191-1/+1
| | | | | | | | JRCXZ was already present, but not the others. We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output. llvm-svn: 327881
* [X86] Add the rest of the TEST with immediate instructions to the scheduler ↵Craig Topper2018-03-191-2/+2
| | | | | | models to match their 8-bit counterpart. llvm-svn: 327874
* [X86] Add MOV16ri*/MOV32ri*/MOV64ri* to scheduler models to match MOV8ri. ↵Craig Topper2018-03-191-1/+1
| | | | | | Correct SchedRW and itinerary for MOV32ri64. llvm-svn: 327872
* [X86] Generalize schedule classes to support multiple stagesSimon Pilgrim2018-03-191-83/+41
| | | | | | | | | | | | Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults. This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases. I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific. Differential Revision: https://reviews.llvm.org/D44612 llvm-svn: 327855
* [X86] Merge XADD8rr regular expression with XADD16rr/XADD32rr/XADD64rr in a ↵Craig Topper2018-03-191-2/+1
| | | | | | couple scheduler models. llvm-svn: 327821
* [X86] Add ADD16i16/ADD32i32/ADD64i32 and similar to the scheduler models to ↵Craig Topper2018-03-191-6/+8
| | | | | | | | match ADD8i8. Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm. llvm-svn: 327820
* [X86] Merge 8-bit instructions into instregex with 16/32/64 instructions in ↵Craig Topper2018-03-191-198/+99
| | | | | | | | the scheduler models as much as possible. NFCI This reduces the total number of generated scheduler classes from 5404 to 5316. llvm-svn: 327815
* [X86] Fix a bunch of overlapping regular expressions in the scheduler models.Craig Topper2018-03-181-22/+15
| | | | llvm-svn: 327787
* [X86] Remove MMX_MASKMOVQ64 and VMASKMOVDQU from scheduler models.Craig Topper2018-03-181-9/+0
| | | | | | | | | | | | | | The information was so wildly inaccurate and incomplete its better to just remove it. MMX_MASKMOVQ64 showed up twice in several scheduler models. In Haswell and Broadwell they were on adjacent lines. On Skylake the copies had different information. MMX_MASKMOVQ and MASKMOVDQU were completely missing. MMX_MASKMOVQ64 was listed on Haswell/Broadwell as 1 cycle on port 1 despite it being a store instruction. Filed PR36780 to track fixing this right. llvm-svn: 327783
* [X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore ↵Simon Pilgrim2018-03-151-0/+8
| | | | | | | | | | | | scheduler classes As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types. I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428. Differential Revision: https://reviews.llvm.org/D44471 llvm-svn: 327630
* [X86] Add IMUL scheduling info on sandybridge, fix it on >=haswell.Clement Courbet2018-03-071-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Only IMUL16rri uses an extra P0156. IMUL32* and IMUL16rr only use P1. This was computed using https://github.com/google/EXEgesis/blob/master/exegesis/tools/compute_itineraries.cc This can easily be validated by running perf on the following code: ``` int main(int argc, char**argv) { int a = argc; int b = argc; int c = argc; int d = argc; for (int i = 0; i < LOOP_ITERATIONS; ++i) { asm volatile( R"( .rept 10000 imull $0x2, %%edx, %%eax imull $0x2, %%ecx, %%ebx imull $0x2, %%eax, %%edx imull $0x2, %%ebx, %%ecx .endr )" : "+a"(a), "+b"(b), "+c"(c), "+d"(d) : :); } return a+b+c+d; } ``` -> test.cc perf stat -x, -e cycles --pfm-events=uops_executed_port:port_0:u,uops_executed_port:port_1:u,uops_executed_port:port_2:u,uops_executed_port:port_3:u,uops_executed_port:port_4:u,uops_executed_port:port_5:u,uops_executed_port:port_6:u,uops_executed_port:port_7:u test Reviewers: craig.topper, RKSimon, gadi.haber Subscribers: llvm-commits, gchatelet, chandlerc Differential Revision: https://reviews.llvm.org/D43460 llvm-svn: 326877
* [X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency ↵Craig Topper2018-01-251-23/+23
| | | | | | | | | | | | to some of them in SkylakeClient model. The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results. This changes them all to use explicit instrs instead. While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too. llvm-svn: 323406
* [X86] Name the MMX phaddd instruction with 3 Ds instead of just 2. NFCCraig Topper2018-01-251-2/+2
| | | | llvm-svn: 323403
* [X86] Remove 64/128/256 from MMX/SSE/AVX instruction names for overall ↵Craig Topper2018-01-251-66/+66
| | | | | | | | | | consistency. NFC MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation. SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128. AVX2 instructions should use a Y to indicate 256-bits. llvm-svn: 323402
* [X86] Remove unnecessary '_alt' and '_Int' from scheduler model regular ↵Craig Topper2018-01-251-1/+1
| | | | | | | | expressions. These were treated as optional suffixes, but the regular expressions are already prefix matches so this is unnecessary. It breaks the binary search optimization in tablegen due to the top level question mark. llvm-svn: 323401
* [X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 ↵Craig Topper2018-01-241-11/+9
| | | | | | for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI llvm-svn: 323352
* [X86] Remove '(_REV)?' from a bunch of scheduler regular expressions. NFCCraig Topper2018-01-241-69/+69
| | | | | | The regexs are treated as a prefix match already so the checking for optional text at the end provides no value. Instead it prevents the binary search optimization in tablegen from kicking in due to the top level question mark. llvm-svn: 323351
* [X86] Make better use of instregex for cmovcc/setcc/jcc instructions in the ↵Craig Topper2018-01-191-96/+10
| | | | | | | | Intel scheduler models. Combine all the separate condition codes into a singular expression when possible. llvm-svn: 322924
* [X86] Remove duplicate lines from scheduler models. NFCCraig Topper2018-01-171-2/+0
| | | | llvm-svn: 322615
* [X86] Combine some more scheduler model entries using regular expressions.Craig Topper2017-12-161-56/+28
| | | | | | We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions. llvm-svn: 320924
* [X86] Use instrs instead of instregex for gather/scatter instructions in the ↵Craig Topper2017-12-161-76/+76
| | | | | | | | scheduler models. Combine into single InstrRW entries. The reduces the number of scheduler groups in subtarget info. llvm-svn: 320923
* Recommit r320461 "[X86] Use regular expressions more aggressively to reduce ↵Craig Topper2017-12-131-456/+21
| | | | | | | | | | | | | | the number of scheduler entries needed for FMA3 instructions." I've hopefully sidestepped the MSVC issue that caused it to be reverted. We no longer include the Sched enum from X86GenInstrInfo.inc on the X86 target. So hopefully MSVC's preprocessor will skip over it and nothing will notice the 11000 character enum name. Original commit message: When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models. This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do. llvm-svn: 320655
* Revert r320461 - causing ICE in windows buildssSimon Pilgrim2017-12-121-21/+456
| | | | | | | | | | [X86] Use regular expressions more aggressively to reduce the number of scheduler entries needed for FMA3 instructions. When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models. This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do. llvm-svn: 320470
* [X86] Use regular expressions more aggressively to reduce the number of ↵Craig Topper2017-12-121-456/+21
| | | | | | | | | | scheduler entries needed for FMA3 instructions. When the scheduler tables are generated by tablegen, the instructions are divided up into groups based on their default scheduling information and how they are referenced by groups for each processor. For any set of instructions that are matched by a specific InstRW line, that group of instructions is guaranteed to not be in a group with any other instructions. So in general, the more InstRW class definitions are created, the more groups we end up with in the generated files. Particularly if a lot of the InstRW lines only match to single instructions, which is true of a large number of the Intel scheduler models. This change alone reduces the number of instructions groups from ~6000 to ~5500. And there's lots more we could do. llvm-svn: 320461
* [X86] Add VCOMISDZrr, VCOMISSZrr, VUCOMISDZrr, and VUCOMISSZrr to the ↵Craig Topper2017-12-101-4/+4
| | | | | | skylake server sheduler model llvm-svn: 320326
* [X86] Rename some instructions that start with Int_ to have the _Int at the end.Craig Topper2017-12-101-4/+4
| | | | | | | | This matches AVX512 version and is more consistent overall. And improves our scheduler models. In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses. llvm-svn: 320325
* [X86] Rename some instructions from 'rb' to 'rrb' to make 'b' a proper ↵Craig Topper2017-12-101-8/+8
| | | | | | | | suffix. Fix the scheduling information for some of them. Some of the scheduling information was only present for the 'rb' version' and not the 'rr' version. Now we match 'rr(b?)' llvm-svn: 320320
* [X86] Add VCVTQQ2PS to the skylake server scheduler models.Craig Topper2017-12-101-0/+6
| | | | llvm-svn: 320319
* [X86] Add VPMULLWZ256 to the skylake server scheduler modelCraig Topper2017-12-101-0/+2
| | | | llvm-svn: 320318
* [X86] Add 256/512-bit EVEX VPSADBW instructions to skylake server scheduler ↵Craig Topper2017-12-101-2/+4
| | | | | | model. llvm-svn: 320317
* [X86] Fix a few instructions that were named Z512 instead of just Z.Craig Topper2017-12-101-4/+4
| | | | | | This makes things consistent with our normal instruction naming. llvm-svn: 320316
* [X86] Add VPSRLWZrr to skylake server scheduler model.Craig Topper2017-12-101-0/+1
| | | | llvm-svn: 320315
* [X86] Add VPUNPCKLWDZrr to skylake server scheduler model.Craig Topper2017-12-101-0/+1
| | | | llvm-svn: 320314
* [X86] Fix duplicate entries in skylake server scheduler model by changing ↵Craig Topper2017-12-101-8/+8
| | | | | | | | Z128 to Z256 Based on the fact that the 'Y' version of the instruction is next to this, I assume Z256 is the intended value. llvm-svn: 320295
OpenPOWER on IntegriCloud