summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86SchedSandyBridge.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Fix the SNB scheduler for BLENDVB.Craig Topper2018-03-201-1/+2
| | | | | | PBLENDVBrr0 was with the memory version of VBLENDVB and PBLENDVBrm0 was missing. llvm-svn: 327937
* [X86] Add JMP16r and JMP32r to Sandybridge scheduler model.Craig Topper2018-03-191-1/+1
| | | | | | Fixes PR36010 llvm-svn: 327883
* [X86] Remove OUT32rr/OUT8rr/OUT32ri/OUT8ri from Sandybridge scheduler model.Craig Topper2018-03-191-4/+0
| | | | | | PR35590 was already filed for this information being wrong. It's probably better to default to WriteSystem behavior instead of using something completely wrong. llvm-svn: 327882
* [X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models.Craig Topper2018-03-191-1/+1
| | | | | | | | JRCXZ was already present, but not the others. We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output. llvm-svn: 327881
* [X86] Add the rest of the TEST with immediate instructions to the scheduler ↵Craig Topper2018-03-191-2/+2
| | | | | | models to match their 8-bit counterpart. llvm-svn: 327874
* [X86] Add MOV16ri*/MOV32ri*/MOV64ri* to scheduler models to match MOV8ri. ↵Craig Topper2018-03-191-1/+1
| | | | | | Correct SchedRW and itinerary for MOV32ri64. llvm-svn: 327872
* [X86] Generalize schedule classes to support multiple stagesSimon Pilgrim2018-03-191-86/+43
| | | | | | | | | | | | Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults. This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases. I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific. Differential Revision: https://reviews.llvm.org/D44612 llvm-svn: 327855
* [X86] Merge XADD8rr regular expression with XADD16rr/XADD32rr/XADD64rr in a ↵Craig Topper2018-03-191-2/+1
| | | | | | couple scheduler models. llvm-svn: 327821
* [X86] Add ADD16i16/ADD32i32/ADD64i32 and similar to the scheduler models to ↵Craig Topper2018-03-191-8/+8
| | | | | | | | match ADD8i8. Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm. llvm-svn: 327820
* [X86] Merge 8-bit instructions into instregex with 16/32/64 instructions in ↵Craig Topper2018-03-191-156/+78
| | | | | | | | the scheduler models as much as possible. NFCI This reduces the total number of generated scheduler classes from 5404 to 5316. llvm-svn: 327815
* [X86] Fix a bunch of overlapping regular expressions in the scheduler models.Craig Topper2018-03-181-7/+7
| | | | llvm-svn: 327787
* [X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore ↵Simon Pilgrim2018-03-151-0/+8
| | | | | | | | | | | | scheduler classes As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types. I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428. Differential Revision: https://reviews.llvm.org/D44471 llvm-svn: 327630
* [X86] Add IMUL scheduling info on sandybridge, fix it on >=haswell.Clement Courbet2018-03-071-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Only IMUL16rri uses an extra P0156. IMUL32* and IMUL16rr only use P1. This was computed using https://github.com/google/EXEgesis/blob/master/exegesis/tools/compute_itineraries.cc This can easily be validated by running perf on the following code: ``` int main(int argc, char**argv) { int a = argc; int b = argc; int c = argc; int d = argc; for (int i = 0; i < LOOP_ITERATIONS; ++i) { asm volatile( R"( .rept 10000 imull $0x2, %%edx, %%eax imull $0x2, %%ecx, %%ebx imull $0x2, %%eax, %%edx imull $0x2, %%ebx, %%ecx .endr )" : "+a"(a), "+b"(b), "+c"(c), "+d"(d) : :); } return a+b+c+d; } ``` -> test.cc perf stat -x, -e cycles --pfm-events=uops_executed_port:port_0:u,uops_executed_port:port_1:u,uops_executed_port:port_2:u,uops_executed_port:port_3:u,uops_executed_port:port_4:u,uops_executed_port:port_5:u,uops_executed_port:port_6:u,uops_executed_port:port_7:u test Reviewers: craig.topper, RKSimon, gadi.haber Subscribers: llvm-commits, gchatelet, chandlerc Differential Revision: https://reviews.llvm.org/D43460 llvm-svn: 326877
* [X86] Expand IMUL/MUL instregexs in Intel scheduler models. Add load latency ↵Craig Topper2018-01-251-4/+4
| | | | | | | | | | | | to some of them in SkylakeClient model. The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results. This changes them all to use explicit instrs instead. While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too. llvm-svn: 323406
* [X86] Name the MMX phaddd instruction with 3 Ds instead of just 2. NFCCraig Topper2018-01-251-2/+2
| | | | llvm-svn: 323403
* [X86] Remove 64/128/256 from MMX/SSE/AVX instruction names for overall ↵Craig Topper2018-01-251-56/+56
| | | | | | | | | | consistency. NFC MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation. SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128. AVX2 instructions should use a Y to indicate 256-bits. llvm-svn: 323402
* [X86] Adjust names of PINSRW/PEXTRW intructions between MMX/SSE/AVX/AVX512 ↵Craig Topper2018-01-241-6/+6
| | | | | | for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI llvm-svn: 323352
* [X86] Make better use of instregex for cmovcc/setcc/jcc instructions in the ↵Craig Topper2018-01-191-96/+10
| | | | | | | | Intel scheduler models. Combine all the separate condition codes into a singular expression when possible. llvm-svn: 322924
* [X86] Rename some instructions that start with Int_ to have the _Int at the end.Craig Topper2017-12-101-8/+8
| | | | | | | | This matches AVX512 version and is more consistent overall. And improves our scheduler models. In some cases this adds _Int to instructions that didn't have any Int_ before. It's a side effect of the adjustments made to some of the multiclasses. llvm-svn: 320325
* [X86] Add MOVQI2PQIrm, MOVSDmr, and MOVSDrm to scheduler informationCraig Topper2017-12-101-0/+3
| | | | | | The VEX versions were present but not the legacy SSE versions. llvm-svn: 320294
* [X86] Add LEA64_32r to scheduler models for ↵Craig Topper2017-12-101-1/+1
| | | | | | Sandybridge,Haswell,Broadwell,Skylake llvm-svn: 320293
* [X86] Fix scheduler models to support ADD32ri in addition to ADD32ri8. ↵Craig Topper2017-12-101-16/+16
| | | | | | Similar for all sizes of AND/OR/XOR/SUB/ADC/SBB/CMP. llvm-svn: 320291
* [X86] Add CMPSDrr/rm to the scheduler models.Craig Topper2017-12-101-0/+2
| | | | | | Somehow CMPSSrr/rm was there and the VEX version was there, but this was consistently missing. llvm-svn: 320289
* [X86] Add the commutable floating point min/max pseudo instructions to ↵Craig Topper2017-12-101-40/+40
| | | | | | sandybridge,haswell,broadwell,skylakeclient scheduler models. llvm-svn: 320277
* [X86][FMA] Tag all FMA/FMA4 instructions with WriteFMA schedule classSimon Pilgrim2017-11-271-1/+2
| | | | | | | | | | As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general). Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes. Differential Revision: https://reviews.llvm.org/D40351 llvm-svn: 319016
* Avoid unecessary opsize byte in segment move to memoryNirav Dave2017-11-211-1/+1
| | | | | | | | | | | | | | | | | Segment moves to memory are always 16-bit. Remove invalid 32 and 64 bit variants. Recommiting with missing clang inline assembly test change. Fixes PR34478. Reviewers: rnk, craig.topper Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39847 llvm-svn: 318797
* Revert r318678 to fix Clang testRichard Trieu2017-11-211-1/+1
| | | | | | r318678 caused the Clang test CodeGen/ms-inline-asm.c to start failing. llvm-svn: 318710
* [X86] Avoid unecessary opsize byte in segment move to memoryNirav Dave2017-11-201-1/+1
| | | | | | | | | | | | | | | | | Summary: Segment moves to memory are always 16-bit. Remove invalid 32 and 64 bit variants. Fixes PR34478. Reviewers: rnk, craig.topper Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39847 llvm-svn: 318678
* [X86] Add CBW/CDQ/CDQE/CQO/CWD/CWDE to WriteALU schedule classSimon Pilgrim2017-11-151-0/+1
| | | | | | | | Some CPUs are already overriding these sign extension instructions but we should be able to use the WriteALU schedule class by default. Differential Revision: https://reviews.llvm.org/D39899 llvm-svn: 318308
* [X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMemCraig Topper2017-10-011-2/+2
| | | | | | | | | | | | | | | | | | | Summary: Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable. For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem. I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995. Reviewers: aymanmus, RKSimon, zvi Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38120 llvm-svn: 314639
* [X86] Remove windows line endings.Craig Topper2017-09-211-919/+919
| | | | llvm-svn: 313862
* [X86][SandyBridge] Additional updates to the SNB instructions scheduling ↵Gadi Haber2017-08-131-824/+988
| | | | | | | | | | | | | | | | information This is a continuation patch for commit r307529 which completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target (see also https://reviews.llvm.org/D35019). In this patch we added the scheduling information of additional SNB instructions that were missing from the patch commit r307529, fixed the scheduling of several resource groups that include only port0 instead of port05 (i.e., port0 OR port5) and fixed several incorrect instructions' scheduling in the r307529 commit. The patch also includes the X87 instructions which were missing in previous patch commit r307529 as reported in bugzilla bug 34080. Reviewers: zvi, RKSimon, chandlerc, igorb, m_zuckerman, craig.topper, aymanmus, dim Differential Revision: https://reviews.llvm.org/D36388 llvm-svn: 310792
* This patch completely replaces the scheduling information for the ↵Gadi Haber2017-07-101-30/+2442
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target. The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information. Please note that the patch extensively affects the X86 MC instr scheduling for SNB. Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX. The updated and extended information about each instruction includes the following details: •static latency of the instruction •number of uOps from which the instruction consists of •all ports used by the instruction's' uOPs For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5: def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> { let Latency = 9; let NumMicroOps = 6; let ResourceCycles = [1,2,2,1]; } def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>; def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>; def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>; def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>; Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script. Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb Differential Revision: https://reviews.llvm.org/D35019#inline-304691 llvm-svn: 307529
* Reverting commit 306414 on behalf of @gadi.haberMichael Zuckerman2017-06-281-2281/+27
| | | | llvm-svn: 306532
* Updated and extended the information about each instruction in HSW and SNB ↵Gadi Haber2017-06-271-27/+2281
| | | | | | | | | | | | | | | | | | | to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414
* Add scheduler classes to integer/float horizontal operations.Andrew V. Tischenko2017-06-081-0/+25
| | | | | | | This patch will close PR32801. Differential Revision: https://reviews.llvm.org/D33203 llvm-svn: 304986
* [X86][SchedModel] SSE reciprocal square root instruction latencies.Andrea Di Biagio2014-09-261-0/+1
| | | | | | | | | | | | | | | | | The SSE rsqrt instruction (a fast reciprocal square root estimate) was grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very slow) SSE sqrt instruction. For code which uses rsqrt (possibly with newton-raphson iterations) this poor scheduling was affecting performances. This patch splits off the rsqrt instruction from the sqrt instruction scheduling classes and creates new IIC_SSE_RSQER* classes with latency values based on Agner's table. Differential Revision: http://reviews.llvm.org/D5370 Patch by Simon Pilgrim. llvm-svn: 218517
* Move late partial-unrolling thresholds into the processor definitionsHal Finkel2014-05-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | The old method used by X86TTI to determine partial-unrolling thresholds was messy (because it worked by testing target features), and also would not correctly identify the target CPU if certain target features were disabled. After some discussions on IRC with Chandler et al., it was decided that the processor scheduling models were the right containers for this information (because it is often tied to special uop dispatch-buffer sizes). This does represent a small functionality change: - For generic x86-64 (which uses the SB model and, thus, will get some unrolling). - For AMD cores (because they still currently use the SB scheduling model) - For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump the default threshold to 50; we're working on a test case for this). Otherwise, nothing has changed for any other targets. The logic, however, has been moved into BasicTTI, so other targets may now also opt-in to this functionality simply by setting LoopMicroOpBufferSize in their processor model definitions. llvm-svn: 208289
* [X86][SchedModel] Add missing scheduling model for SSE related instructions.Quentin Colombet2014-02-241-0/+115
| | | | | | | | | | The patch defines new or refines existing generic scheduling classes to match the behavior of the SSE instructions. It also maps those scheduling classes on the related SSE instructions. <rdar://problem/15607571> llvm-svn: 202065
* Mark the x86 machine model as incomplete. PR17367.Andrew Trick2013-09-251-0/+4
| | | | | | | | | | | | Ideally, the machinel model is added at the time the instructions are defined. But many instructions in X86InstrSSE.td still need a model. Without this workaround the scheduler asserts because x86 already has itinerary classes for these instructions, indicating they should be modeled by the scheduler. Since we use the new machine model for other instructions, it expects a new machine model for these too. llvm-svn: 191391
* Fix IMULX machine model. Multiple def operands require multiple SchedWrites.Andrew Trick2013-06-211-0/+1
| | | | llvm-svn: 184566
* Support BufferSize on ProcResGroup for unified MOp schedulers.Andrew Trick2013-06-151-0/+5
| | | | | | And add Sandybridge/Haswell resource buffers. llvm-svn: 184034
* Update machine models. Specify buffer sizes for OOO processors.Andrew Trick2013-06-151-1/+1
| | | | llvm-svn: 184033
* Machine Model: Add MicroOpBufferSize and resource BufferSize.Andrew Trick2013-06-151-1/+0
| | | | | | | | | | | | | Replace the ill-defined MinLatency and ILPWindow properties with with straightforward buffer sizes: MCSchedMode::MicroOpBufferSize MCProcResourceDesc::BufferSize These can be used to more precisely model instruction execution if desired. Disabled some misched tests temporarily. They'll be reenabled in a few commits. llvm-svn: 184032
* X86 machine model: reduce SandyBridge and Haswell ILPWindow.Andrew Trick2013-04-131-1/+1
| | | | | | | | | | | | | | | The initial values were arbitrary. I want them to be more conservative. This represents the number of latency cycles hidden by OOO execution. In practice, I think it should be within a small factor of the complex floating point operation latency so the scheduler can make some attempt to hide latency even for smallish blocks. These are by no means the best values, just a starting point for tuning heuristics. Some benchmarks such as TSVC run faster with this lower value for SandyBridge. I haven't run anything on Haswell, but it's shouldn't be 2x SB. llvm-svn: 179450
* The divide unit is not pipeline, but it is still buffered.Andrew Trick2013-04-021-2/+2
| | | | | | | | | | | | | | | | | | | | Buffered means a later divide may be executed out-of-order while a prior divide is sitting (buffered) in a reservation station. You can tell it's not pipelined, because operations that use it reserve it for more than one cycle: def : WriteRes<WriteIDiv, [HWPort0, HWDivider]> { let Latency = 25; let ResourceCycles = [1, 10]; } We don't currently distinguish between an unpipeline operation and one that is split into multiple micro-ops requiring the same unit. Except that the later may have NumMicroOps > 1 if they also consume issue/dispatch resources. llvm-svn: 178519
* Remove the unused port from the SandyBridge machine modelNadav Rotem2013-03-281-1/+0
| | | | llvm-svn: 178300
* Add a scheduling model for Intel Sandy Bridge microarchitecture.Jakob Stoklund Olesen2013-03-251-0/+123
The model isn't hooked up by this patch because the instruction set isn't fully annotated yet. llvm-svn: 177942
OpenPOWER on IntegriCloud