summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86SchedSandyBridge.td
Commit message (Collapse)AuthorAgeFilesLines
* [X86][BtVer2] Fix latency and throughput of conditional SIMD store instructions.Andrea Di Biagio2019-09-021-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On BtVer2 conditional SIMD stores are heavily microcoded. The latency is directly proportional to the number of packed elements extracted from the input vector. Also, according to micro-benchmarks, most of the computation seems to be done in the integer unit. Only a minority of the uOPs is executed by the FPU. The observed behaviour on the FPU looks similar to this: - The input MASK value is moved to the Integer Unit -- [ a VMOVMSK-like uOP-executed on JFPU0]. - In parallel, each element of the input XMM/YMM is extracted and then sent to the IntegerUnit through JFPU1. As expected, a (conditional) store is executed for every extracted element. Interestingly, a (speculative) load is executed for every extracted element too. It is as-if a "LOAD - BIT_EXTRACT- CMOV" sequence of uOPs is repeated by the integer unit for every contionally stored element. VMASKMOVDQU is a special case: the number of speculative loads is always 2 (presumably, one load per quadword). That means, extra shifts and masking is performed on (one of) the loaded quadwords before each conditional store (that also explains the big number of non-FP uOPs retired). This patch replaces the existing writes for conditional SIMD stores (i.e. WriteFMaskedStore, and WriteFMaskedStoreY) with the following new writes: WriteFMaskedStore32 [ XMM Packed Single ] WriteFMaskedStore32Y [ YMM Packed Single ] WriteFMaskedStore64 [ XMM Packed Double ] WriteFMaskedStore64Y [ YMM Packed Double ] Added a wrapper class named X86SchedWriteMaskMove in X86Schedule.td to describe both RM and MR variants for conditional SIMD moves in a single tablegen definition. Instances of that class are then passed in input to multiclass avx_movmask_rm when constructing MASKMOVPS/PD definitions. Since this patch introduces new writes, I had to update all the X86 scheduling models. Differential Revision: https://reviews.llvm.org/D66801 llvm-svn: 370649
* [X86] Add zero idioms to the haswell, broadwell, and skylake schedule ↵Craig Topper2019-05-251-7/+13
| | | | | | | | | | models. Add 256-bit fp xor to sandybridge zero idioms This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate. Differential Revision: https://reviews.llvm.org/D62360 llvm-svn: 361690
* [X86] Merge the different SETcc instructions for each condition code into ↵Craig Topper2019-04-051-14/+26
| | | | | | | | | | | | | | | | | | | | | single instructions that store the condition code as an operand. Summary: This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between SETcc instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. Reviewers: andreadb, courbet, RKSimon, spatel, lebedev.ri Reviewed By: andreadb Subscribers: hiraditya, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60138 llvm-svn: 357801
* [X86] Merge the different CMOV instructions for each condition code into ↵Craig Topper2019-04-051-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | single instructions that store the condition code as an immediate. Summary: Reorder the condition code enum to match their encodings. Move it to MC layer so it can be used by the scheduler models. This avoids needing an isel pattern for each condition code. And it removes translation switches for converting between CMOV instructions and condition codes. Now the printer, encoder and disassembler take care of converting the immediate. We use InstAliases to handle the assembly matching. But we print using the asm string in the instruction definition. The instruction itself is marked IsCodeGenOnly=1 to hide it from the assembly parser. This does complicate the scheduler models a little since we can't assign the A and BE instructions to a separate class now. I plan to make similar changes for SETcc and Jcc. Reviewers: RKSimon, spatel, lebedev.ri, andreadb, courbet Reviewed By: RKSimon Subscribers: gchatelet, hiraditya, kristina, lebedev.ri, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60041 llvm-svn: 357800
* [MC][X86] Correctly model additional operand latency caused by transfer ↵Andrea Di Biagio2019-01-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | delays from the integer to the floating point unit. This patch adds a new ReadAdvance definition named ReadInt2Fpu. ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by data transfers from the integer unit to the floating point unit. ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all x86 models excluding BtVer2. That means, this patch is only a functional change for the Jaguar cpu model only. Tablegen definitions for instructions (V)PINSR* have been updated to account for the new ReadInt2Fpu. That read is mapped to the the GPR input operand. On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch, that extra delay was added to the opcode latency. In practice, the insert opcode only executes for 1cy. Most of the actual latency is actually contributed by the so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR* latency is defined by expression f+1, where f is defined as a forwarding delay from the integer unit to the fpu. When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC (only when flag -print-schedule is speified), we now need to account for any extra forwarding delays. We do this by checking if scheduling classes declare any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td: "A negative advance effectively increases latency, which may be used for cross-domain stalls". When computing the instruction latency for the purpose of our scheduling tests, we now add any extra delay to the formula. This avoids regressing existing codegen and mca schedule tests. It comes with the cost of an extra (but very simple) hook in MCSchedModel. Differential Revision: https://reviews.llvm.org/D57056 llvm-svn: 351965
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [X86] Fix VZEROUPPER scheduling info on SNB,HSW,BDW,SXL,SKX.Clement Courbet2018-11-091-0/+7
| | | | | | | | | | | | | | | | | | | | | | | Summary: Starting from SNB, VZEROUPPER is handled by the renamer and uses no proc resources. After HSW, it also has zero latency. This fixes PR35606. To reproduce: Uops: llvm-exegesis -mode=uops -opcode-name=VZEROUPPER Latency: echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper' | /tmp/llvm-exegesis -mode=latency -snippets-file=- echo -e '#LLVM-EXEGESIS-DEFREG XMM0 1\n#LLVM-EXEGESIS-DEFREG XMM1 1\nvzeroupper\naddps %xmm0, %xmm1' | /tmp/llvm-exegesis -mode=latency -snippets-file=- Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D54107 llvm-svn: 346482
* [X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957)Simon Pilgrim2018-10-051-1/+7
| | | | | | | | | | | | Currently we hardcode instructions with ReadAfterLd if the register operands don't need to be available until the folded load has completed. This doesn't take into account the different load latencies of different memory operands (PR36957). This patch adds a ReadAfterFold def into X86FoldableSchedWrite to replace ReadAfterLd, allowing us to specify the load latency at a scheduler class level. I've added ReadAfterVec*Ld classes that match the XMM/Scl, XMM and YMM/ZMM WriteVecLoad classes that we currently use, we can tweak these values in future patches once this infrastructure is in place. Differential Revision: https://reviews.llvm.org/D52886 llvm-svn: 343868
* [X86] Remove unnecessary BT(C/R/S)m(i/r) scheduler overridesSimon Pilgrim2018-10-021-8/+5
| | | | | | Some SchedAlias remain due to some badly setup RMW tags - but at least the overrides are all removed llvm-svn: 343586
* [X86] Create schedule classes for BT(C|R|S)mi and BT(C|R|S)mr instructionsSimon Pilgrim2018-10-011-5/+7
| | | | llvm-svn: 343490
* [X86] Remove unnecessary BTmi/BTmr scheduler overridesSimon Pilgrim2018-10-011-9/+2
| | | | llvm-svn: 343487
* [X86] Create schedule classes for BTmi and BTmr instructionsSimon Pilgrim2018-10-011-3/+6
| | | | llvm-svn: 343478
* [X86][Sched] Update scheduling information for VZEROALL on HWS, BDW, SKX, SNB.Clement Courbet2018-10-011-0/+7
| | | | | | | | | | | | | | | | | Summary: While looking at PR35606, I found out that the scheduling info is incorrect. One can check that it's really a P5+P6 and not a 2*P56 with: echo -e 'vzeroall\nvandps %xmm1, %xmm2, %xmm3' | ./bin/llvm-exegesis -mode=uops -snippets-file=- (vandps executes on P5 only) Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52541 llvm-svn: 343447
* [X86] Split BT and BTC/BTR/BTS scheduler classesSimon Pilgrim2018-09-271-2/+3
| | | | llvm-svn: 343233
* Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overridesSimon Pilgrim2018-09-251-7/+29
| | | | | | | | As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead. The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342969
* [X86] Remove shift/rotate by CL memory (RMW) overridesSimon Pilgrim2018-09-241-29/+7
| | | | | | The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342916
* [X86] Split WriteIMul into 8/16/32/64 implementations (PR36931)Simon Pilgrim2018-09-241-61/+14
| | | | | | | | Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases. This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions. llvm-svn: 342892
* [X86] Split WriteShift/WriteRotate schedule classes by CL usage.Simon Pilgrim2018-09-231-13/+4
| | | | | | Variable Shifts/Rotates using the CL register have different behaviours to the immediate instructions - split accordingly to help remove yet more repeated overrides from the schedule models. llvm-svn: 342852
* [X86] Remove unnecessary WriteRotate override. NFCI.Simon Pilgrim2018-09-231-4/+2
| | | | | | SNB was the last override for ROT(L|R)r(1|i) - they now all use WriteRotate correctly. llvm-svn: 342848
* Fix line ending mismatches. NFCI.Simon Pilgrim2018-09-231-6/+6
| | | | llvm-svn: 342847
* [X86] Added missing RCL/RCR schedule overrides to the generic SNB modelSimon Pilgrim2018-09-231-0/+24
| | | | | | | | | | | | The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults. I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well. This is necessary to allow WriteRotate to be updated to remove other rotate overrides. It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions. llvm-svn: 342842
* [X86] Add WriteRotate schedule class, splitting off from WriteShift.Simon Pilgrim2018-09-231-1/+3
| | | | | | | | NFCI for now, but it should make it easier to remove a lot of unnecessary overrides in a future commit. Now that funnel shift intrinsics are coming online we need to get this cleaned up to make vectorization costs from scalar rotate patterns more straightforward. llvm-svn: 342837
* [X86][Sched] Add zero idiom sched data to the SNB model.Clement Courbet2018-09-211-1/+51
| | | | | | | | | | | | | | | | | | | | Summary: On SNB, renamer-based zeroing does not work for: - 16 and 8-bit GPRs[1]. - MMX [2]. - ANDN variants [3] [1] echo 'sub %ax, %ax' | /tmp/llvm-exegesis -mode=uops -snippets-file=- [2] echo 'pxor %mm0, %mm0' | /tmp/llvm-exegesis -mode=uops -snippets-file=- [3] echo 'andnps %xmm0, %xmm0' | /tmp/llvm-exegesis -mode=uops -snippets-file=- Reviewers: RKSimon, andreadb Subscribers: gbedwell, craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52358 llvm-svn: 342736
* [X86][BMI1] Add scheduler class for BLSI/BLSMSK/BLSR BMI1 instructionsSimon Pilgrim2018-09-141-2/+3
| | | | llvm-svn: 342234
* [X86] Improved sched model for X86 CMPXCHG* instructions.Andrew V. Tischenko2018-08-301-13/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D50070 llvm-svn: 341024
* [X86] Replace all single match schedule class instregexs with instrs entriesSimon Pilgrim2018-08-181-34/+35
| | | | | | Helps reduce cost of instrw collection llvm-svn: 340124
* [X86] Merge shift/rotate schedule class instregexsSimon Pilgrim2018-08-181-14/+7
| | | | | | Helps reduce cost of instrw collection llvm-svn: 340123
* [X86] Improved sched models for X86 XCHG*rr and XADD*rr instructions.Andrew V. Tischenko2018-08-091-9/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D49861 llvm-svn: 339321
* [X86] Improved sched models for X86 BT*rr instructions.Andrew V. Tischenko2018-08-011-8/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D49243 llvm-svn: 338507
* [X86] WriteBSWAP sched classes are reg-reg only.Simon Pilgrim2018-07-311-2/+2
| | | | | | | Don't declare them as X86SchedWritePair when the folded class will never be used. Note: MOVBE (load/store endian conversion) instructions tend to have a very different behaviour to BSWAP. llvm-svn: 338412
* Revert r338365: [X86] Improved sched models for X86 BT*rr instructions.Simon Pilgrim2018-07-311-1/+8
| | | | | | | | https://reviews.llvm.org/D49243 Contains WIP code that should not have been included. llvm-svn: 338369
* [X86] Improved sched models for X86 BT*rr instructions.Andrew V. Tischenko2018-07-311-8/+1
| | | | | | https://reviews.llvm.org/D49243 llvm-svn: 338365
* [X86] Improved sched models for X86 SHLD/SHRD* instructions.Andrew V. Tischenko2018-07-311-33/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D9611 llvm-svn: 338359
* Improved sched model for X86 BSWAP* instrs.Andrew V. Tischenko2018-07-201-14/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D49477 llvm-svn: 337537
* [X86][Nearly NFC] Split SHLD/SHRD into their own WriteShiftDouble classRoman Lebedev2018-07-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: {F6603964} While there is still some discrepancies within that new group, it is clearly separate from the other shifts. And Agner's tables agree, these double shifts are clearly different from the normal shifts/rotates. I'm guessing `FeatureSlowSHLD` is related. Indeed, a basic sched pair is *not* the /best/ match. But keeping it in the WriteShift is /clearly/ not ideal either. This can and likely will be fine-tuned later. This is purely mechanical change, it does not change any numbers, as the [lack of the change of] mca tests show. Reviewers: craig.topper, RKSimon, andreadb Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49015 llvm-svn: 336515
* [X86][Basically NFC] Sched: split WriteBitScan into WriteBSF/WriteBSR.Roman Lebedev2018-07-081-4/+5
| | | | | | | | | | | | | | | | | | Summary: Motivation: {F6597954} This only does the mechanical splitting, does not actually change any numbers, as the tests added in previous revision show. Reviewers: craig.topper, RKSimon, courbet Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48998 llvm-svn: 336511
* [X86] Add sched class WriteLAHFSAHF and fix values.Clement Courbet2018-06-201-1/+1
| | | | | | | | | | | | | | | Summary: I ran llvm-exegesis on SKX, SKL, BDW, HSW, SNB. Atom is from Agner and SLM is a guess. I've left AMD processors alone. Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48079 llvm-svn: 335097
* [X86] Fix skylake server scheduling info.Clement Courbet2018-06-111-0/+45
| | | | | | | | | | | | | | Summary: This fixes most of the scheduling info for SKX vector operations. I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM. The before/after llvm-exegesis analysis are in the phabricator diff. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47721 llvm-svn: 334407
* [X86] Explicitly mark unsupported classes in scheduling models.Clement Courbet2018-06-111-4/+8
| | | | | | | | | | | | | Summary: In preparation for D47721. HSW and SNB still define unsupported classes as they are used by KNL and generic models respectively. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47763 llvm-svn: 334389
* [X86] Introduce WriteFLDC for x87 constant loads.Clement Courbet2018-05-311-0/+1
| | | | | | | | | | | | | | | | | Summary: {FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI} were using WriteMicrocoded. - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake. - For ZnVer1 and Atom, values were transferred form InstRWs. - For SLM and BtVer2, I've guessed some values :( Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47585 llvm-svn: 333656
* [X86] Extract latency of fldz/fld1 in separate classes.Clement Courbet2018-05-311-0/+2
| | | | | | | | | | | | | | | | | Summary: - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake. - For ZnVer1 and Atom, values were transferred form `InstRW`s. - For SLM and BtVer2, values are from Agner. This is split off from https://reviews.llvm.org/D47377 Reviewers: RKSimon, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47523 llvm-svn: 333642
* [X86][Sched] Add InstRW for CLC on Intel after SNB.Clement Courbet2018-05-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: After SNB, Intel CPUs can rename CF independently of other EFLAGS, so the renamer can zero it for free. Note that STC still consumes resources. To reproduce: `$ llvm-exegesis -mode=uops -opcode-name=CLC` On SNB: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: sandybridge llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.0014, debug_string: SBPort0 } - { key: '4', value: 0.0013, debug_string: SBPort1 } - { key: '5', value: 0.0003, debug_string: SBPort4 } - { key: '6', value: 0.0029, debug_string: SBPort5 } - { key: '10', value: 0.0003, debug_string: SBPort23 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` On HSW: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.001, debug_string: HWPort0 } - { key: '4', value: 0.0009, debug_string: HWPort1 } - { key: '5', value: 0.0004, debug_string: HWPort2 } - { key: '6', value: 0.0006, debug_string: HWPort3 } - { key: '7', value: 0.0002, debug_string: HWPort4 } - { key: '8', value: 0.0012, debug_string: HWPort5 } - { key: '9', value: 0.0022, debug_string: HWPort6 } - { key: '10', value: 0.0001, debug_string: HWPort7 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` Reviewers: craig.topper, RKSimon Subscribers: gchatelet, llvm-commits Differential Revision: https://reviews.llvm.org/D47362 llvm-svn: 333392
* [X86][SNB] Fix differences between vex/non-vex XMM vector moves (PR37286)Simon Pilgrim2018-05-251-9/+1
| | | | | | | | As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models...... llvm-svn: 333271
* [X86] Add GPR<->XMM Schedule TagsSimon Pilgrim2018-05-181-10/+2
| | | | | | | | | | BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737) The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1: SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM) Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner) llvm-svn: 332745
* [X86] Split WriteCMOV + WriteCMOV2 scheduler classesSimon Pilgrim2018-05-171-14/+1
| | | | | | Handle SNB+ targets which treat CMOVA/CMOVBE specially due to partial EFLAGS handling. llvm-svn: 332626
* [X86] Split WriteADC/WriteADCRMW scheduler classesSimon Pilgrim2018-05-171-17/+3
| | | | | | For integer ALU instructions taking eflags as an input (ADC/SBB/ADCX/ADOX) llvm-svn: 332605
* [X86][SNB] Minor scheduler cleanupSimon Pilgrim2018-05-171-7/+3
| | | | | | Merge 2 instregex and explain the VMOVDQArr/MOVDQArr difference llvm-svn: 332591
* [X86][SNB] Remove unnecessary CVT InstRW overridesSimon Pilgrim2018-05-161-82/+24
| | | | llvm-svn: 332536
* [X86] Split WriteCvtI2F/WriteCvtF2I into I<->F32 and I<->F64 scheduler classesSimon Pilgrim2018-05-161-3/+13
| | | | | | A lot of the models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332451
* [X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classesSimon Pilgrim2018-05-151-27/+16
| | | | | | | | BtVer2 - Fixes schedules for (V)CVTPS2PD instructions A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332376
OpenPOWER on IntegriCloud