summaryrefslogtreecommitdiffstats
path: root/llvm/test/tools/llvm-mca
Commit message (Collapse)AuthorAgeFilesLines
...
* [llvm-mca] Make sure not to end the test files with an empty line.Roman Lebedev2018-06-04197-197/+0
| | | | | | | | | | | | | | | | | | | Summary: It's super irritating. [properly configured] git client then complains about that double-newline, and you have to use `--force` to ignore the warning, since even if you fix it manually, it will be reintroduced the very next runtime :/ Reviewers: RKSimon, andreadb, courbet, craig.topper, javed.absar, gbedwell Reviewed By: gbedwell Subscribers: javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47697 llvm-svn: 333887
* [X86] Introduce WriteFLDC for x87 constant loads.Clement Courbet2018-05-317-77/+77
| | | | | | | | | | | | | | | | | Summary: {FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI} were using WriteMicrocoded. - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake. - For ZnVer1 and Atom, values were transferred form InstRWs. - For SLM and BtVer2, I've guessed some values :( Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47585 llvm-svn: 333656
* [X86] Extract latency of fldz/fld1 in separate classes.Clement Courbet2018-05-317-33/+33
| | | | | | | | | | | | | | | | | Summary: - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake. - For ZnVer1 and Atom, values were transferred form `InstRW`s. - For SLM and BtVer2, values are from Agner. This is split off from https://reviews.llvm.org/D47377 Reviewers: RKSimon, andreadb Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47523 llvm-svn: 333642
* [X86][Sched] Add InstRW for CLC on Intel after SNB.Clement Courbet2018-05-299-4/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: After SNB, Intel CPUs can rename CF independently of other EFLAGS, so the renamer can zero it for free. Note that STC still consumes resources. To reproduce: `$ llvm-exegesis -mode=uops -opcode-name=CLC` On SNB: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: sandybridge llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.0014, debug_string: SBPort0 } - { key: '4', value: 0.0013, debug_string: SBPort1 } - { key: '5', value: 0.0003, debug_string: SBPort4 } - { key: '6', value: 0.0029, debug_string: SBPort5 } - { key: '10', value: 0.0003, debug_string: SBPort23 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` On HSW: ``` --- key: opcode_name: CLC mode: uops config: '' cpu_name: haswell llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 10000 measurements: - { key: '3', value: 0.001, debug_string: HWPort0 } - { key: '4', value: 0.0009, debug_string: HWPort1 } - { key: '5', value: 0.0004, debug_string: HWPort2 } - { key: '6', value: 0.0006, debug_string: HWPort3 } - { key: '7', value: 0.0002, debug_string: HWPort4 } - { key: '8', value: 0.0012, debug_string: HWPort5 } - { key: '9', value: 0.0022, debug_string: HWPort6 } - { key: '10', value: 0.0001, debug_string: HWPort7 } error: '' info: 'instruction is serial, repeating a random one. Snippet: CLC ' ... ``` Reviewers: craig.topper, RKSimon Subscribers: gchatelet, llvm-commits Differential Revision: https://reviews.llvm.org/D47362 llvm-svn: 333392
* [X86][SNB] Fix differences between vex/non-vex XMM vector moves (PR37286)Simon Pilgrim2018-05-251-5/+5
| | | | | | | | As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models...... llvm-svn: 333271
* [UpdateTestChecks] Improved update_mca_test_checks block analysisGreg Bedwell2018-05-2424-816/+726
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously update_mca_test_checks worked entirely at "block" level where a block is some sequence of lines delimited by at least one empty line. This generally worked well, but could sometimes lead to excessive repetition of check lines for various prefixes if some block was almost identical between prefixes, but not quite (for example, due to a different dispatch width in the otherwise identical summary views). This new analyis attempts to split blocks further in the case where the following conditions are met: a) There is some prefix common to every RUN line (typically 'ALL'). b) The first line of the block is common to the output with every prefix. c) The block has the same number of lines for the output with every prefix. Also, regenerated all llvm-mca test files with the following command: update_mca_test_checks.py "../test/tools/llvm-mca/*/*.s" "../test/tools/llvm-mca/*/*/*.s" The new analysis showed a "multiple lines not disambiguated by prefixes" warning for test "AArch64/Exynos/scheduler-queue-usage.s" so I've also added some explicit prefixes to each of the RUN lines in that test. Differential Revision: https://reviews.llvm.org/D47321 llvm-svn: 333204
* [llvm-mca] Print the "Block RThroughput" in the SummaryView.Andrea Di Biagio2018-05-2320-100/+120
| | | | | | | | | | | | | | | | | | | This patch implements the "block reciprocal throughput" computation in the SummaryView. The block reciprocal throughput is computed as the MAX of: - NumMicroOps / DispatchWidth - Resource Cycles / #Units (for every resource consumed). The block throughput is bounded from above by the hardware dispatch throughput. That is because the DispatchWidth is an upper bound on how many opcodes can be part of a single dispatch group. The block throughput is also limited by the amount of hardware parallelism. The number of available resource units affects how the resource pressure is distributed, and also how many blocks can be delivered every cycle. llvm-svn: 333095
* [llvm-mca] Removed an empty line generated by the timeline view. NFC.Andrea Di Biagio2018-05-2127-165/+637
| | | | | | Also, regenerate all tests. llvm-svn: 332853
* [X86][BtVer2] Add a 'J' prefix to the PRF/RCU defs. NFCAndrea Di Biagio2018-05-219-18/+18
| | | | | | | This is to keep the Jaguar model's naming convention. Processor resources all have a 'J' prefix in the BtVer2 scheduling model. llvm-svn: 332851
* [X86] Add GPR<->XMM Schedule TagsSimon Pilgrim2018-05-185-41/+41
| | | | | | | | | | BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737) The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1: SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM) Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner) llvm-svn: 332745
* [X86][BtVer2] Improve simulation of (V)PINSR valuesSimon Pilgrim2018-05-183-16/+16
| | | | | | Include the 6cy delay transferring from the GPR to FPU. llvm-svn: 332737
* [X86][BtVer2] Partial vector stores (inc MMX) have a 2cy latencySimon Pilgrim2018-05-184-18/+18
| | | | llvm-svn: 332722
* [X86][SSE] Ensure vector partial load/stores use the ↵Simon Pilgrim2018-05-185-18/+18
| | | | | | | | | | WriteVecLoad/WriteVecStore scheduler classes Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332718
* [X86][SSE] Ensure float load/stores use the WriteFLoad/WriteFStore scheduler ↵Simon Pilgrim2018-05-185-27/+27
| | | | | | | | | | classes Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332714
* [llvm-mca][X86] Add CMOV test filesSimon Pilgrim2018-05-179-0/+2928
| | | | llvm-svn: 332622
* [X86][BtVer2] ADC/SBB take 2cy on an ALU pipe, not 1cy like ADD/SUBSimon Pilgrim2018-05-171-91/+91
| | | | llvm-svn: 332616
* [llvm-mca] add flag -all-views and flag -all-stats.Andrea Di Biagio2018-05-174-0/+442
| | | | | | | Flag -all-views enables all the views. Flag -all-stats enables all the views that print hardware statistics. llvm-svn: 332602
* [llvm-mca][X86] Add ADX test filesSimon Pilgrim2018-05-174-0/+234
| | | | llvm-svn: 332595
* [X86] Fix typo in instregex for CVTSI642SDrrSimon Pilgrim2018-05-162-4/+4
| | | | llvm-svn: 332510
* [llvm-mca] Regenerate tests after r332381 and r332361. NFCAndrea Di Biagio2018-05-16157-42121/+42121
| | | | llvm-svn: 332447
* [X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classesSimon Pilgrim2018-05-152-9/+9
| | | | | | | | BtVer2 - Fixes schedules for (V)CVTPS2PD instructions A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332376
* [X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classesSimon Pilgrim2018-05-152-4/+4
| | | | | | | | | Btver2 - VCVTPH2PSYrm needs to double pump the AGU Broadwell - missing VCVTPS2PH*mr stores extra latency Allows us to remove the WriteCvtF2FSt conversion store class llvm-svn: 332357
* [llvm-mca][x86] Add F16C instruction testsSimon Pilgrim2018-05-155-0/+302
| | | | llvm-svn: 332347
* [llvm-mca][X86] Add missing SSE4A test fileSimon Pilgrim2018-05-141-0/+55
| | | | llvm-svn: 332270
* [X86][BtVer2] Fix MMX/YMM integer vector nt store schedulesSimon Pilgrim2018-05-142-2/+2
| | | | | | MMX was missing and YMM was tagged as a fp nt store llvm-svn: 332269
* [llvm-mca][x86] Add scalar nt-store instruction testsSimon Pilgrim2018-05-149-9/+72
| | | | llvm-svn: 332262
* [llvm-mca][x86] Add and/not/or/xor instruction testsSimon Pilgrim2018-05-149-9/+2772
| | | | llvm-svn: 332257
* [X86][BtVer2] Model ymm move as double pumped instructionsSimon Pilgrim2018-05-111-13/+13
| | | | | | We still need to handle mmx/xmm moves as 'decode-only' no-pipe instructions llvm-svn: 332109
* [X86][MMX] Tag MMX Move/Load/Store as WriteVec schedule classesSimon Pilgrim2018-05-114-9/+9
| | | | | | Fixes an issue on SLM/Btver2 where we had instructions were being treated as scalar loads/stores llvm-svn: 332104
* [X86][SLM] Vector stores only use the MEC port.Simon Pilgrim2018-05-112-12/+12
| | | | | | | | Confirmed by both Agner and Intel's AOM - the IEC/FPC are not required for pure load/stores (even if its a partial update). Can't fix WriteStore until all RMW instructions are cleaned up though.... llvm-svn: 332096
* [X86] Split WriteF/WriteVec Move/Load/Store scheduler classes by vector widthSimon Pilgrim2018-05-111-2/+2
| | | | | | Fixes a SNB issue that was missing vlddqu/vmovntdqa ymm instructions llvm-svn: 332094
* [X86][SNB] Fix typo in PEXTRDmr instregex, was missing VPEXTRDmr.Simon Pilgrim2018-05-101-3/+3
| | | | llvm-svn: 332002
* [X86] Cleanup WriteFStore/WriteVecStore schedulesSimon Pilgrim2018-05-098-14/+14
| | | | | | | | MOVNTPD/MOVNTPS should be WriteFStore Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction.... llvm-svn: 331864
* [X86] Split off WriteIMul64 from WriteIMul schedule class (PR36931)Simon Pilgrim2018-05-081-13/+13
| | | | | | | This fixes a couple of BtVer2 missing instructions that weren't been handled in the override. NOTE: There are still a lot of overrides that still need cleaning up! llvm-svn: 331770
* [llvm][x86] SandyBridge/IvyBridge don't support BMI1/BMI2Simon Pilgrim2018-05-082-256/+0
| | | | llvm-svn: 331769
* [X86] Split WriteIDiv into div/idiv 8/16/32/64 implementations (PR36930)Simon Pilgrim2018-05-081-33/+33
| | | | | | | I've created the necessary classes but there are still a lot of overrides that need cleaning up. NOTE: The Znver1 model was missing some div/idiv variants in the instregex patterns and wasn't setting the resource cycles at all in the overrides. llvm-svn: 331767
* [llvm-mca][x86] Add div/idiv, mul/imul and inc/dec/neg/nop instruction testsSimon Pilgrim2018-05-089-9/+2295
| | | | llvm-svn: 331765
* [llvm-mca][x86] Remove addsubpd from SSE2 testsSimon Pilgrim2018-05-079-72/+9
| | | | llvm-svn: 331678
* [X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classesSimon Pilgrim2018-05-072-6/+6
| | | | | | | | Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions. Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour. llvm-svn: 331672
* [X86][AVX2] Tag VPMOVSX/VPMOVZX ymm instructions as WriteShuffle256Simon Pilgrim2018-05-071-37/+37
| | | | | | | | These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents. Differential Revision: https://reviews.llvm.org/D46229 llvm-svn: 331659
* [X86][Znver1] Remove WriteFMul/WriteFRcp InstRW overrides/aliases.Simon Pilgrim2018-05-071-17/+17
| | | | | | Fixes x87 schedules to more closely match Agner - AMD doesn't tend to "special case" x87 instructions as much as Intel. llvm-svn: 331645
* [X86] Split WriteFDiv schedule classes to support single/double scalar, XMM ↵Simon Pilgrim2018-05-072-50/+50
| | | | | | | | and YMM/ZMM instructions. This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values. llvm-svn: 331643
* [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classesSimon Pilgrim2018-05-0711-63/+63
| | | | | | | | | | | | | WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions. WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions. This removes all InstrRW overrides for these instructions. NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner. NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80. llvm-svn: 331629
* [X86] Add WriteEMMS scheduler classSimon Pilgrim2018-05-044-12/+12
| | | | | | Filled in the missing values from Btver2 SoG or Agner llvm-svn: 331546
* [X86] Add SchedWriteFRnd fp rounding scheduler classesSimon Pilgrim2018-05-042-18/+18
| | | | | | | | Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions. Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA. llvm-svn: 331515
* [X86][Znver1] Use SchedAlias to tag microcoded scheduler classesSimon Pilgrim2018-05-031-3/+3
| | | | | | | | Avoids extra entries in the class tables. Found a typo that missed the MMX_PHSUBSW instruction. llvm-svn: 331488
* [X86][SNB] Fix scheduling of MMX integer multiply instructions.Simon Pilgrim2018-05-024-8/+8
| | | | | | The entries were being bound to the wrong class. llvm-svn: 331388
* [X86] Fix scheduling info for (V?)SQRTPDm on silvermont.Clement Courbet2018-05-021-2/+2
| | | | | | https://reviews.llvm.org/D46356 llvm-svn: 331356
* [llvm-mca] Correctly handle zero-latency stores that consume pipeline resources.Andrea Di Biagio2018-04-301-0/+44
| | | | | | | | | | | | | | | | This fixes PR37293. We can have scheduling classes with no write latency entries, that still consume processor resources. We don't want to treat those instructions as zero-latency instructions; they still have to be issued to the underlying pipelines, so they still consume resource cycles. This is likely to be a regression which I have accidentally introduced at revision 330807. Now, if an instruction has a non-empty set of write processor resources, we conservatively treat it as a normal (i.e. non zero-latency) instruction. llvm-svn: 331193
* [llvm-mca] Regenerate test Atom/resources-sse3.s. NFCAndrea Di Biagio2018-04-301-47/+41
| | | | | | Before this change, it wrongly specified -mcpu=slm instead of -mcpu=atom. llvm-svn: 331170
OpenPOWER on IntegriCloud