summaryrefslogtreecommitdiffstats
path: root/llvm/test/tools/llvm-mca
Commit message (Collapse)AuthorAgeFilesLines
...
* [LLVM-MCA][X86] Add missing VCMPESTR/VCMPESTR testsSimon Pilgrim2018-09-308-7/+231
| | | | llvm-svn: 343421
* [LLVM-MCA][X86] Add some AVX512 testsSimon Pilgrim2018-09-303-1/+899
| | | | | | These are going to be necessary to check I don't mess up when I start cleaning up all the remaining vector integer overrides llvm-svn: 343414
* [X86][Btver2] Fix PCmpIStrI/PCmpIStrM schedulesSimon Pilgrim2018-09-301-5/+5
| | | | | | | | Missing JFPU0 pipe and double JFPU1 pipe (to match JVALU1) resources Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343413
* [llvm-mca] Add a test for zero-idiom VPERM2F128rr. NFCAndrea Di Biagio2018-09-281-0/+75
| | | | | | | | | | | We don't correctly model the latency and resource usage information for zero-idiom VPERM2F128rr on Jaguar. This is demonstrated by the incorrect numbers in the resource pressure view, and the timeline view. A follow up patch will fix this problem. llvm-svn: 343346
* [utils] Stricter checking from update_mca_test_checks.pyGreg Bedwell2018-09-282-28/+28
| | | | | | | | | | | | | | | | If any prefixes have been specified on the RUN lines that do not end up ever actually getting printed, raise an Error. This is either an indication that the run lines just need cleaning up, or that something is more fundamentally wrong with the test. Also raise an Error if there are any blocks which cannot be checked because they are not uniquely covered by a prefix. Fixed up a couple of tests where the extra checking flagged up issues. Differential Revision: https://reviews.llvm.org/D48276 llvm-svn: 343332
* [utils] Allow better identification of matching blocks in ↵Greg Bedwell2018-09-284-151/+127
| | | | | | | | | | | | | | update_mca_test_checks.py Insert empty blocks to cause the positions of matching blocks to match across lists where possible so that later stages of the algorithm can actually identify them as being identical. Regenerated all tests with this change. Differential Revision: https://reviews.llvm.org/D52560 llvm-svn: 343331
* [X86][Btver2] PSUBS/PSUBUS instructions are zero-idiomsSimon Pilgrim2018-09-281-148/+148
| | | | | | Noticed during llvm-exegesis tests, the PSUBS/PSUBUS instructions have the same zero-idiom behaviour to PSUB llvm-svn: 343321
* [X86][Btver2] Add zero-idiom tests for PSUBS/PSUBUS instructionsSimon Pilgrim2018-09-281-88/+172
| | | | | | Noticed during llvm-exegesis tests, the PSUBS/PSUBUS instructions have the same zero-idiom behaviour to PSUB llvm-svn: 343319
* [X86][Btver2] CVTSS2I/CVTSD2I - add missing JFPU0 pipeSimon Pilgrim2018-09-283-35/+35
| | | | | | | | We issue JFPU1->JSTC then JFPU0->JFPA then -> JALU0 (integer pipe) Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343314
* [X86][Btver2] Fix BSF/BSR scheduleSimon Pilgrim2018-09-282-43/+43
| | | | | | | | Double throughput to account for 2 pipes + fix BSF's latency/uop counts Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343311
* [X86][BtVer2] Fix PHMINPOS schedule resources typoSimon Pilgrim2018-09-282-8/+8
| | | | | | PHMINPOS can run on either JFPU pipe llvm-svn: 343299
* [X86][Btver2] (V)MPSADBW instructions take 3uops not 1Simon Pilgrim2018-09-272-4/+4
| | | | llvm-svn: 343238
* [X86][Btver2] BTC/BTR/BTS instructions take 2uops not 1Simon Pilgrim2018-09-271-18/+18
| | | | llvm-svn: 343234
* [X86][Btver2] BLSI/BLSMSK/BLSR instructions take 2uops not 1 (same as TZCNT)Simon Pilgrim2018-09-271-12/+12
| | | | llvm-svn: 343227
* [X86][Btver2] TZCNT instructions take 2uops not 1Simon Pilgrim2018-09-271-4/+4
| | | | llvm-svn: 343200
* Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overridesSimon Pilgrim2018-09-251-8/+8
| | | | | | | | As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead. The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342969
* [X86] Remove shift/rotate by CL memory (RMW) overridesSimon Pilgrim2018-09-241-8/+8
| | | | | | The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342916
* [X86] Split WriteIMul into 8/16/32/64 implementations (PR36931)Simon Pilgrim2018-09-245-25/+25
| | | | | | | | Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases. This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions. llvm-svn: 342892
* [X86] ROR*mCL instruction models should match ROL*mCL etc.Simon Pilgrim2018-09-234-36/+36
| | | | | | | | Confirmed with Craig Topper - fix a typo that was missing a Port4 uop for ROR*mCL instructions on some Intel models. Yet another step on the scheduler model cleanup marathon...... llvm-svn: 342846
* [X86] Added missing RCL/RCR schedule overrides to the generic SNB modelSimon Pilgrim2018-09-232-194/+194
| | | | | | | | | | | | The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults. I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well. This is necessary to allow WriteRotate to be updated to remove other rotate overrides. It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions. llvm-svn: 342842
* [X86][Sched] Add zero idiom sched data to the SNB model.Clement Courbet2018-09-211-0/+348
| | | | | | | | | | | | | | | | | | | | Summary: On SNB, renamer-based zeroing does not work for: - 16 and 8-bit GPRs[1]. - MMX [2]. - ANDN variants [3] [1] echo 'sub %ax, %ax' | /tmp/llvm-exegesis -mode=uops -snippets-file=- [2] echo 'pxor %mm0, %mm0' | /tmp/llvm-exegesis -mode=uops -snippets-file=- [3] echo 'andnps %xmm0, %xmm0' | /tmp/llvm-exegesis -mode=uops -snippets-file=- Reviewers: RKSimon, andreadb Subscribers: gbedwell, craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52358 llvm-svn: 342736
* [X86][BtVer2] Fix latency and resource cycles of AVX 256-bit zero-idioms.Andrea Di Biagio2018-09-211-42/+42
| | | | | | | | | | | | | | | | This patch introduces a SchedWriteVariant to describe zero-idiom VXORP(S|D)Yrr and VANDNP(S|D)Yrr. This is a follow-up of r342555. On Jaguar, a VXORPSYrr is 2 macro opcodes. Only one opcode is eliminated at register-renaming stage. The other opcode has to be executed to set the upper half of the destination YMM. Same for VANDNP(S|D)Yrr. Differential Revision: https://reviews.llvm.org/D52347 llvm-svn: 342728
* [llvm-mca][BtVer2] Modify ANDN tests in zero-idioms-avx-256.s. NFCAndrea Di Biagio2018-09-201-42/+42
| | | | | | | Two test cases should have tested 256-bit variants of VANDN zero-idioms instead of the 128-bit variants. llvm-svn: 342655
* [TableGen][SubtargetEmitter] Add the ability for processor models to ↵Andrea Di Biagio2018-09-191-0/+322
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | describe dependency breaking instructions. This patch adds the ability for processor models to describe dependency breaking instructions. Different processors may specify a different set of dependency-breaking instructions. That means, we cannot assume that all processors of the same target would use the same rules to classify dependency breaking instructions. The main goal of this patch is to provide the means to describe dependency breaking instructions directly via tablegen, and have the following TargetSubtargetInfo hooks redefined in overrides by tabegen'd XXXGenSubtargetInfo classes (here, XXX is a Target name). ``` virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const { return false; } virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const { return isZeroIdiom(MI); } ``` An instruction MI is a dependency-breaking instruction if a call to method isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to true. Similarly, an instruction MI is a special case of zero-idiom dependency breaking instruction if a call to STI.isZeroIdiom(MI) returns true. The extra APInt is used for those targets that may want to select which machine operands have their dependency broken (see comments in code). Note that by default, subtargets don't know about the existence of dependency-breaking. In the absence of external information, those method calls would always return false. A new tablegen class named STIPredicate has been added by this patch to let processor models classify instructions that have properties in common. The idea is that, a MCInstrPredicate definition can be used to "generate" an instruction equivalence class, with the idea that instructions of a same class all have a property in common. STIPredicate definitions are essentially a collection of instruction equivalence classes. Also, different processor models can specify a different variant of the same STIPredicate with different rules (i.e. predicates) to classify instructions. Tablegen backends (in this particular case, the SubtargetEmitter) will be able to process STIPredicate definitions, and automatically generate functions in XXXGenSubtargetInfo. This patch introduces two special kind of STIPredicate classes named IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a definition for those in the BtVer2 scheduling model only. This patch supersedes the one committed at r338372 (phabricator review: D49310). The main advantages are: - We can describe subtarget predicates via tablegen using STIPredicates. - We can describe zero-idioms / dep-breaking instructions directly via tablegen in the scheduling models. In future, the STIPredicates framework can be used for solving other problems. Examples of future developments are: - Teach how to identify optimizable register-register moves - Teach how to identify slow LEA instructions (each subtarget defining its own concept of "slow" LEA). - Teach how to identify instructions that have undocumented false dependencies on the output registers on some processors only. It is also (in my opinion) an elegant way to expose knowledge to both external tools like llvm-mca, and codegen passes. For example, machine schedulers in LLVM could reuse that information when internally constructing the data dependency graph for a code region. This new design feature is also an "opt-in" feature. Processor models don't have to use the new STIPredicates. It has all been designed to be as unintrusive as possible. Differential Revision: https://reviews.llvm.org/D52174 llvm-svn: 342555
* [X86][BMI1] Fix BLSI/BLSMSK/BLSR BMI1 scheduling on btver2Simon Pilgrim2018-09-141-25/+25
| | | | | | These have the same behaviour as tzcnt on btver2 - confirmed with AMD 16h SOG, Agner and instlatx64. llvm-svn: 342235
* [X86] Remove wrong ReadAdvance from multiclass sse_fp_unop_s.Andrea Di Biagio2018-09-031-87/+89
| | | | | | | | | | | | | | | | | | | | | | A ReadAdvance was incorrectly added to the SchedReadWrite list associated with the following SSE instructions: sqrtss sqrtsd rsqrtss rcpss As a consequence, a wrong operand latency was computed for the register operand used as the base address of the folded load operand. This patch removes the wrong ReadAdvance, and updates the llvm-mca test cases. There is still a problem with correctly modeling partial register writes on XMM registers This other problem is currently tracked here: https://bugs.llvm.org/show_bug.cgi?id=38813 Differential Revision: https://reviews.llvm.org/D51542 llvm-svn: 341326
* [X86][BtVer2] Remove wrong ReadAdvance from AVX vbroadcast(ss|sd|f128) ↵Andrea Di Biagio2018-08-311-10/+10
| | | | | | | | | | | | | | | | | | | instructions. The presence of a ReadAdvance for input operand #0 is problematic because it changes the input latency of the register used as the base address for the folded load. A broadcast cannot start executing if the load address hasn't been computed yet. In the llvm-mca example, the VBROADCASTSS is dependent on the address generated by the LEAQ. That means, it cannot start until LEAQ reaches the write-back stage. If we apply ReadAdvance, then we wrongly assume that the load can start 3 cycles in advance. Differential Revision: https://reviews.llvm.org/D51534 llvm-svn: 341222
* [X86] Add llvm-mca tests that show how operand latency is wrongly computed ↵Andrea Di Biagio2018-08-311-0/+206
| | | | | | | | | | | for SSE sqrtss/sd and rcpss. According to the timeline view, sqrtss/sd/rcpss start executing before the load address for the memory operand is available. This problem is caused by the presence of a ReadAfterLd (a ReadAdvance). Those unary operations should not specify a ReadAdvance at all. llvm-svn: 341213
* [X86][BtVer2] Add an llvm-mca test that shows how the read latency of AVX ↵Andrea Di Biagio2018-08-311-0/+73
| | | | | | broadcastss on ymm registers is incorrectly set. llvm-svn: 341197
* [X86][BtVer2] Fix WriteFShuffle256 schedule write info.Andrea Di Biagio2018-08-311-11/+11
| | | | | | | | | | | | | | | This patch fixes the number of micro opcodes, and processor resource cycles for the following AVX instructions: vinsertf128rr/rm vperm2f128rr/rm vbroadcastf128 Tests have been regenerated using the usual scripts in the llvm/utils directory. Differential Revision: https://reviews.llvm.org/D51492 llvm-svn: 341185
* [llvm-mca] Report the number of dispatched micro opcodes in the ↵Andrea Di Biagio2018-08-3010-20/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DispatchStatistics view. This patch introduces the following changes to the DispatchStatistics view: * DispatchStatistics now reports the number of dispatched opcodes instead of the number of dispatched instructions. * The "Dynamic Dispatch Stall Cycles" table now also reports the percentage of stall cycles against the total simulated cycles. This change allows users to easily compare dispatch group sizes with the processor DispatchWidth. Before this change, it was difficult to correlate the two numbers, since DispatchStatistics view reported numbers of instructions (instead of opcodes). DispatchWidth defines the maximum size of a dispatch group in terms of number of micro opcodes. The other change introduced by this patch is related to how DispatchStage generates "instruction dispatch" events. In particular: * There can be multiple dispatch events associated with a same instruction * Each dispatch event now encapsulates the number of dispatched micro opcodes. The number of micro opcodes declared by an instruction may exceed the processor DispatchWidth. Therefore, we cannot assume that instructions are always fully dispatched in a single cycle. DispatchStage knows already how to handle instructions declaring a number of opcodes bigger that DispatchWidth. However, DispatchStage always emitted a single instruction dispatch event (during the first simulated dispatch cycle) for instructions dispatched. With this patch, DispatchStage now correctly notifies multiple dispatch events for instructions that cannot be dispatched in a single cycle. A few views had to be modified. Views can no longer assume that there can only be one dispatch event per instruction. Tests (and docs) have been updated. Differential Revision: https://reviews.llvm.org/D51430 llvm-svn: 341055
* [X86] Improved sched model for X86 CMPXCHG* instructions.Andrew V. Tischenko2018-08-302-18/+20
| | | | | | Differential Revision: https://reviews.llvm.org/D50070 llvm-svn: 341024
* [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report ↵Andrea Di Biagio2018-08-2976-176/+514
| | | | | | | | | | | | | | | | | | | | | | | | | | | generated by the SummaryView. This patch adds two new fields to the perf report generated by the SummaryView. Fields are now logically organized into two small groups; only the second group contains throughput indicators. Example: ``` Iterations: 100 Instructions: 300 Total Cycles: 414 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 1.69 IPC: 0.72 Block RThroughput: 4.0 ``` This patch also updates the docs for llvm-mca. Due to the nature of this change, several tests in the tools/llvm-mca directory were affected, and had to be updated using script `update_mca_test_checks.py`. llvm-svn: 340946
* [llvm-mca] Don't disable the SummaryView if flag `-all-stats` is false.Andrea Di Biagio2018-08-292-6/+78
| | | | llvm-svn: 340945
* [llvm-mca][TimelineView] Force the same number of executions for every entry ↵Andrea Di Biagio2018-08-283-19/+19
| | | | | | | | | | | | | | | | | | | in the 'wait-times' table. This patch also uses colors to highlight problematic wait-time entries. A problematic entry is an entry with an high wait time that tends to match (or exceed) the size of the scheduler's buffer. Color RED is used if an instruction had to wait an average number of cycles which is bigger than (or equal to) the size of the underlying scheduler's buffer. Color YELLOW is used if the time (in cycles) spend waiting for the operands or pipeline resources is bigger than half the size of the underlying scheduler's buffer. Color MAGENTA is used if an instruction does not consume buffer resources according to the scheduling model. llvm-svn: 340825
* [llvm-mca] Improved report generated by the SchedulerStatistics view.Andrea Di Biagio2018-08-276-35/+119
| | | | | | | | | | | | | Before this patch, the SchedulerStatistics only printed the maximum number of buffer entries consumed in each scheduler's queue at a given point of the simulation. This patch restructures the reported table, and adds an extra field named "Average number of used buffer entries" to it. This patch also uses different colors to help identifying bottlenecks caused by high scheduler's buffer pressure. llvm-svn: 340746
* [llvm-mca] Fix PR38575: Avoid an invalid implicit truncation of a processor ↵Andrea Di Biagio2018-08-151-0/+82
| | | | | | | | | | | | | | | | | resource mask (an uint64_t value) to unsigned. This patch fixes a regression introduced at revision 338702. A processor resource mask was incorrectly implicitly truncated to an unsigned quantity. Later on, the truncated mask was used to initialize an element of a vector of processor resource descriptors. On targets with more than 32 processor resources, some elements of the vector are left uninitialized. As a consequence, this bug might have eventually caused a crash due to null dereference in the Scheduler. This patch fixes PR38575, and adds a test for it. llvm-svn: 339768
* [X86] MCA tests for XCHG*, XADD* and CMPXCHG* instructionsAndrew V. Tischenko2018-08-0710-10/+940
| | | | | | Differential Revision: https://reviews.llvm.org/D49912 llvm-svn: 339145
* [llvm-mca][x86] Add CMPXCHG instruction resource testsSimon Pilgrim2018-08-0110-0/+372
| | | | | | I've put CMPXCHG8B/CMPXCHG16B in the same file, even though technically they are under separate CPUID bits all targets seem to support both (or neither). llvm-svn: 338595
* [llvm-mca][x86] Add PREFETCHW instruction resource testsSimon Pilgrim2018-08-017-0/+268
| | | | | | These aren't just available via 3DNow! so test for them separately as well. llvm-svn: 338584
* [llvm-mca][x86] Add PCLMUL instruction resource testsSimon Pilgrim2018-08-019-0/+300
| | | | | | Renamed the btver2 file that already contained them - the other targets were only testing the AVX versions llvm-svn: 338583
* [llvm-mca] Correctly update the rank in `Scheduler::select()`.Andrea Di Biagio2018-08-011-0/+112
| | | | | | Found by inspection. llvm-svn: 338579
* [llvm-mca][x86] Add SET/TEST instruction resource testsSimon Pilgrim2018-08-0110-10/+1800
| | | | llvm-svn: 338576
* [llvm-mca][x86] Add LEA instruction resource testsSimon Pilgrim2018-08-019-0/+3939
| | | | | | We already added these to btver2, now add them to other targets, even though none of their models treat them specially (yet). llvm-svn: 338565
* [llvm-mca][x86] Add more x86-64 system instruction resource testsSimon Pilgrim2018-08-0110-9/+919
| | | | | | CPUID, IN/OUT, INS/OUTS, INT, PAUSE, SCAS, UD2, XLAT llvm-svn: 338563
* [llvm-mca][x86] Add CLFLUSHOPT instruction resource testsSimon Pilgrim2018-08-014-0/+140
| | | | llvm-svn: 338550
* [llvm-mca][x86] Add CMPS/LODS/MOVS/STOS string instruction resource testsSimon Pilgrim2018-08-0110-9/+529
| | | | llvm-svn: 338532
* [llvm-mca][x86] Add STC + STD instruction resource testsSimon Pilgrim2018-08-0110-10/+80
| | | | llvm-svn: 338514
* [llvm-mca][x86] Add 32-bit instruction resource testsSimon Pilgrim2018-07-3110-0/+792
| | | | | | These aren't exhaustive, but cover some instructions that are only available in 32-bit mode (where would we be without good BCD math performance?). llvm-svn: 338404
* [llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms.Andrea Di Biagio2018-07-314-104/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch teaches llvm-mca how to identify dependency breaking instructions on btver2. An example of dependency breaking instructions is the zero-idiom XOR (example: `XOR %eax, %eax`), which always generates zero regardless of the actual value of the input register operands. Dependency breaking instructions don't have to wait on their input register operands before executing. This is because the computation is not dependent on the inputs. Not all dependency breaking idioms are also zero-latency instructions. For example, `CMPEQ %xmm1, %xmm1` is independent on the value of XMM1, and it generates a vector of all-ones. That instruction is not eliminated at register renaming stage, and its opcode is issued to a pipeline for execution. So, the latency is not zero. This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis interface. That method takes as input an instruction (i.e. MCInst) and a MCSubtargetInfo. The default implementation of isDependencyBreaking() conservatively returns false for all instructions. Targets may override the default behavior for specific CPUs, and return a value which better matches the subtarget behavior. In future, we should teach to Tablegen how to automatically generate the body of isDependencyBreaking from scheduling predicate definitions. This would allow us to expose the knowledge about dependency breaking instructions to the machine schedulers (and, potentially, other codegen passes). Differential Revision: https://reviews.llvm.org/D49310 llvm-svn: 338372
OpenPOWER on IntegriCloud