summaryrefslogtreecommitdiffstats
path: root/llvm/test/tools/llvm-mca
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add tests to verify the presence of "ReadAfterLd" after r328823.Andrea Di Biagio2018-03-302-0/+53
| | | | | | | This change adds a couple of tests to verify the change introduced by revision 328823 ([X86] Correct the placement of ReadAfterLd in BEXTR and BZHI). llvm-svn: 328859
* [llvm-mca] Correctly set the ReadAdvance information for register use operands.Andrea Di Biagio2018-03-291-0/+46
| | | | | | | | | | | | The tool was passing the wrong operand index to method MCSubtargetInfo::getReadAdvanceCycles(). That method requires a "UseIdx", and not the operand index. This was found when testing X86 code where instructions had a memory folded operand. This patch fixes the issue and adds test read-advance-1.s to ensure that the ReadAfterLd (a ReadAdvance of 3cy) information is correctly used. llvm-svn: 328790
* [X86][BtVer2] Fix the number of micro opcodes for AES[ENC|DEC] and other YMM ↵Andrea Di Biagio2018-03-281-22/+22
| | | | | | | | | | | instructions. Similar to r328694. The number of micro opcodes should be 2 for those instructions. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328698
* [X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions.Andrea Di Biagio2018-03-282-14/+709
| | | | | | | | | | | | | The Jaguar backend natively supports 128-bit data types. Operations on YMM registers are split into two COPs (complex operations). Each COP consumes a slot in the dispatch group, and in the reorder buffer. The scheduling model for Jaguar should mark those instructions as `let NumMicroOps = 2`. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328694
* [llvm-mca] pass the correct set of used registers in checkRAT.Andrea Di Biagio2018-03-271-0/+33
| | | | | | | | | We were incorrectly initializing the array of used registers in method checkRAT. As a consequence, the number of register file stalls was misreported. Added a test to cover this case. llvm-svn: 328629
* [X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costsSimon Pilgrim2018-03-263-16/+16
| | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573
* [X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costsSimon Pilgrim2018-03-261-4/+4
| | | | | | We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551
* [X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costsSimon Pilgrim2018-03-262-8/+8
| | | | llvm-svn: 328541
* [X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of ↵Simon Pilgrim2018-03-265-30/+30
| | | | | | JALU0 for GPR PRF write) llvm-svn: 328536
* [X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costsSimon Pilgrim2018-03-263-36/+45
| | | | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505
* [X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costsSimon Pilgrim2018-03-261-12/+12
| | | | | | These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501
* [llvm-mca] Fix how views are added to the InstructionTables.Andrea Di Biagio2018-03-261-0/+103
| | | | | | | This should fix the stack-use-after-scope reported by the asan buildbots after revision 328493. llvm-svn: 328499
* [X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costsSimon Pilgrim2018-03-262-8/+8
| | | | | | The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497
* [llvm-mca] Add a flag -instruction-info to enable/disable the instruction ↵Andrea Di Biagio2018-03-261-0/+23
| | | | | | info view. llvm-svn: 328493
* [X86][Btver2] Double the AGU and schedule pipe resources for YMMSimon Pilgrim2018-03-262-105/+105
| | | | | | Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491
* [llvm-mca] Add flag -instruction-tables to print the theoretical resource ↵Andrea Di Biagio2018-03-2610-706/+707
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pressure distribution for instructions (PR36874) The goal of this patch is to address most of PR36874. To fully fix PR36874 we need to split the "InstructionInfo" view from the "SummaryView". That would make easy to check the latency and rthroughput as well. The patch reuses all the logic from ResourcePressureView to print out the "instruction tables". We have an entry for every instruction in the input sequence. Each entry reports the theoretical resource pressure distribution. Resource pressure is uniformly distributed across all the processor resource units of a group. At the moment, the backend pipeline is not configurable, so the only way to fix this is by creating a different driver that simply sends instruction events to the resource pressure view. That means, we don't use the Backend interface. Instead, it is simpler to just have a different code-path for when flag -instruction-tables is specified. Once Clement addresses bug 36663, then we can port the "instruction tables" logic into a stage of our configurable pipeline. Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag -instruction-tables to each modified test. Differential Revision: https://reviews.llvm.org/D44839 llvm-svn: 328487
* [X86][Btver2] Cleanup TEST instructions to use JFPA (+JFPX on ymms) function ↵Simon Pilgrim2018-03-232-35/+35
| | | | | | unit llvm-svn: 328343
* [X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unitSimon Pilgrim2018-03-233-298/+298
| | | | | | Add missing non-VEX and (V)PMOVMSKB instructions to the pattern llvm-svn: 328338
* [X86][Btver2] Vector permutes use a JFPU01 scheduler pipe and JFPX/JVALU ↵Simon Pilgrim2018-03-232-303/+303
| | | | | | function unit llvm-svn: 328331
* [X86][Btver2] Vector store instructions use a JFPU1 scheduler pipe and ↵Simon Pilgrim2018-03-235-415/+415
| | | | | | JSAGU/JSTC function units llvm-svn: 328328
* [X86][Btver2] Cleanup DPPS/DPPD instructions to use JFPA/JFPM function unitsSimon Pilgrim2018-03-232-81/+81
| | | | llvm-svn: 328324
* [X86][Btver2] Fix MicroOps counts for DPPS/YMM memory folded instructionsSimon Pilgrim2018-03-232-56/+56
| | | | | | This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops. llvm-svn: 328320
* [X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to ↵Simon Pilgrim2018-03-231-9/+9
| | | | | | | | correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units Fixes throughput to match Agner/Fam16h-SoG as well. llvm-svn: 328318
* [X86][Btver2] Vector move/load/store instructions use a JFPU01 scheduler ↵Simon Pilgrim2018-03-237-520/+520
| | | | | | pipe and JFPX/JVALU function unit as well as the AGUs llvm-svn: 328304
* [X86] Match vpblendvb/vblendvps/vblendvpd itineraries to the SSE equivalent. ↵Craig Topper2018-03-231-66/+66
| | | | | | Change pblendvb/blendvps/blendvpd to use WriteFVarBlend llvm-svn: 328294
* [X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version.Craig Topper2018-03-231-55/+55
| | | | llvm-svn: 328293
* [X86][Btver2] Conversion, MaskedLoad/MaskedStore and NTStores all are ↵Simon Pilgrim2018-03-224-269/+269
| | | | | | scheduled through the JFPU1 pipe llvm-svn: 328226
* [X86][Btver2] FCMP (inc FMAX/FMIN) instructions use the JFPA functional pipeSimon Pilgrim2018-03-223-77/+77
| | | | | | The ymm instructions are double pumped as well. llvm-svn: 328222
* [X86][Btver2] FMUL ymm instructions are double pumped on the JFPM functional ↵Simon Pilgrim2018-03-222-255/+255
| | | | | | pipe llvm-svn: 328217
* [llvm-mca] Move the logic that computes the register file usage to the ↵Andrea Di Biagio2018-03-212-0/+58
| | | | | | | | | | | | | | | | | | | | | | | BackendStatistics view. With this patch, the "instruction dispatched" event now provides information related to the number of microarchitectural registers used in each register file. Similarly, the "instruction retired" event is now able to tell how may registers are freed in each register file. Currently, the BackendStatistics view is the only consumer of register usage/pressure information. BackendStatistics uses that info to print out a few general statistics (i.e. max number of mappings used; total mapping created). Before this patch, the BackendStatistics was forced to query the Backend to obtain the register pressure information. This helps removes that dependency. Now views are completely independent from the Backend. As a consequence, it should be easier to address PR36663 and further modularize the pipeline. Added a couple of test cases in the BtVer2 specific directory. llvm-svn: 328129
* [X86][Btver2] Fix crc32 schedule costsSimon Pilgrim2018-03-181-11/+11
| | | | | | The default is currently FAdd for some reason llvm-svn: 327807
* [X86][Btver2] Add crc32 resource testsSimon Pilgrim2018-03-181-1/+26
| | | | llvm-svn: 327805
* [X86][Btver2] FADD/FHADD ymm instructions are double pumped on the JFPA ↵Simon Pilgrim2018-03-182-46/+46
| | | | | | functional pipe llvm-svn: 327804
* [X86][Btver2] Float bitwise ymm instructions are double pumped on the JFPX ↵Simon Pilgrim2018-03-181-27/+27
| | | | | | (JFPA/JFPM) functional pipes llvm-svn: 327803
* [X86][Btver2] F16C instructions are performed on the JSTC functional pipeSimon Pilgrim2018-03-181-8/+8
| | | | llvm-svn: 327801
* [X86][Btver2] SSE4A EXTRQ/INSERTQ instructions are performed on the ↵Simon Pilgrim2018-03-181-4/+4
| | | | | | JVALU0/JVALU1 functional pipes llvm-svn: 327794
* [X86][Btver2] Modelled float bitwise instructions as being performed on the ↵Simon Pilgrim2018-03-183-40/+40
| | | | | | float cluster (FPA/FPM) not the integer. llvm-svn: 327793
* [X86][Btver2] Correctly distinguish between scheduling pipe and functional ↵Simon Pilgrim2018-03-1810-828/+828
| | | | | | | | | | unit for JWriteResFpuPair defs Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit. This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later. llvm-svn: 327791
* [X86][Btver2] Add llvm-mca tests to show pipe resource usage of most vector ↵Simon Pilgrim2018-03-1811-0/+3206
| | | | | | | | instructions Hopefully these tests can be easily reused should any other subtarget get in depth llvm-mca coverage (we can either copy the tests or move them into a common dir and run it with multiple prefixes). llvm-svn: 327788
* [X86][Btver2] Tweak pipes test to remove register dependenciesSimon Pilgrim2018-03-151-34/+34
| | | | | | It gives us a better view of pipe usage in the timeline which is what the test is trying to show. llvm-svn: 327685
* [X86][Btver2] Fix ymm div/sqrt to use fmul unitSimon Pilgrim2018-03-151-27/+26
| | | | | | | | YMM FDiv/FSqrt are dispatched on pipe JFPU1 but should be performed on the JFPM unit - that is where most of the cycles are spent. This matches the pipes for WriteFSqrt/WriteFDiv definitions. llvm-svn: 327682
* [X86][Btver2] Add test to show timeline of fpu instructions on different ↵Simon Pilgrim2018-03-151-0/+114
| | | | | | | | pipes/units Try to demonstrate the scheduling from fpu0/fpu1 pipes to the valu0/vimul/fpa or valu1/stc/fpm functional units llvm-svn: 327676
* [llvm-mca] BackendStatistics: early exit from method printSchedulerUsage if theAndrea Di Biagio2018-03-102-0/+41
| | | | | | no scheduler resources were consumed. llvm-svn: 327215
* [llvm-mca] Fix handling of zero-latency instructions.Andrea Di Biagio2018-03-083-0/+100
| | | | | | | | | | | | | | | | | | | | This patch fixes a problem found when testing zero latency instructions on target AArch64 -mcpu=exynos-m3 / -mcpu=exynos-m1. On Exynos-m3/m1, direct branches are zero-latency instructions that don't consume any processor resources. The DispatchUnit marks zero-latency instructions as "executed", so that no scheduling is required. The event of instruction executed is then notified to all the listeners, and the reorder buffer (managed by the RetireControlUnit) is updated. In particular, the entry associated to the zero-latency instruction in the reorder buffer is marked as executed. Before this patch, the DispatchUnit forgot to assign a retire control unit token (RCUToken) to the zero-latency instruction. As a consequence, the RCUToken was used uninitialized. This was causing a crash in the RetireControlUnit logic. Fixes PR36650. llvm-svn: 327056
* [llvm-mca] Emit the 'Instruction Info' table before the resource pressure view.Andrea Di Biagio2018-03-085-72/+78
| | | | | | | In future, both the summary information and the 'instruction info' table should be moved into a separate "Summary" view. llvm-svn: 327010
* [llvm-mca] LLVM Machine Code Analyzer.Andrea Di Biagio2018-03-0817-0/+431
llvm-mca is an LLVM based performance analysis tool that can be used to statically measure the performance of code, and to help triage potential problems with target scheduling models. llvm-mca uses information which is already available in LLVM (e.g. scheduling models) to statically measure the performance of machine code in a specific cpu. Performance is measured in terms of throughput as well as processor resource consumption. The tool currently works for processors with an out-of-order backend, for which there is a scheduling model available in LLVM. The main goal of this tool is not just to predict the performance of the code when run on the target, but also help with diagnosing potential performance issues. Given an assembly code sequence, llvm-mca estimates the IPC (instructions per cycle), as well as hardware resources pressure. The analysis and reporting style were mostly inspired by the IACA tool from Intel. This patch is related to the RFC on llvm-dev visible at this link: http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html Differential Revision: https://reviews.llvm.org/D43951 llvm-svn: 326998
OpenPOWER on IntegriCloud