summaryrefslogtreecommitdiffstats
path: root/llvm/test/tools/llvm-mca/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [llvm-mca] Do not separate iterations with a newline in the timeline view.Andrea Di Biagio2018-04-066-17/+18
| | | | | | | Also, update a few tests to minimize the diff in D45369. No functional change intended. llvm-svn: 329403
* [MC][Tablegen] Allow models to describe the retire control unit for llvm-mca. Andrea Di Biagio2018-04-053-49/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the ability to describe properties of the hardware retire control unit. Tablegen class RetireControlUnit has been added for this purpose (see TargetSchedule.td). A RetireControlUnit specifies the size of the reorder buffer, as well as the maximum number of opcodes that can be retired every cycle. A zero (or negative) value for the reorder buffer size means: "the size is unknown". If the size is unknown, then llvm-mca defaults it to the value of field SchedMachineModel::MicroOpBufferSize. A zero or negative number of opcodes retired per cycle means: "there is no restriction on the number of instructions that can be retired every cycle". Models can optionally specify an instance of RetireControlUnit. There can only be up-to one RetireControlUnit definition per scheduling model. Information related to the RCU (RetireControlUnit) is stored in (two new fields of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp). This patch fixes PR36661. Differential Revision: https://reviews.llvm.org/D45259 llvm-svn: 329304
* [X86][Btver2] Strip unnecessary check prefixes from resources testsSimon Pilgrim2018-04-0411-11/+11
| | | | llvm-svn: 329192
* [llvm-mca] Move the logic that prints register file statistics to its own ↵Andrea Di Biagio2018-04-035-5/+5
| | | | | | | | | | | | | view. NFCI Before this patch, the "BackendStatistics" view was responsible for printing the register file usage (as well as many other statistics). Now users can enable register file usage statistics using the command line flag `-register-file-stats`. By default, the tool doesn't print register file statistics. llvm-svn: 329083
* [MC][Tablegen] Allow the definition of processor register files in the ↵Andrea Di Biagio2018-04-035-13/+217
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067
* [llvm-mca] Do not assume that implicit reads cannot be associated with ↵Andrea Di Biagio2018-04-021-0/+26
| | | | | | | | | | ReadAdvance entries. Before, the instruction builder incorrectly assumed that only explicit reads could have been associated with ReadAdvance entries. This patch fixes the issue and adds a test to verify it. llvm-svn: 328972
* [X86] Add SchedRW for PMULLDCraig Topper2018-03-313-29/+28
| | | | | | | | | | | | | | | | | | | Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914
* [X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts andAndrea Di Biagio2018-03-301-8/+8
| | | | | | | | | VSQRT instructions. There were still a few AVX instructions with an incorrect number of opcodes. These should be fixed now. llvm-svn: 328892
* [X86][BtVer2] Fix the number of uOps for horizontal operations.Andrea Di Biagio2018-03-302-12/+12
| | | | llvm-svn: 328886
* [X86][BtVer2] Add missing ReadAfterLd to RM variants of AVX horizontal adds andAndrea Di Biagio2018-03-303-12/+10
| | | | | | | | | most vector logic instructions. Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input operand. llvm-svn: 328867
* [X86][BtVer2] Add tests that show how ReadAfterLd is missing for someAndrea Di Biagio2018-03-304-0/+92
| | | | | | | | | | | | | instructions. In the Btver2 model, there are a few InstRW overrides that don't specify a ReadAfterLd for the register input operand. As a result, a few AVX variants of horizontal operations and most vector logic operations with a folded memory operand don't have a ReadAdvance info associated to their input register operands. llvm-svn: 328865
* [X86] Add llvm-mca tests for r328834.Andrea Di Biagio2018-03-304-0/+120
| | | | | | | | | Verify that the ReadAfterLd is correctly applied to FMA and 4-ops variable blend instructions. As Craig pointed out in D44726, some Intel models still have to be fixed. llvm-svn: 328861
* [X86] Add tests to verify the presence of "ReadAfterLd" after r328823.Andrea Di Biagio2018-03-302-0/+53
| | | | | | | This change adds a couple of tests to verify the change introduced by revision 328823 ([X86] Correct the placement of ReadAfterLd in BEXTR and BZHI). llvm-svn: 328859
* [llvm-mca] Correctly set the ReadAdvance information for register use operands.Andrea Di Biagio2018-03-291-0/+46
| | | | | | | | | | | | The tool was passing the wrong operand index to method MCSubtargetInfo::getReadAdvanceCycles(). That method requires a "UseIdx", and not the operand index. This was found when testing X86 code where instructions had a memory folded operand. This patch fixes the issue and adds test read-advance-1.s to ensure that the ReadAfterLd (a ReadAdvance of 3cy) information is correctly used. llvm-svn: 328790
* [X86][BtVer2] Fix the number of micro opcodes for AES[ENC|DEC] and other YMM ↵Andrea Di Biagio2018-03-281-22/+22
| | | | | | | | | | | instructions. Similar to r328694. The number of micro opcodes should be 2 for those instructions. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328698
* [X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions.Andrea Di Biagio2018-03-282-14/+709
| | | | | | | | | | | | | The Jaguar backend natively supports 128-bit data types. Operations on YMM registers are split into two COPs (complex operations). Each COP consumes a slot in the dispatch group, and in the reorder buffer. The scheduling model for Jaguar should mark those instructions as `let NumMicroOps = 2`. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328694
* [llvm-mca] pass the correct set of used registers in checkRAT.Andrea Di Biagio2018-03-271-0/+33
| | | | | | | | | We were incorrectly initializing the array of used registers in method checkRAT. As a consequence, the number of register file stalls was misreported. Added a test to cover this case. llvm-svn: 328629
* [X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costsSimon Pilgrim2018-03-263-16/+16
| | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573
* [X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costsSimon Pilgrim2018-03-261-4/+4
| | | | | | We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551
* [X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costsSimon Pilgrim2018-03-262-8/+8
| | | | llvm-svn: 328541
* [X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of ↵Simon Pilgrim2018-03-265-30/+30
| | | | | | JALU0 for GPR PRF write) llvm-svn: 328536
* [X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costsSimon Pilgrim2018-03-263-36/+45
| | | | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505
* [X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costsSimon Pilgrim2018-03-261-12/+12
| | | | | | These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501
* [llvm-mca] Fix how views are added to the InstructionTables.Andrea Di Biagio2018-03-261-0/+103
| | | | | | | This should fix the stack-use-after-scope reported by the asan buildbots after revision 328493. llvm-svn: 328499
* [X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costsSimon Pilgrim2018-03-262-8/+8
| | | | | | The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497
* [llvm-mca] Add a flag -instruction-info to enable/disable the instruction ↵Andrea Di Biagio2018-03-261-0/+23
| | | | | | info view. llvm-svn: 328493
* [X86][Btver2] Double the AGU and schedule pipe resources for YMMSimon Pilgrim2018-03-262-105/+105
| | | | | | Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491
* [llvm-mca] Add flag -instruction-tables to print the theoretical resource ↵Andrea Di Biagio2018-03-2610-706/+707
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pressure distribution for instructions (PR36874) The goal of this patch is to address most of PR36874. To fully fix PR36874 we need to split the "InstructionInfo" view from the "SummaryView". That would make easy to check the latency and rthroughput as well. The patch reuses all the logic from ResourcePressureView to print out the "instruction tables". We have an entry for every instruction in the input sequence. Each entry reports the theoretical resource pressure distribution. Resource pressure is uniformly distributed across all the processor resource units of a group. At the moment, the backend pipeline is not configurable, so the only way to fix this is by creating a different driver that simply sends instruction events to the resource pressure view. That means, we don't use the Backend interface. Instead, it is simpler to just have a different code-path for when flag -instruction-tables is specified. Once Clement addresses bug 36663, then we can port the "instruction tables" logic into a stage of our configurable pipeline. Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag -instruction-tables to each modified test. Differential Revision: https://reviews.llvm.org/D44839 llvm-svn: 328487
* [X86][Btver2] Cleanup TEST instructions to use JFPA (+JFPX on ymms) function ↵Simon Pilgrim2018-03-232-35/+35
| | | | | | unit llvm-svn: 328343
* [X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unitSimon Pilgrim2018-03-233-298/+298
| | | | | | Add missing non-VEX and (V)PMOVMSKB instructions to the pattern llvm-svn: 328338
* [X86][Btver2] Vector permutes use a JFPU01 scheduler pipe and JFPX/JVALU ↵Simon Pilgrim2018-03-232-303/+303
| | | | | | function unit llvm-svn: 328331
* [X86][Btver2] Vector store instructions use a JFPU1 scheduler pipe and ↵Simon Pilgrim2018-03-235-415/+415
| | | | | | JSAGU/JSTC function units llvm-svn: 328328
* [X86][Btver2] Cleanup DPPS/DPPD instructions to use JFPA/JFPM function unitsSimon Pilgrim2018-03-232-81/+81
| | | | llvm-svn: 328324
* [X86][Btver2] Fix MicroOps counts for DPPS/YMM memory folded instructionsSimon Pilgrim2018-03-232-56/+56
| | | | | | This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops. llvm-svn: 328320
* [X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to ↵Simon Pilgrim2018-03-231-9/+9
| | | | | | | | correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units Fixes throughput to match Agner/Fam16h-SoG as well. llvm-svn: 328318
* [X86][Btver2] Vector move/load/store instructions use a JFPU01 scheduler ↵Simon Pilgrim2018-03-237-520/+520
| | | | | | pipe and JFPX/JVALU function unit as well as the AGUs llvm-svn: 328304
* [X86] Match vpblendvb/vblendvps/vblendvpd itineraries to the SSE equivalent. ↵Craig Topper2018-03-231-66/+66
| | | | | | Change pblendvb/blendvps/blendvpd to use WriteFVarBlend llvm-svn: 328294
* [X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version.Craig Topper2018-03-231-55/+55
| | | | llvm-svn: 328293
* [X86][Btver2] Conversion, MaskedLoad/MaskedStore and NTStores all are ↵Simon Pilgrim2018-03-224-269/+269
| | | | | | scheduled through the JFPU1 pipe llvm-svn: 328226
* [X86][Btver2] FCMP (inc FMAX/FMIN) instructions use the JFPA functional pipeSimon Pilgrim2018-03-223-77/+77
| | | | | | The ymm instructions are double pumped as well. llvm-svn: 328222
* [X86][Btver2] FMUL ymm instructions are double pumped on the JFPM functional ↵Simon Pilgrim2018-03-222-255/+255
| | | | | | pipe llvm-svn: 328217
* [llvm-mca] Move the logic that computes the register file usage to the ↵Andrea Di Biagio2018-03-212-0/+58
| | | | | | | | | | | | | | | | | | | | | | | BackendStatistics view. With this patch, the "instruction dispatched" event now provides information related to the number of microarchitectural registers used in each register file. Similarly, the "instruction retired" event is now able to tell how may registers are freed in each register file. Currently, the BackendStatistics view is the only consumer of register usage/pressure information. BackendStatistics uses that info to print out a few general statistics (i.e. max number of mappings used; total mapping created). Before this patch, the BackendStatistics was forced to query the Backend to obtain the register pressure information. This helps removes that dependency. Now views are completely independent from the Backend. As a consequence, it should be easier to address PR36663 and further modularize the pipeline. Added a couple of test cases in the BtVer2 specific directory. llvm-svn: 328129
* [X86][Btver2] Fix crc32 schedule costsSimon Pilgrim2018-03-181-11/+11
| | | | | | The default is currently FAdd for some reason llvm-svn: 327807
* [X86][Btver2] Add crc32 resource testsSimon Pilgrim2018-03-181-1/+26
| | | | llvm-svn: 327805
* [X86][Btver2] FADD/FHADD ymm instructions are double pumped on the JFPA ↵Simon Pilgrim2018-03-182-46/+46
| | | | | | functional pipe llvm-svn: 327804
* [X86][Btver2] Float bitwise ymm instructions are double pumped on the JFPX ↵Simon Pilgrim2018-03-181-27/+27
| | | | | | (JFPA/JFPM) functional pipes llvm-svn: 327803
* [X86][Btver2] F16C instructions are performed on the JSTC functional pipeSimon Pilgrim2018-03-181-8/+8
| | | | llvm-svn: 327801
* [X86][Btver2] SSE4A EXTRQ/INSERTQ instructions are performed on the ↵Simon Pilgrim2018-03-181-4/+4
| | | | | | JVALU0/JVALU1 functional pipes llvm-svn: 327794
* [X86][Btver2] Modelled float bitwise instructions as being performed on the ↵Simon Pilgrim2018-03-183-40/+40
| | | | | | float cluster (FPA/FPM) not the integer. llvm-svn: 327793
* [X86][Btver2] Correctly distinguish between scheduling pipe and functional ↵Simon Pilgrim2018-03-1810-828/+828
| | | | | | | | | | unit for JWriteResFpuPair defs Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit. This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later. llvm-svn: 327791
OpenPOWER on IntegriCloud