summaryrefslogtreecommitdiffstats
path: root/llvm/test/tools/llvm-mca/X86/BtVer2/resources-sse1.s
Commit message (Collapse)AuthorAgeFilesLines
* [X86][BtVer2] Improved latency and throughput of float/vector loads and stores.Andrea Di Biagio2019-10-141-3/+3
| | | | | | | | | | | | | | | | | | | | This patch introduces the following changes to the btver2 scheduling model: - The number of micro opcodes for YMM loads and stores is now 2 (it was incorrectly set to 1 for both aligned and misaligned loads/stores). - Increased the number of AGU resource cycles for YMM loads and stores to 2cy (instead of 1cy). - Removed JFPU01 and JFPX from the list of resources consumed by pure float/vector loads (no MMX). I verified with llvm-exegesis that pure XMM/YMM loads are no-pipe. Those are dispatched to the FPU but not really issues on JFPU01. Differential Revision: https://reviews.llvm.org/D68871 llvm-svn: 374765
* [X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsrClement Courbet2019-06-191-2/+2
| | | | | | | | | | | | | | | | Summary: llvm.x86.sse.stmxcsr only writes to memory. llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62896 llvm-svn: 363773
* [X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly ↵Craig Topper2019-05-061-4/+4
| | | | | | | | | | | | | | printing. We require d/q suffixes on the memory form of these instructions to disambiguate the memory size. We don't require it on the register forms, but need to support parsing both with and without it. Previously we always printed the d/q suffix on the register forms, but it's redundant and inconsistent with gcc and objdump. After this patch we should support the d/q for parsing, but not print it when its unneeded. llvm-svn: 360085
* [llvm-mca][x86] Fix MMX PMOVMSKB testSimon Pilgrim2019-04-291-3/+3
| | | | | | This is defined as part of SSE1, XMM PMOVMSKB doesn't appear until SSE2 llvm-svn: 359477
* [X86] Remove the _alt forms of (V)CMP instructions. Use a combination of ↵Craig Topper2019-03-181-8/+8
| | | | | | | | | | custom printing and custom parsing to achieve the same result and more Similar to previous change done for VPCOM and VPCMP Differential Revision: https://reviews.llvm.org/D59468 llvm-svn: 356384
* [X86][Btver2] Improved latency/throughput model for scalar int-to-float ↵Andrea Di Biagio2019-01-291-4/+4
| | | | | | | | | | | | | | conversions. Account for bypass delays when computing the latency of scalar int-to-float conversions. On Jaguar we need to account for an extra 6cy latency (see AMD fam16h SOG). This patch also fixes the number of micropcodes for the register-memory variants of scalar int-to-float conversions. Differential Revision: https://reviews.llvm.org/D57148 llvm-svn: 352518
* [X86][BtVer2] Update the WriteLoad latency.Andrea Di Biagio2019-01-211-5/+5
| | | | | | | | | | | | | | | | | | r327630 introduced new write definitions for float/vector loads. Before that revision, WriteLoad was used by both integer/float (scalar/vector) load. So, WriteLoad had to conservatively declare a latency to 5cy. That is because the load-to-use latency for float/vector load is 5cy. Now that we have dedicated writes for float/vector loads, there is no reason why we should keep the latency of WriteLoad to 5cy. At the moment, WriteLoad is only used by scalar integer loads only; we can assume an optimstic 3cy latency for them. This patch changes that latency from 5cy to 3cy, and regenerates the affected scheduling/mca tests. Differential Revision: https://reviews.llvm.org/D56922 llvm-svn: 351742
* [X86][Btver2] CVTSS2I/CVTSD2I - add missing JFPU0 pipeSimon Pilgrim2018-09-281-9/+9
| | | | | | | | We issue JFPU1->JSTC then JFPU0->JFPA then -> JALU0 (integer pipe) Match AMD Fam16h SOG + llvm-exegesis tests llvm-svn: 343314
* [X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions.Andrea Di Biagio2018-07-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was inferred directly from the "default" pattern associated with the instruction definition. r336728 removed special node X86Movlps, and all the patterns associated to it. Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the 'mayLoad/hasSideEffects' flags are left unset. When the instruction info is emitted by tablegen, method CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a pattern, and flags are undefined. So, it conservatively sets the "hasSideEffects" flag for it. As a consequence, we were losing the 'mayLoad' flag, and we were gaining a 'hasSideEffect' flag in its place. This patch fixes the issue (originally reported by Michael Holmen). The mca tests show the differences in the instruction info flags. Instructions that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm. Differential Revision: https://reviews.llvm.org/D49182 llvm-svn: 336818
* [llvm-mca] Use a different character to flag instructions with side-effects ↵Andrea Di Biagio2018-07-111-7/+7
| | | | | | | | | | | | | | | | | | | | | in the Instruction Info View. NFC This makes easier to identify changes in the instruction info flags. It also helps spotting potential regressions similar to the one recently introduced at r336728. Using the same character to mark MayLoad/MayStore/HasSideEffects is problematic for llvm-lit. When pattern matching substrings, llvm-lit consumes tabs and spaces. A change in position of the flag marker may not trigger a test failure. This patch only changes the character used for flag `hasSideEffects`. The reason why I didn't touch other flags is because I want to avoid spamming the mailing because of the massive diff due to the numerous tests affected by this change. In future, each instruction flag should be associated with a different character in the Instruction Info View. llvm-svn: 336797
* [llvm-mca] Make sure not to end the test files with an empty line.Roman Lebedev2018-06-041-1/+0
| | | | | | | | | | | | | | | | | | | Summary: It's super irritating. [properly configured] git client then complains about that double-newline, and you have to use `--force` to ignore the warning, since even if you fix it manually, it will be reintroduced the very next runtime :/ Reviewers: RKSimon, andreadb, courbet, craig.topper, javed.absar, gbedwell Reviewed By: gbedwell Subscribers: javed.absar, tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D47697 llvm-svn: 333887
* [X86][BtVer2] Improve simulation of (V)PINSR valuesSimon Pilgrim2018-05-181-2/+2
| | | | | | Include the 6cy delay transferring from the GPR to FPU. llvm-svn: 332737
* [X86][BtVer2] Partial vector stores (inc MMX) have a 2cy latencySimon Pilgrim2018-05-181-3/+3
| | | | llvm-svn: 332722
* [X86][SSE] Ensure float load/stores use the WriteFLoad/WriteFStore scheduler ↵Simon Pilgrim2018-05-181-5/+5
| | | | | | | | | | classes Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc. Fixes BtVer2/SLM which have different behaviours for GPR stores. llvm-svn: 332714
* [llvm-mca] Regenerate tests after r332381 and r332361. NFCAndrea Di Biagio2018-05-161-264/+264
| | | | llvm-svn: 332447
* [X86][BtVer2] Fix MMX/YMM integer vector nt store schedulesSimon Pilgrim2018-05-141-1/+1
| | | | | | MMX was missing and YMM was tagged as a fp nt store llvm-svn: 332269
* [X86][MMX] Tag MMX Move/Load/Store as WriteVec schedule classesSimon Pilgrim2018-05-111-2/+2
| | | | | | Fixes an issue on SLM/Btver2 where we had instructions were being treated as scalar loads/stores llvm-svn: 332104
* [llvm-mca][X86] Add prefetch instruction resource testsSimon Pilgrim2018-04-191-1/+14
| | | | llvm-svn: 330371
* [llvm-mca][X86] Add mmx instruction to btver2 resource testsSimon Pilgrim2018-04-191-7/+110
| | | | | | Useful to see scheduler class deltas against xmm equivalents llvm-svn: 330335
* [UpdateTestChecks] Add update_mca_test_checks.py scriptGreg Bedwell2018-04-181-3/+6
| | | | | | | | | | | This script can be used to regenerate tests in the test/tools/llvm-mca directory (PR36904). Regenerated a number of tests using the pattern: test/tools/llvm-mca/*/*/*.s Differential Revision: https://reviews.llvm.org/D45369 llvm-svn: 330246
* [X86][Btver2] Strip unnecessary check prefixes from resources testsSimon Pilgrim2018-04-041-1/+1
| | | | llvm-svn: 329192
* [X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costsSimon Pilgrim2018-03-261-4/+4
| | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573
* [X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costsSimon Pilgrim2018-03-261-4/+4
| | | | | | We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551
* [X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of ↵Simon Pilgrim2018-03-261-1/+1
| | | | | | JALU0 for GPR PRF write) llvm-svn: 328536
* [X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costsSimon Pilgrim2018-03-261-16/+16
| | | | | | | | Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505
* [llvm-mca] Fix how views are added to the InstructionTables.Andrea Di Biagio2018-03-261-0/+103
| | | | | | | This should fix the stack-use-after-scope reported by the asan buildbots after revision 328493. llvm-svn: 328499
* [llvm-mca] Add flag -instruction-tables to print the theoretical resource ↵Andrea Di Biagio2018-03-261-27/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pressure distribution for instructions (PR36874) The goal of this patch is to address most of PR36874. To fully fix PR36874 we need to split the "InstructionInfo" view from the "SummaryView". That would make easy to check the latency and rthroughput as well. The patch reuses all the logic from ResourcePressureView to print out the "instruction tables". We have an entry for every instruction in the input sequence. Each entry reports the theoretical resource pressure distribution. Resource pressure is uniformly distributed across all the processor resource units of a group. At the moment, the backend pipeline is not configurable, so the only way to fix this is by creating a different driver that simply sends instruction events to the resource pressure view. That means, we don't use the Backend interface. Instead, it is simpler to just have a different code-path for when flag -instruction-tables is specified. Once Clement addresses bug 36663, then we can port the "instruction tables" logic into a stage of our configurable pipeline. Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag -instruction-tables to each modified test. Differential Revision: https://reviews.llvm.org/D44839 llvm-svn: 328487
* [X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unitSimon Pilgrim2018-03-231-3/+3
| | | | | | Add missing non-VEX and (V)PMOVMSKB instructions to the pattern llvm-svn: 328338
* [X86][Btver2] Vector store instructions use a JFPU1 scheduler pipe and ↵Simon Pilgrim2018-03-231-2/+2
| | | | | | JSAGU/JSTC function units llvm-svn: 328328
* [X86][Btver2] Vector move/load/store instructions use a JFPU01 scheduler ↵Simon Pilgrim2018-03-231-9/+9
| | | | | | pipe and JFPX/JVALU function unit as well as the AGUs llvm-svn: 328304
* [X86][Btver2] Conversion, MaskedLoad/MaskedStore and NTStores all are ↵Simon Pilgrim2018-03-221-1/+1
| | | | | | scheduled through the JFPU1 pipe llvm-svn: 328226
* [X86][Btver2] FCMP (inc FMAX/FMIN) instructions use the JFPA functional pipeSimon Pilgrim2018-03-221-14/+14
| | | | | | The ymm instructions are double pumped as well. llvm-svn: 328222
* [X86][Btver2] Modelled float bitwise instructions as being performed on the ↵Simon Pilgrim2018-03-181-14/+14
| | | | | | float cluster (FPA/FPM) not the integer. llvm-svn: 327793
* [X86][Btver2] Correctly distinguish between scheduling pipe and functional ↵Simon Pilgrim2018-03-181-58/+58
| | | | | | | | | | unit for JWriteResFpuPair defs Jaguar's FPU has 2 scheduler pipes (JFPU0/JFPU1) which forward to multiple functional sub-units each. We need to model that an micro-op will both consume the scheduler pipe and a functional unit. This patch just handles the ops defined through JWriteResFpuPair, I'll go through the custom cases later. llvm-svn: 327791
* [X86][Btver2] Add llvm-mca tests to show pipe resource usage of most vector ↵Simon Pilgrim2018-03-181-0/+245
instructions Hopefully these tests can be easily reused should any other subtarget get in depth llvm-mca coverage (we can either copy the tests or move them into a common dir and run it with multiple prefixes). llvm-svn: 327788
OpenPOWER on IntegriCloud