bcm5719-llvm/llvm/test/tools/llvm-mca/X86/BdVer2, branch meklort-10.0.1

[MCA] Show aggregate over Average Wait times for the whole snippet (PR43219)

2019-10-10T14:46:21+00:00

Summary: As disscused in https://bugs.llvm.org/show_bug.cgi?id=43219, i believe it may be somewhat useful to show //some// aggregates over all the sea of statistics provided. Example: ``` Average Wait times (based on the timeline view): [0]: Executions [1]: Average time spent waiting in a scheduler's queue [2]: Average time spent waiting in a scheduler's queue while ready [3]: Average time elapsed from WB until retire stage [0] [1] [2] [3] 0. 3 1.0 1.0 4.7 vmulps %xmm0, %xmm1, %xmm2 1. 3 2.7 0.0 2.3 vhaddps %xmm2, %xmm2, %xmm3 2. 3 6.0 0.0 0.0 vhaddps %xmm3, %xmm3, %xmm4 3 3.2 0.3 2.3 ``` I.e. we average the averages. Reviewers: andreadb, mattd, RKSimon Reviewed By: andreadb Subscribers: gbedwell, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68714 llvm-svn: 374361

[MCA][LSUnit] Track loads and stores until retirement.

2019-10-08T10:46:01+00:00

Before this patch, loads and stores were only tracked by their corresponding queues in the LSUnit from dispatch until execute stage. In practice we should be more conservative and assume that memory opcodes leave their queues at retirement stage. Basically, loads should leave the load queue only when they have completed and delivered their data. We conservatively assume that a load is completed when it is retired. Stores should be tracked by the store queue from dispatch until retirement. In practice, stores can only leave the store queue if their data can be written to the data cache. This is mostly a mechanical change. With this patch, the retire stage notifies the LSUnit when a memory instruction is retired. That would triggers the release of LDQ/STQ entries. The only visible change is in memory tests for the bdver2 model. That is because bdver2 is the only model that defines the load/store queue size. This patch partially addresses PR39830. Differential Revision: https://reviews.llvm.org/D68266 llvm-svn: 374034

[MCA][X86] Add tests for LOCK variants of standard X86 arithmetic ops

2019-08-20T11:13:20+00:00

D66424 adds the base support for LOCK so we should be able to add special case support for all these cases in future patches llvm-svn: 369367

[X86] Move scheduling tests for CMPXCHG to the corresponding resources-x86_64.s files. NFC

2019-08-19T18:20:30+00:00

In D66424 it has been requested to move all the new tests added by r369278 into resources-x86_64.s. That is because only the 8b/16 ops should be tested by resources-cmpxchg.s. This partially reverts r369278. llvm-svn: 369288

[X86] Added extensive scheduling model tests for all the CMPXCHG variants. NFC

2019-08-19T17:07:26+00:00

Addresses a review comment in D66424 llvm-svn: 369279

[X86] Limit vpermil2pd/vpermil2ps immediates to 4 bits in the assembly parser.

2019-08-07T05:34:27+00:00

The upper 4 bits of the immediate byte are used to encode a register. We need to limit the explicit immediate to fit in the remaining 4 bits. Fixes PR42899. llvm-svn: 368123

[NFC][X86][MCA] BdVer2: add load-store-throughput test

2019-06-19T08:53:28+00:00

llvm-svn: 363774

[X86] Add missing properties on llvm.x86.sse.{st,ld}mxcsr

2019-06-19T08:44:31+00:00

Summary: llvm.x86.sse.stmxcsr only writes to memory. llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE. Reviewers: craig.topper, RKSimon Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62896 llvm-svn: 363773

[NFC][MCA][X86] Add one more 'clear super register' pattern - movss/movsd load clears high XMM bits

2019-06-15T16:12:13+00:00

llvm-svn: 363498

[X86] AMD Piledriver (BdVer2): major cleanup (mainly inverse throughput)

2019-05-09T13:54:51+00:00

I've started this cleanup more several times now, but got sidetracked elsewhere, e.g. by llvm-exegesis problems. Not this time, finally! This is mainly cleaning up the inverse throughput values, and a few latencies/uops, based on the llvm-exegesis measured values. Though this is not complete by any means, there's certainly more cleanup to be done. The performance numbers (i've only checked by RawSpeed benchmark) aren't really surprising - overall this *slightly* (< -1%) improves perf. llvm-svn: 360341