bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on ↵	Roman Lebedev	2018-07-23	5	-95/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	partially written registers. Summary: Pretty mechanical follow-up for D49196. As microarchitecture.pdf notes, "20 AMD Ryzen pipeline", "20.8 Register renaming and out-of-order schedulers": The integer register file has 168 physical registers of 64 bits each. The floating point register file has 160 registers of 128 bits each. "20.14 Partial register access": The processor always keeps the different parts of an integer register together. ... An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it. Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh Reviewed By: GGanesh Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49393 llvm-svn: 337676
*	[NFC][MCA] ZnVer1: add partial-reg-update tests	Roman Lebedev	2018-07-23	7	-0/+460
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh Reviewed By: GGanesh Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49392 llvm-svn: 337675
*	[llvm-mca][x86] Add movsx/movzx instructions to general x86_64 resource tests	Simon Pilgrim	2018-07-20	10	-10/+700
\| \| \| \|	llvm-svn: 337586
*	[X86][BtVer2] correctly model the latency/throughput of LEA instructions.	Andrea Di Biagio	2018-07-19	1	-171/+171
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the latency/throughput of LEA instructions in the BtVer2 scheduling model. On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of 1. That is because it uses one cycle of SAGU followed by 1cy of ALU1. An LEA with a "Scale" operand is also slow, and it has the same latency profile as the 3-operands LEA. An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e. RThrouhgput of 2.0). This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td. The tablegen backend (for instruction-info) expands that definition into this (file X86GenInstrInfo.inc): ``` static bool isThreeOperandsLEA(const MachineInstr &MI) { return ( ( MI.getOpcode() == X86::LEA32r \|\| MI.getOpcode() == X86::LEA64r \|\| MI.getOpcode() == X86::LEA64_32r \|\| MI.getOpcode() == X86::LEA16r ) && MI.getOperand(1).isReg() && MI.getOperand(1).getReg() != 0 && MI.getOperand(3).isReg() && MI.getOperand(3).getReg() != 0 && ( ( MI.getOperand(4).isImm() && MI.getOperand(4).getImm() != 0 ) \|\| (MI.getOperand(4).isGlobal()) ) ); } ``` A similar method is generated in the X86_MC namespace, and included into X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h). Back to the BtVer2 scheduling model: A new scheduling predicate named JSlowLEAPredicate now checks if either the instruction is a three-operands LEA, or it is an LEA with a Scale value different than 1. A variant scheduling class uses that new predicate to correctly select the appropriate latency profile. Differential Revision: https://reviews.llvm.org/D49436 llvm-svn: 337469
*	[llvm-mca][x86] Add extend, carry-flag and CMP instructions to general ↵	Simon Pilgrim	2018-07-17	10	-10/+1200
\| \| \| \| \| \|	x86_64 resource tests llvm-svn: 337306
*	[llvm-mca][x86] Add MOVBE resource tests to all supporting targets	Simon Pilgrim	2018-07-17	9	-0/+462
\| \| \| \| \| \|	SNB doesn't support MOVBE but the numbers in Generic (which use the SNB model) look sane. llvm-svn: 337305
*	[llvm-mca][x86] Add BSWAP resource tests	Simon Pilgrim	2018-07-17	10	-10/+80
\| \| \| \|	llvm-svn: 337302
*	[llvm-mca][x86] Add displacement-only and additional scale=1 LEA tests	Simon Pilgrim	2018-07-17	1	-1/+82
\| \| \| \|	llvm-svn: 337298
*	[llvm-mca][x86] Add LEA resource tests (PR32326)	Simon Pilgrim	2018-07-17	1	-0/+362
\| \| \| \| \| \|	Add llvm-mca tests demonstrating how LEA instructions are currently modelled. Once this is working on btver2 I'll copy the test file to the other target directories. llvm-svn: 337297
*	[llvm-mca] Regenerate X86 specific tests. NFC	Andrea Di Biagio	2018-07-15	13	-15/+15
\| \| \| \| \| \|	Not all tests were correctly updated by the update script after r336797. llvm-svn: 337124
*	[llvm-mca][BtVer2] teach how to identify false dependencies on partially written	Andrea Di Biagio	2018-07-15	5	-62/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	registers. The goal of this patch is to improve the throughput analysis in llvm-mca for the case where instructions perform partial register writes. On x86, partial register writes are quite difficult to model, mainly because different processors tend to implement different register merging schemes in hardware. When the code contains partial register writes, the IPC (instructions per cycles) estimated by llvm-mca tends to diverge quite significantly from the observed IPC (using perf). Modern AMD processors (at least, from Bulldozer onwards) don't rename partial registers. Quoting Agner Fog's microarchitecture.pdf: " The processor always keeps the different parts of an integer register together. For example, AL and AH are not treated as independent by the out-of-order execution mechanism. An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it." This patch is a first important step towards improving the analysis of partial register updates. It changes the semantic of RegisterFile descriptors in tablegen, and teaches llvm-mca how to identify false dependences in the presence of partial register writes (for more details: see the new code comments in include/Target/TargetSchedule.h - class RegisterFile). This patch doesn't address the case where a write to a part of a register is followed by a read from the whole register. On Intel chips, high8 registers (AH/BH/CH/DH)) can be stored in separate physical registers. However, a later (dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which adds extra latency (and potentially affects the pipe usage). This is a very interesting article on the subject with a very informative answer from Peter Cordes: https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to In future, the definition of RegisterFile can be extended with extra information that may be used to identify delays caused by merge opcodes triggered by a dirty read of a partial write. Differential Revision: https://reviews.llvm.org/D49196 llvm-svn: 337123
*	[llvm-mca][BtVer2] Add tests for dependency breaking instructions.	Andrea Di Biagio	2018-07-13	6	-1/+412
\| \| \| \|	llvm-svn: 337024
*	[X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions.	Andrea Di Biagio	2018-07-11	18	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was inferred directly from the "default" pattern associated with the instruction definition. r336728 removed special node X86Movlps, and all the patterns associated to it. Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the 'mayLoad/hasSideEffects' flags are left unset. When the instruction info is emitted by tablegen, method CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a pattern, and flags are undefined. So, it conservatively sets the "hasSideEffects" flag for it. As a consequence, we were losing the 'mayLoad' flag, and we were gaining a 'hasSideEffect' flag in its place. This patch fixes the issue (originally reported by Michael Holmen). The mca tests show the differences in the instruction info flags. Instructions that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm. Differential Revision: https://reviews.llvm.org/D49182 llvm-svn: 336818
*	[llvm-mca] Use a different character to flag instructions with side-effects ↵	Andrea Di Biagio	2018-07-11	212	-2025/+2025
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in the Instruction Info View. NFC This makes easier to identify changes in the instruction info flags. It also helps spotting potential regressions similar to the one recently introduced at r336728. Using the same character to mark MayLoad/MayStore/HasSideEffects is problematic for llvm-lit. When pattern matching substrings, llvm-lit consumes tabs and spaces. A change in position of the flag marker may not trigger a test failure. This patch only changes the character used for flag `hasSideEffects`. The reason why I didn't touch other flags is because I want to avoid spamming the mailing because of the massive diff due to the numerous tests affected by this change. In future, each instruction flag should be associated with a different character in the Instruction Info View. llvm-svn: 336797
*	[llvm-mca] Add tests for partial register writes.	Andrea Di Biagio	2018-07-11	4	-0/+310
\| \| \| \| \| \| \| \| \| \|	llvm-mca doesn't know that on modern AMD processors, portions of a general purpose register are not treated independently. So, a partial register write has a false dependency on the super-register. The issue with partial register writes will be addressed by a follow-up patch. llvm-svn: 336778
*	[llvm-mca] report an error if the assembly sequence contains an unsupported ↵	Andrea Di Biagio	2018-07-09	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	instruction. This is a short-term fix for PR38093. For now, we llvm::report_fatal_error if the instruction builder finds an unsupported instruction in the instruction stream. We need to revisit this fix once we start addressing PR38101. Essentially, we need a better framework for error handling. llvm-svn: 336543
*	[MCA][X86][NFC] Add BSF/BSR resource tests	Roman Lebedev	2018-07-08	10	-10/+400
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: RKSimon, andreadb, courbet Reviewed By: RKSimon Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48997 llvm-svn: 336510
*	[llvm-mca] improve the instruction issue logic implemented by the Scheduler.	Andrea Di Biagio	2018-07-06	3	-60/+256
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch modifies the Scheduler heuristic used to select the next instruction to issue to the pipelines. The motivating example is test X86/BtVer2/add-sequence.s, for which llvm-mca wrongly reported an estimated IPC of 1.50. According to perf, the actual IPC for that test should have been ~2.00. It turns out that an IPC of 2.00 for test add-sequence.s cannot possibly be predicted by a Scheduler that only prioritizes instructions based on their "age". A similar issue also affected test X86/BtVer2/dependent-pmuld-paddd.s, for which llvm-mca wrongly estimated an IPC of 0.84 instead of an IPC of 1.00. Instructions in the ReadyQueue are now ranked based on two factors: - The "age" of an instruction. - The number of unique users of writes associated with an instruction. The new logic still prioritizes older instructions over younger instructions to minimize the pressure on the reorder buffer. However, the number of users of an instruction now also affects the overall rank. This potentially increases the ability of the Scheduler to extract instruction level parallelism. This patch fixes the problem with the wrong IPC reported for test add-sequence.s and test dependent-pmuld-paddd.s. llvm-svn: 336420
*	[X86][BtVer2][MCA][NFC] Add CMPEQ dependency-breaking one-idioms tests	Roman Lebedev	2018-07-04	2	-0/+157
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As per `Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions)`, these, like zero-idioms, are dependency-breaking, although they produce ones and still consume resources. FIXME: as discussed in D48877, llvm-mca handling is broken for these. Reviewers: andreadb Reviewed By: andreadb Subscribers: gbedwell, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D48876 llvm-svn: 336292
*	Replace unused output filenames with /dev/null in tests	Fangrui Song	2018-07-02	4	-4/+4
\| \| \| \| \| \|	Similar to rLLD336129 llvm-svn: 336131
*	[llvm-mca][x86] Add FMA4 resource tests	Simon Pilgrim	2018-06-28	1	-0/+349
\| \| \| \| \| \|	We should be ensuring we have (near) complete test coverage of instructions, at least for the generic model. llvm-svn: 335870
*	[llvm-mca][x86] Add 3dnow! resource tests	Simon Pilgrim	2018-06-28	1	-0/+208
\| \| \| \| \| \|	We should be ensuring we have (near) complete test coverage of instructions, at least for the generic model. llvm-svn: 335869
*	[llvm-mca][X86] Teach how to identify register writes that implicitly clear ↵	Andrea Di Biagio	2018-06-20	7	-167/+171
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the upper portion of a super-register. This patch teaches llvm-mca how to identify register writes that implicitly zero the upper portion of a super-register. On X86-64, a general purpose register is implemented in hardware as a 64-bit register. Quoting the Intel 64 Software Developer's Manual: "an update to the lower 32 bits of a 64 bit integer register is architecturally defined to zero extend the upper 32 bits". Also, a write to an XMM register performed by an AVX instruction implicitly zeroes the upper 128 bits of the aliasing YMM register. This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis interface to help identify instructions that implicitly clear the upper portion of a super-register. The rest of the patch teaches llvm-mca how to use that new method to obtain the information, and update the register dependencies accordingly. I compared the kernels from tests clear-super-register-1.s and clear-super-register-2.s against the output from perf on btver2. Previously there was a large discrepancy between the estimated IPC and the measured IPC. Now the differences are mostly in the noise. Differential Revision: https://reviews.llvm.org/D48225 llvm-svn: 335113
*	[X86][Znver1] Specify Register Files, RCU; FP scheduler capacity.	Roman Lebedev	2018-06-20	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: First off: i do not have any access to that processor, so this is purely theoretical, no benchmarks. I have been looking into bdver2 scheduling profile, and while cross-referencing the existing btver2, znver1 profiles, and the reference docs (`Software Optimization Guide for AMD Family {15,16,17}h Processors`), i have noticed that only btver2 scheduling profile specifies these. Also, there is no mca test coverage. Reviewers: RKSimon, craig.topper, courbet, GGanesh, andreadb Reviewed By: GGanesh Subscribers: gbedwell, vprasad, ddibyend, shivaram, Ashutosh, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D47676 llvm-svn: 335099
*	[X86] Fix r335097	Clement Courbet	2018-06-20	1	-1/+5
\| \| \| \| \| \|	Missed `Generic` test in llvm-mca. llvm-svn: 335098
*	[X86] Add sched class WriteLAHFSAHF and fix values.	Clement Courbet	2018-06-20	10	-9/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I ran llvm-exegesis on SKX, SKL, BDW, HSW, SNB. Atom is from Agner and SLM is a guess. I've left AMD processors alone. Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48079 llvm-svn: 335097
*	[MCA][NFC] Add generic XOP resource tests	Roman Lebedev	2018-06-19	1	-0/+534
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Based on * [[ https://support.amd.com/TechDocs/43479.pdf \| AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 Instructions ]], * [[ https://support.amd.com/TechDocs/24594.pdf \| AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions]], * https://en.wikipedia.org/wiki/XOP_instruction_set Appears to be only supported in AMD's 15h generation, so only in bdver[1-4], for which currently llvm has no scheduling profiles. Reviewers: RKSimon, craig.topper, andreadb, spatel Reviewed By: RKSimon Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48264 llvm-svn: 335034
*	[MCA][NFC] Add generic TBM resource tests	Roman Lebedev	2018-06-19	1	-0/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Based on https://support.amd.com/TechDocs/24594.pdf, https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets#TBM_(Trailing_Bit_Manipulation) Appears to be only supported in AMD's 15h generation, so only in bdver[1-4], for which currently llvm has no scheduling profiles. Reviewers: RKSimon, craig.topper, simark, andreadb Reviewed By: RKSimon Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48252 llvm-svn: 335033
*	[llvm-mca] Use an ordered map to collect hardware statistics. NFC.	Andrea Di Biagio	2018-06-18	6	-5/+6
\| \| \| \| \| \| \|	Histogram entries are now ordered by key. This should improves their readability when statistics are printed. llvm-svn: 334961
*	[llvm-mca] Add tests for XOP and AVX512 instructions that implicitly clear ↵	Andrea Di Biagio	2018-06-18	5	-0/+430
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the upper portion of a super-register. When the destination register of a XOP instruction is an XMM register, bits [255:128] of the corresponding YMM register are cleared. When the destination register of a EVEX encoded instruction is an XMM/YMM register, the upper bits of the corresponding ZMM are cleared. On processors that feature AVX512, a write to an XMM registers always clears the upper portion of the corresponding ZMM register if the instruction is VEX or EVEX encoded. These new tests show some interesting cases which aren't correctly analyzed by llvm-mca. The lack of knowledge related to the implicit update on the super-registers is addressed by D48225. llvm-svn: 334945
*	[X86] Fix NOOP sched overrides on BDW/HSW/SKL.	Clement Courbet	2018-06-18	3	-31/+31
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Noop certainly does not use resources. Reviewers: RKSimon, craig.topper, andreadb Subscribers: gbedwell, llvm-commits, gchatelet Differential Revision: https://reviews.llvm.org/D48028 llvm-svn: 334927
*	[llvm-mca][X86] Add some avx512f/avx512vl resource test placeholders	Simon Pilgrim	2018-06-17	2	-0/+514
\| \| \| \| \| \|	There are a lot of instructions to add under these ISAs (and the other AVX512 variants) but this should demonstrate how to test for the EVEX instructions with different maskings llvm-svn: 334907
*	[llvm-mca][x86] Add Generic cpu resource tests	Simon Pilgrim	2018-06-15	21	-0/+9934
\| \| \| \| \| \| \| \|	Added a Generic x86 cpu set of resource tests to allow us to check all ISAs. We currently use SandyBridge as our generic CPU model, but it's better if we actually duplicate these tests for if/when we change the model, it also means we don't end up polluting the SandyBridge folder with tests for ISAs it doesn't support. llvm-svn: 334853
*	[MCA] Add -summary-view option	Roman Lebedev	2018-06-15	4	-72/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: While that is indeed a quite interesting summary stat, there are cases where it does not really add anything other than consuming extra lines. Declutters the output of D48190. Reviewers: RKSimon, andreadb, courbet, craig.topper Reviewed By: andreadb Subscribers: javed.absar, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48209 llvm-svn: 334833
*	[MCA][x86][NFC] Add tests for -register-file-stats, -scheduler-stats	Roman Lebedev	2018-06-15	2	-0/+134
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There does not seem to be any other tests for this. Split off from D47676. Reviewers: RKSimon, craig.topper, courbet, andreadb Reviewed By: andreadb Subscribers: javed.absar, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48190 llvm-svn: 334832
*	[llvm-mca] Add tests for instructions that implicitly clear the upper ↵	Andrea Di Biagio	2018-06-14	2	-0/+176
\| \| \| \| \| \| \| \| \| \| \| \| \|	portion of a super-register. On x86-64, a write to register EAX implicitly clears the upper half or RAX. 128-bit AVX instructions clear the upper 128-bit of the YMM register that aliases the XMM definition register. llvm-mca doesn't know about register writes that implicitly clear the upper portion of an aliasing super-register. This issue will be fixed in a future patch. llvm-svn: 334742
*	[llvm-mca] Add another test for partial register stalls.	Andrea Di Biagio	2018-06-14	1	-0/+43
\| \| \| \| \| \| \| \| \|	This test checks that a physical register is correctly allocated for the partial write to register BX. The ADD instruction has to wait for the write to RBX (and BX) before being executed. llvm-svn: 334730
*	[llvm-mca] Fixed a bug in the logic that checks if a memory operation is ↵	Andrea Di Biagio	2018-06-13	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ready to execute. Fixes PR37790. In some (very rare) cases, the LSUnit (Load/Store unit) was wrongly marking a load (or store) as "ready to execute" effectively bypassing older memory barrier instructions. To reproduce this bug, the memory barrier must be the first instruction in the input assembly sequence, and it doesn't have to perform any register writes. llvm-svn: 334633
*	[X86] Fix skylake server scheduling info.	Clement Courbet	2018-06-11	10	-721/+721
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This fixes most of the scheduling info for SKX vector operations. I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM. The before/after llvm-exegesis analysis are in the phabricator diff. Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47721 llvm-svn: 334407
*	[X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that ↵	Simon Pilgrim	2018-06-08	1	-127/+127
\| \| \| \| \| \| \| \|	should match the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits). llvm-svn: 334303
*	[X86][BtVer2] Remove SBB tests that were accidentally added in rL334296	Simon Pilgrim	2018-06-08	1	-130/+120
\| \| \| \| \| \|	These aren't true zero-idiom instructions (just dependency breaking). llvm-svn: 334297
*	[X86][BtVer2] Add tests for scalar SUB/XOR instructions that should match ↵	Simon Pilgrim	2018-06-08	1	-119/+143
\| \| \| \| \| \| \| \|	the dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions). llvm-svn: 334296
*	[X86][BtVer2] Limit zero idiom tests to a single iteration.	Simon Pilgrim	2018-06-08	1	-216/+109
\| \| \| \| \| \|	Reduces output size and we're only wanting to check that the instructions are fast-path'd (just Dispatch+Retire) anyhow llvm-svn: 334292
*	[X86][BtVer2] Add support for all vector instructions that should match the ↵	Simon Pilgrim	2018-06-06	1	-288/+300
\| \| \| \| \| \| \| \|	dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register. llvm-svn: 334119
*	[llvm-mca][x86] Fix all resources-x86_64.s tests to use different registers ↵	Simon Pilgrim	2018-06-06	9	-1755/+1755
\| \| \| \| \| \| \| \|	in reg-reg cases I noticed while working on zero-idiom + dependency-breaking support (PR36671) that most of our binary instruction tests were reusing the same src registers, which would cause the tests to fail once we enable scalar zero-idiom support on btver2. Fixed in all targets to keep them in sync. llvm-svn: 334110
*	[X86][BtVer2] Add tests for all vector instructions that should match the ↵	Simon Pilgrim	2018-06-06	1	-66/+348
\| \| \| \| \| \| \| \| \|	dependency-breaking 'zero-idiom' As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), all these instructions are dependency breaking and zero the destination register. TODO: Scalar instructions still need to be tested (need to check EFLAGS handling). llvm-svn: 334104
*	[CodeGen] assume max/default throughput for unspecified instructions	Sanjay Patel	2018-06-05	23	-282/+284
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for the problem arising in D47374 (PR37678): https://bugs.llvm.org/show_bug.cgi?id=37678 We may not have throughput info because it's not specified in the model or it's not available with variant scheduling, so assume that those instructions can execute/complete at max-issue-width. Differential Revision: https://reviews.llvm.org/D47723 llvm-svn: 334055
*	[llvm-mca] Correctly update the CyclesLeft of a register read in the ↵	Andrea Di Biagio	2018-06-05	2	-1/+44
\| \| \| \| \| \| \| \| \| \| \|	presence of partial register updates. This patch fixe the logic in ReadState::cycleEvent(). That method was not correctly updating field `TotalCycles`. Added extra code comments in class ReadState to better describe each field. llvm-svn: 334028
*	[RFC][patch 3/3] Add support for variant scheduling classes in llvm-mca.	Andrea Di Biagio	2018-06-04	1	-0/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is the last of a sequence of three patches related to LLVM-dev RFC "MC support for variant scheduling classes". http://lists.llvm.org/pipermail/llvm-dev/2018-May/123181.html This fixes PR36672. The main goal of this patch is to teach llvm-mca how to solve variant scheduling classes. This patch does that, plus it adds new variant scheduling classes to the BtVer2 scheduling model to identify so-called zero-idioms (i.e. so-called dependency breaking instructions that are known to generate zero, and that are optimized out in hardware at register renaming stage). Without the BtVer2 change, this patch would not have had any meaningful tests. This patch is effectively the union of two changes: 1) a change that teaches llvm-mca how to resolve variant scheduling classes. 2) a change to the BtVer2 scheduling model that allows us to special-case packed XOR zero-idioms (this partially fixes PR36671). Differential Revision: https://reviews.llvm.org/D47374 llvm-svn: 333909
*	[llvm-mca] Regenerate a test to remove a double newline	Greg Bedwell	2018-06-04	1	-1/+0
\| \| \| \| \| \|	Command used: py update_mca_test_checks.py ..\test\tools\llvm-mca\\.s ..\test\tools\llvm-mca\\\*.s llvm-svn: 333893