bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[llvm-mca][x86] Remove addsubpd from SSE2 tests	Simon Pilgrim	2018-05-07	1	-8/+1
\| \| \| \|	llvm-svn: 331678
*	[X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes	Simon Pilgrim	2018-05-07	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions. WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions. This removes all InstrRW overrides for these instructions. NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner. NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80. llvm-svn: 331629
*	[X86] Add WriteEMMS scheduler class	Simon Pilgrim	2018-05-04	1	-3/+3
\| \| \| \| \| \|	Filled in the missing values from Btver2 SoG or Agner llvm-svn: 331546
*	[X86] Add SchedWriteFRnd fp rounding scheduler classes	Simon Pilgrim	2018-05-04	2	-18/+18
\| \| \| \| \| \| \| \|	Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions. Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA. llvm-svn: 331515
*	[llvm-mca][X86] Add BT resource tests to all models	Simon Pilgrim	2018-04-29	1	-1/+148
\| \| \| \|	llvm-svn: 331144
*	[llvm-mca][X86] Add add/adc + sub/sbb resource tests to all models	Simon Pilgrim	2018-04-29	1	-1/+377
\| \| \| \|	llvm-svn: 331140
*	[llvm-mca][X86] Add double shift resource tests to all relevant models	Simon Pilgrim	2018-04-28	1	-1/+76
\| \| \| \|	llvm-svn: 331109
*	[llvm-mca][X86] Add shift/rotate resource tests to all relevant models	Simon Pilgrim	2018-04-28	1	-0/+552
\| \| \| \| \| \|	I intend to add further instruction tests to the resources-x86_64.s test file as required, but this initial commit is to help remove a load of unnecessary InstRW overrides in a future patch llvm-svn: 331108
*	[X86] Split off PHMINPOSUW to their own schedule class	Simon Pilgrim	2018-04-24	2	-6/+6
\| \| \| \| \| \|	This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default. llvm-svn: 330756
*	[X86][BtVer2] Fix VCVTPS2PHmr/VCVTPS2PHYmr latencies	Simon Pilgrim	2018-04-24	1	-11/+5
\| \| \| \| \| \|	These are stores, not loads, so don't need to account for load latency. llvm-svn: 330735
*	[llvm-mca][X86] Add BMI/LZCNT/POPCNT resource tests to all relevant models	Simon Pilgrim	2018-04-22	2	-0/+177
\| \| \| \| \| \|	The SandyBridge BMI tests are actually run on IvyBridge as that's the first lowest CPU that actually support the ISAs (but still use the SandyBridge model). llvm-svn: 330556
*	[llvm-mca][X86] Add POPCNT resource test	Simon Pilgrim	2018-04-22	1	-0/+57
\| \| \| \|	llvm-svn: 330540
*	[llvm-mca][X86] Add X87 resource tests	Simon Pilgrim	2018-04-21	1	-0/+528
\| \| \| \|	llvm-svn: 330499
*	[llvm-mca][X86] Add prefetch instruction resource tests	Simon Pilgrim	2018-04-19	1	-1/+14
\| \| \| \|	llvm-svn: 330371
*	[llvm-mca][X86] Add mmx instruction to btver2 resource tests	Simon Pilgrim	2018-04-19	3	-8/+569
\| \| \| \| \| \|	Useful to see scheduler class deltas against xmm equivalents llvm-svn: 330335
*	[llvm-mca][X86] Add mmx versions of SSSE3 instructions	Simon Pilgrim	2018-04-18	2	-23/+135
\| \| \| \| \| \|	Move PABS instructions incorrectly tested under SSE2 llvm-svn: 330295
*	[UpdateTestChecks] Add update_mca_test_checks.py script	Greg Bedwell	2018-04-18	31	-138/+989
\| \| \| \| \| \| \| \| \| \| \|	This script can be used to regenerate tests in the test/tools/llvm-mca directory (PR36904). Regenerated a number of tests using the pattern: test/tools/llvm-mca///*.s Differential Revision: https://reviews.llvm.org/D45369 llvm-svn: 330246
*	[X86] Add separate scheduling class for PSADBW instruction.	Craig Topper	2018-04-17	1	-2/+2
\| \| \| \|	llvm-svn: 330204
*	[llvm-mca] Ensure that instructions with a schedule read-advance are always ↵	Andrea Di Biagio	2018-04-13	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	issued in the right order. Normally, the Scheduler prioritizes older instructions over younger instructions during the instruction issue stage. In one particular case where a dependent instruction had a schedule read-advance associated to one of the input operands, this rule was not correctly applied. This patch fixes the issue and adds a test to verify that we don't regress that particular case. llvm-svn: 330032
*	[llvm-mca] Renamed BackendStatistics to RetireControlUnitStatistics.	Andrea Di Biagio	2018-04-11	1	-0/+56
\| \| \| \| \| \|	Also, removed flag -verbose in favor of flag -retire-stats. llvm-svn: 329794
*	[llvm-mca] Move the logic that prints scheduler statistics from ↵	Andrea Di Biagio	2018-04-11	1	-2/+26
\| \| \| \| \| \| \| \|	BackendStatistics to its own view. Added flag -scheduler-stats to print scheduler related statistics. llvm-svn: 329792
*	[llvm-mca] Move the logic that prints dispatch unit statistics from ↵	Andrea Di Biagio	2018-04-10	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	BackendStatistics to its own view. This patch moves the logic that collects and analyzes dispatch events to the DispatchStatistics view. Added flag -dispatch-stats to print statistics related to the dispatch logic. llvm-svn: 329708
*	[llvm-mca] Increase the default number of iterations to 100.	Andrea Di Biagio	2018-04-10	2	-25/+50
\| \| \| \|	llvm-svn: 329694
*	Reapply "[llvm-mca] Do not separate iterations with a newline in the ↵	Andrea Di Biagio	2018-04-10	6	-17/+18
\| \| \| \| \| \| \| \|	timeline view." This reapplies r329403 with a fix for the floating point rounding issue. llvm-svn: 329680
*	Revert r329403 "[llvm-mca] Do not separate iterations with a newline in the ↵	Hans Wennborg	2018-04-09	6	-18/+17
\| \| \| \| \| \| \| \| \| \| \| \|	timeline view." This made AArch64/CortexA57/direct-branch.s fail on Windows, e.g. http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/11251 > Also, update a few tests to minimize the diff in D45369. > No functional change intended. llvm-svn: 329569
*	[X86][Btver2] Add vector extract costs	Simon Pilgrim	2018-04-08	3	-30/+30
\| \| \| \|	llvm-svn: 329524
*	[llvm-mca] Do not separate iterations with a newline in the timeline view.	Andrea Di Biagio	2018-04-06	6	-17/+18
\| \| \| \| \| \| \|	Also, update a few tests to minimize the diff in D45369. No functional change intended. llvm-svn: 329403
*	[MC][Tablegen] Allow models to describe the retire control unit for llvm-mca.	Andrea Di Biagio	2018-04-05	3	-49/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the ability to describe properties of the hardware retire control unit. Tablegen class RetireControlUnit has been added for this purpose (see TargetSchedule.td). A RetireControlUnit specifies the size of the reorder buffer, as well as the maximum number of opcodes that can be retired every cycle. A zero (or negative) value for the reorder buffer size means: "the size is unknown". If the size is unknown, then llvm-mca defaults it to the value of field SchedMachineModel::MicroOpBufferSize. A zero or negative number of opcodes retired per cycle means: "there is no restriction on the number of instructions that can be retired every cycle". Models can optionally specify an instance of RetireControlUnit. There can only be up-to one RetireControlUnit definition per scheduling model. Information related to the RCU (RetireControlUnit) is stored in (two new fields of) MCExtraProcessorInfo. llvm-mca loads that information when it initializes the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp). This patch fixes PR36661. Differential Revision: https://reviews.llvm.org/D45259 llvm-svn: 329304
*	[X86][Btver2] Strip unnecessary check prefixes from resources tests	Simon Pilgrim	2018-04-04	11	-11/+11
\| \| \| \|	llvm-svn: 329192
*	[llvm-mca] Move the logic that prints register file statistics to its own ↵	Andrea Di Biagio	2018-04-03	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	view. NFCI Before this patch, the "BackendStatistics" view was responsible for printing the register file usage (as well as many other statistics). Now users can enable register file usage statistics using the command line flag `-register-file-stats`. By default, the tool doesn't print register file statistics. llvm-svn: 329083
*	[MC][Tablegen] Allow the definition of processor register files in the ↵	Andrea Di Biagio	2018-04-03	5	-13/+217
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067
*	[llvm-mca] Do not assume that implicit reads cannot be associated with ↵	Andrea Di Biagio	2018-04-02	1	-0/+26
\| \| \| \| \| \| \| \| \| \|	ReadAdvance entries. Before, the instruction builder incorrectly assumed that only explicit reads could have been associated with ReadAdvance entries. This patch fixes the issue and adds a test to verify it. llvm-svn: 328972
*	[X86] Add SchedRW for PMULLD	Craig Topper	2018-03-31	3	-29/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914
*	[X86][BtVer2] Fixed the number of micro opcodes for AVX vector converts and	Andrea Di Biagio	2018-03-30	1	-8/+8
\| \| \| \| \| \| \| \| \|	VSQRT instructions. There were still a few AVX instructions with an incorrect number of opcodes. These should be fixed now. llvm-svn: 328892
*	[X86][BtVer2] Fix the number of uOps for horizontal operations.	Andrea Di Biagio	2018-03-30	2	-12/+12
\| \| \| \|	llvm-svn: 328886
*	[X86][BtVer2] Add missing ReadAfterLd to RM variants of AVX horizontal adds and	Andrea Di Biagio	2018-03-30	3	-12/+10
\| \| \| \| \| \| \| \| \|	most vector logic instructions. Fixed a few InstRW that forgot to specify a ReadAfterLd for the register input operand. llvm-svn: 328867
*	[X86][BtVer2] Add tests that show how ReadAfterLd is missing for some	Andrea Di Biagio	2018-03-30	4	-0/+92
\| \| \| \| \| \| \| \| \| \| \| \| \|	instructions. In the Btver2 model, there are a few InstRW overrides that don't specify a ReadAfterLd for the register input operand. As a result, a few AVX variants of horizontal operations and most vector logic operations with a folded memory operand don't have a ReadAdvance info associated to their input register operands. llvm-svn: 328865
*	[llvm-mca] Correctly set the ReadAdvance information for register use operands.	Andrea Di Biagio	2018-03-29	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \|	The tool was passing the wrong operand index to method MCSubtargetInfo::getReadAdvanceCycles(). That method requires a "UseIdx", and not the operand index. This was found when testing X86 code where instructions had a memory folded operand. This patch fixes the issue and adds test read-advance-1.s to ensure that the ReadAfterLd (a ReadAdvance of 3cy) information is correctly used. llvm-svn: 328790
*	[X86][BtVer2] Fix the number of micro opcodes for AES[ENC\|DEC] and other YMM ↵	Andrea Di Biagio	2018-03-28	1	-22/+22
\| \| \| \| \| \| \| \| \| \| \|	instructions. Similar to r328694. The number of micro opcodes should be 2 for those instructions. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328698
*	[X86][BtVer2] Fix the number of micro opcodes for a bunch of YMM instructions.	Andrea Di Biagio	2018-03-28	2	-14/+709
\| \| \| \| \| \| \| \| \| \| \| \| \|	The Jaguar backend natively supports 128-bit data types. Operations on YMM registers are split into two COPs (complex operations). Each COP consumes a slot in the dispatch group, and in the reorder buffer. The scheduling model for Jaguar should mark those instructions as `let NumMicroOps = 2`. This was found when testing AVX code for BtVer2 using llvm-mca. llvm-svn: 328694
*	[llvm-mca] pass the correct set of used registers in checkRAT.	Andrea Di Biagio	2018-03-27	1	-0/+33
\| \| \| \| \| \| \| \| \|	We were incorrectly initializing the array of used registers in method checkRAT. As a consequence, the number of register file stalls was misreported. Added a test to cover this case. llvm-svn: 328629
*	[X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costs	Simon Pilgrim	2018-03-26	3	-16/+16
\| \| \| \| \| \|	Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573
*	[X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs	Simon Pilgrim	2018-03-26	1	-4/+4
\| \| \| \| \| \|	We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551
*	[X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs	Simon Pilgrim	2018-03-26	2	-8/+8
\| \| \| \|	llvm-svn: 328541
*	[X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of ↵	Simon Pilgrim	2018-03-26	5	-30/+30
\| \| \| \| \| \|	JALU0 for GPR PRF write) llvm-svn: 328536
*	[X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs	Simon Pilgrim	2018-03-26	3	-36/+45
\| \| \| \| \| \| \| \|	Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505
*	[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs	Simon Pilgrim	2018-03-26	1	-12/+12
\| \| \| \| \| \|	These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501
*	[llvm-mca] Fix how views are added to the InstructionTables.	Andrea Di Biagio	2018-03-26	1	-0/+103
\| \| \| \| \| \| \|	This should fix the stack-use-after-scope reported by the asan buildbots after revision 328493. llvm-svn: 328499
*	[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs	Simon Pilgrim	2018-03-26	2	-8/+8
\| \| \| \| \| \|	The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497
*	[llvm-mca] Add a flag -instruction-info to enable/disable the instruction ↵	Andrea Di Biagio	2018-03-26	1	-0/+23
\| \| \| \| \| \|	info view. llvm-svn: 328493