bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[CostModel][X86] Add missing scalar i64->f32 uitofp costs	Simon Pilgrim	2020-01-06	1	-11/+11
\|
*	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵	Fangrui Song	2019-12-24	1	-1/+1
\| \| \| \|	as cleanups after D56351
*	[AMDGPU] Implemented fma cost analysis	Stanislav Mekhanoshin	2019-12-18	1	-0/+120
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D71676
*	[AMDGPU] Fixed cost model for packed 16 bit ops	Stanislav Mekhanoshin	2019-12-17	7	-102/+258
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D71622
*	[ARM] Teach the Arm cost model that a Shift can be folded into other ↵	David Green	2019-12-09	1	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
*	[ARM] Additional tests and minor formatting. NFC	David Green	2019-12-09	1	-0/+96
\| \| \| \| \| \|	This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.
*	[x86] add cost model special-case for insert/extract from element 0	Sanjay Patel	2019-12-06	4	-62/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023
*	[PowerPC] Separate Features that are known to be Power9 specific from Future CPU	Stefan Pintilie	2019-11-27	1	-0/+16
\| \| \| \| \| \| \| \|	The Power 9 CPU has some features that are unlikely to be passed on to future versions of the CPU. This patch separates this out so that future CPU does not inherit them. Differential Revision: https://reviews.llvm.org/D70466
*	[x86] make SLM extract vector element more expensive than default	Sanjay Patel	2019-11-27	4	-255/+1197
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
*	AMDGPU: Split test functions to avoid dependency on subtarget	Matt Arsenault	2019-11-19	1	-57/+155
\| \| \| \| \|	Prepare this test for moving tthe denormal setting out of the subtarget features.
*	[X86] Remove setOperationAction for FP_TO_SINT v8i16.	Craig Topper	2019-11-12	1	-12/+12
\| \| \| \| \| \| \| \|	This is no longer needed after widening legalization as we custom legalize v8i8 ourselves. Added entries to the cost model, but bumped the cost slightly to account for the truncate shuffle that wasn't costed before.
*	[CostModel] Fixed isExtractSubvectorMask for undef index off end	Tim Renouf	2019-11-08	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ShuffleVectorInst::isExtractSubvectorMask, introduced in [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) erroneously thought that %340 = shufflevector <4 x float> %339, <4 x float> undef, <3 x i32> <i32 2, i32 3, i32 undef> is a subvector extract, even though it goes off the end of the parent vector with the undef index. That then caused an assert in BasicTTIImplBase::getExtractSubvectorOverhead. This commit fixes that, by not considering the above a subvector extract. Differential Revision: https://reviews.llvm.org/D70005 Change-Id: I87b8b00b24bef19ffc9a1b82ef4eca3b8a246eaf
*	[AMDGPU] Fix bug introduced in 47a5c36b37f0	dfukalov	2019-11-07	1	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: [AMDGPU] Fix bug introduced in 47a5c36b37f0 Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69915
*	[CostModel][X86] Improve add vXi64 + fadd vXf64 reduction tests for SLM	Simon Pilgrim	2019-11-06	2	-12/+12
\| \| \| \|	As noted on D59710 we weren't handling the high costs of these operations on SLM.
*	[CostModel][X86] Add add/fadd reduction tests for SLM	Simon Pilgrim	2019-11-06	2	-0/+256
\|
*	[AMDGPU] Improve code size cost model (part 2)	dfukalov	2019-11-06	10	-1/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Added estimations for ShuffleVector, some cast and arithmetic instructions Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69629
*	[X86] Lower the cost of avx512 horizontal bool and/or reductions to ↵	Craig Topper	2019-11-04	2	-42/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	2*log2(bitwidth)+1 for legal types. This better represents the kshift+binop we'd get for each stage before the final extract. Its likely we'll do even better by doing a kmov and a cmp with a GPR, but this is a good start. The default handling was costing a worst case single source permute shuffle of the vector before the binop. This worst case assumes the shuffle might have to be emulated with extracts and inserts. But since we know we're doing a reduction we can assume we'll get kshift lowering. There's still some room for improvement here, but this is much better than it was.
*	[AMDGPU] Improve code size cost model	Daniil Fukalov	2019-10-17	3	-18/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Added estimation for zero size insertelement, extractelement and llvm.fabs operators. Updated inline/unroll parameters default values. Reviewers: rampitec, arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68881 llvm-svn: 375109
*	[CostModel][X86] Add CTLZ scalar costs	Simon Pilgrim	2019-10-14	1	-31/+64
\| \| \| \| \| \| \| \|	Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. llvm-svn: 374786
*	[CostModel][X86] Add CTPOP scalar costs (PR43656)	Simon Pilgrim	2019-10-14	1	-4/+4
\| \| \| \| \| \| \| \|	Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. llvm-svn: 374775
*	[CostModel][X86] Improve sum reduction costs.	Simon Pilgrim	2019-10-12	2	-336/+141
\| \| \| \| \| \| \| \|	I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674. llvm-svn: 374655
*	[CostModel][X86] Add tests for insertelement to non-immediate vector element ↵	Simon Pilgrim	2019-10-09	1	-0/+74
\| \| \| \| \| \|	indices llvm-svn: 374161
*	[CostModel][X86] Add tests for extractelement from non-immediate vector ↵	Simon Pilgrim	2019-10-09	1	-0/+74
\| \| \| \| \| \|	element indices llvm-svn: 374160
*	[CostModel][X86] Fix SLM <2 x i64> icmp costs	Simon Pilgrim	2019-09-26	6	-55/+285
\| \| \| \| \| \| \| \|	SLM is 2 x slower for <2 x i64> comparison ops than other vector types, we should account for this like we do for SLM <2 x i64> add/sub/mul costs. This should remove some of the SLM codegen diffs in D43582 llvm-svn: 372954
*	[Cost][X86] Add more missing vector truncation costs	Simon Pilgrim	2019-09-22	1	-49/+49
\| \| \| \| \| \|	The AVX512 cases still need some work to correct recognise the PMOV truncation cases. llvm-svn: 372514
*	[Cost][X86] Add v2i64 truncation costs	Simon Pilgrim	2019-09-22	4	-105/+105
\| \| \| \| \| \| \| \|	We are missing costs for a lot of truncation cases, I'm hoping to address all the 'zero cost' cases in trunc.ll I thought this was a vector widening side effect, but even before this we had some interesting LV decisions (notably over indvars) being made due to these zero costs. llvm-svn: 372498
*	[SystemZ] Support z15 processor name	Ulrich Weigand	2019-09-20	3	-41/+41
\| \| \| \| \| \| \| \| \| \| \|	The recently announced IBM z15 processor implements the architecture already supported as "arch13" in LLVM. This patch adds support for "z15" as an alternate architecture name for arch13. The patch also uses z15 in a number of places where we used arch13 as long as the official name was not yet announced. llvm-svn: 372435
*	[CostModel][X86] Add scalar sext/zext cost tests	Simon Pilgrim	2019-09-02	1	-0/+158
\| \| \| \|	llvm-svn: 370684
*	[CostModel] Model all `extractvalue`s as free.	Roman Lebedev	2019-08-29	2	-48/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As disscussed in https://reviews.llvm.org/D65148#1606412, `extractvalue` don't actually generate any code, so we should treat them as free. Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen Reviewed By: jmolloy Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66098 llvm-svn: 370339
*	Recommit [PowerPC] Update P9 vector costs for insert/extract	Roland Froese	2019-08-26	1	-24/+24
\| \| \| \| \| \| \|	Now that the v1i128 smin regression has been fixed, recommit the P9 cost updates from D60160. llvm-svn: 369952
*	[X86] Lower the cost of v2i32->v2f64 sint_to_fp under vector widening ↵	Craig Topper	2019-08-22	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	legalization. I don't really understand the costs we're using for fp_to_sint, but prior to widening legalization we used 20 as the cost for this via the v2i64->v2f64 entry. That number seems better than the 40 we got with widening legalization. So now we need either a v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether AVX is enabled or not since we skip the first SSE2 table look up under AVX. llvm-svn: 369628
*	[ARM] MVE sext costs	David Green	2019-08-19	1	-16/+52
\| \| \| \| \| \| \| \| \|	This adds some sext costs for MVE, taken from the length of assembly sequences that we currently generate. Differential Revision: https://reviews.llvm.org/D66010 llvm-svn: 369244
*	[ARM] MVE sext of a load is free	David Green	2019-08-16	1	-6/+6
\| \| \| \| \| \| \| \| \|	MVE also has some sext of loads, which will be free just as scalar instructions are. Differential Revision: https://reviews.llvm.org/D66008 llvm-svn: 369118
*	[X86] Improve cost model for subvector extraction of less than 128-bit vectors	Craig Topper	2019-08-15	1	-615/+853
\| \| \| \| \| \| \| \|	Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements. Differential Revision: https://reviews.llvm.org/D65892 llvm-svn: 369022
*	[X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than ↵	Craig Topper	2019-08-14	3	-58/+58
\| \| \| \| \| \| \| \| \| \| \| \|	128-bit inputs Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG. For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high. Differential Revision: https://reviews.llvm.org/D66169 llvm-svn: 368858
*	[X86] Add missing regular 512-bit vXi8 extract subvector cost model tests	Simon Pilgrim	2019-08-14	1	-73/+421
\| \| \| \| \| \|	These tests don't cover many cases where the subvectors don't start on aligned indices, but that can be added later. llvm-svn: 368839
*	[ARM] Add MVE beats vector cost model	David Green	2019-08-13	7	-665/+979
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MVE architecture has the idea of "beats", where a vector instruction can be executed over several ticks of the architecture. This adds a similar system into the Arm backend cost model, multiplying the cost of all vector instructions by a factor. This factor essentially becomes the expected difference between scalar code and vector code, on average. MVE Vector instructions can also overlap so the a true cost of them is often lower. But equally scalar instructions can in some situations be dual issued, or have other optimisations such as unrolling or make use of dsp instructions. The default is chosen as 2. This should not prevent vectorisation is a most cases (as the vector instructions will still be doing at least 4 times the work), but it will help prevent over vectorising in cases where the benefits are less likely. This adds things so far to the obvious places in ARMTargetTransformInfo, and updates a few related costs like not treating float instructions as cost 2 just because they are floats. Differential Revision: https://reviews.llvm.org/D66005 llvm-svn: 368733
*	[X86] Add some vXi8 extract subvector cost model tests	Simon Pilgrim	2019-08-13	1	-0/+367
\| \| \| \| \| \|	We don't have full 512-bit test coverage yet - but there's enough to help test D65892 llvm-svn: 368716
*	[CostModel][X86][AArch64] Check all 3 cost kinds in aggregates.ll	Roman Lebedev	2019-08-12	2	-50/+182
\| \| \| \|	llvm-svn: 368595
*	[ARM] sext of a load is free	David Green	2019-08-12	1	-14/+14
\| \| \| \| \| \| \| \| \|	This teaches the cost model that the sext or zext of a load is going to be free. Differential Revision: https://reviews.llvm.org/D66006 llvm-svn: 368593
*	[ARM] MVE shuffle broadcast costs	David Green	2019-08-12	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \|	A VDUP will perform a vector broadcast in a single instruction. Update the cost model for MVE accordingly. Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63448 llvm-svn: 368589
*	[ARM] Put some of the TTI costmodel behind hasNeon calls.	David Green	2019-08-12	7	-317/+317
\| \| \| \| \| \| \| \| \|	This puts some of the calls in ARMTargetTransformInfo.cpp behind hasNeon() checks, now that we have MVE, and updates all the tests accordingly. Differential Revision: https://reviews.llvm.org/D63447 llvm-svn: 368587
*	[ARM] Add or update a number of costmodel tests. NFC	David Green	2019-08-12	7	-793/+2486
\| \| \| \| \| \| \|	This adds a number of cost model tests for ARM, useful for MVE. It also re-jigs some of the existing tests to make them easier to update and read. llvm-svn: 368586
*	[CostModel][X86][AArch64] Add some tests for extractvalue	Roman Lebedev	2019-08-12	2	-0/+152
\| \| \| \| \| \| \|	In https://reviews.llvm.org/D65148 it is suggested that it should have zero cost, always. llvm-svn: 368548
*	Recommit r368081 "[X86] Add more extract subvector cost model tests for ↵	Craig Topper	2019-08-07	1	-7/+488
\| \| \| \| \| \|	smaller element sizes and smaller than 128-bit vectors." llvm-svn: 368185
*	Recommit r367901 "[X86] Enable ↵	Craig Topper	2019-08-07	23	-828/+470
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183
*	Revert "[X86] Add more extract subvector cost model tests for smaller ↵	Mitch Phillips	2019-08-06	1	-488/+7
\| \| \| \| \| \| \| \| \| \| \|	element sizes and smaller than 128-bit vectors." This reverts commit fc33e33776b7a7ce22e539f0ec2e3bfdb09ad361. This commit depends on the rolled back commit rL367901, and thus needs to be rolled back. llvm-svn: 368109
*	Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."	Mitch Phillips	2019-08-06	23	-470/+828
\| \| \| \| \| \| \| \| \|	This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810. This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information. llvm-svn: 368107
*	[X86] Add more extract subvector cost model tests for smaller element sizes ↵	Craig Topper	2019-08-06	1	-7/+488
\| \| \| \| \| \| \| \| \|	and smaller than 128-bit vectors. With the switch to widening legalization, we need to a better job of costing extractions of less than 128-bits. llvm-svn: 368081
*	[X86] Remove tests for -x86-experimental-vector-widening-legalization from ↵	Craig Topper	2019-08-06	18	-8109/+0
\| \| \| \| \| \| \| \| \|	test/Analysis/CostModel/X86/ This flag is now the default behavior so we don't need separate tests. llvm-svn: 368080