bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86] Add silvermont fp arithmetic cost model tests	Simon Pilgrim	2018-03-05	1	-0/+73
\| \| \| \| \| \|	Add silvermont to existing high coverage tests instead of repeating in slm-arith-costs.ll llvm-svn: 326747
*	[X86][SSE] Reduce FADD/FSUB/FMUL costs on later targets (PR36280)	Simon Pilgrim	2018-02-26	2	-85/+85
\| \| \| \| \| \| \| \| \| \|	Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark. Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch. Differential Revision: https://reviews.llvm.org/D43733 llvm-svn: 326133
*	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)	Sanjay Patel	2018-02-21	3	-168/+168
\| \| \| \| \| \| \| \|	There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658
*	[TTI CostModel] change default cost of FP ops to 1 (PR36280)	Sanjay Patel	2018-02-19	3	-168/+168
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515
*	[X86][SSE] Increase PMULLD costs to better match hardware	Simon Pilgrim	2018-02-10	2	-18/+18
\| \| \| \| \| \|	Until Skylake, most hardware could only issue a PMULLD op every other cycle llvm-svn: 324823
*	[X86] Make v2i1 and v4i1 legal types without VLX	Craig Topper	2018-01-07	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967
*	[X86] Use mattr instead of mcpu in some of the cost model tests.	Craig Topper	2017-12-18	4	-39/+33
\| \| \| \| \| \| \| \|	Based on the names of the check lines, features seems more appropriate that cpu. Spotted while prototyping my patch to make 512-bit vectors illegal on SKX sometimes. llvm-svn: 320959
*	[X86] Promote fp_to_sint v16f32->v16i16/v16i8 to avoid scalarization.	Craig Topper	2017-11-29	1	-2/+2
\| \| \| \|	llvm-svn: 319266
*	[LV][X86] Support of AVX2 Gathers code generation and update the LV with this	Mohammed Agabaria	2017-11-20	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch depends on: https://reviews.llvm.org/D35348 Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen) Update LoopVectorize to generate gathers for AVX2 processors. Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb Reviewed By: delena, RKSimon Differential Revision: https://reviews.llvm.org/D35772 llvm-svn: 318641
*	[TTI][X86] update costs of interleaved load\store of i64\double	Mohammed Agabaria	2017-11-16	2	-0/+80
\| \| \| \| \| \| \| \| \| \| \| \|	This patch contains more accurate cost of interelaved load\store of stride 2 for the types int64\double on AVX2. Reviewers: delena, RKSimon, craig.topper, dorit Reviewed By: dorit Differential Revision: https://reviews.llvm.org/D40008 llvm-svn: 318385
*	[LV][X86] update the cost of interleaving mem. access of floats	Mohammed Agabaria	2017-11-06	1	-0/+141
\| \| \| \| \| \| \| \| \| \|	Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471
*	[AVX512][AVX2]Cost calculation for interleave load/store patterns ↵	Michael Zuckerman	2017-10-18	3	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	{v8i8,v16i8,v32i8,v64i8} This patch adds accurate instructions cost. The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride. Reviewers: 1. delena 2. Farhana 3. zvi 4. dorit 5. Ayal Differential Revision: https://reviews.llvm.org/D38762 Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7 llvm-svn: 316072
*	Revert r314923: "Recommit : Use the basic cost if a GEP is not used as ↵	Daniel Jasper	2017-10-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	addressing mode" Significantly reduces performancei (~30%) of gipfeli (https://github.com/google/gipfeli) I have not yet managed to reproduce this regression with the open-source version of the benchmark on github, but will work with others to get a reproducer to you later today. llvm-svn: 315680
*	[TargetTransformInfo] Check if function pointer is valid before calling ↵	Guozhi Wei	2017-10-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	isLoweredToCall Function isLoweredToCall can only accept non-null function pointer, but a function pointer can be null for indirect function call. So check it before calling isLoweredToCall from getInstructionLatency. Differential Revision: https://reviews.llvm.org/D38204 llvm-svn: 314927
*	Recommit : Use the basic cost if a GEP is not used as addressing mode	Jun Bum Lim	2017-10-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recommitting r314517 with the fix for handling ConstantExpr. Original commit message: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. llvm-svn: 314923
*	[X86] Add AVX512 check lines to the cost model truncate test.	Craig Topper	2017-10-03	1	-0/+13
\| \| \| \|	llvm-svn: 314758
*	Revert "Use the basic cost if a GEP is not used as addressing mode"	Alex Shlyapnikov	2017-09-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r314517. This commit crashes sanitizer bots, for example: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/4167 Stack snippet: ... /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Support/Casting.h:255:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getGEPCost(llvm::GEPOperator const, llvm::ArrayRef<llvm::Value const>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:742:0 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getUserCost(llvm::User const, llvm::ArrayRef<llvm::Value const>) /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:782:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/lib/Analysis/TargetTransformInfo.cpp:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:116:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:343:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/ADT/SmallVector.h:864:0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/include/llvm/Analysis/TargetTransformInfo.h:285:0 ... llvm-svn: 314560
*	Use the basic cost if a GEP is not used as addressing mode	Jun Bum Lim	2017-09-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target. However, since it doesn't check its actual users, it will return FREE even in cases where the GEP cannot be folded away as a part of actual addressing mode. For example, if an user of the GEP is a call instruction taking the GEP as a parameter, then the GEP may not be folded in isel. Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng Reviewed By: hfinkel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38085 llvm-svn: 314517
*	Check for overflows when calculating the offset in GetGEPCost.	Justin Lebar	2017-09-27	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This avoids C++ UB if the GEP is weird and the calculation overflows int64_t, and it's also observable in the cost model's results. Such GEPs are almost surely not valid pointers, but LLVM nonetheless generates them sometimes. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38337 llvm-svn: 314362
*	[TargetTransformInfo] Handle intrinsic call in getInstructionLatency()	Guozhi Wei	2017-09-22	1	-0/+6
\| \| \| \| \| \| \| \|	Usually an intrinsic is a simple target instruction, it should have a small latency. A real function call has much larger latency. So handle the intrinsic call in function getInstructionLatency(). Differential Revision: https://reviews.llvm.org/D38104 llvm-svn: 314003
*	[TargetTransformInfo] Static alloca has 0 cost	Guozhi Wei	2017-09-15	1	-0/+8
\| \| \| \| \| \| \| \|	Static alloca usually doesn't generate any machine instructions, so it has 0 cost. Differential Revision: https://reviews.llvm.org/D37879 llvm-svn: 313410
*	[TargetTransformInfo] Detect 0 latency instructions	Guozhi Wei	2017-09-14	1	-0/+18
\| \| \| \| \| \| \| \|	For instructions that unlikely generate machine instructions, they should also have 0 latency. Differential Revision: https://reviews.llvm.org/D37833 llvm-svn: 313288
*	[TargetTransformInfo] Add a new public interface getInstructionCost	Guozhi Wei	2017-09-08	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current TargetTransformInfo can support throughput cost model and code size model, but sometimes we also need instruction latency cost model in different optimizations. Hal suggested we need a single public interface to query the different cost of an instruction. So I proposed following interface: enum TargetCostKind { TCK_RecipThroughput, ///< Reciprocal throughput. TCK_Latency, ///< The latency of instruction. TCK_CodeSize ///< Instruction code size. }; int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const; All clients should mainly use this function to query the cost of an instruction, parameter <kind> specifies the desired cost model. This patch also provides a simple default implementation of getInstructionLatency. The default getInstructionLatency provides latency numbers for only small number of instruction classes, those latency numbers are only reasonable for modern OOO processors. It can be extended in following ways: Add more detail into this function. Add getXXXLatency function and call it from here. Implement target specific getInstructionLatency function. Differential Revision: https://reviews.llvm.org/D37170 llvm-svn: 312832
*	X86: Improve AVX512 fptoui lowering	Zvi Rackover	2017-09-07	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704
*	[CostModel][X86][XOP] Improve costs for XOP shuffles	Simon Pilgrim	2017-08-16	2	-0/+46
\| \| \| \| \| \|	VPPERM/VPERMIL2PD/VPERMIL2PS all provide more effective 2-input shuffles than regular AVX instructions llvm-svn: 311005
*	[CostModel][X86] Add SSE2 two-src shuffle costs	Simon Pilgrim	2017-08-10	2	-12/+12
\| \| \| \|	llvm-svn: 310654
*	[CostModel][X86] Add avx1 two-src shuffle costs	Simon Pilgrim	2017-08-10	2	-26/+26
\| \| \| \|	llvm-svn: 310650
*	[CostModel][X86] Add avx2 two-src shuffle costs	Simon Pilgrim	2017-08-10	2	-34/+34
\| \| \| \|	llvm-svn: 310645
*	[CostModel][X86] Extend two src shuffle cost tests	Simon Pilgrim	2017-08-10	1	-17/+195
\| \| \| \| \| \|	Cover most 128/256/512/1024-bit cases for vXf64/vXi64, vXf32/vXi32, vXi16 + vXi8 llvm-svn: 310641
*	[CostModel][X86] Add avx512vbmi broadcast/reverse/single-src shuffle cost tests	Simon Pilgrim	2017-08-10	3	-6/+18
\| \| \| \|	llvm-svn: 310633
*	[CostModel][X86] Improve single src shuffle costs	Simon Pilgrim	2017-08-10	1	-60/+60
\| \| \| \| \| \|	Add missing SK_PermuteSingleSrc costs for AVX2 targets and earlier, also added some of the simpler SK_PermuteTwoSrc costs to support splitting of SK_PermuteSingleSrc shuffles llvm-svn: 310632
*	[CostModel][X86] Added v2f64/v2i64 single src shuffle model tests	Simon Pilgrim	2017-08-10	1	-4/+21
\| \| \| \| \| \|	Fixed label checks for all prefixes llvm-svn: 310606
*	[X86][CM] update add\sub costs of vectors of 64 in X86\SLM arch	Mohammed Agabaria	2017-07-02	1	-7/+21
\| \| \| \| \| \| \| \| \|	this patch updates the cost of addq\subq (add\subtract of vectors of 64bits) based on the performance numbers of SLM arch. Differential Revision: https://reviews.llvm.org/D33983 llvm-svn: 306974
*	[AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2	Dorit Nuzman	2017-06-25	2	-0/+183
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238
*	[CostModel][X86] Add scalar arithmetic cost tests	Simon Pilgrim	2017-06-20	1	-7/+55
\| \| \| \|	llvm-svn: 305810
*	[CostModel][X86] Declare costs variables based on type	Simon Pilgrim	2017-06-20	1	-470/+470
\| \| \| \| \| \|	The alphabetical progression isn't that useful llvm-svn: 305808
*	Fix line-endings.	Simon Pilgrim	2017-05-19	1	-1/+1
\| \| \| \|	llvm-svn: 303448
*	[X86][AVX512] Add 512-bit vector ctpop costs + tests	Simon Pilgrim	2017-05-18	1	-0/+63
\| \| \| \|	llvm-svn: 303342
*	[X86][AVX512] Add 512-bit vector ctlz costs + tests	Simon Pilgrim	2017-05-17	1	-6/+150
\| \| \| \|	llvm-svn: 303300
*	[X86][AVX512] Add 512-bit vector cttz costs + tests	Simon Pilgrim	2017-05-17	1	-6/+125
\| \| \| \|	llvm-svn: 303293
*	[X86] Split ctpop/ctlz/cttz cost tests	Simon Pilgrim	2017-05-17	4	-587/+599
\| \| \| \| \| \|	This will make things a lot easier to test all the permutations of avx512 llvm-svn: 303290
*	[X86][AVX512] Add 512-bit vector bitreverse costs + tests	Simon Pilgrim	2017-05-17	1	-0/+69
\| \| \| \|	llvm-svn: 303283
*	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts	Simon Pilgrim	2017-05-14	3	-52/+52
\| \| \| \|	llvm-svn: 303023
*	[X86][AVX2] Fix costs for v4i64 ashr by splat	Simon Pilgrim	2017-05-14	1	-2/+2
\| \| \| \|	llvm-svn: 303022
*	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splat	Simon Pilgrim	2017-05-14	3	-38/+38
\| \| \| \|	llvm-svn: 303021
*	[X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul ↵	Simon Pilgrim	2017-05-14	1	-16/+16
\| \| \| \| \| \|	sequences llvm-svn: 303017
*	[X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + ↵	Simon Pilgrim	2017-05-14	2	-6/+6
\| \| \| \| \| \| \| \|	mask. Tweak cost model to match what lowering actually does. llvm-svn: 303013
*	[X86][SSE] Account for cost of extract/insert of v32i8 vector shifts	Simon Pilgrim	2017-05-14	3	-12/+12
\| \| \| \|	llvm-svn: 303012
*	[X86][XOP] Account for cost of extract/insert of 256-bit vector shifts	Simon Pilgrim	2017-05-14	3	-98/+98
\| \| \| \|	llvm-svn: 303010
*	[X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics.	Simon Pilgrim	2017-05-07	2	-24/+24
\| \| \| \| \| \|	Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. llvm-svn: 302378