bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][CM] update add\sub costs of vectors of 64 in X86\SLM arch	Mohammed Agabaria	2017-07-02	1	-4/+9
\| \| \| \| \| \| \| \| \|	this patch updates the cost of addq\subq (add\subtract of vectors of 64bits) based on the performance numbers of SLM arch. Differential Revision: https://reviews.llvm.org/D33983 llvm-svn: 306974
*	[AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2	Dorit Nuzman	2017-06-25	1	-0/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238
*	[x86] enable CGP memcmp() expansion for 2/4/8 byte sizes	Sanjay Patel	2017-06-20	1	-0/+6
\| \| \| \| \| \| \| \| \|	There are a couple of potential improvements as seen in the IR and asm: 1. We're unnecessarily extending to a larger type to compare values. 2. The codegen for (select cond, 1, -1) could avoid a cmov. (or we could change the order of the compares, so we have a select with 0 operand) llvm-svn: 305802
*	Revert r304824 "Fix PR23384 (part 3 of 3)"	Hans Wennborg	2017-06-19	1	-11/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This seems to be interacting badly with ASan somehow, causing false reports of heap-buffer overflows: PR33514. > Summary: > The patch makes instruction count the highest priority for > LSR solution for X86 (previously registers had highest priority). > > Reviewers: qcolombet > > Differential Revision: http://reviews.llvm.org/D30562 > > From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 305720
*	Fix PR23384 (part 3 of 3)	Evgeny Stupachenko	2017-06-06	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304824
*	[Atomics][LoopIdiom] Recognize unordered atomic memcpy	Anna Thomas	2017-06-06	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations. Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html Patch by Daniel Neilson! Reviewers: reames, anna, skatkov Reviewed By: reames, anna Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33243 llvm-svn: 304806
*	[X86][AVX512] Add 512-bit vector ctpop costs + tests	Simon Pilgrim	2017-05-18	1	-0/+6
\| \| \| \|	llvm-svn: 303342
*	[X86][AVX512] Add 512-bit vector ctlz costs + tests	Simon Pilgrim	2017-05-17	1	-0/+24
\| \| \| \|	llvm-svn: 303300
*	[X86][AVX512] Add 512-bit vector cttz costs + tests	Simon Pilgrim	2017-05-17	1	-0/+6
\| \| \| \|	llvm-svn: 303293
*	[X86][AVX512] Add 512-bit vector bitreverse costs + tests	Simon Pilgrim	2017-05-17	1	-0/+18
\| \| \| \|	llvm-svn: 303283
*	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts	Simon Pilgrim	2017-05-14	1	-49/+49
\| \| \| \|	llvm-svn: 303023
*	[X86][AVX2] Fix costs for v4i64 ashr by splat	Simon Pilgrim	2017-05-14	1	-0/+5
\| \| \| \|	llvm-svn: 303022
*	[X86][AVX1] Account for cost of extract/insert of 256-bit shifts by splat	Simon Pilgrim	2017-05-14	1	-12/+12
\| \| \| \|	llvm-svn: 303021
*	[X86][AVX1] Account for cost of extract/insert of 256-bit SDIV/UDIV by mul ↵	Simon Pilgrim	2017-05-14	1	-17/+17
\| \| \| \| \| \|	sequences llvm-svn: 303017
*	[X86][XOP] XOP's general v16i8 shifts will be used instead of v8i16 shift + ↵	Simon Pilgrim	2017-05-14	1	-3/+6
\| \| \| \| \| \| \| \|	mask. Tweak cost model to match what lowering actually does. llvm-svn: 303013
*	[X86][SSE] Account for cost of extract/insert of v32i8 vector shifts	Simon Pilgrim	2017-05-14	1	-3/+3
\| \| \| \|	llvm-svn: 303012
*	[X86][XOP] Account for cost of extract/insert of 256-bit vector shifts	Simon Pilgrim	2017-05-14	1	-12/+12
\| \| \| \|	llvm-svn: 303010
*	[X86][AVX1] Improve 256-bit vector costs for integer unary intrinsics.	Simon Pilgrim	2017-05-07	1	-16/+16
\| \| \| \| \| \|	Account for subvector extraction/insertion, helps prevent the vectorizers from selecting 256-bit vectors that will have to be split anyhow on AVX1 targets. llvm-svn: 302378
*	[SystemZ] TargetTransformInfo cost functions implemented.	Jonas Paulsson	2017-04-12	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052
*	[X86 TTI] Implement LSV hook	Keno Fischer	2017-04-05	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: LSV wants to know the maximum size that can be loaded to a vector register. On X86, this always matches the maximum register width. Implement this accordingly and add a test to make sure that LSV can vectorize up to the maximum permissible width on X86. Reviewers: delena, arsenm Reviewed By: arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D31504 llvm-svn: 299589
*	[X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars	Simon Pilgrim	2017-03-15	1	-0/+19
\| \| \| \| \| \|	Prep work for PR31810 llvm-svn: 297876
*	Align cost model columns. NFCI.	Simon Pilgrim	2017-03-15	1	-4/+4
\| \| \| \|	llvm-svn: 297824
*	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved	Jonas Paulsson	2017-03-14	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705
*	[X86] Add costs for non-AVX512 single-source permutation integer shuffles	Michael Kuperstein	2017-02-02	1	-3/+16
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D29416 llvm-svn: 293932
*	[TargetTransformInfo] Refactor and improve getScalarizationOverhead()	Jonas Paulsson	2017-01-26	1	-14/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155
*	[X86] enable memory interleaving for X86\SLM arch.	Mohammed Agabaria	2017-01-25	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28547 llvm-svn: 293040
*	Remove trailing whitespace. NFCI.	Simon Pilgrim	2017-01-20	1	-1/+1
\| \| \| \|	llvm-svn: 292613
*	[CostModel][X86] Removed unused cost. NFCI.	Simon Pilgrim	2017-01-20	1	-1/+0
\| \| \| \| \| \|	SHL v8i32 is already handled in the SSE41 cost table llvm-svn: 292612
*	[CostModel][X86] Fix AVX512BW vector shift costs for vXi16 types	Simon Pilgrim	2017-01-15	1	-0/+8
\| \| \| \| \| \|	We already have patterns in place to support 128/256-bit shifts without AVX512VL llvm-svn: 292077
*	[CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 ↵	Simon Pilgrim	2017-01-14	1	-0/+8
\| \| \| \| \| \|	has landed llvm-svn: 292023
*	[X86][AVX512BW] Vectorize v64i8 vector shifts	Simon Pilgrim	2017-01-11	1	-0/+4
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28447 llvm-svn: 291665
*	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch.	Mohammed Agabaria	2017-01-11	1	-3/+50
\| \| \| \| \| \| \| \| \| \| \| \|	updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657
*	[CostModel][X86] Fixed vXi8 uniform shift costs.	Simon Pilgrim	2017-01-08	1	-6/+16
\| \| \| \| \| \| \| \| \| \|	The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation). Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled. Added missing AVX2/AVX512BW costs as well. llvm-svn: 291391
*	[CostModel][X86] Moved legal uniform shift costs earlier.	Simon Pilgrim	2017-01-08	1	-24/+39
\| \| \| \| \| \|	XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts. llvm-svn: 291390
*	[CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs	Simon Pilgrim	2017-01-07	1	-0/+2
\| \| \| \| \| \|	SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq. llvm-svn: 291372
*	[CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.	Simon Pilgrim	2017-01-07	1	-2/+15
\| \| \| \|	llvm-svn: 291366
*	[CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above	Simon Pilgrim	2017-01-07	1	-45/+44
\| \| \| \| \| \|	We were matching against general vector shift costs before the uniform splat costs llvm-svn: 291365
*	[CostModel][X86] Generalized cost calculation of SHL by constant -> MUL ↵	Simon Pilgrim	2017-01-07	1	-21/+10
\| \| \| \| \| \|	conversion. llvm-svn: 291364
*	[CostModel][X86] Merge separate AVX1 cost LUTs. NFCI.	Simon Pilgrim	2017-01-07	1	-38/+30
\| \| \| \|	llvm-svn: 291355
*	[CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.	Simon Pilgrim	2017-01-07	1	-0/+4
\| \| \| \|	llvm-svn: 291354
*	[CostModel][X86] Added missing AVX2 arithmetic costs.	Simon Pilgrim	2017-01-07	1	-23/+33
\| \| \| \| \| \|	Allows us to correctly fall through to the lower AVX1 costs if look up failed. llvm-svn: 291353
*	[CostModel][X86] Reordered AVX1 arithmetic cost LUT into descending target ↵	Simon Pilgrim	2017-01-07	1	-27/+27
\| \| \| \| \| \|	order. NFCI. llvm-svn: 291352
*	[X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI ↵	Simon Pilgrim	2017-01-07	1	-2/+1
\| \| \| \| \| \|	v64i8 shuffles (PR31470) llvm-svn: 291347
*	[CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.	Simon Pilgrim	2017-01-06	1	-16/+18
\| \| \| \| \| \|	Set the costs on the lowest target that supports the type. llvm-svn: 291229
*	[CostModel][X86] Tidyup arithmetic costs code. NFCI.	Simon Pilgrim	2017-01-05	1	-28/+15
\| \| \| \| \| \|	Remove unnecessary braces, remove one use variables and keep LUTs to similar naming convention. llvm-svn: 291187
*	[CostModel][X86] Move vXi32 MUL costs into existing tables. NFCI.	Simon Pilgrim	2017-01-05	1	-6/+5
\| \| \| \|	llvm-svn: 291165
*	Remove trailing whitespace. NFCI.	Simon Pilgrim	2017-01-05	1	-3/+3
\| \| \| \|	llvm-svn: 291163
*	[CostModel][X86] Reordered SSE42 arithmetic cost LUT into descending order. ↵	Simon Pilgrim	2017-01-05	1	-13/+11
\| \| \| \| \| \|	NFCI. llvm-svn: 291162
*	[CostModel][X86] Move vXi64 MUL costs into existing tables. NFCI.	Simon Pilgrim	2017-01-05	1	-11/+3
\| \| \| \| \| \|	Removes need for yet another LUT. llvm-svn: 291158
*	[CostModel][X86] Strip unused 256-bit vector shift costs. NFCI.	Simon Pilgrim	2017-01-05	1	-8/+0
\| \| \| \| \| \|	Remove SSE2 256-bit entries - AVX targets will have used the SSE42 costs instead. llvm-svn: 291152