bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32	Simon Pilgrim	2016-10-18	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \|	As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459
*	[x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics ↵	Sanjay Patel	2016-10-03	1	-28/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(PR30433) This should fix: https://llvm.org/bugs/show_bug.cgi?id=30433 There are a couple of open questions about the codegen: 1. Should we let scalar ops be scalars and avoid vector constant loads/splats? 2. Should we have a pass to combine constants such as the inverted pair that we have here? Differential Revision: https://reviews.llvm.org/D25165 llvm-svn: 283119
*	[CostModel][X86] Added tests for current fptosi/fptoui costs	Simon Pilgrim	2016-10-01	2	-0/+501
\| \| \| \|	llvm-svn: 283047
*	[CostModel][X86] Added fcopysign costs	Simon Pilgrim	2016-10-01	1	-0/+65
\| \| \| \|	llvm-svn: 283044
*	[CostModel][X86] Added fabs costs	Simon Pilgrim	2016-10-01	1	-0/+65
\| \| \| \|	llvm-svn: 283042
*	[CostModel][X86] Added scalar float op costs	Simon Pilgrim	2016-09-18	1	-252/+340
\| \| \| \|	llvm-svn: 281864
*	[CostModel][X86] Removed shift tests	Simon Pilgrim	2016-08-21	1	-80/+0
\| \| \| \| \| \|	There are more thorough tests found in vshift-*-cost.ll llvm-svn: 279406
*	[CostModel][X86] Added costs for vXi16 and vXi8 vectors for ↵	Simon Pilgrim	2016-08-21	1	-114/+372
\| \| \| \| \| \|	add/sub/mul/and/or/xor tests llvm-svn: 279405
*	[CostModel][X86] Replaced SSSE3 with SSE2 costs to create a better baseline	Simon Pilgrim	2016-08-21	1	-43/+43
\| \| \| \|	llvm-svn: 279404
*	[CostModel][X86] Added fsqrt and fma costs	Simon Pilgrim	2016-08-21	1	-2/+104
\| \| \| \|	llvm-svn: 279403
*	[CostModel][X86] Split off float arithmetic cost tests	Simon Pilgrim	2016-08-21	2	-215/+224
\| \| \| \|	llvm-svn: 279402
*	[CostModel][X86] Added sub, or, and, fadd and fsub costs and missing 512-bit ↵	Simon Pilgrim	2016-08-19	1	-0/+227
\| \| \| \| \| \|	mul costs llvm-svn: 279301
*	[CostModel][X86] Added some AVX512 and 512-bit vector cost tests	Simon Pilgrim	2016-08-19	1	-0/+129
\| \| \| \|	llvm-svn: 279291
*	[CostModel][X86] Add fdiv + frem cost tests	Simon Pilgrim	2016-08-19	1	-2/+32
\| \| \| \|	llvm-svn: 279283
*	[LV, X86] Be more optimistic about vectorizing shifts.	Michael Kuperstein	2016-08-04	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shifts with a uniform but non-constant count were considered very expensive to vectorize, because the splat of the uniform count and the shift would tend to appear in different blocks. That made the splat invisible to ISel, and we'd scalarize the shift at codegen time. Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we are able to select the appropriate vector shifts. This updates the cost model to to take this into account by making shifts by a uniform cheap again. Differential Revision: https://reviews.llvm.org/D23049 llvm-svn: 277782
*	[X86] Dropped XOP ctbits checks - they match the AVX checks	Simon Pilgrim	2016-08-04	1	-62/+2
\| \| \| \|	llvm-svn: 277718
*	[X86][SSE] Add initial costs for vector CTTZ/CTLZ	Simon Pilgrim	2016-08-04	2	-99/+163
\| \| \| \|	llvm-svn: 277716
*	[X86][SSE] Add cost model values for CTPOP of vectors	Simon Pilgrim	2016-07-20	1	-24/+40
\| \| \| \| \| \| \| \|	This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better. Differential Revision: https://reviews.llvm.org/D22456 llvm-svn: 276104
*	[X86] Add CTPOP/CTLZ/CTTZ scalar cost tests	Simon Pilgrim	2016-07-17	1	-6/+171
\| \| \| \|	llvm-svn: 275725
*	[X86] Make some cast costs more precise	Michael Kuperstein	2016-07-11	3	-35/+35
\| \| \| \| \| \| \| \| \|	Make some AVX and AVX512 cast costs more precise. Based on part of a patch by Elena Demikhovsky (D15604). Differential Revision: http://reviews.llvm.org/D22064 llvm-svn: 275106
*	[x86] fix cost of SINT_TO_FP for i32 --> float (PR21356, PR28434)	Sanjay Patel	2016-07-06	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is "cvtdq2ps" which does not appear to be particularly slow on any CPU according to Agner's tables. Choosing "5" as a cost here as suggested in: https://llvm.org/bugs/show_bug.cgi?id=21356 ...but it seems very conservative given that the instruction is fully pipelined, and I think these costs are supposed to model throughput. Note that related costs are also most likely too high, but this fixes PR21356 and partly fixes PR28434. llvm-svn: 274658
*	[TTI] The cost model should not assume vector casts get completely scalarized	Michael Kuperstein	2016-07-06	4	-210/+210
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cost model should not assume vector casts get completely scalarized, since on targets that have vector support, the common case is a partial split up to the legal vector size. So, when a vector cast gets split, the resulting casts end up legal and cheap. Instead of pessimistically assuming scalarization, base TTI can use the costs the concrete TTI provides for the split vector, plus a fudge factor to account for the cost of the split itself. This fudge factor is currently 1 by default, except on AMDGPU where inserts and extracts are considered free. Differential Revision: http://reviews.llvm.org/D21251 llvm-svn: 274642
*	[PowerPC] - Legalize vector types by widening instead of integer promotion	Nemanja Ivanovic	2016-07-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch corresponds to review: http://reviews.llvm.org/D20443 It changes the legalization strategy for illegal vector types from integer promotion to widening. This only applies for vectors with elements of width that is a multiple of a byte since we have hardware support for vectors with 1, 2, 3, 8 and 16 byte elements. Integer promotion for vectors is quite expensive on PPC due to the sequence of breaking apart the vector, extending the elements and reconstituting the vector. Two of these operations are expensive. This patch causes between minor and major improvements in performance on most benchmarks. There are very few benchmarks whose performance regresses. These regressions can be handled in a subsequent patch with a DAG combine (similar to how this patch handles int -> fp conversions of illegal vector types). llvm-svn: 274535
*	Support arbitrary addrspace pointers in masked load/store intrinsics	Artur Pilipenko	2016-06-28	1	-29/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 274043
*	Revert -r273892 "Support arbitrary addrspace pointers in masked load/store ↵	Artur Pilipenko	2016-06-27	1	-27/+29
\| \| \| \| \| \|	intrinsics" since some of the clang tests don't expect to see the updated signatures. llvm-svn: 273895
*	Support arbitrary addrspace pointers in masked load/store intrinsics	Artur Pilipenko	2016-06-27	1	-29/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details). This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 273892
*	[X86] Make arithmetic operations cost model test saner. NFC.	Michael Kuperstein	2016-06-21	1	-83/+128
\| \| \| \|	llvm-svn: 273316
*	[X86][SSE] Add cost model for BSWAP of vectors	Simon Pilgrim	2016-06-20	2	-38/+38
\| \| \| \| \| \| \| \|	The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization. Differential Revision: http://reviews.llvm.org/D21521 llvm-svn: 273217
*	[CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets	Simon Pilgrim	2016-06-11	1	-24/+24
\| \| \| \| \| \|	To account for the fast PSHUFB implementation now available llvm-svn: 272484
*	[X86] Add costs for SSE zext/sext to v4i64 to TTI	Michael Kuperstein	2016-06-10	1	-0/+79
\| \| \| \| \| \| \| \| \|	The costs are somewhat hand-wavy, but should be much closer to the truth than what we get from BasicTTI. Differential Revision: http://reviews.llvm.org/D21156 llvm-svn: 272406
*	[CostModel][X86][XOP] Added XOP costmodel for BITREVERSE	Simon Pilgrim	2016-05-24	1	-12/+12
\| \| \| \| \| \|	Now that we have a nice fast VPPERM solution. Added framework for future intrinsic costs as well. llvm-svn: 270537
*	[CostModel][X86] Tidied up checks	Simon Pilgrim	2016-05-17	1	-125/+45
\| \| \| \|	llvm-svn: 269770
*	[CostModel][X86] Added scalar bitreverse tests	Simon Pilgrim	2016-05-15	1	-0/+51
\| \| \| \|	llvm-svn: 269594
*	[X86][SSE] Improve cost model for i64 vector comparisons on pre-SSE42 targets	Simon Pilgrim	2016-05-09	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \|	As discussed on PR24888, until SSE42 we don't have access to PCMPGTQ for v2i64 comparisons, but the cost models don't reflect this, resulting in over-optimistic vectorizaton. This patch adds SSE2 'base level' costs that match what a typical target is capable of and only reduces the v2i64 costs at SSE42. Technically SSE41 provides a PCMPEQQ v2i64 equality test, but as getCmpSelInstrCost doesn't give us a way to discriminate between comparison test types we can't easily make use of this, otherwise we could split the cost of integer equality and greater-than tests to give better costings of each. Differential Revision: http://reviews.llvm.org/D20057 llvm-svn: 268972
*	[CostModel][X86] Extended comparison instruction cost model tests to include ↵	Simon Pilgrim	2016-05-08	1	-33/+113
\| \| \| \| \| \|	SSE2/SSE3/SSSE3/SSE41/SSE42 targets llvm-svn: 268877
*	[CostModel][X86] Split BSWAP/BITREVERSE cost tests from CTPOP/CTLZ/CTTZ 'bit ↵	Simon Pilgrim	2016-05-07	3	-176/+188
\| \| \| \| \| \|	count' cost tests llvm-svn: 268859
*	[CostModel][X86] Tweak 'SSE2-only' test CPU as it was only disabling SSE41 ↵	Simon Pilgrim	2016-05-06	1	-1/+1
\| \| \| \| \| \|	not SSE3/SSSE3 etc. llvm-svn: 268763
*	[CostModel][X86] Added ctlz/cttz undef-zero costmodel tests	Simon Pilgrim	2016-05-06	1	-32/+209
\| \| \| \|	llvm-svn: 268761
*	[CostModel][X86] Added costmodel tests for vector ↵	Simon Pilgrim	2016-05-06	1	-0/+481
\| \| \| \| \| \|	ctpop/ctlz/cttz/bitreverse/bswap llvm-svn: 268738
*	[X86]: Changing cost for “TRUNCATE v16i32 to v16i8” in SSE4.1 mode.	Ashutosh Nema	2016-04-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS operations during DAG combine. This generates better code for truncate, so cost of truncate needs to be changed but looks like it got changed only in SSE2 table Whereas this change is also applicable for SSE4.1, so the cost of truncate needs to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from SSE4.1, so it will fall back to SSE2. Reviewers: Simon Pilgrim llvm-svn: 267123
*	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"	Adam Nemet	2016-04-14	1	-27/+29
\| \| \| \| \| \| \| \|	This reverts commit r266086. It breaks the LTO build of gcc in SPEC2000. llvm-svn: 266282
*	Support arbitrary addrspace pointers in masked load/store intrinsics	Artur Pilipenko	2016-04-12	1	-29/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a resubmittion of 263158 change. This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 266086
*	[TTI] Let the cost model estimate ctpop costs based on legality	Benjamin Kramer	2016-03-31	2	-8/+19
\| \| \| \| \| \| \| \| \|	PPC has a vector popcount, this lets the vectorizer use the correct cost for it. Tweak X86 test to use an intrinsic that's actually scalarized (we have a somewhat efficient lowering for vector popcount using SSE, the cost model finds that now). llvm-svn: 265005
*	AMDGPU: Cost model for basic integer operations	Matt Arsenault	2016-03-25	4	-0/+343
\| \| \| \| \| \| \|	This resolves bug 21148 by preventing promotion to i64 induction variables. llvm-svn: 264376
*	AMDGPU: Partially implement getArithmeticInstrCost for FP ops	Matt Arsenault	2016-03-25	4	-0/+358
\| \| \| \|	llvm-svn: 264374
*	TTI: Report 0 cost for free addrspacecasts	Matt Arsenault	2016-03-25	1	-0/+45
\| \| \| \|	llvm-svn: 264369
*	TTI: Use 0 for cost of fabs if free	Matt Arsenault	2016-03-25	1	-0/+97
\| \| \| \| \| \| \|	Ideally this would also happen for fneg, but that isn't a distinct operation in the IR. llvm-svn: 264368
*	AMDGPU: TTI: Make insertelement free.	Matt Arsenault	2016-03-25	1	-0/+37
\| \| \| \| \| \|	We don't want to have a cost to scalarizing operations. llvm-svn: 264364
*	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"	Matthias Braun	2016-03-22	1	-27/+29
\| \| \| \| \| \| \| \| \| \| \|	This commit broke LTO builds. Reverting it to unbreak the bots while the issue is investigated. See also: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160321/341002.html This reverts r263158 llvm-svn: 264088
*	Support arbitrary addrspace pointers in masked load/store intrinsics	Artur Pilipenko	2016-03-10	1	-29/+27
\| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 263158