bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[CostModel][X86] Fixed vXi8 uniform shift costs.	Simon Pilgrim	2017-01-08	5	-39/+45
\| \| \| \| \| \| \| \| \| \|	The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation). Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled. Added missing AVX2/AVX512BW costs as well. llvm-svn: 291391
*	[CostModel][X86] Moved legal uniform shift costs earlier.	Simon Pilgrim	2017-01-08	2	-8/+5
\| \| \| \| \| \|	XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts. llvm-svn: 291390
*	[CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs	Simon Pilgrim	2017-01-07	1	-12/+12
\| \| \| \| \| \|	SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq. llvm-svn: 291372
*	[CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.	Simon Pilgrim	2017-01-07	2	-12/+16
\| \| \| \|	llvm-svn: 291366
*	[CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above	Simon Pilgrim	2017-01-07	3	-20/+20
\| \| \| \| \| \|	We were matching against general vector shift costs before the uniform splat costs llvm-svn: 291365
*	[CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.	Simon Pilgrim	2017-01-07	3	-12/+12
\| \| \| \|	llvm-svn: 291354
*	[X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI ↵	Simon Pilgrim	2017-01-07	1	-1/+1
\| \| \| \| \| \|	v64i8 shuffles (PR31470) llvm-svn: 291347
*	[CostModel][X86] Add AVX512 and 512-bit vector shift cost tests.	Simon Pilgrim	2017-01-06	3	-18/+758
\| \| \| \|	llvm-svn: 291269
*	[AArch64] Reduce vector insert/extract cost for Falkor.	Chad Rosier	2017-01-06	1	-0/+26
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28403 llvm-svn: 291254
*	[CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.	Simon Pilgrim	2017-01-06	1	-4/+2
\| \| \| \| \| \|	Set the costs on the lowest target that supports the type. llvm-svn: 291229
*	[CostModel][X86] Add SDIV/UDIV cost tests for a wider range of targets	Simon Pilgrim	2017-01-06	1	-18/+50
\| \| \| \| \| \|	Added a test demonstrating bug in AVX512 division costs llvm-svn: 291228
*	[CostModel][X86] Include the cost of 256-bit upper subvector ↵	Simon Pilgrim	2017-01-05	1	-2/+2
\| \| \| \| \| \| \| \|	extract/insertion in AVX1 v4i64 MUL Matches other MUL/ADD/SUB 256-bit case on AVX1 llvm-svn: 291149
*	[AArch64][CostModel] Add coverage for bswap intrinsics.	Chad Rosier	2017-01-05	1	-0/+70
\| \| \| \|	llvm-svn: 291140
*	[CostModel][X86] Add support for broadcast shuffle costs	Simon Pilgrim	2017-01-05	1	-2/+138
\| \| \| \| \| \| \| \|	Currently only for broadcasts with input and output of the same width. Differential Revision: https://reviews.llvm.org/D27811 llvm-svn: 291122
*	[AArch64] Remove mcpu option as this test is not target specific. NFC.	Chad Rosier	2017-01-05	1	-1/+1
\| \| \| \|	llvm-svn: 291117
*	[AArch64] Remove unused arguments from tests. NFC.	Chad Rosier	2017-01-05	1	-32/+32
\| \| \| \|	llvm-svn: 291112
*	[CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs	Simon Pilgrim	2017-01-04	1	-22/+22
\| \| \| \| \| \|	Actual codegen is much better than the extract+insert patterns that was assumed. llvm-svn: 290962
*	Fixed shuffle-reverse cost on AVX-512.	Elena Demikhovsky	2017-01-02	1	-1/+2
\| \| \| \| \| \|	(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately). llvm-svn: 290812
*	AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.	Elena Demikhovsky	2017-01-02	9	-9/+762
\| \| \| \| \| \| \| \| \| \| \| \|	X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810
*	[X86][SSE] Improve lowering of vXi64 multiplies	Simon Pilgrim	2016-12-21	1	-24/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As mentioned on PR30845, we were performing our vXi64 multiplication as: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32); when we could avoid one of the upper shifts with: AloBlo = pmuludq(a, b); AloBhi = pmuludq(a, psrlqi(b, 32)); AhiBlo = pmuludq(psrlqi(a, 32), b); return AloBlo + psllqi(AloBhi + AhiBlo, 32); This matches the lowering on gcc/icc. Differential Revision: https://reviews.llvm.org/D27756 llvm-svn: 290267
*	[AArch64] Guard Misaligned 128-bit store penalty by subtarget feature	Matthew Simpson	2016-12-15	1	-6/+12
\| \| \| \| \| \| \| \| \|	This patch checks that the SlowMisaligned128Store subtarget feature is set when penalizing such stores in getMemoryOpCost. Differential Revision: https://reviews.llvm.org/D27677 llvm-svn: 289845
*	[CostModel][X86] Updated reverse shuffle costs	Simon Pilgrim	2016-12-15	1	-32/+56
\| \| \| \|	llvm-svn: 289819
*	[CostModel] Fix long standing bug with reverse shuffle mask detection	Simon Pilgrim	2016-12-15	1	-0/+31
\| \| \| \| \| \|	Incorrect 'undef' mask index matching meant that broadcast shuffles could be detected as reverse shuffles llvm-svn: 289811
*	[CostModel][X86] Add tests for reverse shuffle costs	Simon Pilgrim	2016-12-15	1	-0/+143
\| \| \| \|	llvm-svn: 289800
*	[AArch64] Correct the check of signed 9-bit imm in isLegalAddressingMode()	Haicheng Wu	2016-12-07	1	-1/+97
\| \| \| \| \| \| \| \|	In the addressing mode, signed 9-bit imm is [-256, 255], not [-512, 511]. Differential Revision: https://reviews.llvm.org/D27480 llvm-svn: 288876
*	[TTI/CostModel] Correct the way getGEPCost() calls isLegalAddressingMode()	Haicheng Wu	2016-12-03	2	-11/+207
\| \| \| \| \| \| \| \|	Fix a bug when we call isLegalAddressingMode() from getGEPCost(). Differential Revision: https://reviews.llvm.org/D27357 llvm-svn: 288569
*	[ppc] Correctly compute the cost of loading 32/64 bit memory into VSR	Guozhi Wei	2016-12-03	1	-0/+19
\| \| \| \| \| \| \| \|	VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial. Differential Revision: https://reviews.llvm.org/D26713 llvm-svn: 288560
*	[SLP] Fixed cost model for horizontal reduction.	Alexey Bataev	2016-12-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when cost of scalar operations is evaluated the vector type is used for scalar operations. Patch fixes this issue and fixes evaluation of the vector operations cost. Several test showed that vector cost model is too optimistic. It allowed vectorization of 8 or less add/fadd operations, though scalar code is faster. Actually, only for 16 or more operations vector code provides better performance. Differential Revision: https://reviews.llvm.org/D26277 llvm-svn: 288398
*	[SLP] Additional tests with the cost of vector operations.	Alexey Bataev	2016-12-01	1	-0/+2
\| \| \| \|	llvm-svn: 288377
*	Revert "[SLP] Additional tests with the cost of vector operations."	Alexey Bataev	2016-12-01	1	-2/+0
\| \| \| \| \| \|	This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e. llvm-svn: 288371
*	[SLP] Additional tests with the cost of vector operations.	Alexey Bataev	2016-12-01	1	-0/+2
\| \| \| \|	llvm-svn: 288369
*	[X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on ↵	Simon Pilgrim	2016-11-24	3	-9/+15
\| \| \| \| \| \| \| \|	AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287882
*	[X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on ↵	Simon Pilgrim	2016-11-23	2	-4/+6
\| \| \| \| \| \| \| \|	AVX512DQ-only targets Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762
*	[CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs	Simon Pilgrim	2016-11-23	1	-3/+3
\| \| \| \|	llvm-svn: 287760
*	[CostModel][X86] Add v2f32 -> v2i64 fptosi/fptoui cost tests	Simon Pilgrim	2016-11-23	2	-0/+14
\| \| \| \|	llvm-svn: 287756
*	[CostModel][X86] Updated sitofp/uitofp scalar/vector cost tests	Simon Pilgrim	2016-11-22	2	-1379/+492
\| \| \| \| \| \| \| \|	Better coverage of all legal types + special cases. Removed old fptoui tests which are all handled in fptoui.ll llvm-svn: 287678
*	[AVX-512] Support FCOPYSIGN for v16f32 and v8f64	Craig Topper	2016-11-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This extends FCOPYSIGN support to 512-bit vectors. I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads. Reviewers: delena, zvi, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26791 llvm-svn: 287298
*	[CostModel][X86] Added mul costs for vXi8 vectors	Simon Pilgrim	2016-11-14	1	-16/+18
\| \| \| \| \| \|	More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838
*	[X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets	Simon Pilgrim	2016-11-14	1	-8/+8
\| \| \| \| \| \| \| \|	Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832
*	[VectorLegalizer] Expansion of CTLZ using CTPOP when possible	Simon Pilgrim	2016-11-08	1	-16/+16
\| \| \| \| \| \| \| \| \| \|	This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233
*	Improved cost model for FDIV and FSQRT, by Andrew Tischenko	Alexey Bataev	2016-10-31	1	-82/+82
\| \| \| \| \| \| \| \| \| \|	There is a bug describing poor cost model for floating point operations: Bug 29083 - [X86][SSE] Improve costs for floating point operations. This patch is the second one in series of patches dealing with cost model. Differential Revision: https://reviews.llvm.org/D25722 llvm-svn: 285564
*	[X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets	Simon Pilgrim	2016-10-27	1	-2/+2
\| \| \| \|	llvm-svn: 285329
*	[X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64	Simon Pilgrim	2016-10-27	1	-4/+13
\| \| \| \| \| \| \| \| \| \|	With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304
*	[CostModel][X86] Added tests for current integer signed/unsigned remainder costs	Simon Pilgrim	2016-10-23	1	-0/+116
\| \| \| \|	llvm-svn: 284940
*	[X86][SSE] Add SSE41/AVX1 costs for vector shifts.	Simon Pilgrim	2016-10-23	3	-109/+109
\| \| \| \| \| \|	We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results. llvm-svn: 284939
*	[CostModel][X86] Added tests for current integer trunc costs	Simon Pilgrim	2016-10-23	1	-0/+141
\| \| \| \|	llvm-svn: 284938
*	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 ↵	Simon Pilgrim	2016-10-20	1	-36/+32
\| \| \| \| \| \| \| \|	bit integer vectors We weren't checking for uniform const costs before the general cost, resulting in very high estimates. llvm-svn: 284755
*	[CostModel][X86] Added tests for sdiv/udiv costs for uniform const and ↵	Simon Pilgrim	2016-10-20	1	-0/+264
\| \| \| \| \| \| \| \|	uniform const power-of-2 Shows poor costings in AVX1/AVX512BW for certain vector types llvm-svn: 284748
*	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit ↵	Simon Pilgrim	2016-10-20	1	-76/+24
\| \| \| \| \| \| \| \| \| \|	integer vectors We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults. We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases. llvm-svn: 284744
*	[CostModel][X86] Added tests for sdiv/udiv costs for scalar and 128/256/512 ↵	Simon Pilgrim	2016-10-20	1	-29/+165
\| \| \| \| \| \| \| \|	bit integer vectors Shows current bug in AVX1/AVX512BW costs for 256 bit vector types llvm-svn: 284723