bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86][SSE] Regenerated sext/zext constant folding tests and added i686 tests	Simon Pilgrim	2016-10-21	1	-101/+214
\| \| \| \|	llvm-svn: 284837
*	[X86][SSE] Regenerated chained pmovsx store tests and added i686 tests	Simon Pilgrim	2016-10-21	1	-76/+401
\| \| \| \|	llvm-svn: 284833
*	[DAG] use SDNode flags 'nsz' to enable fadd/fsub with zero folds	Sanjay Patel	2016-10-21	1	-25/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed in D24815, let's start the process of killing off the broken fast-math global state housed in TargetOptions and eliminate the need for function-level fast-math attributes. Here we enable two similar folds that are possible when we don't care about signed-zero: fadd nsz x, 0 --> x fsub nsz 0, x --> -x Note that although the test cases include a 'sin' function call, I'm side-stepping the FMF-on-calls question (and lack of support in the DAG) for now. It's not needed for these tests - isNegatibleForFree/GetNegatedExpression just look through a ISD::FSIN node. Also, when we create an FNEG node and propagate the Flags of the FSUB to it, this doesn't actually do anything today because Flags are silently dropped for any node that is not a binary operator. Differential Revision: https://reviews.llvm.org/D25297 llvm-svn: 284824
*	[X86][AVX512] Add mask/maskz writemask support to subvector broadcast ↵	Simon Pilgrim	2016-10-21	2	-8/+8
\| \| \| \| \| \|	shuffle decode comments llvm-svn: 284821
*	[X86][AVX] Add 32-bit target tests for vector lzcnt/tzcnt to demonstrate ↵	Simon Pilgrim	2016-10-21	2	-32/+524
\| \| \| \| \| \|	missed folding opportunities llvm-svn: 284816
*	[AVX-512] Add tests to show opportunities for commuting vpermi2/vpermt2 ↵	Craig Topper	2016-10-21	1	-0/+367
\| \| \| \| \| \| \| \|	instructions. Commuting will be added in a future commit. llvm-svn: 284808
*	Using branch probability to guide critical edge splitting.	Dehao Chen	2016-10-20	6	-99/+146
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass. The performance impact on speccpu2006 on Intel sandybridge machines: spec/2006/fp/C++/444.namd 25.3 +0.26% spec/2006/fp/C++/447.dealII 45.96 -0.10% spec/2006/fp/C++/450.soplex 41.97 +1.49% spec/2006/fp/C++/453.povray 36.83 -0.96% spec/2006/fp/C/433.milc 23.81 +0.32% spec/2006/fp/C/470.lbm 41.17 +0.34% spec/2006/fp/C/482.sphinx3 48.13 +0.69% spec/2006/int/C++/471.omnetpp 22.45 +3.25% spec/2006/int/C++/473.astar 21.35 -2.06% spec/2006/int/C++/483.xalancbmk 36.02 -2.39% spec/2006/int/C/400.perlbench 33.7 -0.17% spec/2006/int/C/401.bzip2 22.9 +0.52% spec/2006/int/C/403.gcc 32.42 -0.54% spec/2006/int/C/429.mcf 39.59 +0.19% spec/2006/int/C/445.gobmk 26.98 -0.00% spec/2006/int/C/456.hmmer 24.52 -0.18% spec/2006/int/C/458.sjeng 28.26 +0.02% spec/2006/int/C/462.libquantum 55.44 +3.74% spec/2006/int/C/464.h264ref 46.67 -0.39% geometric mean +0.20% Manually checked 473 and 471 to verify the diff is in the noise range. Reviewers: rengolin, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24818 llvm-svn: 284757
*	Fix *_EXTEND_VECTOR_INREG legalization	Pirama Arumuga Nainar	2016-10-20	1	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: While promoting _EXTEND_VECTOR_INREG nodes whose inputs are already promoted, perform the appropriate sign extension for the promoted node before doing the _EXTEND_VECTOR_INREG operation. If not, the undefined high-order bits of the promoted operand may (a) be garbage inc ase of zext) or (b) contribute the wrong sign-bit (in case of sext) Updated the promote-vec3.ll test after this change. The diff shows explicit zeroing in case of zext and intermediate sign extension in case of sext. Reviewers: RKSimon Subscribers: llvm-commits, srhines Differential Revision: https://reviews.llvm.org/D25790 llvm-svn: 284752
*	[DAGCombiner] Add general constant vector support to (srl (shl x, c), c) -> ↵	Simon Pilgrim	2016-10-20	1	-14/+2
\| \| \| \| \| \| \| \|	(and x, cst2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284717
*	[DAGCombiner] Add general constant vector support to (shl (add x, c1), c2) ↵	Simon Pilgrim	2016-10-19	1	-1/+1
\| \| \| \| \| \| \| \|	-> (add (shl x, c2), c1 << c2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284613
*	[WinEH] Allow catchpads to reuse the same catch object	Reid Kleckner	2016-10-19	1	-0/+107
\| \| \| \| \| \|	This code used a regular when it should have used a multimap. llvm-svn: 284612
*	[DAG] optimize negation of bool	Sanjay Patel	2016-10-19	1	-35/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use mask and negate for legalization of i1 source type with SIGN_EXTEND_INREG. With the mask, this should be no worse than 2 shifts. The mask can be eliminated in some cases, so that should be better than 2 shifts. This change exposed some missing folds related to negation: https://reviews.llvm.org/rL284239 https://reviews.llvm.org/rL284395 There may be others, so please let me know if you see any regressions. Differential Revision: https://reviews.llvm.org/D25485 llvm-svn: 284611
*	[DAGCombiner] Add general constant vector support to (shl (sra x, c1), c1) ↵	Simon Pilgrim	2016-10-19	1	-14/+2
\| \| \| \| \| \| \| \|	-> (and x, (shl -1, c1)) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284608
*	[DAGCombiner] Add general constant vector support to (shl (mul x, c1), c2) ↵	Simon Pilgrim	2016-10-19	1	-1/+0
\| \| \| \| \| \| \| \|	-> (mul x, c1 << c2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284607
*	Fix line endings	Simon Pilgrim	2016-10-19	2	-814/+814
\| \| \| \|	llvm-svn: 284576
*	[DAGCombine] Generalize distributeTruncateThroughAnd to work with any ↵	Simon Pilgrim	2016-10-19	3	-9/+6
\| \| \| \| \| \|	non-opaque constant or constant vector llvm-svn: 284574
*	[AVX-512] Teach isel lowering that a subvector broadcast being inserted into ↵	Craig Topper	2016-10-19	2	-168/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	both halves of a 512-bit vector can be combined into a larger subvector broadcast. Summary: This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors. New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions. There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert. The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns. Reviewers: RKSimon, delena, igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25651 llvm-svn: 284567
*	Revert r284545 again as the regression in ppc still exists. There is bug in ↵	Dehao Chen	2016-10-19	6	-146/+99
\| \| \| \| \| \| \| \|	MBPI exposed by th patch. Also update the section.ll to fix non-x86 failure. llvm-svn: 284563
*	Using branch probability to guide critical edge splitting.	Dehao Chen	2016-10-18	6	-99/+146
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass. The performance impact on speccpu2006 on Intel sandybridge machines: spec/2006/fp/C++/444.namd 25.3 +0.26% spec/2006/fp/C++/447.dealII 45.96 -0.10% spec/2006/fp/C++/450.soplex 41.97 +1.49% spec/2006/fp/C++/453.povray 36.83 -0.96% spec/2006/fp/C/433.milc 23.81 +0.32% spec/2006/fp/C/470.lbm 41.17 +0.34% spec/2006/fp/C/482.sphinx3 48.13 +0.69% spec/2006/int/C++/471.omnetpp 22.45 +3.25% spec/2006/int/C++/473.astar 21.35 -2.06% spec/2006/int/C++/483.xalancbmk 36.02 -2.39% spec/2006/int/C/400.perlbench 33.7 -0.17% spec/2006/int/C/401.bzip2 22.9 +0.52% spec/2006/int/C/403.gcc 32.42 -0.54% spec/2006/int/C/429.mcf 39.59 +0.19% spec/2006/int/C/445.gobmk 26.98 -0.00% spec/2006/int/C/456.hmmer 24.52 -0.18% spec/2006/int/C/458.sjeng 28.26 +0.02% spec/2006/int/C/462.libquantum 55.44 +3.74% spec/2006/int/C/464.h264ref 46.67 -0.39% geometric mean +0.20% Manually checked 473 and 471 to verify the diff is in the noise range. Reviewers: rengolin, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24818 llvm-svn: 284545
*	revert r284541.	Dehao Chen	2016-10-18	6	-146/+99
\| \| \| \|	llvm-svn: 284544
*	Using branch probability to guide critical edge splitting.	Dehao Chen	2016-10-18	6	-99/+146
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The original heuristic to break critical edge during machine sink is relatively conservertive: when there is only one instruction sinkable to the critical edge, it is likely that the machine sink pass will not break the critical edge. This leads to many speculative instructions executed at runtime. However, with profile info, we could model the splitting benefits: if the critical edge has 50% taken rate, it would always be beneficial to split the critical edge to avoid the speculated runtime instructions. This patch uses profile to guide critical edge splitting in machine sink pass. The performance impact on speccpu2006 on Intel sandybridge machines: spec/2006/fp/C++/444.namd 25.3 +0.26% spec/2006/fp/C++/447.dealII 45.96 -0.10% spec/2006/fp/C++/450.soplex 41.97 +1.49% spec/2006/fp/C++/453.povray 36.83 -0.96% spec/2006/fp/C/433.milc 23.81 +0.32% spec/2006/fp/C/470.lbm 41.17 +0.34% spec/2006/fp/C/482.sphinx3 48.13 +0.69% spec/2006/int/C++/471.omnetpp 22.45 +3.25% spec/2006/int/C++/473.astar 21.35 -2.06% spec/2006/int/C++/483.xalancbmk 36.02 -2.39% spec/2006/int/C/400.perlbench 33.7 -0.17% spec/2006/int/C/401.bzip2 22.9 +0.52% spec/2006/int/C/403.gcc 32.42 -0.54% spec/2006/int/C/429.mcf 39.59 +0.19% spec/2006/int/C/445.gobmk 26.98 -0.00% spec/2006/int/C/456.hmmer 24.52 -0.18% spec/2006/int/C/458.sjeng 28.26 +0.02% spec/2006/int/C/462.libquantum 55.44 +3.74% spec/2006/int/C/464.h264ref 46.67 -0.39% geometric mean +0.20% Manually checked 473 and 471 to verify the diff is in the noise range. Reviewers: rengolin, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24818 llvm-svn: 284541
*	[X86][SSE] Added vector lshr/shl combine tests	Simon Pilgrim	2016-10-18	2	-0/+1127
\| \| \| \| \| \|	This doesn't cover all combines in DAGCombiner::visitSRL/visitSHL yet, but identifies several cases where we fail to combine vectors (or non-splatted) vectors llvm-svn: 284518
*	[X86][SSE] Added vector ashr combine tests	Simon Pilgrim	2016-10-18	1	-0/+299
\| \| \| \| \| \|	This doesn't cover all combines in DAGCombiner::visitSRA yet, but identifies several cases where we fail to combine vectors (or non-splatted) vectors llvm-svn: 284498
*	[DAGCombiner] Add splatted vector support to (udiv x, (shl pow2, y)) -> x ↵	Simon Pilgrim	2016-10-18	1	-47/+19
\| \| \| \| \| \|	>>u (log2(pow2)+y) llvm-svn: 284491
*	[X86][AVX512] Add mask/maskz writemask support to constant pool shuffle ↵	Simon Pilgrim	2016-10-18	2	-6/+6
\| \| \| \| \| \|	decode commentx llvm-svn: 284488
*	[X86][SSE] Added extra (mul x, (1 << c)) -> x << c style vector tests	Simon Pilgrim	2016-10-18	1	-0/+76
\| \| \| \| \| \|	vXi64 will benefit more from lowering to shifts than multiplies llvm-svn: 284461
*	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32	Simon Pilgrim	2016-10-18	1	-105/+25
\| \| \| \| \| \| \| \| \| \| \| \|	As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459
*	[AVX-512] Add test case to check shuffle decoding for masked vpermilps for ↵	Craig Topper	2016-10-18	1	-0/+19
\| \| \| \| \| \| \| \|	r284450. This is harder to do for vpermilpd as shuffle combining turns the constant vector into an immediate since all vpermilpd's inputs with constant vector can also be encoded with the immediate form. llvm-svn: 284455
*	[X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has ↵	Craig Topper	2016-10-18	1	-8/+2
\| \| \| \| \| \| \| \|	a different type than the shuffle itself. This is especially important for 32-bit targets with 64-bit shuffle elements. llvm-svn: 284453
*	[AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool ↵	Craig Topper	2016-10-18	1	-16/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	entry has a different type than the shuffle itself. Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25666 llvm-svn: 284451
*	remove FIXME comment (fixed with r284424); NFC	Sanjay Patel	2016-10-17	1	-2/+0
\| \| \| \|	llvm-svn: 284427
*	[DAG] use isConstOrConstSplat in ComputeNumSignBits to optimize SRA	Sanjay Patel	2016-10-17	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The scalar version of this pattern was noted in: https://reviews.llvm.org/D25485 and fixed with: https://reviews.llvm.org/rL284395 More refactoring of the constant/splat helpers is needed and will happen in follow-up patches. Differential Revision: https://reviews.llvm.org/D25685 llvm-svn: 284424
*	[DAG] optimize away an arithmetic-right-shift of a 0 or -1 value	Sanjay Patel	2016-10-17	1	-3/+0
\| \| \| \| \| \| \| \| \|	This came up as part of: https://reviews.llvm.org/D25485 Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors. llvm-svn: 284395
*	[x86] add tests to show missing DAG folds for arithmetic-shift-right	Sanjay Patel	2016-10-17	1	-0/+44
\| \| \| \|	llvm-svn: 284394
*	[x86] auto-generate checks	Sanjay Patel	2016-10-17	1	-0/+17
\| \| \| \|	llvm-svn: 284393
*	[AVX-512] Add shuffle combining support for vpermi2var shuffles derived from ↵	Craig Topper	2016-10-17	1	-32/+0
\| \| \| \| \| \|	existing support for vpermt2var. llvm-svn: 284357
*	[AVX-512] Add vpermi2var test cases to shuffle combining test case. ↵	Craig Topper	2016-10-17	1	-0/+112
\| \| \| \| \| \|	Combining will be added in a future commit. llvm-svn: 284356
*	[AVX-512] Add support for turning a 256-bit load that goes to both halfs of ↵	Craig Topper	2016-10-16	2	-60/+30
\| \| \| \| \| \| \| \|	an insert_subvector into a subvector broadcast. Differential Revision: https://reviews.llvm.org/D25650 llvm-svn: 284353
*	[AVX-512] Fix the operand order for vpermi2var_qi intrinsics to match the ↵	Craig Topper	2016-10-16	2	-9/+9
\| \| \| \| \| \|	other vpermi2var intrinsics. llvm-svn: 284329
*	[AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS.	Craig Topper	2016-10-16	7	-90/+90
\| \| \| \|	llvm-svn: 284328
*	[X86][SSE] Added some basic examples of knownbits failing for vector types	Simon Pilgrim	2016-10-15	1	-0/+119
\| \| \| \| \| \|	computeKnownBits only returns the common bits of each vector element instead of only the elements that are actually used llvm-svn: 284308
*	[X86] Regenerate known bits test	Simon Pilgrim	2016-10-15	1	-4/+18
\| \| \| \|	llvm-svn: 284306
*	[AVX-512] Add shuffle comments for vbroadcast instructions.	Craig Topper	2016-10-15	5	-92/+98
\| \| \| \|	llvm-svn: 284305
*	Add a pass to optimize patterns of vectorized interleaved memory accesses for	David L Kreitzer	2016-10-14	1	-0/+129
\| \| \| \| \| \| \| \| \| \| \| \| \|	X86. The pass optimizes as a unit the entire wide load + shuffles pattern produced by interleaved vectorization. This initial patch optimizes one pattern (64-bit elements interleaved by a factor of 4). Future patches will generalize to additional patterns. Patch by Farhana Aleen Differential revision: http://reviews.llvm.org/D24681 llvm-svn: 284260
*	[X86] Take advantage of the lzcnt instruction on btver2 architectures when ↵	Pierre Gousseau	2016-10-14	1	-0/+341
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248
*	[DAG] add folds for negated shifted sign bit	Sanjay Patel	2016-10-14	1	-12/+4
\| \| \| \| \| \| \| \| \|	The same folds exist in InstCombine already. This came up as part of: https://reviews.llvm.org/D25485 llvm-svn: 284239
*	[x86] add tests to show missing folds for negated shifted sign bit	Sanjay Patel	2016-10-14	1	-0/+57
\| \| \| \|	llvm-svn: 284238
*	[DAGCombiner] Teach createBuildVecShuffle to handle cases where input ↵	Craig Topper	2016-10-14	2	-7/+7
\| \| \| \| \| \| \| \|	vectors are less than half of the output vector size. This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code. llvm-svn: 284204
*	CodeGen: use MSVC division on windows itanium	Saleem Abdulrasool	2016-10-13	1	-0/+38
\| \| \| \| \| \| \|	Windows itanium is identical to MSVC when dealing with everything but C++. Lower the math routines into msvcrt rather than compiler-rt. llvm-svn: 284175
*	CodeGen: adjust floating point operations in Windows itanium	Saleem Abdulrasool	2016-10-13	1	-0/+92
\| \| \| \| \| \| \|	Windows itanium is equivalent to MSVC except in C++ mode. Ensure that the promote the 32-bit floating point operations to their 64-bit equivalences. llvm-svn: 284173