bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))	Tim Northover	2015-01-21	1	-0/+11
\| \| \| \|	llvm-svn: 226663
*	Improve DAG combine pass on certain IR vector patterns	Mehdi Amini	2015-01-17	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector produced suboptimal code in AVX2 mode with certain IR combinations. In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32 (undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical, but then mysteriously generated rather bad code; the movq/movhpd combination didn't match. The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs would get promoted to 4f32 by the type legalizer, eventually resulting in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing these were both half the output size, concatted them and then produced a shuffle. However, the resulting concat + shuffle was more complex than it should be; in the case where the upper half of the output is undef, we probably want to generate shuffle + concat instead. This enhancement causes the vector_shuffle combine step to recognize this suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR in case the same suboptimal pattern occurs for other reasons. This results in the optimizer correctly producing the optimal movq + movhpd sequence for all three variations on this IR, even with AVX2. I've included a test case. Radar link: rdar://problem/19287012 Fix for PR 21943. From: Fiona Glaser <fglaser@apple.com> llvm-svn: 226360
*	[cleanup] Re-sort all the #include lines in LLVM using	Chandler Carruth	2015-01-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	utils/sort_includes.py. I clearly haven't done this in a while, so more changed than usual. This even uncovered a missing include from the InstrProf library that I've added. No functionality changed here, just mechanical cleanup of the include order. llvm-svn: 225974
*	DAG Combiner: Fold SelectCC When Cond is UNDEF	Mehdi Amini	2015-01-14	1	-4/+7
\| \| \| \| \| \| \| \| \|	In case folding a node end up with a NaN as operand for the select, the folding of the condition of the selectcc node returns "UNDEF". Differential Revision: http://reviews.llvm.org/D6889 llvm-svn: 225952
*	DAGCombiner: simplify by using condition variables; NFC	Matthias Braun	2015-01-13	1	-16/+12
\| \| \| \|	llvm-svn: 225836
*	R600: Implement getRecipEstimate	Matt Arsenault	2015-01-13	1	-1/+2
\| \| \| \| \| \| \| \| \|	This requires a new hook to prevent expanding sqrt in terms of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are all the same rate, so this expansion would just double the number of instructions and cycles. llvm-svn: 225828
*	Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now ↵	Olivier Sallenave	2015-01-13	1	-63/+70
\| \| \| \| \| \|	guarded with that hook. llvm-svn: 225795
*	Combine fcmp + select to fminnum / fmaxnum if no nans and legal	Matt Arsenault	2015-01-13	1	-0/+59
\| \| \| \| \| \| \|	Also require unsafe FP math for no since there isn't a way to test for signed zeros. llvm-svn: 225744
*	[DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities)	Hal Finkel	2015-01-09	1	-10/+24
\| \| \| \| \| \| \| \| \| \|	As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA, we need to extend the incoming operands so that the resulting node will really be legal. This is currently enabled only for PowerPC, and it happens to work there regardless, but this should fix the functionality for everyone else should anyone else wish to use it. llvm-svn: 225492
*	Partial fix to r225380 (More FMA folding opportunities)	Hal Finkel	2015-01-09	1	-96/+95
\| \| \| \| \| \| \| \| \| \| \| \|	As pointed out by Aditya (and Owen), there are two things wrong with this code. First, it adds patterns which elide FP extends when forming FMAs, and that might not be profitable on all targets (it belongs behind the pre-existing aggressive-FMA-formation flag). This is fixed by this change. Second, the resulting nodes might have operands of different types (the extensions need to be re-added). That will be fixed in the follow-up commit. llvm-svn: 225485
*	[SelectionDAG] Allow targets to specify legality of extloads' result	Ahmed Bougacha	2015-01-08	1	-26/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
*	More FMA folding opportunities.	Olivier Sallenave	2015-01-07	1	-1/+133
\| \| \| \|	llvm-svn: 225380
*	Test commit	Olivier Sallenave	2015-01-07	1	-0/+1
\| \| \| \|	llvm-svn: 225368
*	Replace several 'assert(false' with 'llvm_unreachable' or fold a condition ↵	Craig Topper	2015-01-05	1	-1/+1
\| \| \| \| \| \|	into the assert. llvm-svn: 225160
*	Revert "merge consecutive stores of extracted vector elements"	Alexey Samsonov	2014-12-31	1	-75/+4
\| \| \| \| \| \| \|	This reverts commit r224611. This change causes crashes in X86 DAG->DAG Instruction Selection. llvm-svn: 225031
*	Always assert in DAGCombine and not only when -debug is enabled	Mehdi Amini	2014-12-23	1	-5/+6
\| \| \| \| \| \| \| \| \|	Right now in DAG Combine check the validity of the returned type only when -debug is given on the command line. However usually the test cases in the validation does not use -debug. An Assert build should always check this. llvm-svn: 224779
*	[DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of ↵	Michael Kuperstein	2014-12-23	1	-12/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	elements This partially fixes PR21943. For AVX, we go from: vmovq (%rsi), %xmm0 vmovq (%rdi), %xmm1 vpermilps $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3] vinsertps $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3] vinsertps $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3] vpermilps $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3] vinsertps $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0] To the expected: vmovq (%rdi), %xmm0 vmovhpd (%rsi), %xmm0, %xmm0 retq Fixing this for AVX2 is still open. Differential Revision: http://reviews.llvm.org/D6749 llvm-svn: 224759
*	merge consecutive stores of extracted vector elements	Sanjay Patel	2014-12-19	1	-4/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a path to DAGCombiner::MergeConsecutiveStores() to combine multiple scalar stores when the store operands are extracted vector elements. This is a partial fix for PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). For the new test case, codegen improves from: vmovss %xmm0, (%rdi) vextractps $1, %xmm0, 4(%rdi) vextractps $2, %xmm0, 8(%rdi) vextractps $3, %xmm0, 12(%rdi) vextractf128 $1, %ymm0, %xmm0 vmovss %xmm0, 16(%rdi) vextractps $1, %xmm0, 20(%rdi) vextractps $2, %xmm0, 24(%rdi) vextractps $3, %xmm0, 28(%rdi) vzeroupper retq To: vmovups %ymm0, (%rdi) vzeroupper retq Patch reviewed by Nadav Rotem. Differential Revision: http://reviews.llvm.org/D6698 llvm-svn: 224611
*	[DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.	Michael Kuperstein	2014-12-17	1	-11/+22
\| \| \| \| \| \| \| \| \| \|	This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors. This fixes PR15872 and provides a partial fix for PR21711. Differential Revision: http://reviews.llvm.org/D6678 llvm-svn: 224429
*	Add target hook for whether it is profitable to reduce load widths	Matt Arsenault	2014-12-12	1	-0/+3
\| \| \| \| \| \| \| \|	Add an option to disable optimization to shrink truncated larger type loads to smaller type loads. On SI this prevents using scalar load instructions in some cases, since there are no scalar extloads. llvm-svn: 224084
*	Fix a few instances found in SelectionDAG where we were not handling F16 at ↵	Owen Anderson	2014-12-09	1	-2/+0
\| \| \| \| \| \|	parity with F32 and F64. llvm-svn: 223760
*	[InstCombine] Minor optimization for bswap with binary ops	Simon Pilgrim	2014-12-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added instcombine optimizations for BSWAP with AND/OR/XOR ops: OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) ) OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) ) Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well: fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y)) Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone. Differential Revision: http://reviews.llvm.org/D6407 llvm-svn: 223349
*	Masked Load / Store Intrinsics - the CodeGen part.	Elena Demikhovsky	2014-12-04	1	-0/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348
*	Revert "Masked Vector Load and Store Intrinsics."	Duncan P. N. Exon Smith	2014-11-28	1	-161/+0
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936
*	Masked Vector Load and Store Intrinsics.	Elena Demikhovsky	2014-11-23	1	-0/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632
*	[DAG] Teach how to turn a build_vector into a shuffle if some of the ↵	Andrea Di Biagio	2014-11-21	1	-11/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. llvm-svn: 222536
*	[DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.	Andrea Di Biagio	2014-11-21	1	-153/+73
\| \| \| \| \| \| \| \|	This patch simplifies the logic that combines a pair of shuffle nodes into a single shuffle if there is a legal mask. Also added comments to better describe the algorithm. No functional change intended. llvm-svn: 222522
*	DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same ↵	Hao Liu	2014-11-21	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \|	divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510
*	Update SetVector to rely on the underlying set's insert to return a ↵	David Blaikie	2014-11-19	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
*	Fix optimisations of SELECT_CC which assumed result is boolean	Oliver Stannard	2014-11-17	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Some optimisations in DAGCombiner cause miscompilations for targets that use TargetLowering::UndefinedBooleanContent, because they assume that the results of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and XORed. These optimisations are only valid for targets that use ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent. This is a follow-up to D6210/r221693. llvm-svn: 222123
*	[DAG] Improved target independent vector shuffle folding logic.	Andrea Di Biagio	2014-11-15	1	-0/+20
\| \| \| \| \| \| \| \| \|	This patch teaches the DAGCombiner how to combine shuffles according to rules: shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2) llvm-svn: 222090
*	LLVM incorrectly folds xor into select	Oliver Stannard	2014-11-11	1	-1/+2
\| \| \| \| \| \| \| \| \|	LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with (set_cc !cc x y), which is only correct when the xor has type i1. Instead, we should check that the constant operand to the xor is all ones. llvm-svn: 221693
*	[X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.	Andrea Di Biagio	2014-11-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the folding of vector AND nodes into blend operations for targets that feature SSE4.1. A vector AND node where one of the operands is a constant build_vector with elements that are either zero or all-ones can be converted into a blend. This allows for example to simplify the following code: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1> %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0> %3 = or <4 x i32> %1, %2 ret <4 x i32> %3 } Before this patch llc (-mcpu=corei7) generated: andps LCPI1_0(%rip), %xmm0, %xmm0 andps LCPI1_1(%rip), %xmm1, %xmm1 orps %xmm1, %xmm0, %xmm0 retq With this patch we generate a single 'vpblendw'. llvm-svn: 221343
*	Normally an 'optnone' function goes through fast-isel, which does not	Paul Robinson	2014-11-03	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \|	call DAGCombiner. But we ran into a case (on Windows) where the calling convention causes argument lowering to bail out of fast-isel, and we end up in CodeGenAndEmitDAG() which does run DAGCombiner. So, we need to make DAGCombiner check for 'optnone' after all. Commit includes the test that found this, plus another one that got missed in the original optnone work. llvm-svn: 221168
*	Fix incorrect invariant check in DAG Combine	Louis Gerbarg	2014-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Earlier this summer I fixed an issue where we were incorrectly combining multiple loads that had different constraints such alignment, invariance, temporality, etc. Apparently in one case I made copt paste error and swapped alignment and invariance. Tests included. rdar://18816719 llvm-svn: 220933
*	Whitespace.	NAKAMURA Takumi	2014-10-29	1	-26/+26
\| \| \| \|	llvm-svn: 220857
*	Use rsqrt (X86) to speed up reciprocal square root calcs	Sanjay Patel	2014-10-24	1	-40/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570
*	Strength reduce constant-sized vectors into arrays. No functionality change.	Benjamin Kramer	2014-10-22	1	-2/+2
\| \| \| \|	llvm-svn: 220412
*	Add minnum / maxnum codegen	Matt Arsenault	2014-10-21	1	-0/+46
\| \| \| \|	llvm-svn: 220342
*	SelectionDAG: Add sext_inreg optimizations	Jan Vesely	2014-10-17	1	-0/+22
\| \| \| \| \| \| \| \| \| \|	v2: use dyn_cast fixup comments v3: use cast Reviewed-by: Matt Arsenault <arsenm2@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220044
*	Improve sqrt estimate algorithm (fast-math)	Sanjay Patel	2014-10-09	1	-17/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes the fast-math implementation for calculating sqrt(x) from: y = 1 / (1 / sqrt(x)) to: y = x * (1 / sqrt(x)) This has 2 benefits: less code / faster code and one less estimate instruction that may lose precision. The only target that will be affected (until http://reviews.llvm.org/D5658 is approved) is PPC. The difference in codegen for PPC is 2 less flops for a single-precision sqrtf or vector sqrtf and 4 less flops for a double-precision sqrt. We also eliminate a constant load and extra register usage. Differential Revision: http://reviews.llvm.org/D5682 llvm-svn: 219445
*	Remove unnecessary include.	Eric Christopher	2014-10-08	1	-1/+0
\| \| \| \|	llvm-svn: 219368
*	Use both the cached TLI and the subtarget off of the DAG in	Eric Christopher	2014-10-08	1	-15/+10
\| \| \| \| \| \|	the DAG combiner. llvm-svn: 219367
*	[DAGCombine] Remove SIGN_EXTEND-related inf-loop	Hal Finkel	2014-10-06	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch's author points out that, despite the function's documentation, getSetCCResultType is only used to get the SETCC result type (with one here-removed problematic exception). In one case, getSetCCResultType was being used to get the predicate type to use for a SELECT node, and then SIGN_EXTENDing (or truncating) to get the input predicate to match that type. Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the extension/truncation seems unnecessary here: SELECT is defined as: Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1 then the high bits must conform to getBooleanContents. So here we remove this use of getSetCCResultType and update getSetCCResultType's documentation to reflect its actual uses. Patch by deadal nix! llvm-svn: 219141
*	Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)	Sanjay Patel	2014-10-06	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c: float distance = sqrt(dx * dx + dy * dy + dz * dz); float mag = dt / (distance * distance * distance); Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces: addis 3, 2, .LCPI4_2@toc@ha lfs 4, .LCPI4_2@toc@l(3) addis 3, 2, .LCPI4_1@toc@ha lfs 0, .LCPI4_1@toc@l(3) fcmpu 0, 1, 4 beq 0, .LBB4_2 # BB#1: frsqrtes 4, 1 addis 3, 2, .LCPI4_0@toc@ha lfs 5, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 5, 1 fmuls 6, 4, 4 fmadds 1, 13, 6, 5 fmuls 1, 4, 1 fres 4, 1 <--- reciprocal of reciprocal square root fnmsubs 1, 1, 4, 0 fmadds 4, 4, 1, 4 .LBB4_2: fmuls 1, 4, 2 fres 2, 1 fnmsubs 0, 1, 2, 0 fmadds 0, 2, 0, 2 fmuls 1, 3, 0 blr After the patch, this simplifies to: frsqrtes 0, 1 addis 3, 2, .LCPI4_1@toc@ha fres 5, 2 lfs 4, .LCPI4_1@toc@l(3) addis 3, 2, .LCPI4_0@toc@ha lfs 7, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 4, 1 fmuls 6, 0, 0 fnmsubs 2, 2, 5, 7 fmadds 1, 13, 6, 4 fmadds 2, 5, 2, 5 fmuls 0, 0, 1 fmuls 0, 0, 2 fmuls 1, 3, 0 blr Differential Revision: http://reviews.llvm.org/D5628 llvm-svn: 219139
*	[x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffle	Chandler Carruth	2014-10-05	1	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that are unused. This allows the combiner to delete math feeding shuffles where the math isn't actually necessary. This improves some of the vperm2x128 tests that regressed when the vector shuffle lowering started actually generating vperm instructions rather than forcibly decomposing them. Sadly, this isn't enough to get this really right because we still form a completely unnecessary permutation. To fix that, we also need to fold shuffles which just rearrange concatenated or inserted subvectors. llvm-svn: 219086
*	Use the target-specified iteration count to opt out of any further ↵	Sanjay Patel	2014-09-30	1	-60/+62
\| \| \| \| \| \|	refinement of an estimate. NFC. llvm-svn: 218700
*	Split the estimate() interface into separate functions for each type. NFC.	Sanjay Patel	2014-09-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698
*	[DAG] Check in advance if a build_vector has a legal type before attempting ↵	Andrea Di Biagio	2014-09-30	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	to convert it into a shuffle. Currently, the DAG Combiner only tries to convert type-legal build_vector nodes into shuffles. This patch simply moves the logic that checks if a build_vector has a legal value type up before we even start analyzing the operands. This allows to early exit immediately from method 'visitBUILD_VECTOR' if the node type is known to be illegal. No functional change intended. llvm-svn: 218677
*	[AArch64] Redundant store instructions should be removed as dead code	James Molloy	2014-09-27	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! llvm-svn: 218569