bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[DAGCombine] Remove SIGN_EXTEND-related inf-loop	Hal Finkel	2014-10-06	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch's author points out that, despite the function's documentation, getSetCCResultType is only used to get the SETCC result type (with one here-removed problematic exception). In one case, getSetCCResultType was being used to get the predicate type to use for a SELECT node, and then SIGN_EXTENDing (or truncating) to get the input predicate to match that type. Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the extension/truncation seems unnecessary here: SELECT is defined as: Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1 then the high bits must conform to getBooleanContents. So here we remove this use of getSetCCResultType and update getSetCCResultType's documentation to reflect its actual uses. Patch by deadal nix! llvm-svn: 219141
*	Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)	Sanjay Patel	2014-10-06	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c: float distance = sqrt(dx * dx + dy * dy + dz * dz); float mag = dt / (distance * distance * distance); Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces: addis 3, 2, .LCPI4_2@toc@ha lfs 4, .LCPI4_2@toc@l(3) addis 3, 2, .LCPI4_1@toc@ha lfs 0, .LCPI4_1@toc@l(3) fcmpu 0, 1, 4 beq 0, .LBB4_2 # BB#1: frsqrtes 4, 1 addis 3, 2, .LCPI4_0@toc@ha lfs 5, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 5, 1 fmuls 6, 4, 4 fmadds 1, 13, 6, 5 fmuls 1, 4, 1 fres 4, 1 <--- reciprocal of reciprocal square root fnmsubs 1, 1, 4, 0 fmadds 4, 4, 1, 4 .LBB4_2: fmuls 1, 4, 2 fres 2, 1 fnmsubs 0, 1, 2, 0 fmadds 0, 2, 0, 2 fmuls 1, 3, 0 blr After the patch, this simplifies to: frsqrtes 0, 1 addis 3, 2, .LCPI4_1@toc@ha fres 5, 2 lfs 4, .LCPI4_1@toc@l(3) addis 3, 2, .LCPI4_0@toc@ha lfs 7, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 4, 1 fmuls 6, 0, 0 fnmsubs 2, 2, 5, 7 fmadds 1, 13, 6, 4 fmadds 2, 5, 2, 5 fmuls 0, 0, 1 fmuls 0, 0, 2 fmuls 1, 3, 0 blr Differential Revision: http://reviews.llvm.org/D5628 llvm-svn: 219139
*	[x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffle	Chandler Carruth	2014-10-05	1	-0/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that are unused. This allows the combiner to delete math feeding shuffles where the math isn't actually necessary. This improves some of the vperm2x128 tests that regressed when the vector shuffle lowering started actually generating vperm instructions rather than forcibly decomposing them. Sadly, this isn't enough to get this really right because we still form a completely unnecessary permutation. To fix that, we also need to fold shuffles which just rearrange concatenated or inserted subvectors. llvm-svn: 219086
*	Use the target-specified iteration count to opt out of any further ↵	Sanjay Patel	2014-09-30	1	-60/+62
\| \| \| \| \| \|	refinement of an estimate. NFC. llvm-svn: 218700
*	Split the estimate() interface into separate functions for each type. NFC.	Sanjay Patel	2014-09-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698
*	[DAG] Check in advance if a build_vector has a legal type before attempting ↵	Andrea Di Biagio	2014-09-30	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	to convert it into a shuffle. Currently, the DAG Combiner only tries to convert type-legal build_vector nodes into shuffles. This patch simply moves the logic that checks if a build_vector has a legal value type up before we even start analyzing the operands. This allows to early exit immediately from method 'visitBUILD_VECTOR' if the node type is known to be illegal. No functional change intended. llvm-svn: 218677
*	[AArch64] Redundant store instructions should be removed as dead code	James Molloy	2014-09-27	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed. This problem is found in spec2006-197.parser. For example, stur w10, [x11, #-4] stur w10, [x11, #-4] Then one of the two stur instructions can be removed. Patch by David Xu! llvm-svn: 218569
*	Refactor reciprocal and reciprocal square root estimate into ↵	Sanjay Patel	2014-09-26	1	-28/+142
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553
*	Revert patch ofr218493	David Xu	2014-09-26	1	-14/+0
\| \| \| \|	llvm-svn: 218494
*	Redundant store instructions should be removed as dead code	David Xu	2014-09-26	1	-0/+14
\| \| \| \|	llvm-svn: 218493
*	Use SDValue bool operator to reduce code. No functional change.	Sanjay Patel	2014-09-23	1	-9/+6
\| \| \| \|	llvm-svn: 218314
*	Refactor reciprocal square root estimate into target-independent function; NFC.	Sanjay Patel	2014-09-21	1	-17/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is purely a plumbing patch. No functional changes intended. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner. Next steps: The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added. Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner. X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added. Differential Revision: http://reviews.llvm.org/D5425 llvm-svn: 218219
*	Optionally enable more-aggressive FMA formation in DAGCombine	Hal Finkel	2014-09-19	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120
*	Replace dead links to "Hacker's Delight" with general references. NFC.	Sanjay Patel	2014-09-15	1	-4/+4
\| \| \| \|	llvm-svn: 217814
*	Add DAG combine for shl + add of constants.	Matt Arsenault	2014-09-11	1	-32/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Do (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) This is already done for multiplies, but since multiplies by powers of two are turned into shifts, we also need to handle it here. This might want checks for isLegalAddImmediate to avoid transforming an add of a legal immediate with one that isn't. llvm-svn: 217610
*	Combine fmul vector FP constants when unsafe math is allowed.	Sanjay Patel	2014-09-11	1	-6/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an extension of the change made with r215820: http://llvm.org/viewvc/llvm-project?view=revision&revision=215820 That patch allowed combining of splatted vector FP constants that are multiplied. This patch allows combining non-uniform vector FP constants too by relaxing the check on the type of vector. Also, canonicalize a vector fmul in the same way that we already do for scalars - if only one operand of the fmul is a constant, make it operand 1. Otherwise, we miss potential folds. This fold is also done by -instcombine, but it's possible that extra fmuls may have been generated during lowering. Differential Revision: http://reviews.llvm.org/D5254 llvm-svn: 217599
*	Build correct vector filled with undef nodes	David Xu	2014-09-11	1	-4/+20
\| \| \| \|	llvm-svn: 217570
*	Group unsafe fmul math folds together for easier reading. No functional change.	Sanjay Patel	2014-09-08	1	-6/+10
\| \| \| \|	llvm-svn: 217399
*	Fix the FIXME that was just added in r217390 - remove a bunch of redundant ↵	Sanjay Patel	2014-09-08	1	-43/+2
\| \| \| \| \| \| \| \|	fold permutations. The testcases for these folds already exist in test/CodeGen/X86/fp-fast.ll. llvm-svn: 217393
*	group unsafe math folds together for easier reading	Sanjay Patel	2014-09-08	1	-150/+142
\| \| \| \| \| \|	Also added a FIXME regarding redundant folds for non-canonicalized constants. llvm-svn: 217390
*	Allow vector fsub ops with constants to get the same optimizations as scalars.	Sanjay Patel	2014-09-05	1	-2/+2
\| \| \| \| \| \| \| \|	This problem is bigger than just fsub, but this is the minimum fix to solve fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve zero subtraction with the same change. llvm-svn: 217286
*	clean up; NFC	Sanjay Patel	2014-09-05	1	-2/+2
\| \| \| \|	llvm-svn: 217278
*	Fix interference caused by fmul 2, x -> fadd x, x	Matt Arsenault	2014-09-02	1	-8/+21
\| \| \| \| \| \| \| \|	If an fmul was introduced by lowering, it wouldn't be folded into a multiply by a constant since the earlier combine would have replaced the fmul with the fadd. llvm-svn: 216932
*	Fix comment and unnecessary check for FP build_vectors.	Matt Arsenault	2014-09-02	1	-5/+1
\| \| \| \| \| \| \|	This was copy-paste from the integer version, but FP build_vectors don't truncate. llvm-svn: 216928
*	Enable splitting indexing from loads with TargetConstants	Hal Finkel	2014-09-02	1	-8/+21
\| \| \| \| \| \| \| \| \| \| \| \|	When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant offsets, as there is no guarantee that a backend can handle them on generic ADDs (even if it generates them during address-mode matching) -- and, specifically, applying this transformation directly with TargetConstants caused a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less than ideal. Instead, for non-opaque constants, we can convert them into regular constants for use with the generated ADD (or SUB). llvm-svn: 216908
*	Revert "Revert '[DAGCombiner] Split up an indexed load if only the base ↵	Hal Finkel	2014-09-02	1	-4/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pointer value is live'" I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The underlying cause of the failure is that pre-inc loads with increments represented by ISD::TargetConstants were being transformed into ISD:::ADDs with ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they were selected as invalid r+r adds. This recommits r208640, rebased and with an exclusion for ISD::TargetConstant increments. This behavior seems correct, although in the future we might want to ask the target to split out the indexing that uses ISD::TargetConstants. Unfortunately, I don't yet have small test case where the relevant invalid 'add' instruction is not itself dead (and thus eliminated by DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things) Original commit message (by Adam Nemet): Right now the load may not get DCE'd because of the side-effect of updating the base pointer. This can happen if we lower a read-modify-write of an illegal larger type (e.g. i48) such that the modification only affects one of the subparts (the lower i32 part but not the higher i16 part). See the testcase. In order to spot the dead load we need to revisit it when SimplifyDemandedBits decided that the value of the load is masked off. This is the CommitTargetLoweringOpt piece. I checked compile time with ARM64 by sending SPEC bitcode files through llc. No measurable change. Fixes <rdar://problem/16031651> llvm-svn: 216898
*	Move FNEG next to FABS and make them more similar, so it's easier that they ↵	Sanjay Patel	2014-08-28	1	-43/+46
\| \| \| \| \| \|	can be refactored. NFC. llvm-svn: 216688
*	Do not introduce new shuffle patterns after operation legalization if ↵	Owen Anderson	2014-08-28	1	-2/+1
\| \| \| \| \| \| \| \| \|	SHUFFLE_VECTOR was marked custom. The target independent DAG combine has no way to know if the shuffles it is introducing are ones that the target could support or not. llvm-svn: 216678
*	Janitorial services: "Don’t duplicate function or class name at the ↵	Sanjay Patel	2014-08-28	1	-134/+119
\| \| \| \| \| \|	beginning of the comment." llvm-svn: 216674
*	Remove local TLI vars that are just duplicates of the class var. No ↵	Sanjay Patel	2014-08-28	1	-2/+0
\| \| \| \| \| \|	functional change. llvm-svn: 216673
*	Use local vars to improve readability. No functional change.	Sanjay Patel	2014-08-28	1	-42/+37
\| \| \| \| \| \| \|	Completes what was started in r216611 and r216623. Used const refs instead of pointers; not sure if one is preferable to the other. llvm-svn: 216672
*	Use local variable in visitFADD. No functional change.	Sanjay Patel	2014-08-27	1	-13/+11
\| \| \| \|	llvm-svn: 216623
*	Group unsafe-math optimizations for fsub into one block. No functional change.	Sanjay Patel	2014-08-27	1	-14/+17
\| \| \| \|	llvm-svn: 216616
*	Use local variable to improve readability.	Sanjay Patel	2014-08-27	1	-15/+10
\| \| \| \| \| \|	No functional change intended. llvm-svn: 216611
*	Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or ↵	Craig Topper	2014-08-27	1	-8/+4
\| \| \| \| \| \|	just letting them be implicitly created. llvm-svn: 216525
*	Use range based for loops to avoid needing to re-mention SmallPtrSet size.	Craig Topper	2014-08-24	1	-4/+3
\| \| \| \|	llvm-svn: 216351
*	name change: isPow2DivCheap -> isPow2SDivCheap	Sanjay Patel	2014-08-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isPow2DivCheap That name doesn't specify signed or unsigned. Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong: srl/add/sra is not the general sequence for signed integer division by power-of-2. We need one more 'sra': sra/srl/add/sra That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case. This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods. No functional change intended. Differential Revision: http://reviews.llvm.org/D5010 llvm-svn: 216237
*	DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors ↵	Benjamin Kramer	2014-08-21	1	-1/+6
\| \| \| \| \| \| \| \|	with many arguments. PR20677 llvm-svn: 216175
*	Fix fmul combines with constant splat vectors	Matt Arsenault	2014-08-16	1	-7/+26
\| \| \| \| \| \|	Fixes things like fmul x, 2 -> fadd x, x llvm-svn: 215820
*	[DAGCombiner] Improve the folding of target independet shuffles to Undef.	Andrea Di Biagio	2014-08-16	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When combining a pair of shuffle nodes, check if the combined shuffle mask is trivially Undef. In case, immediately fold that pair of shuffles to Undef. The lack of checks for undef masks was the root-cause of a poor-codegen bug in the dag combiner. Example: %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6> %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3> Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire sequence to Undef value and therefore we generated: shufps $-123, %xmm1, $xmm0 pshufd $-46, %xmm0, %xmm0 With this patch, the entire shuffle sequence is folded to Undef and no shuffles are generated in the output assembly. Added new test cases to test 'combine-vec-shuffle-5.ll'. llvm-svn: 215797
*	optimize vector fneg of bitcasted integer value	Sanjay Patel	2014-08-14	1	-9/+14
\| \| \| \| \| \| \| \| \| \|	This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops. This patch is very similar to a fabs patch committed at r214892. Differential Revision: http://reviews.llvm.org/D4852 llvm-svn: 215646
*	[SDAG] Fix a bug in the DAG combiner where we would fail to return the	Chandler Carruth	2014-08-14	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	input node after manually adding it to the worklist and using CombineTo. Once we use CombineTo the input node may have been deleted. Despite this being completely confusing and somewhat broken, the only way to "correctly" return from a DAG combine after potentially deleting the input node is to return that exact node.... But really, this code should just never have used CombineTo. It won't do what it wants (returning the node as mentioned above just causes the combine to infloop). The correct way to combine away a casted load to a load of the correct type is to RAUW the chain directly and then return the loaded value to replace the actual value node. I managed to find this with the vector shuffle fuzzer even though it clearly has nothing at all to do with vector shuffles and rather those happen to trigger a load of a constant pool that hits this combine just right. I've included the test as it is small and a nice stress test that the infrastructure isn't asserting. llvm-svn: 215622
*	[DAGCombiner] Improved target independent vector shuffle combine rule.	Andrea Di Biagio	2014-08-13	1	-10/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the existing algorithm in DAGCombiner that attempts to fold shuffles according to rule: shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3) Before this change, there were cases where the DAGCombiner conservatively avoided folding shuffles even if the resulting mask would have been legal. That is because the algorithm wrongly assumed that commuting an illegal shuffle mask would always produce an illegal mask. With this change, we now correctly compute the commuted shuffle mask before calling method 'isShuffleMaskLegal' on it. On X86, this improves for example the codegen for the following function: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3> ret <4 x i32> %2 } Before this change the X86 backend (-mcpu=corei7) generated the following assembly code for function @test: shufps $-23, %xmm0, %xmm1 # xmm1 = xmm1[1,2],xmm0[2,3] movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] movaps %xmm1, %xmm0 Now we produce: movhlps %xmm0, %xmm0 # xmm0 = xmm0[1,1] Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly fold according to the above-mentioned rule. llvm-svn: 215555
*	[x86] Fold extract_vector_elt of a load into the Load's address computation.	Michael J. Spencer	2014-08-11	1	-90/+125
\| \| \| \|	llvm-svn: 215409
*	Optimize vector fabs of bitcasted constant integer values.	Sanjay Patel	2014-08-05	1	-9/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow vector fabs operations on bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs. So for code like this: %bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000 %fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast) %ret = bitcast <2 x float> %fabs to i64 Instead of generating something like this: movabsq (constant pool loadi of mask for sign bits) vmovq (move from integer register to vector/fp register) vandps (mask off sign bits) vmovq (move vector/fp register back to integer return register) We should generate: mov (put constant value in return register) I have also removed a redundant clause in the first 'if' statement: N0.getOperand(0).getValueType().isInteger() is the same thing as: IntVT.isInteger() Testcases for x86 and ARM added to existing files that deal with vector fabs. One existing testcase for x86 removed because it is no longer ideal. For more background, please see: http://reviews.llvm.org/D4770 And: http://llvm.org/bugs/show_bug.cgi?id=20354 Differential Revision: http://reviews.llvm.org/D4785 llvm-svn: 214892
*	[SDAG] Fix a really, really terrible bug in the DAG combiner.	Chandler Carruth	2014-08-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This code is completely wrong. It is also dead, as if it were to ever run, it would crash. Fortunately, after my work to the combiner, it is at least possible to reach the code, and llvm-stress has found a test case. Thanks to Patrick for reporting. It would be really good if anyone who remembers how this code works and what it was intended to do could add some more obvious test coverage instead of my completely contrived and reduced test case. My test case was so brittle I left a bread crumb comment in it to help the next person to stumble on it and not know what it was actually testing for. llvm-svn: 214785
*	Remove the TargetMachine forwards for TargetSubtargetInfo based	Eric Christopher	2014-08-04	1	-3/+10
\| \| \| \| \| \|	information and update all callers. No functional change. llvm-svn: 214781
*	[x86] Don't add nodes to the combined set (and prune subsequent	Chandler Carruth	2014-08-03	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	combines) until they are legal. Doing it the old way could, when the stars align just right, cause a node to get into the combine set prior to being legalized. Then, when the same node showed up as an operand to another node later on (but not so much later on that it had been deleted as dead) we would fail to add it back to the worklist thinking it had already been combined. This would in turn cause it to not be legalized. Fortunately, we can also walk the operands looking for uncombined (and thus potentially un-legalized) nodes late. It will still ensure that we walk all operands of all nodes and send all of them through both the legalizer without changes and the combiner at least once. (Which was the original goal of this). I have a test case for this bug, but it is terribly brittle. For example, it will stop finding the bug the moment I enable the new shuffle lowering. I don't yet have any test case that reliably exercises this bug, and it isn't clear that it will be possible to craft one. It is entirely possible that with the new shuffle lowering the two forms of doing this are precisely equivalent. That doesn't mean we shouldn't take the more conservative approach of insisting on things in the combined set having survived the legalizer. llvm-svn: 214673
*	fix for PR20354 - Miscompile of fabs due to vectorization	Sanjay Patel	2014-08-03	1	-1/+5
\| \| \| \| \| \| \| \| \| \|	This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation. This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon. There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too. llvm-svn: 214670
*	[AArch64] Teach DAGCombiner that converting two consecutive loads into a ↵	James Molloy	2014-08-02	1	-0/+7
\| \| \| \| \| \| \| \|	vector load is not a good transform when paired loads are available. The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers! llvm-svn: 214634