bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SDAG] Fix Stale SDNode usage in visitAND	Nirav Dave	2017-03-28	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reorder CombineTo Calls to prevent potential use of deleted node. Fixes PR32372. Reviewers: jnspaulsson, RKSimon, uweigand, jonpa Reviewed By: jonpa Subscribers: jonpa, llvm-commits Differential Revision: https://reviews.llvm.org/D31346 llvm-svn: 298920
*	[SDAG] Minor cleanup of variable usage. NFC.	Nirav Dave	2017-03-28	1	-2/+2
\| \| \| \|	llvm-svn: 298916
*	[SDAG] Fix zeroExtend assertion error	Nirav Dave	2017-03-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move CombineTo preventing deleted node from being returned in visitZERO_EXTEND. Fixes PR32284. Reviewers: RKSimon, bogner Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31254 llvm-svn: 298604
*	Rename AttributeSet to AttributeList	Reid Kleckner	2017-03-21	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393
*	DAG: Fold bitcast/extract_vector_elt of undef to undef	Matt Arsenault	2017-03-21	1	-0/+6
\| \| \| \| \| \|	Fixes not eliminating store when intrinsic is lowered to undef. llvm-svn: 298385
*	[SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.	Jonas Paulsson	2017-03-16	1	-28/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't scalarize VSELECT->SETCC when operands/results needs to be widened, or when the type of the SETCC operands are different from those of the VSELECT. (VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled. The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is no longer needed and has been removed. Updated tests: test/CodeGen/ARM/vuzp.ll test/CodeGen/NVPTX/f16x2-instructions.ll test/CodeGen/X86/2011-10-19-widen_vselect.ll test/CodeGen/X86/2011-10-21-widen-cmp.ll test/CodeGen/X86/psubus.ll test/CodeGen/X86/vselect-pcmp.ll Review: Eli Friedman, Simon Pilgrim https://reviews.llvm.org/D29489 llvm-svn: 297930
*	[DAGCombine] Bail out if can't create a vector with at least two elements	Zvi Rackover	2017-03-15	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes pr32278 Reviewers: igorb, craig.topper, RKSimon, spatel, hfinkel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30978 llvm-svn: 297878
*	[SelectionDAG] Add a signed integer absolute ISD node	Simon Pilgrim	2017-03-14	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \|	Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780
*	[DAG] vector div/rem with any zero element in divisor is undef	Sanjay Patel	2017-03-14	1	-9/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the backend counterpart to: https://reviews.llvm.org/rL297390 https://reviews.llvm.org/rL297409 and follow-up to: https://reviews.llvm.org/rL297384 It surprised me that we need to duplicate the check in FoldConstantArithmetic and FoldConstantVectorArithmetic, but one or the other doesn't catch all of the test cases. There is an existing code comment about merging those someday. Differential Revision: https://reviews.llvm.org/D30826 llvm-svn: 297762
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-03-14	1	-370/+390
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695
*	[DAGCombiner] Do various combine on uaddo.	Amaury Sechet	2017-03-09	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This essentially does the same transform as for ADC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30417 llvm-svn: 297416
*	[DAGCombiner] Do various combine on usubo.	Amaury Sechet	2017-03-09	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This essentially does the same transform as for SUBC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30437 llvm-svn: 297404
*	[DAG] recognize div/rem by 0 as undef before trying constant folding	Sanjay Patel	2017-03-09	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed in the review thread for rL297026, this is actually 2 changes that would independently fix all of the test cases in the patch: 1. Return undef in FoldConstantArithmetic for div/rem by 0. 2. Move basic undef simplifications for div/rem (simplifyDivRem()) before foldBinopIntoSelect() as a matter of efficiency. I will handle the case of vectors with any zero element as a follow-up. That change is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic(). I'm deleting the test for PR30693 because it does not test for the actual bug any more (dangers of using bugpoint). Differential Revision: https://reviews.llvm.org/D30741 llvm-svn: 297384
*	DAG: Check no signed zeros instead of unsafe math attribute	Matt Arsenault	2017-03-09	1	-2/+2
\| \| \| \|	llvm-svn: 297354
*	[DAGCombine] Simplify ISD::AND in GetDemandedBits.	Eli Friedman	2017-03-08	1	-0/+11
\| \| \| \| \| \| \| \| \|	This helps in cases involving bitfields where an AND is exposed by legalization. Differential Revision: https://reviews.llvm.org/D30472 llvm-svn: 297249
*	[DAG] refactor related div/rem folds; NFCI	Sanjay Patel	2017-03-06	1	-28/+32
\| \| \| \| \| \| \| \| \| \| \|	This is known incomplete and not called in the right order relative to other folds, but that's the current behavior. I'm just trying to clean this up before making actual functional changes to make the patch smaller. The logic here should mimic the IR equivalents that are in InstSimplify's simplifyDivRem(). llvm-svn: 297086
*	[DAGCombiner] simplify div/rem-by-0	Sanjay Patel	2017-03-06	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactoring of duplicated code and more fixes to follow. This is motivated by the post-commit comments for r296699: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html Ie, we can crash if we're missing obvious simplifications like this that exist in the IR simplifier or if these occur later than expected. The x86 change for non-splat division shows a potential opportunity to improve vector codegen: we assumed that since only one lane had meaningful results, we should do the math in scalar. But that means moving back and forth from vector registers. llvm-svn: 297026
*	[DAG] fix formatting; NFC	Sanjay Patel	2017-03-06	1	-2/+1
\| \| \| \|	llvm-svn: 297015
*	[DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)	Sanjay Patel	2017-03-04	1	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977
*	Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creation	Simon Pilgrim	2017-03-03	1	-1/+1
\| \| \| \| \| \|	Avoids all the unnecessary extra bitrange creation/shift stages. llvm-svn: 296871
*	[SDAG] Revert r296476 (and r296486, r296668, r296690).	Chandler Carruth	2017-03-03	1	-378/+369
\| \| \| \| \| \| \| \| \| \|	This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862
*	[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)	Taewook Oh	2017-03-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below ``` extern int bar(); extern int baz(); int foo(int x, int y) { if (x != y) return bar(); else return baz(); } ``` , following is the bitcode representation of 'foo' at the end of llvm-ir level optimization: ``` define i32 @foo(i32 %x, i32 %y) !dbg !4 { entry: tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12 tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13 %cmp = icmp ne i32 %x, %y, !dbg !14 br i1 %cmp, label %if.then, label %if.else, !dbg !16 if.then: ; preds = %entry %call = tail call i32 (...) @bar() #3, !dbg !17 br label %return, !dbg !18 if.else: ; preds = %entry %call1 = tail call i32 (...) @baz() #3, !dbg !19 br label %return, !dbg !20 return: ; preds = %if.else, %if.then %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ] ret i32 %retval.0, !dbg !21 } !14 = !DILocation(line: 5, column: 9, scope: !15) !16 = !DILocation(line: 5, column: 7, scope: !4) ``` As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue. Reviewers: atrick, bogner, andreadb, craig.topper, aprantl Reviewed By: andreadb Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D29813 llvm-svn: 296825
*	[DAGCombiner] avoid assertion when folding binops with opaque constants	Sanjay Patel	2017-03-02	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	This bug was introduced with: https://reviews.llvm.org/rL296699 There may be a way to loosen the restriction, but for now just bail out on any opaque constant. The tests show that opacity is target-specific. This goes back to cost calculations in ConstantHoisting based on TTI->getIntImmCost(). llvm-svn: 296768
*	[DAGCombiner] fold binops with constant into select-of-constants	Sanjay Patel	2017-03-01	1	-0/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 llvm-svn: 296699
*	[DAGCombiner] Remove non-ascii character and reflow comment.	Benjamin Kramer	2017-03-01	1	-5/+4
\| \| \| \|	llvm-svn: 296690
*	[DAG] Prevent Stale nodes from entering worklist	Nirav Dave	2017-03-01	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add check that deleted nodes do not get added to worklist. This can occur when a node's operand is simplified to an existing node. This fixes PR32108. Reviewers: jyknight, hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30506 llvm-svn: 296668
*	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine	Artur Pilipenko	2017-03-01	1	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \|	Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336). Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 296651
*	[DAGCombiner] use dyn_cast values in foldSelectOfConstants(); NFC	Sanjay Patel	2017-02-28	1	-6/+8
\| \| \| \|	llvm-svn: 296502
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-02-28	1	-369/+373
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296476
*	Revert "DAG: Check if extract_vector_elt is legal or custom"	Matt Arsenault	2017-02-27	1	-1/+1
\| \| \| \| \| \| \|	This reverts r295782. This could potentially result in some legalization loops and I avoided the need for this. llvm-svn: 296393
*	[X86][SSE] Attempt to extract vector elements through target shuffles	Simon Pilgrim	2017-02-27	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381
*	[DAGCombine] Fix for a load combine bug with non-zero offset patterns on BE ↵	Artur Pilipenko	2017-02-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	targets This pattern is essentially a i16 load from p+1 address: %p1.i16 = bitcast i8* %p to i16* %p2.i8 = getelementptr i8, i8* %p, i64 2 %v1 = load i16, i16* %p1.i16 %v2.i8 = load i8, i8* %p2.i8 %v2 = zext i8 %v2.i8 to i16 %v1.shl = shl i16 %v1, 8 %res = or i16 %v1.shl, %v2 Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining. llvm-svn: 296336
*	[DAGCombine] NFC. MatchLoadCombine extract MemoryByteOffset lambda helper	Artur Pilipenko	2017-02-27	1	-9/+13
\| \| \| \| \| \|	This refactoring will simplify the upcoming change to fix the bug in folding patterns with non-zero offsets on BE targets. llvm-svn: 296332
*	[DAGCombine] NFC. MatchLoadCombine remember the first byte provider, not the ↵	Artur Pilipenko	2017-02-27	1	-3/+5
\| \| \| \| \| \| \| \|	load node This refactoring will simplify the upcoming change to fix a bug in folding patterns with non-zero offsets on BE targets. llvm-svn: 296331
*	Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵	Nirav Dave	2017-02-26	1	-373/+369
\| \| \| \| \| \| \| \|	UseAA is enabled." This reverts commit r296252 until 256-bit operations are more efficiently generated in X86. llvm-svn: 296279
*	No need to copy the variable [NFC]	Artyom Skrobov	2017-02-25	1	-2/+1
\| \| \| \|	llvm-svn: 296259
*	In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵	Nirav Dave	2017-02-25	1	-369/+373
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252
*	[DAGCombiner] add missing folds for scalar select of {-1,0,1}	Sanjay Patel	2017-02-24	1	-3/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivation for filling out these select-of-constants cases goes back to D24480, where we discussed removing an IR fold from add(zext) --> select. And that goes back to: https://reviews.llvm.org/rL75531 https://reviews.llvm.org/rL159230 The idea is that we should always canonicalize patterns like this to a select-of-constants in IR because that's the smallest IR and the best for value tracking. Note that we currently do the opposite in some cases (like the cases in this patch). Ie, the proposed folds in this patch already exist in InstCombine today: https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151 As this patch shows, most targets generate better machine code for simple ext/add/not ops rather than a select of constants. So the follow-up steps to make this less of a patchwork of special-case folds and missing IR canonicalization: 1. Have DAGCombiner convert any select of constants into ext/add/not ops. 2 Have InstCombine canonicalize in the other direction (create more selects). Differential Revision: https://reviews.llvm.org/D30180 llvm-svn: 296137
*	[DAG] add convenience function to get -1 constant; NFCI	Sanjay Patel	2017-02-23	1	-32/+15
\| \| \| \|	llvm-svn: 296004
*	[DAGCombiner] revert r295336	Bill Seurer	2017-02-22	1	-19/+8
\| \| \| \| \| \| \| \| \| \| \|	r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849
*	DAG: Check if extract_vector_elt is legal or custom	Matt Arsenault	2017-02-21	1	-1/+1
\| \| \| \| \| \| \|	Avoids test regressions in future AMDGPU commits when more vector types are custom lowered. llvm-svn: 295782
*	Strip trailing whitespace.	Simon Pilgrim	2017-02-20	1	-1/+1
\| \| \| \|	llvm-svn: 295653
*	[DAGCombiner] split i1 select-of-constants from non-i1 case; NFCI	Sanjay Patel	2017-02-17	1	-9/+25
\| \| \| \| \| \|	I can't find any tests of the non-i1 code path, so it may be unnecessary at this point. llvm-svn: 295463
*	Fix signed/unsigned comparison warning.	Simon Pilgrim	2017-02-17	1	-2/+2
\| \| \| \|	llvm-svn: 295453
*	[DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle ↵	Simon Pilgrim	2017-02-17	1	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	masks During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing). This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types. The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712). Differential Revision: https://reviews.llvm.org/D29454 llvm-svn: 295451
*	[DAGCombiner] improve readability; NFCI	Sanjay Patel	2017-02-17	1	-44/+32
\| \| \| \|	llvm-svn: 295447
*	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine	Artur Pilipenko	2017-02-16	1	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \|	Resubmit -r295314 with PowerPC and AMDGPU tests updated. Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295336
*	Rever -r295314 "[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in ↵	Artur Pilipenko	2017-02-16	1	-19/+8
\| \| \| \| \| \| \| \|	load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316
*	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine	Artur Pilipenko	2017-02-16	1	-8/+19
\| \| \| \| \| \| \| \| \| \|	Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314
*	[DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source	Michael Kuperstein	2017-02-15	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \|	We currently can't legalize those, but we should really not be creating them in the first place, since legalization would probably look similar to the way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD. This fixes PR311956. Differential Revision: https://reviews.llvm.org/D29961 llvm-svn: 295213