summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [SDAG] Fix visitAND optimization to deal with vector extract case again.Nirav Dave2017-04-061-1/+1
| | | | | | | | | | | | | | | Summary: Fix case elided by rL298920. Fixes PR32545. Reviewers: eli.friedman, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31759 llvm-svn: 299688
* [DAGCombine] Support FMF contract in fused multiple-and-sub tooAdam Nemet2017-04-051-28/+34
| | | | | | | | | This is a follow-on to r299096 which added support for fmadd. Subtract does not have the case where with two multiply operands we commute in order to fuse with the multiply with the fewer uses. llvm-svn: 299572
* [DAGCombine] Remove commented-out code from r299096Adam Nemet2017-04-051-1/+1
| | | | llvm-svn: 299571
* [DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to ↵Sanjay Patel2017-04-051-0/+15
| | | | | | | | | | | | | | | | | bitwise logic+setcc (PR32401) This is a generic combine enabled via target hook to reduce icmp logic as discussed in: https://bugs.llvm.org/show_bug.cgi?id=32401 It's likely that other targets will want to enable this hook for scalar transforms, and there are probably other patterns that can use bitwise logic to reduce comparisons. Note that we are missing an IR canonicalization for these patterns, and we will probably prefer the pair-of-compares form in IR (shorter, more likely to fold). Differential Revision: https://reviews.llvm.org/D31483 llvm-svn: 299542
* [DAGCombine][InstCombine] Fix inverted if condition in equivalent comments ↵Craig Topper2017-04-031-1/+1
| | | | | | in DAGCombine and InstCombine. NFC llvm-svn: 299378
* Revert "[DAGCombine] A shuffle of a splat is always the splat itself"Zvi Rackover2017-04-031-6/+0
| | | | | | | | | | This reverts commit r299047 which is incorrect because the simplification may result in incorrect propogation of undefs to users of the folded shuffle. Thanks to Andrea Di Biagio for pointing this out. llvm-svn: 299368
* [APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt ↵Craig Topper2017-04-031-2/+2
| | | | | | | | | | class. Implement them without memory allocation for multiword This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation. Differential Revision: https://reviews.llvm.org/D31565 llvm-svn: 299362
* [DAGCombiner] Check limits before accessing array element (PR32502)Simon Pilgrim2017-04-031-1/+1
| | | | llvm-svn: 299361
* [DAGCombiner] enable vector transforms for any/all {sign} bits set/clearSanjay Patel2017-04-011-13/+17
| | | | | | | | The code already allowed vector types in via "isInteger" (which might want a more specific name), so use splat-friendly constant predicates to match those types. llvm-svn: 299304
* [DAGCombiner] Fix fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, ↵Craig Topper2017-04-011-1/+1
| | | | | | | | | | B, Mask) to explicitly ensure that only one of the inputs of each shuffle is a zero vector. This can only happen when we have a mix of zero and undef elements and the two vectors have a different arrangement of zeros/undefs. The shuffle should eventually be constant folded to all zeros. Fixes PR32484. llvm-svn: 299291
* [DAGCombiner] refactor and/or-of-setcc to get rid of duplicated code; NFCISanjay Patel2017-03-311-90/+39
| | | | llvm-svn: 299266
* [DAGCombiner] add fold for 'All sign bits set?'Sanjay Patel2017-03-311-2/+4
| | | | | | | | | | (and (setlt X, 0), (setlt Y, 0)) --> (setlt (and X, Y), 0) We have 7 similar folds, but this one got away. The fact that the x86 test with a branch didn't change is probably a separate bug. We may also be missing this and the related folds in instcombine. llvm-svn: 299252
* [DAGCombiner] remove redundant code and add comments; NFCISanjay Patel2017-03-311-10/+13
| | | | llvm-svn: 299241
* [DAGCombiner] Initial support for the fast-math flag contractAdam Nemet2017-03-301-19/+31
| | | | | | | | | | | | | | | | Now alternatively to the TargetOption.AllowFPOpFusion global flag, FMUL->FADD can also use the per operation FMF to allow fusion. The idea here is not to port everything to the new scheme (e.g. fused multiply-and-sub will be ported later) but that this work all the way from clang. The transformation is conditionalized on *both* the FADD and the FMUL having the FMF contract flag. Differential Revision: https://reviews.llvm.org/D31169 llvm-svn: 299096
* [DAGCombiner] add helper function for visitORLike; NFCISanjay Patel2017-03-301-55/+75
| | | | | | | | | | | | | | | | This combines all of the equivalent clean-ups for foldAndOfSetCCs: https://reviews.llvm.org/rL298938 https://reviews.llvm.org/rL298940 https://reviews.llvm.org/rL298944 https://reviews.llvm.org/rL298949 https://reviews.llvm.org/rL298950 https://reviews.llvm.org/rL299002 https://reviews.llvm.org/rL299013 The sins of code duplication are on full display here: each function is missing a fold that wasn't copied over from its logical sibling. llvm-svn: 299091
* [DAGCombine] A shuffle of a splat is always the splat itselfZvi Rackover2017-03-301-0/+6
| | | | | | | | | | | | | | | | | | | | Summary: Add a simplification: shuffle (splat-shuffle), undef, M --> splat-shuffle Fixes pr32449 Patch by Sanjay Patel Reviewers: eli.friedman, RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31426 llvm-svn: 299047
* [DAGCombiner] Remove else after return. NFCI.Davide Italiano2017-03-291-7/+4
| | | | llvm-svn: 299022
* [DAGCombiner] unify type checks and add asserts; NFCISanjay Patel2017-03-291-52/+58
| | | | | | We had a mix of type checks and usage that wasn't very clear. llvm-svn: 299013
* [DAGCombiner] reduce code duplication by rearranging checks; NFCISanjay Patel2017-03-291-44/+38
| | | | llvm-svn: 299002
* [DAGCombiner] reduce code duplication with local variables; NFCISanjay Patel2017-03-281-21/+21
| | | | llvm-svn: 298954
* [DAGCombiner] remove redundant conditions and duplicated code; NFCISanjay Patel2017-03-281-10/+8
| | | | llvm-svn: 298949
* [DAGCombiner] rename variables in foldAndOfSetCCs for easier reading; NFCISanjay Patel2017-03-281-32/+30
| | | | llvm-svn: 298944
* [DAGCombiner] clean up foldAndOfSetCCs; NFCISanjay Patel2017-03-281-77/+75
| | | | | | | | 1. Fix bogus comment. 2. Early exit to reduce indent. 3. Change node pointer param to what it really is: an SDLoc. llvm-svn: 298940
* [DAGCombiner] add helper function for and-of-setcc folds; NFCSanjay Patel2017-03-281-25/+37
| | | | | | This is just a cut and paste followed by clang-format. Clean up to follow. llvm-svn: 298938
* [SDAG] Deal with deleted node in PromoteIntShiftOpNirav Dave2017-03-281-5/+11
| | | | | | | | | | | | | | | Deal with case that initial node is deleted during dag-combine leading to an assertional failure in promoteIntShiftOp. Fixes PR32420. Reviewers: spatel, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31403 llvm-svn: 298931
* [SDAG] Avoid deleted SDNodes PromoteIntBinOpNirav Dave2017-03-281-20/+19
| | | | | | | | | | | | | | | Reorder work in PromoteIntBinOp to prevent stale (deleted) nodes from being used. Fixes PR32340 and PR32345. Reviewers: hfinkel, dbabokin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31148 llvm-svn: 298923
* [SDAG] Fix Stale SDNode usage in visitANDNirav Dave2017-03-281-4/+4
| | | | | | | | | | | | | | | Reorder CombineTo Calls to prevent potential use of deleted node. Fixes PR32372. Reviewers: jnspaulsson, RKSimon, uweigand, jonpa Reviewed By: jonpa Subscribers: jonpa, llvm-commits Differential Revision: https://reviews.llvm.org/D31346 llvm-svn: 298920
* [SDAG] Minor cleanup of variable usage. NFC.Nirav Dave2017-03-281-2/+2
| | | | llvm-svn: 298916
* [SDAG] Fix zeroExtend assertion errorNirav Dave2017-03-231-1/+2
| | | | | | | | | | | | | | | | | Move CombineTo preventing deleted node from being returned in visitZERO_EXTEND. Fixes PR32284. Reviewers: RKSimon, bogner Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31254 llvm-svn: 298604
* Rename AttributeSet to AttributeListReid Kleckner2017-03-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393
* DAG: Fold bitcast/extract_vector_elt of undef to undefMatt Arsenault2017-03-211-0/+6
| | | | | | Fixes not eliminating store when intrinsic is lowered to undef. llvm-svn: 298385
* [SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.Jonas Paulsson2017-03-161-28/+0
| | | | | | | | | | | | | | | | | | | | | | | | Don't scalarize VSELECT->SETCC when operands/results needs to be widened, or when the type of the SETCC operands are different from those of the VSELECT. (VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled. The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is no longer needed and has been removed. Updated tests: test/CodeGen/ARM/vuzp.ll test/CodeGen/NVPTX/f16x2-instructions.ll test/CodeGen/X86/2011-10-19-widen_vselect.ll test/CodeGen/X86/2011-10-21-widen-cmp.ll test/CodeGen/X86/psubus.ll test/CodeGen/X86/vselect-pcmp.ll Review: Eli Friedman, Simon Pilgrim https://reviews.llvm.org/D29489 llvm-svn: 297930
* [DAGCombine] Bail out if can't create a vector with at least two elementsZvi Rackover2017-03-151-2/+5
| | | | | | | | | | | | | | | | Summary: Fixes pr32278 Reviewers: igorb, craig.topper, RKSimon, spatel, hfinkel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30978 llvm-svn: 297878
* [SelectionDAG] Add a signed integer absolute ISD nodeSimon Pilgrim2017-03-141-0/+29
| | | | | | | | | | | | Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780
* [DAG] vector div/rem with any zero element in divisor is undefSanjay Patel2017-03-141-9/+1
| | | | | | | | | | | | | | | | This is the backend counterpart to: https://reviews.llvm.org/rL297390 https://reviews.llvm.org/rL297409 and follow-up to: https://reviews.llvm.org/rL297384 It surprised me that we need to duplicate the check in FoldConstantArithmetic and FoldConstantVectorArithmetic, but one or the other doesn't catch all of the test cases. There is an existing code comment about merging those someday. Differential Revision: https://reviews.llvm.org/D30826 llvm-svn: 297762
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-03-141-370/+390
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting with compiler time improvements Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 297695
* [DAGCombiner] Do various combine on uaddo.Amaury Sechet2017-03-091-0/+35
| | | | | | | | | | | | Summary: This essentially does the same transform as for ADC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30417 llvm-svn: 297416
* [DAGCombiner] Do various combine on usubo.Amaury Sechet2017-03-091-0/+34
| | | | | | | | | | | | Summary: This essentially does the same transform as for SUBC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30437 llvm-svn: 297404
* [DAG] recognize div/rem by 0 as undef before trying constant foldingSanjay Patel2017-03-091-11/+11
| | | | | | | | | | | | | | | | | | | | As discussed in the review thread for rL297026, this is actually 2 changes that would independently fix all of the test cases in the patch: 1. Return undef in FoldConstantArithmetic for div/rem by 0. 2. Move basic undef simplifications for div/rem (simplifyDivRem()) before foldBinopIntoSelect() as a matter of efficiency. I will handle the case of vectors with any zero element as a follow-up. That change is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic(). I'm deleting the test for PR30693 because it does not test for the actual bug any more (dangers of using bugpoint). Differential Revision: https://reviews.llvm.org/D30741 llvm-svn: 297384
* DAG: Check no signed zeros instead of unsafe math attributeMatt Arsenault2017-03-091-2/+2
| | | | llvm-svn: 297354
* [DAGCombine] Simplify ISD::AND in GetDemandedBits.Eli Friedman2017-03-081-0/+11
| | | | | | | | | This helps in cases involving bitfields where an AND is exposed by legalization. Differential Revision: https://reviews.llvm.org/D30472 llvm-svn: 297249
* [DAG] refactor related div/rem folds; NFCISanjay Patel2017-03-061-28/+32
| | | | | | | | | | | This is known incomplete and not called in the right order relative to other folds, but that's the current behavior. I'm just trying to clean this up before making actual functional changes to make the patch smaller. The logic here should mimic the IR equivalents that are in InstSimplify's simplifyDivRem(). llvm-svn: 297086
* [DAGCombiner] simplify div/rem-by-0Sanjay Patel2017-03-061-1/+10
| | | | | | | | | | | | | | | | | Refactoring of duplicated code and more fixes to follow. This is motivated by the post-commit comments for r296699: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html Ie, we can crash if we're missing obvious simplifications like this that exist in the IR simplifier or if these occur later than expected. The x86 change for non-splat division shows a potential opportunity to improve vector codegen: we assumed that since only one lane had meaningful results, we should do the math in scalar. But that means moving back and forth from vector registers. llvm-svn: 297026
* [DAG] fix formatting; NFCSanjay Patel2017-03-061-2/+1
| | | | llvm-svn: 297015
* [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)Sanjay Patel2017-03-041-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977
* Use APInt::getOneBitSet instead of APInt::getBitsSet for sign bit mask creationSimon Pilgrim2017-03-031-1/+1
| | | | | | Avoids all the unnecessary extra bitrange creation/shift stages. llvm-svn: 296871
* [SDAG] Revert r296476 (and r296486, r296668, r296690).Chandler Carruth2017-03-031-378/+369
| | | | | | | | | | This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862
* [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)Taewook Oh2017-03-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below ``` extern int bar(); extern int baz(); int foo(int x, int y) { if (x != y) return bar(); else return baz(); } ``` , following is the bitcode representation of 'foo' at the end of llvm-ir level optimization: ``` define i32 @foo(i32 %x, i32 %y) !dbg !4 { entry: tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12 tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13 %cmp = icmp ne i32 %x, %y, !dbg !14 br i1 %cmp, label %if.then, label %if.else, !dbg !16 if.then: ; preds = %entry %call = tail call i32 (...) @bar() #3, !dbg !17 br label %return, !dbg !18 if.else: ; preds = %entry %call1 = tail call i32 (...) @baz() #3, !dbg !19 br label %return, !dbg !20 return: ; preds = %if.else, %if.then %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ] ret i32 %retval.0, !dbg !21 } !14 = !DILocation(line: 5, column: 9, scope: !15) !16 = !DILocation(line: 5, column: 7, scope: !4) ``` As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue. Reviewers: atrick, bogner, andreadb, craig.topper, aprantl Reviewed By: andreadb Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D29813 llvm-svn: 296825
* [DAGCombiner] avoid assertion when folding binops with opaque constantsSanjay Patel2017-03-021-3/+4
| | | | | | | | | | | | | This bug was introduced with: https://reviews.llvm.org/rL296699 There may be a way to loosen the restriction, but for now just bail out on any opaque constant. The tests show that opacity is target-specific. This goes back to cost calculations in ConstantHoisting based on TTI->getIntImmCost(). llvm-svn: 296768
* [DAGCombiner] fold binops with constant into select-of-constantsSanjay Patel2017-03-011-0/+112
| | | | | | | | | | | | | | | | | | This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 llvm-svn: 296699
OpenPOWER on IntegriCloud