summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [DAGCombiner] Enable AND combines of splatted constant vectorsSimon Pilgrim2016-09-081-7/+7
| | | | | | | | Allow AND combines to use a vector splatted constant as well as a constant scalar. Preliminary part of D24253. llvm-svn: 280926
* [DAGCombine] More fixups to SETCC legality checking (visitANDLike/visitORLike)Hal Finkel2016-09-061-28/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I might have called this "r246507, the sequel". It fixes the same issue, as the issue has cropped up in a few more places. The underlying problem is that isSetCCEquivalent can pick up select_cc nodes with a result type that is not legal for a setcc node to have, and if we use that type to create new setcc nodes, nothing fixes that (and so we've violated the contract that the infrastructure has with the backend regarding setcc node types). Fixes PR30276. For convenience, here's the commit message from r246507, which explains the problem is greater detail: [DAGCombine] Fixup SETCC legality checking SETCC is one of those special node types for which operation actions (legality, etc.) is keyed off of an operand type, not the node's value type. This makes sense because the value type of a legal SETCC node is determined by its operands' value type (via the TLI function getSetCCResultType). When the SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value type, or directly with the value type provided by TLI.getSetCCResultType. The first problem being fixed here is that DAGCombine had several places querying TLI.isOperationLegal on SETCC, but providing the return of getSetCCResultType, instead of the operand type directly. This does not mean what the author thought, and "luckily", most in-tree targets have SETCC with Custom lowering, instead of marking them Legal, so these checks return false anyway. The second problem being fixed here is that two of the DAGCombines could create SETCC nodes with arbitrary (integer) value types; specifically, those that would simplify: (setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3 (which is possible for some combinations of (op1, op2)) If the operands of the and|or node are actual setcc nodes, then this is not an issue (because the and|or must share the same type), but, the relevant code in DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls DAGCombiner::isSetCCEquivalent on each operand, and that function will recognise setcc-like select_cc nodes with other return types. And, thus, when creating new SETCC nodes, we need to be careful to respect the value-type constraint. This is even true before type legalization, because it is quite possible for the SELECT_CC node to have a legal type that does not happen to match the corresponding TLI.getSetCCResultType type. To be explicit, there is nothing that later fixes the value types of SETCC nodes (if the type is legal, but does not happen to match TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to work only because, either MVT::i1 is not legal, or it is what TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change, however. For the time being, restrict the relevant transformations to produce only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1 prior to type legalization). Fixes PR24636. llvm-svn: 280767
* Split the store of a wide value merged from an int-fp pair into multiple stores.Wei Mi2016-09-021-0/+103
| | | | | | | | | | | | | | For the store of a wide value merged from a pair of values, especially int-fp pair, sometimes it is more efficent to split it into separate narrow stores, which can remove the bitwise instructions or sink them to colder places. Now the feature is only enabled on x86 target, and only store of int-fp pair is splitted. It is possible that the application scope gets extended with perf evidence support in the future. Differential Revision: https://reviews.llvm.org/D22840 llvm-svn: 280505
* [DAGcombiner] Fix incorrect sinking of a truncate into the operand of a shift.Andrea Di Biagio2016-09-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression introduced by revision 268094. Revision 268094 added the following dag combine rule: // trunc (shl x, K) -> shl (trunc x), K => K < vt.size / 2 That rule converts a truncate of a shift-by-constant into a shift of a truncated value. We do this only if the shift count is less than half the size in bits of the truncated value (K < vt.size / 2). The problem is that the constraint on the shift count is incorrect, so the rule doesn't work well in some cases involving vector types. The combine rule should have been written instead like this: // trunc (shl x, K) -> shl (trunc x), K => K < vt.getScalarSizeInBits() Basically, if K is smaller than the "scalar size in bits" of the truncated value then we know that by "sinking" the truncate into the operand of the shift we would never accidentally make the shift undefined. This patch fixes the check on the shift count, and adds test cases to make sure that we don't regress the behavior. Differential Revision: https://reviews.llvm.org/D24154 llvm-svn: 280482
* [DAGCombine] Don't fold a trunc if it feeds an anyextMichael Kuperstein2016-09-011-0/+4
| | | | | | | | | | | | | | | Legalization tends to create anyext(trunc) patterns. This should always be combined - into either a single trunc, a single ext, or nothing if the types match exactly. But if we happen to combine the trunc first, we may pull the trunc away from the anyext or make it implicit (e.g. the truncate(extract) -> extract(bitcast) fold). To prevent this, we can avoid doing the fold, similarly to how we already handle fpround(fpextend). Differential Revision: https://reviews.llvm.org/D23893 llvm-svn: 280386
* Use SDValue::getOpcode() helper instead of via SDValue::getNode()Simon Pilgrim2016-08-201-2/+2
| | | | llvm-svn: 279381
* Replace a few more "fall through" comments with LLVM_FALLTHROUGHJustin Bogner2016-08-171-1/+1
| | | | | | Follow up to r278902. I had missed "fall through", with a space. llvm-svn: 278970
* Use the range variant of find instead of unpacking begin/endDavid Majnemer2016-08-111-2/+1
| | | | | | | | | If the result of the find is only used to compare against end(), just use is_contained instead. No functionality change is intended. llvm-svn: 278433
* Use range algorithms instead of unpacking begin/endDavid Majnemer2016-08-111-4/+3
| | | | | | No functionality change is intended. llvm-svn: 278417
* [DAGCombine] Avoid INSERT_SUBVECTOR reinsertions (PR28678)Simon Pilgrim2016-08-101-1/+10
| | | | | | | | | | | If the input vector to INSERT_SUBVECTOR is another INSERT_SUBVECTOR, and this inserted subvector replaces the last insertion, then insert into the common source vector. i.e. INSERT_SUBVECTOR( INSERT_SUBVECTOR( Vec, SubOld, Idx ), SubNew, Idx ) --> INSERT_SUBVECTOR( Vec, SubNew, Idx ) Differential Revision: https://reviews.llvm.org/D23330 llvm-svn: 278211
* [DAGCombiner] Better support for shifting large value type by constantsSimon Pilgrim2016-08-091-17/+42
| | | | | | | | | | As detailed on D22726, much of the shift combining code assume constant values will fit into a uint64_t value and calls ConstantSDNode::getZExtValue where it probably shouldn't (leading to asserts). Using APInt directly avoids this problem but we encounter other assertions if we attempt to compare/operate on 2 APInt of different bitwidths. This patch adds a helper function to ensure that 2 APInt values are zero extended as required so that they can be safely used together. I've only added an initial example use for this to the '(SHIFT (SHIFT x, c1), c2) --> (SHIFT x, (ADD c1, c2))' combines. Further cases can easily be added as required. Differential Revision: https://reviews.llvm.org/D23007 llvm-svn: 278141
* [X86] Heuristic to selectively build Newton-Raphson SQRT estimationNikolai Bozhenov2016-08-041-2/+6
| | | | | | | | | | | | | | | | | | | | | On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725
* Typo fix in comment. NFCDiana Picus2016-08-041-1/+1
| | | | llvm-svn: 277704
* [DAGCombine] Make sext(setcc) combine respect getBooleanContentsMichael Kuperstein2016-08-011-9/+23
| | | | | | | | | | | We used to combine "sext(setcc x, y, cc) -> (select (setcc x, y, cc), -1, 0)" Instead, we should combine to (select (setcc x, y, cc), T, 0) where the value of T is 1 or -1, depending on the type of the setcc, and getBooleanContents() for the type if it is not i1. This fixes PR28504. llvm-svn: 277371
* MachineFunction: Return reference for getFrameInfo(); NFCMatthias Braun2016-07-281-3/+3
| | | | | | | getFrameInfo() never returns nullptr so we should use a reference instead of a pointer. llvm-svn: 277017
* [DAGCombiner] Use APInt directly to detect out of range shift constantsSimon Pilgrim2016-07-271-3/+3
| | | | | | | | Using getZExtValue() will assert if the value doesn't fit into uint64_t - SHL was already doing this, I've just updated ASHR/LSHR to match As mentioned on D22726 llvm-svn: 276855
* [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, ↵Justin Lebar2016-07-151-106/+83
| | | | | | | | | | | | | | | | | | | | | | | getStore, and friends. Summary: Instead, we take a single flags arg (a bitset). Also add a default 0 alignment, and change the order of arguments so the alignment comes before the flags. This greatly simplifies many callsites, and fixes a bug in AMDGPUISelLowering, wherein the order of the args to getLoad was inverted. It also greatly simplifies the process of adding another flag to getLoad. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D22249 llvm-svn: 275592
* [DAG] make isConstantSplatVector() available to the rest of loweringSanjay Patel2016-07-101-32/+12
| | | | llvm-svn: 275025
* reformat, fix comments/names; NFCISanjay Patel2016-07-101-27/+22
| | | | llvm-svn: 275015
* Reapply r274829 with fix for FP vectorsMatt Arsenault2016-07-081-2/+4
| | | | llvm-svn: 274937
* Revert r274829, it caused PR28472.Nico Weber2016-07-081-1/+1
| | | | llvm-svn: 274916
* Do not expand SDIV when compiling for minimum code sizeSjoerd Meijer2016-07-081-0/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D22139 llvm-svn: 274855
* Addressing post-commit comments regarding not expanding UDIV;Sjoerd Meijer2016-07-081-2/+2
| | | | | | we don't expand only when compiling for minimum code size. llvm-svn: 274847
* Code size optimisation: don't expand a div to a mul and and a shift sequence.Sjoerd Meijer2016-07-081-0/+5
| | | | | | | | | As a result, the urem instruction will not be expanded to a sequence of umull, lsrs, muls and sub instructions, but just a call to __aeabi_uidivmod. Differential Revision: http://reviews.llvm.org/D22131 llvm-svn: 274843
* Bug 28444: Fix assertion when extract_vector_elt has mismatched typeMatt Arsenault2016-07-081-1/+1
| | | | | | | For some reason extract_vector_elt is sometimes allowed to have a different result type than the vector element type. llvm-svn: 274829
* [DAGCombiner] Fix visitSTORE to continue processing current SDNode, if ↵Tim Shen2016-07-061-6/+12
| | | | | | | | | | | | | | | | | findBetterNeighborChains doesn't actually CombineTo it. Summary: findBetterNeighborChains may or may not find a better chain for each node it finds, which include the node ("St") that visitSTORE is currently processing. If no better chain is found for St, visitSTORE should continue instead of return SDValue(St, 0), as if it's CombinedTo'ed. This fixes bug 28130. There might be other ways to make the test pass (see D21409). I think both of the patches are fixing actual bugs revealed by the same testcase. Reviewers: echristo, wschmidt, hfinkel, kbarton, amehsan, arsenm, nemanjai, bogner Subscribers: mehdi_amini, nemanjai, llvm-commits Differential Revision: http://reviews.llvm.org/D21692 llvm-svn: 274644
* Revert r259387: "AArch64: Implement missed conditional compare sequences."Balaram Makam2016-07-051-2/+2
| | | | | | | This reverts commit r259387 because it inserts illegal code after legalization in some backends where i64 OR type is illegal for example. llvm-svn: 274573
* DAGCombiner: Fold away vector extract of insert with the same indexMatt Arsenault2016-07-051-0/+8
| | | | | | | This only really matters when the index is non-constant since the constant case already gets taken care of by other combines. llvm-svn: 274569
* [CodeGen] Teach OR combine of shuffles involving zero vectors to better ↵Craig Topper2016-07-031-5/+10
| | | | | | | | handle undef indices. Undef indices can now be treated as zeros. Or if its undef ORed with zero, we will keep the undef. llvm-svn: 274472
* [CodeGen,Target] Remove the version of DAG.getVectorShuffle that takes a ↵Craig Topper2016-07-011-11/+9
| | | | | | | | pointer to a mask array. Convert all callers to use the ArrayRef version. No functional change intended. For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array. llvm-svn: 274337
* [DAGCombine] Teach DAG combine to handle ORs of shuffles involving zero ↵Craig Topper2016-06-291-45/+52
| | | | | | vectors where the zero vector is the first operand to the shuffle instead of the second. llvm-svn: 274097
* DAGCombiner: Don't narrow volatile vector loads + extractMatt Arsenault2016-06-271-3/+8
| | | | llvm-svn: 273909
* [SelectionDAG] Use DAG.getCommutedVectorShuffle instead of reimplementing it.Craig Topper2016-06-261-15/+2
| | | | llvm-svn: 273802
* Preserve DebugInfo when replacing values in DAGCombinerNirav Dave2016-06-231-2/+0
| | | | | | | | | | | | | | | | | | | | | Recommiting after correcting over-eager Debug Value transfer fixing PR28270. [DAG] Previously debug values would transfer debuginfo for the selected start node for a replacement which allows for debug to be dropped. Push debug value transfer to occur with node/value replacement in SelectionDAG, remove now extraneous transfers of debug values. This refixes PR9817 which was being incompletely checked in the testsuite. Reviewers: jyknight Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D21037 llvm-svn: 273585
* Revert r273456, "Preserve DebugInfo when replacing values in DAGCombiner" as ↵Peter Collingbourne2016-06-231-0/+2
| | | | | | it caused pr28270. llvm-svn: 273518
* Preserve DebugInfo when replacing values in DAGCombinerNirav Dave2016-06-221-2/+0
| | | | | | | | | | | | | | | | | | | | | Recommiting after fixing over-aggressive assertion [DAG] Previously debug values would transfer debuginfo for the selected start node for a replacement which allows for debug to be dropped. Push debug value transfer to occur with node/value replacement in SelectionDAG, remove now extraneous transfers of debug values. This refixes PR9817 which was being incompletely checked in the testsuite. Reviewers: jyknight Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D21037 llvm-svn: 273456
* Replace silly uses of 'signed' with 'int'David Majnemer2016-06-211-1/+1
| | | | llvm-svn: 273244
* [DAG] Remove redundant FMUL in Newton-Raphson SQRT codeSanjay Patel2016-06-161-53/+86
| | | | | | | | | | | | | | | | | When calculating a square root using Newton-Raphson with two constants, a naive implementation is to use five multiplications (four muls to calculate reciprocal square root and another one to calculate the square root itself). However, after some reassociation and CSE the same result can be obtained with only four multiplications. Unfortunately, there's no reliable way to do such a reassociation in the back-end. So, the patch modifies NR code itself so that it directly builds optimal code for SQRT and doesn't rely on any further reassociation. Patch by Nikolai Bozhenov! Differential Revision: http://reviews.llvm.org/D21127 llvm-svn: 272920
* Revert "Preserve DebugInfo when replacing values in DAGCombiner"Nirav Dave2016-06-151-0/+2
| | | | | | | | | Reverting due to assertion failure in lib/CodeGen/SelectionDAG/InstrEmitter.cpp This reverts commit r272792. llvm-svn: 272799
* Preserve DebugInfo when replacing values in DAGCombinerNirav Dave2016-06-151-2/+0
| | | | | | | | | | | | | | | | | | | [DAG] Previously debug values would transfer debuginfo for the selected start node for a replacement which allows for debug to be dropped. Push debug value transfer to occur with node/value replacement in SelectionDAG, remove now extraneous transfers of debug values. This refixes PR9817 which was being incompletely checked in the testsuite. Reviewers: jyknight Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D21037 llvm-svn: 272792
* Pass DebugLoc and SDLoc by const ref.Benjamin Kramer2016-06-121-35/+33
| | | | | | | | This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512
* Avoid copies of std::strings and APInt/APFloats where we only read from itBenjamin Kramer2016-06-081-3/+3
| | | | | | | | As suggested by clang-tidy's performance-unnecessary-copy-initialization. This can easily hit lifetime issues, so I audited every change and ran the tests under asan, which came back clean. llvm-svn: 272126
* transform obscured FP sign bit ops into a fabs/fneg using TLI hookSanjay Patel2016-06-021-0/+46
| | | | | | | | | | | | | | | | | | | This is effectively a revert of: http://reviews.llvm.org/rL249702 - [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886) and: http://reviews.llvm.org/rL249701 - [ValueTracking] teach computeKnownBits that a fabs() clears sign bits and a reimplementation as a DAG combine for targets that have IEEE754-compliant fabs/fneg instructions. This is intended to resolve the objections raised on the dev list: http://lists.llvm.org/pipermail/llvm-dev/2016-April/098154.html and: https://llvm.org/bugs/show_bug.cgi?id=24886#c4 In the interest of patch minimalism, I've only partly enabled AArch64. PowerPC, MIPS, x86 and others can enable later. Differential Revision: http://reviews.llvm.org/D19391 llvm-svn: 271573
* [DAG] use getBitcast() to reduce codeSanjay Patel2016-06-021-33/+25
| | | | | | | | | | Although this was intended to be NFC, the test case wiggle shows a change in code scheduling/RA caused by a difference in the SDLoc() generation. Depending on how you look at it, this is the (dis)advantage of exact checking in regression tests. llvm-svn: 271526
* DAGCombiner: Fix broken size check in isAliasMatt Arsenault2016-06-011-1/+1
| | | | | | | | | | | | | | | This should have been converting the size to bytes, but wasn't really. These should probably all be using getStoreSize instead. I haven't been able to come up with a meaningful testcase for this. I can trigger it using combinations of struct loads and stores, but can't observe a difference in non-broken testcases. isAlias is only really used during store merging, so I'm not sure how to get into the vector splitting situation the comment describes since store merging is only done before type legalization. llvm-svn: 271356
* [X86] Detect SAD patterns and emit psadbw instructions.Michael Kuperstein2016-05-271-7/+11
| | | | | | | | This recommits r267649 with a fix for PR27539. Differential Revision: http://reviews.llvm.org/D20598 llvm-svn: 271033
* Simplify std::all_of predicate (to one line) by using llvm::all_of. NFCI.Simon Pilgrim2016-05-251-3/+1
| | | | llvm-svn: 270747
* reduce indentation; NFCISanjay Patel2016-05-191-9/+7
| | | | llvm-svn: 270007
* Fix unused variable warning.Simon Pilgrim2016-05-071-1/+0
| | | | llvm-svn: 268867
* [SelectionDAG] Added bitreverse(bitreverse(v)) --> vSimon Pilgrim2016-05-071-0/+12
| | | | | | Added bitreverse creation testing llvm-svn: 268865
OpenPOWER on IntegriCloud