summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "Revert "Fix merges of non-zero vector stores""Matt Arsenault2015-06-161-6/+20
| | | | | | | | Reapply r239539. Don't assume the collected number of stores is the same vector size. Just take the first N stores to fill the vector. llvm-svn: 239825
* [DAGCombiner] Added BSWAP(BSWAP(x)) -> x combine pattern.Simon Pilgrim2015-06-131-0/+3
| | | | llvm-svn: 239682
* [DAGCombiner] Added BSWAP vector constant folding support.Simon Pilgrim2015-06-131-0/+12
| | | | llvm-svn: 239675
* Stripped trailing whitespace. NFC.Simon Pilgrim2015-06-131-5/+5
| | | | llvm-svn: 239674
* Revert "Fix merges of non-zero vector stores"Reid Kleckner2015-06-111-19/+6
| | | | | | | | This reverts commit r239539. It was causing SDAG assertions while building freetype. llvm-svn: 239543
* Fix merges of non-zero vector storesMatt Arsenault2015-06-111-6/+19
| | | | | | | | | | Now actually stores the non-zero constant instead of 0. I somehow forgot to include this part of r238108. The test change was just an independent instruction order swap, so just add another check line to satisfy CHECK-NEXT. llvm-svn: 239539
* [DAGCombiner] Added CTLZ vector constant folding support.Simon Pilgrim2015-06-081-2/+2
| | | | llvm-svn: 239305
* [DAGCombiner] Added CTTZ vector constant folding support.Simon Pilgrim2015-06-081-2/+2
| | | | llvm-svn: 239293
* [DAGCombiner] Added CTPOP vector constant folding support.Simon Pilgrim2015-06-071-1/+1
| | | | | | Added tests to the existing SSE/AVX test files. llvm-svn: 239252
* DAGCombiner: don't duplicate (fmul x, c) in visitFNEG if fneg is freeFiona Glaser2015-06-051-1/+2
| | | | | | | | | | For targets with a free fneg, this fold is always a net loss if it ends up duplicating the multiply, so definitely avoid it. This might be true for some targets without a free fneg too, but I'll leave that for future investigation. llvm-svn: 239167
* Simplify code; NFC.Andrea Di Biagio2015-06-051-7/+7
| | | | | | | | Also, moved test cases from CodeGen/X86/fold-buildvector-bug.ll into CodeGen/X86/buildvec-insertvec.ll and regenerated CHECK lines using update_llc_test_checks.py. llvm-svn: 239142
* [DAGCombiner] Fix wrong folding of a build_vector into a blend with zero.Andrea Di Biagio2015-06-041-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Method 'visitBUILD_VECTOR' in the DAGCombiner knows how to combine a build_vector of a bunch of extract_vector_elt nodes and constant zero nodes into a shuffle blend with a zero vector. However, method 'visitBUILD_VECTOR' forgot that a floating point build_vector may contain negative zero as well as positive zero. Example: define <2 x double> @example(<2 x double> %A) { entry: %0 = extractelement <2 x double> %A, i32 0 %1 = insertelement <2 x double> undef, double %0, i32 0 %2 = insertelement <2 x double> %1, double -0.0, i32 1 ret <2 x double> %2 } Before this patch, llc (with -mattr=+sse4.1) wrongly generated movq %xmm0, %xmm0 # xmm0 = xmm0[0],zero So, the sign bit of the negative zero was effectively lost. This patch fixes the problem by adding explicit checks for positive zero. With this patch, llc produces the following code for the example above: movhpd .LCPI0_0(%rip), %xmm0 where .LCPI0_0 referes to a 'double -0'. llvm-svn: 239070
* Pass address space to isLegalAddressingMode in DAGCombinerMatt Arsenault2015-06-041-1/+5
| | | | | | | No test because I don't know of a target that makes use of address spaces and indexed load / store. llvm-svn: 239051
* Add target hook to allow merging stores of nonzero constantsMatt Arsenault2015-05-241-3/+10
| | | | | | | | | | On GPU targets, materializing constants is cheap and stores are expensive, so only doing this for zero vectors was silly. Most of the new testcases aren't optimally merged, and are for later improvements. llvm-svn: 238108
* [X86][SSE] Improve support for 128-bit vector sign extensionSimon Pilgrim2015-05-211-3/+20
| | | | | | | | | | This patch improves support for sign extension of the lower lanes of vectors of integers by making use of the SSE41 pmovsx* sign extension instructions where possible, and optimizing the sign extension by shifts on pre-SSE41 targets (avoiding the use of i64 arithmetic shifts which require scalarization). It converts SIGN_EXTEND nodes to SIGN_EXTEND_VECTOR_INREG where necessary, that more closely matches the pmovsx* instruction than the default approach of using SIGN_EXTEND_INREG which splits the operation (into an ANY_EXTEND lowered to a shuffle followed by shifts) making instruction matching difficult during lowering. Necessary support for SIGN_EXTEND_VECTOR_INREG has been added to the DAGCombiner. Differential Revision: http://reviews.llvm.org/D9848 llvm-svn: 237885
* DAGCombiner: Continue combining if FoldConstantArithmetic() fails.Matthias Braun2015-05-201-73/+95
| | | | | | | | | | | | DAG.FoldConstantArithmetic() can fail even though both operands are Constants if OpaqueConstants are involved. Continue trying other combine possibilities in tis case. Differential Revision: http://reviews.llvm.org/D6946 Somewhat related to PR21801 / rdar://19211454 llvm-svn: 237822
* use 'auto *' for pointers; clearer usage, no deep copyingSanjay Patel2015-05-191-2/+2
| | | | llvm-svn: 237719
* tidy upSanjay Patel2015-05-191-5/+5
| | | | | | | | 1. remove duplicate local variable 2. add local variable with name to match comment 3. remove useless comment llvm-svn: 237715
* use range-based for-loopSanjay Patel2015-05-191-7/+4
| | | | llvm-svn: 237711
* use range-based for loopSanjay Patel2015-05-191-5/+5
| | | | llvm-svn: 237705
* DAGCombiner: Factor common pattern into isOneConstant() function. NFCMatthias Braun2015-05-191-35/+32
| | | | llvm-svn: 237645
* DAGCombiner: Factor common pattern into isAllOnesConstant() function. NFCMatthias Braun2015-05-191-46/+43
| | | | llvm-svn: 237644
* DAGCombiner: Use isNullConstant() where possibleMatthias Braun2015-05-191-19/+13
| | | | llvm-svn: 237643
* Revert accidental change in r237633Matthias Braun2015-05-181-1/+1
| | | | llvm-svn: 237635
* DAGCombiner: Factor common pattern into isNullConstant() function. NFCMatthias Braun2015-05-181-71/+59
| | | | llvm-svn: 237633
* [DAGCombine] Be more pedantic about use iteration in ↵Hal Finkel2015-05-181-11/+12
| | | | | | | | | | | | | | | | | | | | | | | CombineToPreIndexedLoadStore In CombineToPreIndexedLoadStore, when the offset is a constant, we have code that looks for other uses of the pointer which are constant offset computations so that they can be rewritten in terms of the updated pointer so that we don't need to keep a copy of the base pointer to compute these constant offsets. Unfortunately, when it iterated over the uses, it did so by SDNodes, and so we could confuse ourselves if the base pointer was produced by a node that had multiple results (because we would not immediately exclude uses of the other node results). This was reported as PR22755. Unfortunately, we don't have a test case (and I've also been unable to produce one thus far), but at least the mistake is clear. The right way to fix this problem is to make use of the information contained in the use iterators to filter out any uses of other results of the node producing the base pointer. This should be mostly NFC, but should also fix PR22755 (for which, unfortunately, we have no in-tree test case). llvm-svn: 237576
* Revert r237046. See the testcase on the thread where r237046 was committed.Nick Lewycky2015-05-131-2/+5
| | | | llvm-svn: 237317
* propagate IR-level fast-math-flags to DAG nodes; 2nd try; NFCSanjay Patel2015-05-111-5/+2
| | | | | | | | | | | | | | | | | | | | | | | This is a less ambitious version of: http://reviews.llvm.org/rL236546 because that was reverted in: http://reviews.llvm.org/rL236600 because it caused memory corruption that wasn't related to FMF but was actually due to making nodes with 2 operands derive from a plain SDNode rather than a BinarySDNode. This patch adds the minimum plumbing necessary to use IR-level fast-math-flags (FMF) in the backend without actually using them for anything yet. This is a follow-on to: http://reviews.llvm.org/rL235997 ...which split the existing nsw / nuw / exact flags and FMF into their own struct. llvm-svn: 237046
* Fix MergeConsecutiveStore for non-byte-sized memory accesses.James Y Knight2015-05-091-0/+4
| | | | | | | | | | | | | | | | | | | The bug showed up as a compile-time assertion failure: Assertion `NumBits >= MIN_INT_BITS && "bitwidth too small"' failed when building msan tests on x86-64. Prior to r236850, this bug was masked due to a bogus alignment check, which also accidentally rejected non-byte-sized accesses. Afterwards, an invalid ElementSizeBytes == 0 got further into the function, and triggered the assertion failure. It would probably be a good idea to allow it to handle merging stores of unusual widths as well, but for now, to un-break it, I'm just making the minimal fix. Differential Revision: http://reviews.llvm.org/D9626 llvm-svn: 236927
* Fix alignment checks in MergeConsecutiveStores.James Y Knight2015-05-081-36/+52
| | | | | | | | | | | | | | | 1) check whether the alignment of the memory is sufficient for the *merged* store or load to be efficient. Not doing so can result in some ridiculously poor code generation, if merging creates a vector operation which must be aligned but isn't. 2) DON'T check that the alignment of each load/store is equal. If you're merging 2 4-byte stores, the first *might* have 8-byte alignment, but the second certainly will have 4-byte alignment. We do want to allow those to be merged. llvm-svn: 236850
* Revert r236546, "propagate IR-level fast-math-flags to DAG nodes (NFC)"NAKAMURA Takumi2015-05-061-3/+6
| | | | | | It caused undefined behavior. llvm-svn: 236600
* propagate IR-level fast-math-flags to DAG nodes (NFC)Sanjay Patel2015-05-051-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds the minimum plumbing necessary to use IR-level fast-math-flags (FMF) in the backend without actually using them for anything yet. This is a follow-on to: http://reviews.llvm.org/rL235997 ...which split the existing nsw / nuw / exact flags and FMF into their own struct. There are 2 structural changes here: 1. The main diff is that we're preparing to extend the optimization flags to affect more than just binary SDNodes. Eg, IR intrinsics ( https://llvm.org/bugs/show_bug.cgi?id=21290 ) or non-binop nodes that don't even exist in IR such as FMA, FNEG, etc. 2. The other change is that we're actually copying the FP fast-math-flags from the IR instructions to SDNodes. Differential Revision: http://reviews.llvm.org/D8900 llvm-svn: 236546
* [DAGCombiner] Account for getVectorIdxTy() when narrowing vector loadUlrich Weigand2015-05-051-2/+3
| | | | | | | | | | | This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert the element number from getVectorIdxTy() to PtrTy before doing pointer arithmetic on it. This is needed on z, where element numbers are i32 but pointers are i64. Original patch by Richard Sandiford. llvm-svn: 236530
* [DAGCombiner] Fix ReplaceExtractVectorEltOfLoadWithNarrowedLoad for BEUlrich Weigand2015-05-051-7/+0
| | | | | | | | | | | | | For little-endian, the function would convert (extract_vector_elt (load X), Y) to X + Y*sizeof(elt). For big-endian it would instead use X + sizeof(vec) - Y*sizeof(elt). The big-endian case wasn't right since vector index order always follows memory/array order, even for big-endian. (Note that the current handling has to be wrong for Y==0 since it would access beyond the end of the vector.) Original patch by Richard Sandiford. llvm-svn: 236529
* [DAGCombiner] Enabled vector float/double -> int constant foldingSimon Pilgrim2015-05-021-4/+2
| | | | llvm-svn: 236387
* Masked gather and scatter - added DAGCombine visitorsElena Demikhovsky2015-04-301-0/+142
| | | | | | | | | and AVX-512 instruction selection patterns. All other patches, including tests will follow. http://reviews.llvm.org/D7665 llvm-svn: 236211
* Semantically revert r236031, which is not a good idea for in-order targets.Owen Anderson2015-04-301-31/+0
| | | | | | | | | | | | At the least it should be guarded by some kind of target hook. It also introduced catastrophic compile time and code quality regressions on some out of tree targets (test case still being reduced/sanitized). Sanjay agreed with reverting this patch until these issues can be resolved. llvm-svn: 236199
* generalize binop reassociation; NFCSanjay Patel2015-04-291-17/+30
| | | | | | | | | | | | | Move the fold introduced in r236031: http://reviews.llvm.org/rL236031 to its own helper function, so we can use it for other binops. This is a preliminary step before partially solving: https://llvm.org/bugs/show_bug.cgi?id=21768 https://llvm.org/bugs/show_bug.cgi?id=23116 llvm-svn: 236171
* tidy up; NFCSanjay Patel2015-04-291-41/+28
| | | | llvm-svn: 236156
* too much space again; NFCSanjay Patel2015-04-291-4/+0
| | | | llvm-svn: 236150
* too much space; NFCSanjay Patel2015-04-291-4/+0
| | | | llvm-svn: 236147
* transform fadd chains to increase parallelismSanjay Patel2015-04-281-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a compromise: with this simple patch, we should always handle a chain of exactly 3 operations optimally, but we're not generating the optimal balanced binary tree for a longer sequence. In general, this transform will reduce the dependency chain for a sequence of instructions using N operands from a worst case N-1 dependent operations to N/2 dependent operations. The optimal balanced binary tree would reduce the chain to log2(N). The trade-off for not dealing with longer sequences is: (1) we have less complexity in the compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we don't need to worry about the increased register pressure required to parallelize longer sequences. It also seems unlikely that we would ever encounter really long strings of dependent ops like that in the wild, but I'm not sure how to verify that speculation. FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math and this patch. We can extend this patch to cover other associative operations such as fmul, fmax, fmin, integer add, integer mul. This is a partial fix for: https://llvm.org/bugs/show_bug.cgi?id=17305 and if extended: https://llvm.org/bugs/show_bug.cgi?id=21768 https://llvm.org/bugs/show_bug.cgi?id=23116 The issue also came up in: http://reviews.llvm.org/D8941 Differential Revision: http://reviews.llvm.org/D9232 llvm-svn: 236031
* move IR-level optimization flags into their own structSanjay Patel2015-04-281-3/+4
| | | | | | | | | | | | | | | | | | | | | | This is a preliminary step to using the IR-level floating-point fast-math-flags in the SDAG (D8900). In this patch, we introduce the optimization flags as their own struct. As noted in the TODO comment, we should eventually share this data between the IR passes and the backend. We also switch the existing nsw / nuw / exact bit functionality of the BinaryWithFlagsSDNode class to use the new struct. The tradeoff is that instead of using the free but limited space of SDNode's SubclassData, we add a data member to the subclass. This means we don't have to repeat all of the get/set methods per flag, but we're potentially adding size to all nodes of this subclassi type. In practice on 64-bit systems (measured on Linux and MacOS X), there is no size difference between an SDNode and BinaryWithFlagsSDNode after this change: they're both 80 bytes. This means that we had at least one free byte to play with due to struct alignment. Differential Revision: http://reviews.llvm.org/D9325 llvm-svn: 235997
* Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"Sergey Dmitrouk2015-04-281-355/+490
| | | | | | | | | | | | | | | | | | | | | | | | | [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989
* Revert "[DebugInfo] Add debug locations to constant SD nodes"Daniel Jasper2015-04-281-490/+355
| | | | | | | This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987
* [DebugInfo] Add debug locations to constant SD nodesSergey Dmitrouk2015-04-281-355/+490
| | | | | | | | | | | | | | | | | | | | | | | This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977
* [DAGCombiner] Fix the type used in canFoldInAddressingMode to account for theQuentin Colombet2015-04-241-2/+2
| | | | | | | | | | | | | | | | | | right scaling. In the function canFoldInAddressingMode, VT is computed as the type of the destination/source of a LOAD/STORE operations, instead of the memory type of the operation. On targets with a scaling factor on the offset of the LOAD/STORE operations, the function may return false for actually valid cases. This may then prevent the selection of profitable pre or post indexed load/store operations, and instead select pre or post indexed load/store for unprofitable cases. Patch by Francois de Ferriere <francois.de-ferriere@st.com>! Differential Revision: http://reviews.llvm.org/D9146 llvm-svn: 235780
* [DAGCombiner] Remove extra bitcasts surrounding vector shuffles Simon Pilgrim2015-04-231-0/+45
| | | | | | | | Patch to remove extra bitcasts from shuffles, this is often a legacy of XformToShuffleWithZero being used to combine bitmaskings (of float vectors bitcast to integer vectors) into shuffles: bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) Differential Revision: http://reviews.llvm.org/D9097 llvm-svn: 235578
* Fixed logic to enable complex FMA formation.Olivier Sallenave2015-04-221-2/+2
| | | | llvm-svn: 235508
* [DAGCombine] Disable select(c, load,load) for indexed loadsHal Finkel2015-04-221-0/+3
| | | | | | | | | This turned up after r235333, but was a pre-existing bug. The optimization which transforms select(c, load, load) into a load of a select of the addresses does not handle indexed loads (pre/post inc/dec). However, it did not check for them either, leading to a crash if it tried to transform one of them. llvm-svn: 235497
OpenPOWER on IntegriCloud