summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombiner] Teach createBuildVecShuffle to handle cases where input ↵Craig Topper2016-10-141-5/+9
| | | | | | | | vectors are less than half of the output vector size. This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code. llvm-svn: 284204
* [DAG] hoist DL(N) and fix formatting; NFCSanjay Patel2016-10-131-24/+31
| | | | llvm-svn: 284170
* LegalizeDAG: Implement PROMOTE for ISD::BITREVERSETom Stellard2016-10-131-1/+2
| | | | | | | | | | | | | | | Summary: This operation is promoted the same way was ISD::BSWAP. This will prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16 support is implemented. Reviewers: bogner, hfinkel Subscribers: hfinkel, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D25202 llvm-svn: 284163
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-10-131-120/+271
| | | | | | | | | UseAA is enabled." This reverts commit r284151 which appears to be triggering a LTO failures on Hexagon llvm-svn: 284157
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-10-131-271/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Retrying after upstream changes. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 284151
* [DAGCombiner] Add vector support to (mul (shl X, Y), Z) -> (shl (mul X, Z), ↵Simon Pilgrim2016-10-131-7/+6
| | | | | | Y) style combines llvm-svn: 284122
* [DAGCombiner] Add vector support to C2-(A+C1) -> (C2-C1)-A foldingSimon Pilgrim2016-10-131-5/+5
| | | | llvm-svn: 284117
* [DAGCombiner] Add vector support to (sub -1, x) -> (xor x, -1) canonicalizationSimon Pilgrim2016-10-131-1/+12
| | | | | | Improves commutation potential llvm-svn: 284113
* Create llvm.addressofreturnaddress intrinsicAlbert Gutowski2016-10-123-0/+6
| | | | | | | | | | | | Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows. Reviewers: majnemer, rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25293 llvm-svn: 284061
* [DAGCombiner] Update most ADD combines to support general vector combinesSimon Pilgrim2016-10-121-12/+54
| | | | | | | | Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors. Differential Revision: https://reviews.llvm.org/D25374 llvm-svn: 284015
* [DAGCombiner] Do not remove the load of stored values when optimizations are ↵Konstantin Zhuravlyov2016-10-121-1/+2
| | | | | | | | | | | | | | | | | | | | disabled This combiner breaks debug experience and should not be run when optimizations are disabled. For example: int main() { int j = 0; j += 2; if (j == 2) return 0; return 5; } When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function. Differential Revision: https://reviews.llvm.org/D19268 llvm-svn: 284014
* [DAG] Fix crash in build_vector -> vector_shuffle combineMichael Kuperstein2016-10-111-0/+5
| | | | | | | | Fixes a crash in the build_vector -> vector_shuffle combine when the first vector input is twice as wide as the output, and the second input vector is even wider. llvm-svn: 283953
* Silence -Wunused-but-set-variable warningArnold Schwaighofer2016-10-111-0/+1
| | | | llvm-svn: 283927
* [DAG] add fold for masked negated sign-extended boolSanjay Patel2016-10-111-5/+11
| | | | | | | This enhances the fold added with: https://reviews.llvm.org/rL283900 llvm-svn: 283905
* [DAG] add fold for masked negated extended boolSanjay Patel2016-10-111-2/+15
| | | | | | | | | | | | | | The non-obvious motivation for adding this fold (which already happens in InstCombine) is that we want to canonicalize IR towards select instructions and canonicalize DAG nodes towards boolean math. So we need to recreate some folds in the DAG to handle that change in direction. An interesting implementation difference for cases like this is that InstCombine generally works top-down while the DAG goes bottom-up. That means we need to detect different patterns. In this case, the SimplifyDemandedBits fold prevents us from performing a zext to sext fold that would then be recognized as a negation of a sext. llvm-svn: 283900
* [DAG] simplify logic; NFCSanjay Patel2016-10-111-8/+6
| | | | llvm-svn: 283885
* [DAG] hoist DL(N) and fix formatting; NFCSanjay Patel2016-10-111-25/+32
| | | | llvm-svn: 283884
* [DAG] fix formatting; NFCSanjay Patel2016-10-111-72/+68
| | | | llvm-svn: 283878
* [SelectionDAGBuilder] Support llvm.flt.rounds on targets where i32 is not legalHal Finkel2016-10-102-0/+15
| | | | | | | | | | | Add integer expansion for FLT_ROUNDS_ for targets where i32 is not a legal type. Patch by Edward Jones, thanks! Differential Revision: https://reviews.llvm.org/D24459 llvm-svn: 283797
* DAG: Setting Masked-Expand-Load as a variant of Masked-Load nodeElena Demikhovsky2016-10-092-7/+7
| | | | | | | | | | Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector. Right now, the node is used in intrinsics for VEXPAND* instructions. The work is done towards implementation of masked.expandload and masked.compressstore intrinsics. Differential Revision: https://reviews.llvm.org/D25322 llvm-svn: 283694
* swifterror: Don't compute swifterror vregs during instruction selectionArnold Schwaighofer2016-10-074-152/+184
| | | | | | | | | | | | | | | | | | | | | | | | | The code used llvm basic block predecessors to decided where to insert phi nodes. Instruction selection can and will liberally insert new machine basic block predecessors. There is not a guaranteed one-to-one mapping from pred. llvm basic blocks and machine basic blocks. Therefore the current approach does not work as it assumes we can mark predecessor machine basic block as needing a copy, and needs to know the set of all predecessor machine basic blocks to decide when to insert phis. Instead of computing the swifterror vregs as we select instructions, propagate them at the end of instruction selection when the MBB CFG is complete. When an instruction needs a swifterror vreg and we don't know the value yet, generate a new vreg and remember this "upward exposed" use, and reconcile this at the end of instruction selection. This will only happen if the target supports promoting swifterror parameters to registers and the swifterror attribute is used. rdar://28300923 llvm-svn: 283617
* [DAG] clean up foldSelectOfConstants(); NFCISanjay Patel2016-10-071-19/+14
| | | | | | | Rename variables, simplify logic. Not clear yet why we don't handle a target with ZeroOrNegativeOneBooleanContent too. llvm-svn: 283613
* [DAG] move fold (select C, 0, 1 -> xor C, 1) to a helper function; NFCSanjay Patel2016-10-071-16/+31
| | | | | | We're missing at least 3 other similar folds based on what we have in InstCombine. llvm-svn: 283596
* Delete some dead code in SelectionDAG (NFC)Vedant Kumar2016-10-063-59/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D24435 llvm-svn: 283505
* Handle *_EXTEND_VECTOR_INREG during Integer LegalizationPirama Arumuga Nainar2016-10-062-0/+25
| | | | | | | | | | | | | | | | Summary: These nodes need legalization for 3-element vectors. This commit handles the legalization and adds tests for zext and sext. This fixes PR30614. Reviewers: RKSimon, srhines Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25268 llvm-svn: 283496
* [DAG] Generalize build_vector -> vector_shuffle combine for more than 2 inputsMichael Kuperstein2016-10-061-135/+191
| | | | | | | | | | | | | | | | This generalizes the build_vector -> vector_shuffle combine to support any number of inputs. The idea is to create a binary tree of shuffles, where the first layer performs pairwise shuffles of the input vectors placing each input element into the correct lane, and the rest of the tree blends these shuffles together. This doesn't try to be smart and create any sort of "optimal" shuffles. The assumption is that even a "poor" shuffle sequence is better than extracting and inserting the elements one by one. Differential Revision: https://reviews.llvm.org/D24683 llvm-svn: 283480
* FastISel: Remove unused/un-overridden entry points. NFCI.Peter Collingbourne2016-10-051-23/+0
| | | | llvm-svn: 283366
* [DAG] Teach computeKnownBits and ComputeNumSignBits in SelectionDAG to look ↵Bjorn Pettersson2016-10-051-0/+34
| | | | | | | | | | | | | | | | | through EXTRACT_VECTOR_ELT. Summary: Both computeKnownBits and ComputeNumSignBits can now do a simple look-through of EXTRACT_VECTOR_ELT. It will compute the result based on the known bits (or known sign bits) for the vector that the element is extracted from. Reviewers: bogner, tstellarAMD, mkuper Subscribers: wdng, RKSimon, jyknight, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25007 llvm-svn: 283347
* Use StringRef in FastISel API (NFC)Mehdi Amini2016-10-051-1/+1
| | | | llvm-svn: 283291
* [SelectionDAG] Fix calling convention in expansion of ?MULO.whitequark2016-10-041-2/+12
| | | | | | | | | | | | | | | | | | | | | The SMULO/UMULO DAG nodes, when not directly supported by the target, expand to a multiplication twice as wide. In case that the resulting type is not legal, an __mul?i3 intrinsic is used. Since the type is not legal, the legalizer cannot directly call the intrinsic with the wide arguments; instead, it "pre-lowers" them by splitting them in halves. The "pre-lowering" code in essence made assumptions about the calling convention, specifically that i(N*2) values will be split into two iN values and passed in consecutive registers in little-endian order. This, naturally, breaks on a big-endian system, such as our OR1K out-of-tree backend. Thanks to James Miller <james@aatch.net> for help in debugging. Differential Revision: https://reviews.llvm.org/D25223 llvm-svn: 283203
* fix formatting; NFCSanjay Patel2016-10-031-8/+5
| | | | llvm-svn: 283115
* Revert "In visitSTORE, always use FindBetterChain, rather than only when ↵Nirav Dave2016-09-281-120/+271
| | | | | | | | UseAA is enabled." This reverts commit r282600 due to test failues with MCJIT llvm-svn: 282604
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2016-09-281-271/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates *worse* code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores *CAN* be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll - This test appears to work but no longer exhibits the spill behavior. Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 282600
* [DAG] Remove isVectorClearMaskLegal() check from vector_build dagcombineMichael Kuperstein2016-09-281-7/+0
| | | | | | | | | | | | This check currently doesn't seem to do anything useful on any in-tree target: On non-x86, it always evaluates to false, so we never hit the code path that creates the shuffle with zero. On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to query in general, but doesn't make sense if only restricted to zero blends. Differential Revision: https://reviews.llvm.org/D24625 llvm-svn: 282567
* Add support to optionally limit the size of jump tables.Evandro Menezes2016-09-261-12/+26
| | | | | | | | | | | | | | | | | | | Many high-performance processors have a dedicated branch predictor for indirect branches, commonly used with jump tables. As sophisticated as such branch predictors are, they tend to have well defined limits beyond which their effectiveness is hampered or even nullified. One such limit is the number of possible destinations for a given indirect branches that such branch predictors can handle. This patch considers a limit that a target may set to the number of destination addresses in a jump table. Patch by: Evandro Menezes <e.menezes@samsung.com>, Aditya Kumar <aditya.k7@samsung.com>, Sebastian Pop <s.pop@samsung.com>. Differential revision: https://reviews.llvm.org/D21940 llvm-svn: 282412
* [X86][avx512] Fix bug in masked compress store.Ayman Musa2016-09-261-3/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D23984 llvm-svn: 282381
* [DAG] Fix incorrect alignment of ext load.Nirav Dave2016-09-221-1/+1
| | | | | | | | | | | | Correctly use alignment size from loaded size not output value size. Reviewers: jyknight, tstellarAMD, arsenm Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23356 llvm-svn: 282177
* Disable tail calls if there is an swifterror argumentArnold Schwaighofer2016-09-211-0/+5
| | | | | | | | | ISel does not handle them correctly yet i.e we crash trying to emit tail call code. radar://28407842 llvm-svn: 282088
* [AVX-512] Don't lower CVTPD2PS intrinsics to ISD::FP_ROUND with an X86 ↵Craig Topper2016-09-181-1/+2
| | | | | | | | rounding mode encoding in the second operand. This immediate should only be 0 or 1 and indicates if the truncation loses precision. Also enhance an assert in SelectionDAG::getNode to flag this sort of problem in the future. llvm-svn: 281868
* [X86][SSE] Improve recognition of uitofp conversions that can be performed ↵Simon Pilgrim2016-09-181-4/+0
| | | | | | | | | | | | | | as sitofp With D24253 we can now use SelectionDAG::SignBitIsZero with vector operations. This patch uses SelectionDAG::SignBitIsZero to recognise that a zero sign bit means that we can use a sitofp instead of a uitofp (which is not directly support on pre-AVX512 hardware). While AVX512 does provide support for uitofp, the conversion to sitofp should not cause any regressions. Differential Revision: https://reviews.llvm.org/D24343 llvm-svn: 281852
* Change the order of the splitted store from high - low to low - high.Wei Mi2016-09-181-2/+2
| | | | | | | It is a trivial change which could make the testcase easier to be reused for the store splitting in CodeGenPrepare. llvm-svn: 281846
* Make analyzeBranch family of instruction names consistentMatt Arsenault2016-09-141-1/+1
| | | | | | | analyzeBranch was renamed to use lowercase first, rename the related set to match. llvm-svn: 281506
* getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits(), round 2 ↵Sanjay Patel2016-09-142-5/+4
| | | | | | ; NFCI llvm-svn: 281498
* getVectorElementType().getSizeInBits() -> getScalarSizeInBits() ; NFCISanjay Patel2016-09-146-16/+14
| | | | llvm-svn: 281495
* getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCISanjay Patel2016-09-1412-62/+54
| | | | llvm-svn: 281493
* getValueType().getScalarSizeInBits() -> getScalarValueSizeInBits() ; NFCISanjay Patel2016-09-143-46/+33
| | | | llvm-svn: 281490
* getScalarType().getSizeInBits() -> getScalarSizeInBits() ; NFCISanjay Patel2016-09-146-70/+70
| | | | llvm-svn: 281489
* [CodeGen] Fix invalid shift in mul expansionPawel Bylica2016-09-131-6/+11
| | | | | | | | | | | | Summary: When expanding mul in type legalization make sure the type for shift amount can actually fit the value. This fixes PR30354 https://llvm.org/bugs/show_bug.cgi?id=30354. Reviewers: hfinkel, majnemer, RKSimon Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D24478 llvm-svn: 281403
* [DAG] Allow build-to-shuffle combine to combine builds from two wide vectors.Michael Kuperstein2016-09-131-27/+53
| | | | | | | | | | | This allows us to, in some cases, create a vector_shuffle out of a build_vector, when the inputs to the build are extract_elements from two different vectors, at least one of which is wider than the output. (E.g. a <8 x i16> being constructed out of elements from a <16 x i16> and a <8 x i16>). Differential Revision: https://reviews.llvm.org/D24491 llvm-svn: 281402
* [DAGCombiner] Use APInt directly in (shl (zext (srl x, C)), C) combine range ↵Simon Pilgrim2016-09-131-2/+2
| | | | | | | | | | test To avoid assertion, we must ensure that the inner shift constant is within range before calling ConstantSDNode::getZExtValue(). We already know that the outer shift constant is in range. Followup to D23007 llvm-svn: 281362
OpenPOWER on IntegriCloud