summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [CodeGen,Target] Remove the version of DAG.getVectorShuffle that takes a ↵Craig Topper2016-07-011-3/+2
| | | | | | | | pointer to a mask array. Convert all callers to use the ArrayRef version. No functional change intended. For the most part this simplifies all callers. There were two places in X86 that needed an explicit makeArrayRef to shorten a statically sized array. llvm-svn: 274337
* [SelectionDAG] Attempt to split BITREVERSE vector legalization into BSWAP ↵Simon Pilgrim2016-05-121-5/+32
| | | | | | | | | | | | | | and BITREVERSE stages For BITREVERSE, bit shifting/masking every bit in a vector element is a very lengthy procedure. If the input vector type is a whole multiple of bytes wide then we can split this into a BSWAP shuffle stage (to reverse at the byte level) and then a BITREVERSE stage applied to each byte. Most vector capable targets can efficiently BSWAP using shuffles resulting in a considerable reduction in instructions. With this patch targets would only need to implement a target specific vXi8 BITREVERSE implementation to efficiently reverse most legal vector types. Differential Revision: http://reviews.llvm.org/D19978 llvm-svn: 269290
* [SelectionDAG] BITREVERSE vector legalization of bit operations (REAPPLIED)Simon Pilgrim2016-05-041-2/+2
| | | | | | | | Some vector bit operations are promoted instead of having custom lowering. This patch changes the isOperationLegalOrCustom tests for vector AND/OR operations to use a new TLI helper isOperationLegalOrCustomOrPromote instead, allowing the SSE implementations to stay on the simd unit. Differential Revision: http://reviews.llvm.org/D19805 llvm-svn: 268561
* Revert r268504Simon Pilgrim2016-05-041-2/+2
| | | | llvm-svn: 268526
* [SelectionDAG] BITREVERSE vector legalization of bit operationsSimon Pilgrim2016-05-041-2/+2
| | | | | | | | Vector bit operations are typically promoted instead of having custom lowering. This patch changes the isOperationLegalOrCustom tests for vector AND/OR operations to use isOperationLegalOrPromote instead, allowing the SSE implementations to stay on the simd unit. Differential Revision: http://reviews.llvm.org/D19805 llvm-svn: 268504
* [SelectionDAG] Teach LegalizeVectorOps to directly Expand ↵Craig Topper2016-04-211-3/+5
| | | | | | | | CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to CTTZ/CTLZ directly if those ops are Legal/Custom instead of deferring it to LegalizeOps. This is needed to support CTTZ/CTLZ Custom correctly since LegalizeOps would be too late to do the custom lowering. llvm-svn: 266951
* LegalizeDAG: Don't replace vector store with integer if not legalMatt Arsenault2016-03-301-41/+27
| | | | | | | | | | | For the same reason as the corresponding load change. Note that ExpandStore is completely broken for non-byte sized element vector stores, but preserve the current broken behavior which has tests for it. The behavior should be the same, but now introduces a new typed store that is incorrectly split later rather than doing it directly. llvm-svn: 264928
* LegalizeDAG: Don't replace vector load with integer unless legalMatt Arsenault2016-03-301-28/+21
| | | | | | | | | | | | | On AMDGPU we want to be able to promote i64/f64 loads to v2i32. If the access is unaligned, this would conclude that since i64 is legal, it would convert it back to i64 and there is an endless legalization loop. Extract the logic for scalarizing the load into a new TargetLowering function, where this can also replace the custom function AMDGPU has for this. llvm-svn: 264927
* [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ↵Simon Pilgrim2016-03-101-1/+1
| | | | | | | | | | | | ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159
* [CodeGen] Document and use getConstant's splat-building feature. NFC.Ahmed Bougacha2016-02-151-4/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D17229 llvm-svn: 260901
* [CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI.Ahmed Bougacha2016-02-091-2/+1
| | | | llvm-svn: 260316
* [SelectionDAG] Teach LegalizeVectorOps to not unroll CTLZ_ZERO_UNDEF and ↵Craig Topper2015-12-271-0/+14
| | | | | | CTTZ_ZERO_UNDEF if the non-ZERO_UNDEF form is legal or custom. Will be used to simplify X86 code in a follow on commit. llvm-svn: 256476
* AMDGPU: Use generic bitreverse intrinsicMatt Arsenault2015-12-141-1/+24
| | | | | | Also fix bug in vector legalization for bitreverse. llvm-svn: 255512
* Revert r248483, r242546, r242545, and r242409 - absdiff intrinsicsHal Finkel2015-12-111-34/+0
| | | | | | | | | | | | | | | | | | | | After much discussion, ending here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html it has been decided that, instead of having the vectorizer directly generate special absdiff and horizontal-add intrinsics, we'll recognize the relevant reduction patterns during CodeGen. Accordingly, these intrinsics are not needed (the operations they represent can be pattern matched, as is already done in some backends). Thus, we're backing these out in favor of the current development work. r248483 - Codegen: Fix llvm.*absdiff semantic. r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation llvm-svn: 255387
* AVX-512: Fixed masked load / store instruction selection for KNL.Elena Demikhovsky2015-12-071-1/+4
| | | | | | | | | | | | | | | Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store. This intrinsic comes with all legal types: <8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru), but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only. All data operands should be widened to 512-bit vector. The mask operand should be widened to v16i1 with zeroes. Differential Revision: http://reviews.llvm.org/D15265 llvm-svn: 254909
* Fix some places where we were assuming that memory type had been legalizedEric Christopher2015-11-251-1/+1
| | | | | | | | | | to a simple type when lowering a truncating store of a vector type. In this case for an EVT we'll return Expand as we should in all of the cases anyhow. The testcase triggered at the one in VectorLegalizer::LegalizeOp, inspection found the rest. llvm-svn: 254061
* Do not use "else" when both branches return (NFC)Mehdi Amini2015-10-271-2/+1
| | | | | From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 251398
* Two switch blocks in VectorLegalizer::LegalizeOp already have aArtyom Skrobov2015-10-201-0/+1
| | | | | | | | | | default: llvm_unreachable("This action is not supported yet!"); -- so I'm adding one to the third switch block, too. This is a follow-up fix for http://reviews.llvm.org/D13862 llvm-svn: 250830
* Combining DIV+REM->DIVREM doesn't belong in LegalizeDAG; move it over into ↵Artyom Skrobov2015-10-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | DAGCombiner. Summary: In addition to moving the code over, this patch amends the DIV,REM -> DIVREM combining to run on all affected nodes at once: if the nodes are converted to DIVREM one at a time, then the resulting DIVREM may get legalized by the backend into something target-specific that we won't be able to recognize and correlate with the remaining nodes. The motivation is to "prepare terrain" for D13862: when we set DIV and REM to be legalized to libcalls, instead of the DIVREM, we otherwise lose the ability to combine them together. To prevent this, we need to take the DIV,REM -> DIVREM combining out of the lowering stage. Reviewers: RKSimon, eli.friedman, rengolin Subscribers: john.brawn, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13733 llvm-svn: 250825
* SelectionDAG: Remove implicit ilist iterator conversions, NFCDuncan P. N. Exon Smith2015-10-131-1/+1
| | | | llvm-svn: 250214
* Codegen: Fix llvm.*absdiff semantic.Mohammad Shahid2015-09-241-16/+22
| | | | | | | | Fixes the overflow case of llvm.*absdiff intrinsic also updats the tests and LangRef.rst accordingly. Differential Revision: http://reviews.llvm.org/D11678 llvm-svn: 248483
* propagate fast-math-flags on DAG nodesSanjay Patel2015-09-161-2/+4
| | | | | | | | | | | | | | | | | | | After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is one test case in this patch to prove that point. This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF ( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes. This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the current global settings. Differential Revision: http://reviews.llvm.org/D12095 llvm-svn: 247815
* Add new ISD nodes: ISD::FMINNAN and ISD::FMAXNANJames Molloy2015-08-111-0/+2
| | | | | | | | | | | | | | | | | | The intention of these is to be a corollary to ISD::FMINNUM/FMAXNUM, differing only on how NaNs are treated. FMINNUM returns the non-NaN input (when given one NaN and one non-NaN), FMINNAN returns the NaN input instead. This patch includes support for scalarizing, widening and splitting vectors, but not expansion or softening. The reason is that these should never be needed - FMINNAN nodes are only going to be created in one place (SDAGBuilder::visitSelect) and there we'll check if the node is legal or custom. I could preemptively add expand and soften code, but I'm fairly opposed to adding code I can't test. It's bad enough I can't create tests with this patch, but at least this code will be exercised by the ARM and AArch64 backends fairly shortly. llvm-svn: 244581
* [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute ↵James Molloy2015-07-161-0/+28
| | | | | | | | | | | | | difference operation This adds new intrinsics "*absdiff" for absolute difference ops to facilitate efficient code generation for "sum of absolute differences" operation. The patch also contains the introduction of corresponding SDNodes and basic legalization support.Sanity of the generated code is tested on X86. This is 1st of the three patches. Patch by Shahid Asghar-ahmad! llvm-svn: 242409
* Make TargetLowering::getShiftAmountTy() taking DataLayout as an argumentMehdi Amini2015-07-091-5/+8
| | | | | | | | | | | | | | | | Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11037 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241776
* Make TargetLowering::getPointerTy() taking DataLayout as an argumentMehdi Amini2015-07-091-8/+12
| | | | | | | | | | | | | | | | Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775
* Redirect DataLayout from TargetMachine to Module in SelectionDAGMehdi Amini2015-07-071-2/+2
| | | | | | | | | | | | | | | | | | | | Summary: SelectionDAG itself is not invoking directly the DataLayout in the TargetMachine, but the "TargetLowering" class is still using it. I'll address it in a following commit. This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11000 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241618
* Convert a bunch of loops to foreach. NFC.Pete Cooper2015-06-261-2/+2
| | | | | | This uses the new SDNode::op_values() iterator range committed in r240805. llvm-svn: 240815
* Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)Alexander Kornienko2015-06-231-1/+1
| | | | | | Apparently, the style needs to be agreed upon first. llvm-svn: 240390
* Fixed/added namespace ending comments using clang-tidy. NFCAlexander Kornienko2015-06-191-1/+1
| | | | | | | | | | | | | The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137
* Add SDNodes for umin, umax, smin and smax.James Molloy2015-05-151-0/+4
| | | | | | | | | | | | | This adds new SDNodes for signed/unsigned min/max. These nodes are built from select/icmp pairs matched at SDAGBuilder stage. This patch adds the nodes, as well as legalization support and sets them to be "expand" for all targets. NFC for now; this will be tested when I switch AArch64 to using these new nodes. llvm-svn: 237423
* Masked gather and scatter intrinsics - enabled codegen for KNL.Elena Demikhovsky2015-05-031-2/+6
| | | | llvm-svn: 236394
* Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"Sergey Dmitrouk2015-04-281-25/+28
| | | | | | | | | | | | | | | | | | | | | | | | | [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989
* Revert "[DebugInfo] Add debug locations to constant SD nodes"Daniel Jasper2015-04-281-28/+25
| | | | | | | This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987
* [DebugInfo] Add debug locations to constant SD nodesSergey Dmitrouk2015-04-281-25/+28
| | | | | | | | | | | | | | | | | | | | | | | This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977
* fix typo and 80-col; NFCSanjay Patel2015-03-271-2/+2
| | | | llvm-svn: 233427
* [SDAG] Handle LowerOperation returning its input consistentlyHal Finkel2015-02-241-3/+7
| | | | | | | | | | | | | | | | | | | For almost all node types, if the target requested custom lowering, and LowerOperation returned its input, we'd treat the original node as legal. This did not work, however, for many loads and stores, because they follow slightly different code paths, and we did not account for the possibility of LowerOperation returning its input at those call sites. I think that we now handle this consistently everywhere. At the call sites in LegalizeDAG, we used to assert in this case, so there's no functional change for any existing code there. For the call sites in LegalizeVectorOps, this really only affects whether or not we set Changed = true, but I think makes the semantics clearer. No test case here, but it will be covered by an upcoming PowerPC commit adding QPX support. llvm-svn: 230332
* [SDAG] Use correct alignments on expanded vector trunc-store/ext-loadsHal Finkel2015-02-221-4/+7
| | | | | | | | | | | | | When expanding a truncating store or extending load using vector extracts or inserts and scalar stores and loads, we were giving each of these scalar stores or loads the same alignment as the original vector operation. While this will often be right (most vector operations, especially those produced by autovectorization, have the alignment of the underlying scalar type), the vector operation could certainly have a larger alignment. No test case (yet); noticed by inspection. llvm-svn: 230175
* [SDAG] Don't try to use FP_EXTEND/FP_ROUND for int<->fp promotionsHal Finkel2015-02-121-3/+5
| | | | | | | | | | The PowerPC backend has long promoted some floating-point vector operations (such as select) to integer vector operations. Unfortunately, this behavior was broken by r216555. When using FP_EXTEND/FP_ROUND for promotions, we must check that both the old and new types are floating-point types. Otherwise, we must use BITCAST as we did prior to r216555 for everything. llvm-svn: 228969
* Fixes a bug in vector load legalization that confused bits and bytes.Michael Kuperstein2015-02-041-3/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D7400 llvm-svn: 228168
* [SelectionDAG] Allow targets to specify legality of extloads' resultAhmed Bougacha2015-01-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type (in addition to the memory type). The *LoadExt* legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
* Add minnum / maxnum codegenMatt Arsenault2014-10-211-0/+2
| | | | llvm-svn: 220342
* Teach the AArch64 backend about v4f16 and v8f16Oliver Stannard2014-08-271-6/+17
| | | | | | | | This teaches the AArch64 backend to deal with the operations required to deal with the operations on v4f16 and v8f16 which are exposed by NEON intrinsics, plus the add, sub, mul and div operations. llvm-svn: 216555
* Make sure no loads resulting from load->switch DAGCombine are marked invariantLouis Gerbarg2014-07-311-3/+3
| | | | | | | | | | | | | | Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449
* [x86] Make vector legalization of extloads work more like the "normal"Chandler Carruth2014-07-241-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector operation legalization with support for custom target lowering and fallback to expand when it fails, and use this to implement sext and anyext load lowering for x86 in a more principled way. Previously, the x86 backend relied on a target DAG combine to "combine away" sextload and extload nodes prior to legalization, or would expand them during legalization with terrible code. This is particularly problematic because the DAG combine relies on running over non-canonical DAG nodes at just the right time to match several common and important patterns. It used a combine rather than lowering because we didn't have good lowering support, and to expose some tricks being employed to more combine phases. With this change it becomes a proper lowering operation, the backend marks that it can lower these nodes, and I've added support for handling the canonical forms that don't have direct legal representations such as sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases for this behavior continue to pass even after the DAG combiner beigns running more systematically over every node. There is some noise caused by this in the test suite where we actually use vector extends instead of subregister extraction. This doesn't really seem like the right thing to do, but is unlikely to be a critical regression. We do regress in one case where by lowering to the target-specific patterns early we were able to combine away extraneous legal math nodes. However, this regression is completely addressed by switching to a widening based legalization which is what I'm working toward anyways, so I've just switched the test to that mode. Differential Revision: http://reviews.llvm.org/D4654 llvm-svn: 213897
* AA metadata refactoring (introduce AAMDNodes)Hal Finkel2014-07-241-5/+5
| | | | | | | | | | | | | | | | | | | | In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode*). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode* in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859
* [x86,SDAG] Introduce any- and sign-extend-vector-inreg nodes analogousChandler Carruth2014-07-101-0/+66
| | | | | | | | | | | | | | | | | | | | to the zero-extend-vector-inreg node introduced previously for the same purpose: manage the type legalization of widened extend operations, especially to support the experimental widening mode for x86. I'm adding both because sign-extend is expanded in terms of any-extend with shifts to propagate the sign bit. This removes the last fundamental scalarization from vec_cast2.ll (a test case that hit many really bad edge cases for widening legalization), although the trunc tests in that file still appear scalarized because the the shuffle legalization is scalarizing. Funny thing, I've been working on that. Some initial experiments with this and SSE2 scenarios is showing moderately good behavior already for sign extension. Still some work to do on the shuffle combining on X86 before we're generating optimal sequences, but avoiding scalarization is a huge step forward. llvm-svn: 212714
* Make it possible for ints/floats to return different values from ↵Daniel Sanders2014-07-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getBooleanContents() Summary: On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer comparisons return 0 or 1. Updated the various uses of getBooleanContents. Two simplifications had to be disabled when float and int boolean contents differ: - ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially discoverable (i.e. when the condition of the VSELECT is a SETCC node). - visitVSELECT (select C, 0, 1) -> (xor C, 1). Come to think of it, this one could test for the common case of 'C' being a SETCC too. Preserved existing behaviour for all other targets and updated the affected MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low' variable was counting in the wrong direction because it thought it could simply add the result of the comparison. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4389 llvm-svn: 212697
* [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when wideningChandler Carruth2014-07-091-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610
* [cleanup] Hoist an if-else chain on ISD opcodes (really designed forChandler Carruth2014-07-021-17/+28
| | | | | | | switches) into a switch, and sink them into a dispatch function that can return the result rather than awkward variable setting with breaks. llvm-svn: 212166
OpenPOWER on IntegriCloud