summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Masked gather and scatter intrinsics - enabled codegen for KNL.Elena Demikhovsky2015-05-031-2/+6
| | | | llvm-svn: 236394
* Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"Sergey Dmitrouk2015-04-281-25/+28
| | | | | | | | | | | | | | | | | | | | | | | | | [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989
* Revert "[DebugInfo] Add debug locations to constant SD nodes"Daniel Jasper2015-04-281-28/+25
| | | | | | | This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987
* [DebugInfo] Add debug locations to constant SD nodesSergey Dmitrouk2015-04-281-25/+28
| | | | | | | | | | | | | | | | | | | | | | | This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977
* fix typo and 80-col; NFCSanjay Patel2015-03-271-2/+2
| | | | llvm-svn: 233427
* [SDAG] Handle LowerOperation returning its input consistentlyHal Finkel2015-02-241-3/+7
| | | | | | | | | | | | | | | | | | | For almost all node types, if the target requested custom lowering, and LowerOperation returned its input, we'd treat the original node as legal. This did not work, however, for many loads and stores, because they follow slightly different code paths, and we did not account for the possibility of LowerOperation returning its input at those call sites. I think that we now handle this consistently everywhere. At the call sites in LegalizeDAG, we used to assert in this case, so there's no functional change for any existing code there. For the call sites in LegalizeVectorOps, this really only affects whether or not we set Changed = true, but I think makes the semantics clearer. No test case here, but it will be covered by an upcoming PowerPC commit adding QPX support. llvm-svn: 230332
* [SDAG] Use correct alignments on expanded vector trunc-store/ext-loadsHal Finkel2015-02-221-4/+7
| | | | | | | | | | | | | When expanding a truncating store or extending load using vector extracts or inserts and scalar stores and loads, we were giving each of these scalar stores or loads the same alignment as the original vector operation. While this will often be right (most vector operations, especially those produced by autovectorization, have the alignment of the underlying scalar type), the vector operation could certainly have a larger alignment. No test case (yet); noticed by inspection. llvm-svn: 230175
* [SDAG] Don't try to use FP_EXTEND/FP_ROUND for int<->fp promotionsHal Finkel2015-02-121-3/+5
| | | | | | | | | | The PowerPC backend has long promoted some floating-point vector operations (such as select) to integer vector operations. Unfortunately, this behavior was broken by r216555. When using FP_EXTEND/FP_ROUND for promotions, we must check that both the old and new types are floating-point types. Otherwise, we must use BITCAST as we did prior to r216555 for everything. llvm-svn: 228969
* Fixes a bug in vector load legalization that confused bits and bytes.Michael Kuperstein2015-02-041-3/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D7400 llvm-svn: 228168
* [SelectionDAG] Allow targets to specify legality of extloads' resultAhmed Bougacha2015-01-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type (in addition to the memory type). The *LoadExt* legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
* Add minnum / maxnum codegenMatt Arsenault2014-10-211-0/+2
| | | | llvm-svn: 220342
* Teach the AArch64 backend about v4f16 and v8f16Oliver Stannard2014-08-271-6/+17
| | | | | | | | This teaches the AArch64 backend to deal with the operations required to deal with the operations on v4f16 and v8f16 which are exposed by NEON intrinsics, plus the add, sub, mul and div operations. llvm-svn: 216555
* Make sure no loads resulting from load->switch DAGCombine are marked invariantLouis Gerbarg2014-07-311-3/+3
| | | | | | | | | | | | | | Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449
* [x86] Make vector legalization of extloads work more like the "normal"Chandler Carruth2014-07-241-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector operation legalization with support for custom target lowering and fallback to expand when it fails, and use this to implement sext and anyext load lowering for x86 in a more principled way. Previously, the x86 backend relied on a target DAG combine to "combine away" sextload and extload nodes prior to legalization, or would expand them during legalization with terrible code. This is particularly problematic because the DAG combine relies on running over non-canonical DAG nodes at just the right time to match several common and important patterns. It used a combine rather than lowering because we didn't have good lowering support, and to expose some tricks being employed to more combine phases. With this change it becomes a proper lowering operation, the backend marks that it can lower these nodes, and I've added support for handling the canonical forms that don't have direct legal representations such as sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases for this behavior continue to pass even after the DAG combiner beigns running more systematically over every node. There is some noise caused by this in the test suite where we actually use vector extends instead of subregister extraction. This doesn't really seem like the right thing to do, but is unlikely to be a critical regression. We do regress in one case where by lowering to the target-specific patterns early we were able to combine away extraneous legal math nodes. However, this regression is completely addressed by switching to a widening based legalization which is what I'm working toward anyways, so I've just switched the test to that mode. Differential Revision: http://reviews.llvm.org/D4654 llvm-svn: 213897
* AA metadata refactoring (introduce AAMDNodes)Hal Finkel2014-07-241-5/+5
| | | | | | | | | | | | | | | | | | | | In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode*). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode* in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859
* [x86,SDAG] Introduce any- and sign-extend-vector-inreg nodes analogousChandler Carruth2014-07-101-0/+66
| | | | | | | | | | | | | | | | | | | | to the zero-extend-vector-inreg node introduced previously for the same purpose: manage the type legalization of widened extend operations, especially to support the experimental widening mode for x86. I'm adding both because sign-extend is expanded in terms of any-extend with shifts to propagate the sign bit. This removes the last fundamental scalarization from vec_cast2.ll (a test case that hit many really bad edge cases for widening legalization), although the trunc tests in that file still appear scalarized because the the shuffle legalization is scalarizing. Funny thing, I've been working on that. Some initial experiments with this and SSE2 scenarios is showing moderately good behavior already for sign extension. Still some work to do on the shuffle combining on X86 before we're generating optimal sequences, but avoiding scalarization is a huge step forward. llvm-svn: 212714
* Make it possible for ints/floats to return different values from ↵Daniel Sanders2014-07-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getBooleanContents() Summary: On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer comparisons return 0 or 1. Updated the various uses of getBooleanContents. Two simplifications had to be disabled when float and int boolean contents differ: - ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially discoverable (i.e. when the condition of the VSELECT is a SETCC node). - visitVSELECT (select C, 0, 1) -> (xor C, 1). Come to think of it, this one could test for the common case of 'C' being a SETCC too. Preserved existing behaviour for all other targets and updated the affected MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low' variable was counting in the wrong direction because it thought it could simply add the result of the comparison. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4389 llvm-svn: 212697
* [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when wideningChandler Carruth2014-07-091-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vector types to be legal and a ZERO_EXTEND node is encountered. When we use widening to legalize vector types, extend nodes are a real challenge. Either the input or output is likely to be legal, but in many cases not both. As a consequence, we don't really have any way to represent this situation and the prior code in the widening legalization framework would just scalarize the extend operation completely. This patch introduces a new DAG node to represent doing a zero extend of a vector "in register". The core of the idea is to allow legal but different vector types in the input and output. The output vector must have fewer lanes but wider elements. The operation is defined to zero extend the low elements of the input to the size of the output elements, and drop all of the high elements which don't have a corresponding lane in the output vector. It also includes generic expansion of this node in terms of blending a zero vector into the high elements of the vector and bitcasting across. This in turn yields extremely nice code for x86 SSE2 when we use the new widening legalization logic in conjunction with the new shuffle lowering logic. There is still more to do here. We need to support sign extension, any extension, and potentially int-to-float conversions. My current plan is to continue using similar synthetic nodes to model each of these transitions with generic lowering code for each one. However, with this patch LLVM already reaches performance parity with GCC for the core C loops of the x264 code (assuming you disable the hand-written assembly versions) when compiling for SSE2 and SSE3 architectures and enabling the new widening and lowering logic for vectors. Differential Revision: http://reviews.llvm.org/D4405 llvm-svn: 212610
* [cleanup] Hoist an if-else chain on ISD opcodes (really designed forChandler Carruth2014-07-021-17/+28
| | | | | | | switches) into a switch, and sink them into a dispatch function that can return the result rather than awkward variable setting with breaks. llvm-svn: 212166
* [cleanup] Remove dead 'break;' statements that I meant to nuke inChandler Carruth2014-07-021-2/+0
| | | | | | | | r212158 but missed. Thanks to Craig for spotting the goof! llvm-svn: 212159
* [cleanup] Hoist the promotion dispatch logic into the promote functionChandler Carruth2014-07-021-23/+22
| | | | | | | so that we can use return to express it more cleanly and avoid so many nested switch statements. llvm-svn: 212158
* [cleanup] Nuke the 'VectorOp' bit of the promote method names.Chandler Carruth2014-07-021-9/+9
| | | | | | | This doesn't add any information for methods in the VectorLegalizer class that clearly take SDAG operations to legalize. llvm-svn: 212157
* [x86] Clean up and modernize the doxygen and API comments for the vectorChandler Carruth2014-07-021-24/+38
| | | | | | operation legalization code. llvm-svn: 212155
* SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the ↵Benjamin Kramer2014-05-191-0/+27
| | | | | | | | | | bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123
* Convert more SelectionDAG functions to use ArrayRef.Craig Topper2014-04-281-2/+1
| | | | llvm-svn: 207397
* Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.Craig Topper2014-04-261-10/+7
| | | | llvm-svn: 207327
* [VectorLegalizer/X86] Don't unvectorize fp_to_uint for v8f32->v8i16Adam Nemet2014-03-171-7/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get promoted to fp_to_sint v8f32->v8i32. This is a legal operation on AVX. For that to work properly, we also need to teach the legalizer about the specific promotion required here. The default vector promotion uses bitcasting to a vector type of the same total size. We want to promote the vector element type, effectively widening the operation and then truncating the result. This is analogous to the current logic of how int_to_fp is promoted. The change also factors out some code from the int_to_fp promotion code to ValueType::widenIntegerVectorElementType. This is now shared between int_to_fp and fp_to_int. There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in X86. It can now go through the new target-independent fp_to_*int promotion logic. I also checked that no other target uses Promote for these ops yet, so there shouldn't be any unexpected change in behavior. Fixes <rdar://problem/16202247> llvm-svn: 204058
* [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.Benjamin Kramer2014-03-021-2/+2
| | | | | | Remove the old functions. llvm-svn: 202636
* Expand vector bswap in LegalizeVectorOpsHal Finkel2014-02-031-0/+1
| | | | | | | ISD::BSWAP was missing from the list of node types that should be expanded element-wise. llvm-svn: 200705
* Keep TBAA info when rewriting SelectionDAG loads and storesRichard Sandiford2013-10-281-4/+7
| | | | | | | | | | | | | | | | | Most SelectionDAG code drops the TBAA info when creating a new form of a load and store (e.g. during legalization, or when converting a plain load to an extending one). This patch tries to catch all cases where the TBAA information can legitimately be carried over. The patch adds alternative forms of getLoad() and getExtLoad() that take a MachineMemOperand instead of individual fields. (The corresponding getTruncStore() already exists.) The idea is to use the MachineMemOperand forms when all fields are carried over (size, pointer info, isVolatile, isNonTemporal, alignment and TBAA info). If some adjustment is being made, e.g. to narrow the load, then we still pass the individual fields but also pass the TBAA info. llvm-svn: 193517
* Remove pointless assertion after r190376Matt Arsenault2013-09-121-2/+0
| | | | llvm-svn: 190565
* Don't use getSetCCResultType for creating a vselectMatt Arsenault2013-09-101-2/+1
| | | | | | | | | | | | | | | | | | | The vselect mask isn't a setcc. This breaks in the case when the result of getSetCCResultType is larger than the vector operands e.g. %tmp = select i1 %cmp <2 x i8> %a, <2 x i8> %b when getSetCCResultType returns <2 x i32>, the assertion that the (MaskTy.getSizeInBits() == Op1.getValueType().getSizeInBits()) is hit. No test since I don't think I can hit this with any of the current targets. The R600/SI implementation would break, since it returns a vector of i1 for this, but it doesn't reach ExpandSELECT for other reasons. llvm-svn: 190376
* SelectionDAG: Remove unnecessary uses of TargetLowering::getPointerTy()Tom Stellard2013-08-261-3/+3
| | | | | | | | | | | | If we have a binary operation like ISD:ADD, we can set the result type equal to the result type of one of its operands rather than using TargetLowering::getPointerTy(). Also, any use of DAG.getIntPtrConstant(C) as an operand for a binary operation can be replaced with: DAG.getConstant(C, OtherOperand.getValueType()); llvm-svn: 189227
* SelectionDAG: Make sure stores are always added to the LegalizedNodes listTom Stellard2013-08-211-1/+1
| | | | | | | | | | | | | | | | When truncated vector stores were being custom lowered in VectorLegalizer::LegalizeOp(), the old (illegal) and new (legal) node pair was not being added to LegalizedNodes list. Instead of the legalized result being passed to VectorLegalizer::TranslateLegalizeResult(), the result was being passed back into VectorLegalizer::LegalizeOp(), which ended up adding a (new, new) pair to the list instead. This was causing an assertion failure when a custom lowered truncated vector store was the last instruction a basic block and the VectorLegalizer was unable to find it in the LegalizedNodes list when updating the DAG root. llvm-svn: 188953
* Add a llvm.copysign intrinsicHal Finkel2013-08-191-0/+1
| | | | | | | | | | | | | | | | | | | | | This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728
* Add ISD::FROUND for libm round()Hal Finkel2013-08-071-0/+1
| | | | | | | | | | | | | | | All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding ISD::FROUND so that round() can be custom lowered as well. For the most part, this is straightforward. I've added an intrinsic and a matching ISD node just like those for nearbyint() and friends. The SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed fround). This will be used by the PowerPC backend in a follow-up commit. llvm-svn: 187926
* TargetLowering: Add getVectorIdxTy() function v2Tom Stellard2013-08-051-3/+3
| | | | | | | | | | | | | | | | | | | | | This virtual function can be implemented by targets to specify the type to use for the index operand of INSERT_VECTOR_ELT, EXTRACT_VECTOR_ELT, INSERT_SUBVECTOR, EXTRACT_SUBVECTOR. The default implementation returns the result from TargetLowering::getPointerTy() The previous code was using TargetLowering::getPointerTy() for vector indices, because this is guaranteed to be legal on all targets. However, using TargetLowering::getPointerTy() can be a problem for targets with pointer sizes that differ across address spaces. On such targets, when vectors need to be loaded or stored to an address space other than the default 'zero' address space (which is the address space assumed by TargetLowering::getPointerTy()), having an index that is a different size than the pointer can lead to inefficient pointer calculations, (e.g. 64-bit adds for a 32-bit address space). There is no intended functionality change with this patch. llvm-svn: 187748
* Remove trailing whitespace from SelectionDAG/*.cppStephen Lin2013-07-081-1/+1
| | | | llvm-svn: 185780
* Introduce getSelect usage and use more getSelectCCMatt Arsenault2013-06-141-5/+5
| | | | llvm-svn: 184012
* Remove double semicolons.Benjamin Kramer2013-05-281-9/+9
| | | | llvm-svn: 182778
* Track IR ordering of SelectionDAG nodes 2/4.Andrew Trick2013-05-251-10/+10
| | | | | | | Change SelectionDAG::getXXXNode() interfaces as well as call sites of these functions to pass in SDLoc instead of DebugLoc. llvm-svn: 182703
* Add LLVMContext argument to getSetCCResultTypeMatt Arsenault2013-05-181-2/+3
| | | | llvm-svn: 182180
* Fix vselect when getSetCCResultType returns a different type from the operandsMatt Arsenault2013-05-071-3/+8
| | | | llvm-svn: 181348
* SelectionDAG compile time improvement.Nadav Rotem2013-02-221-0/+19
| | | | | | | | One of the phases of SelectionDAG is LegalizeVectors. We don't need to sort the DAG and copy nodes around if there are no vector ops. Speeds up the compilation time of SelectionDAG on a big scalar workload by ~8%. llvm-svn: 175929
* Fix PR15267Michael Liao2013-02-201-14/+119
| | | | | | | | | - When extloading from a vector with non-byte-addressable element, e.g. <4 x i1>, the current logic breaks. Extend the current logic to fix the case where the element type is not byte-addressable by loading all bytes, bit-extracting/packing each element. llvm-svn: 175642
* This patch aims to reduce compile time in LegalizeTypes by using SmallDenseMap,Preston Gurd2013-01-251-1/+1
| | | | | | | | | | | | | with an initial number of elements, instead of DenseMap, which has zero initial elements, in order to avoid the copying of elements when the size changes and to avoid allocating space every time LegalizeTypes is run. This patch will not affect the memory footprint, because DenseMap will increase the element size to 64 when the first element is added. Patch by Wan Xiaofei. llvm-svn: 173448
* When lowering an inreg sext first shift left, then right arithmetically.Benjamin Kramer2013-01-121-3/+3
| | | | | | | Shifting right two times will only yield zero. Should fix SingleSource/UnitTests/SignlessTypes/factor. llvm-svn: 172322
* PPC: Implement efficient lowering of sign_extend_inreg.Nadav Rotem2013-01-111-1/+25
| | | | llvm-svn: 172269
* Change TargetLowering::getTypeToPromoteTo to take and return MVTs,Patrik Hagglund2012-12-191-2/+2
| | | | | | instead of EVTs. llvm-svn: 170529
* Change TargetLowering::getTruncStoreAction to take MVTs, instead of EVTs.Patrik Hagglund2012-12-191-2/+2
| | | | llvm-svn: 170510
OpenPOWER on IntegriCloud