summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove unnecessary copying or replace it with moves in a bunch of places.Benjamin Kramer2014-10-041-1/+2
| | | | | | NFC. llvm-svn: 219061
* Replace dead links to "Hacker's Delight" with general references. NFC.Sanjay Patel2014-09-151-2/+2
| | | | llvm-svn: 217814
* [SDAG] Re-instate r215611 with a fix to a pesky X86 DAG combine.Chandler Carruth2014-08-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This combine is essentially combining target-specific nodes back into target independent nodes that it "knows" will be combined yet again by a target independent DAG combine into a different set of target-independent nodes that are legal (not custom though!) and thus "ok". This seems... deeply flawed. The crux of the problem is that we don't combine un-legalized shuffles that are introduced by legalizing other operations, and thus we don't see a very profitable combine opportunity. So the backend just forces the input to that combine to re-appear. However, for this to work, the conditions detected to re-form the unlegalized nodes must be *exactly* right. Previously, failing this would have caused poor code (if you're lucky) or a crasher when we failed to select instructions. After r215611 we would fall back into the legalizer. In some cases, this just "fixed" the crasher by produces bad code. But in the test case added it caused the legalizer and the dag combiner to iterate forever. The fix is to make the alignment checking in the x86 side of things match the alignment checking in the generic DAG combine exactly. This isn't really a satisfying or principled fix, but it at least make the code work as intended. It also highlights that it would be nice to detect the availability of under aligned loads for a given type rather than bailing on this optimization. I've left a FIXME to document this. Original commit message for r215611 which covers the rest of the chang: [SDAG] Fix a case where we would iteratively legalize a node during combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it *stops* being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 216537
* Revert r215611 because it caused the infinite loop in bug 20736. There is a ↵Nick Lewycky2014-08-231-2/+0
| | | | | | reduced testcase in that bug. llvm-svn: 216307
* [ARM] Enable DP copy, load and store instructions for FPv4-SPOliver Stannard2014-08-211-6/+20
| | | | | | | | | | | | | | | | | The FPv4-SP floating-point unit is generally referred to as single-precision only, but it does have double-precision registers and load, store and GPR<->DPR move instructions which operate on them. This patch enables the use of these registers, the main advantage of which is that we now comply with the AAPCS-VFP calling convention. This partially reverts r209650, which added some AAPCS-VFP support, but did not handle return values or alignment of double arguments in registers. This patch also adds tests for Thumb2 code generation for floating-point instructions and intrinsics, which previously only existed for ARM. llvm-svn: 216172
* Teach the AArch64 backend to handle f16Oliver Stannard2014-08-181-0/+3
| | | | | | | This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv operations on f16 (half-precision) types by promoting to f32. llvm-svn: 215891
* [SDAG] Fix a case where we would iteratively legalize a node duringChandler Carruth2014-08-141-0/+2
| | | | | | | | | | | | | | | | | | combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it *stops* being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 215611
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-041-1/+3
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* [SDAG] Begin simplifying the way in which the legalizer deletes nodes.Chandler Carruth2014-08-011-41/+25
| | | | | | | | | | | | | | | | | | | | This lifts the (very few) places the legalizer would delete dead nodes into the outer loop around the legalizer. This is significantly simpler because it doesn't require the legalizer itself to manage the iterator validity, and it doesn't require the legalizer to be a DAG update listener in order to remove things from the legalized set. It also makes the interface much less contrived for the case of the legalizer running inside the last phase of DAG combining. I'm working on centralizing the deletion of nodes during both legalizing and combining as much as possible. My hope is to remove the need for DAG update listeners from the combiner next, which would remove a costly virtual dispatch chain on every deletion. This in turn should allow us to more aggressively delete DAG nodes during combining which will in turn allow us to combine more aggressively by exposing the actual nodes which have single users to the combine phases. llvm-svn: 214546
* Make sure no loads resulting from load->switch DAGCombine are marked invariantLouis Gerbarg2014-07-311-18/+24
| | | | | | | | | | | | | | Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449
* [SDAG] Add DEBUG logging to the legalizer, fixing a "bug" found byChandler Carruth2014-07-281-3/+19
| | | | | | | | | | | | | | | | | | | | | | | | | inspection in the proccess, and shuffle the logging in the DAG combiner around a bit. With this it is much easier to follow what the legalizer is doing. It should even accurately present most of the strange legalization operations where a single node is replaced by multiple nodes, etc. There is still some information lost (we log SDNodes not SDValues so we don't log which result is used for which thing), but I think this is much closer to a usable system. Notably, this will make it *much* more apparant when legalization is actually happening inside the combiner, or when there is a cycle caused by interactions of the legalizer and the combiner. The "bug" I fixed here I'm not sure is remotely possible to trigger. We were only adding one of the nodes in a replacement to the updated set rather than all of the nodes in the replacement. Realistically, the worst result of this are nodes not getting back onto the worklist in the DAG combiner. I doubt it is possible to trigger this today, and I certainly don't have any ideas about how, but this at least brings the code into alignment with the principled operation of the routine. llvm-svn: 214105
* Add alignment value to allowsUnalignedMemoryAccessMatt Arsenault2014-07-271-8/+12
| | | | | | | | | | Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055
* [SDAG] Add an assert that we don't mess up the number of values whenChandler Carruth2014-07-261-0/+3
| | | | | | | | replacing nodes in the legalizer. This caught a number of bugs for me during development. llvm-svn: 214022
* [SDAG] Simplify the code for handling single-value nodes and addChandler Carruth2014-07-261-8/+12
| | | | | | a missing transfer of debug information (without which tests fail). llvm-svn: 214021
* [SDAG] When performing post-legalize DAG combining, run the legalizerChandler Carruth2014-07-261-57/+93
| | | | | | | | | | | | | | | | | | | | | | over each node in the worklist prior to combining. This allows the combiner to produce new nodes which need to go back through legalization. This is particularly useful when generating operands to target specific nodes in a post-legalize DAG combine where the operands are significantly easier to express as pre-legalized operations. My immediate use case will be PSHUFB formation where we need to build a constant shuffle mask with a build_vector node. This also refactors the relevant functionality in the legalizer to support this, and updates relevant tests. I've spoken to the R600 folks and these changes look like improvements to them. The avx512 change needs to be investigated, I suspect there is a disagreement between the legalizer and the DAG combiner there, but it seems a minor issue so leaving it to be re-evaluated after this patch. Differential Revision: http://reviews.llvm.org/D4564 llvm-svn: 214020
* AA metadata refactoring (introduce AAMDNodes)Hal Finkel2014-07-241-27/+27
| | | | | | | | | | | | | | | | | | | | In order to enable the preservation of noalias function parameter information after inlining, and the representation of block-level __restrict__ pointer information (etc.), additional kinds of aliasing metadata will be introduced. This metadata needs to be carried around in AliasAnalysis::Location objects (and MMOs at the SDAG level), and so we need to generalize the current scheme (which is hard-coded to just one TBAA MDNode*). This commit introduces only the necessary refactoring to allow for the introduction of other aliasing metadata types, but does not actually introduce any (that will come in a follow-up commit). What it does introduce is a new AAMDNodes structure to hold all of the aliasing metadata nodes associated with a particular memory-accessing instruction, and uses that structure instead of the raw MDNode* in AliasAnalysis::Location, etc. No functionality change intended. llvm-svn: 213859
* CodeGen: generate single libcall for fptrunc -> f16 operations.Tim Northover2014-07-171-6/+6
| | | | | | | | | | | | Previously we asserted on this code. Currently compiler-rt doesn't actually implement any of these new libcalls, but external help is pretty much the only viable option for LLVM. I've followed the much more generic "__truncST2" naming, as opposed to the odd name for f32 -> f16 truncation. This can obviously be changed later, or overridden by any targets that need to. llvm-svn: 213252
* CodeGen: extend f16 conversions to permit types > float.Tim Northover2014-07-171-3/+20
| | | | | | | | | | | | | | | | | | | This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248
* ARM: Allow __fp16 as a function arg or return type for AArch64Oliver Stannard2014-07-111-1/+1
| | | | | | | ACLE 2.0 allows __fp16 to be used as a function argument or return type. This enables this for AArch64. llvm-svn: 212812
* SelectionDAG: Factor FP_TO_SINT lower code out of DAGLegalizerJan Vesely2014-07-101-58/+3
| | | | | | | | | | | Move the code to a helper function to allow calls from TypeLegalizer. No functionality change intended Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> Reviewed-by: Owen Anderson <resistor@mac.com> llvm-svn: 212772
* Make it possible for ints/floats to return different values from ↵Daniel Sanders2014-07-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | getBooleanContents() Summary: On MIPS32r6/MIPS64r6, floating point comparisons return 0 or -1 but integer comparisons return 0 or 1. Updated the various uses of getBooleanContents. Two simplifications had to be disabled when float and int boolean contents differ: - ScalarizeVecRes_VSELECT except when the kind of boolean contents is trivially discoverable (i.e. when the condition of the VSELECT is a SETCC node). - visitVSELECT (select C, 0, 1) -> (xor C, 1). Come to think of it, this one could test for the common case of 'C' being a SETCC too. Preserved existing behaviour for all other targets and updated the affected MIPS32r6/MIPS64r6 tests. This also fixes the pi benchmark where the 'low' variable was counting in the wrong direction because it thought it could simply add the result of the comparison. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: hfinkel, jholewinski, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D4389 llvm-svn: 212697
* [DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI.Juergen Ributzka2014-07-011-8/+9
| | | | | | | | The argument list vector is never used after it has been passed to the CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and pretty much as cheap as keeping a pointer to it. llvm-svn: 212135
* SelectionDAG: Expand i64 = FP_TO_SINT i32Tom Stellard2014-06-171-0/+59
| | | | llvm-svn: 211108
* LegalizeDAG: make sure cast is unsigned before using FP_TO_UINT.Tim Northover2014-06-151-1/+4
| | | | | | | | | | | | It's valid to use FP_TO_SINT when asking for a smaller type (e.g. all "unsigned int16" values fit into a "signed int32"), but the reverse isn't true. Unfortunately, I'm not actually aware of any architecture with asymmetric FP_TO_SINT and FP_TO_UINT handling and the logic happens to work in the symmetric case, so I can't actually write a test for this. llvm-svn: 210986
* IR: add "cmpxchg weak" variant to support permitted failure.Tim Northover2014-06-131-8/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903
* SelectionDAG: Expand SELECT_CC to SELECT + SETCCTom Stellard2014-06-101-1/+17
| | | | | | | | This consolidates code from the Hexagon, R600, and XCore targets. No functionality change intended. llvm-svn: 210539
* Fix wrong setcc result type when legalizing uaddo/usuboMatt Arsenault2014-05-281-5/+11
| | | | | | | | | No test because no in-tree targets change the bitwidth of the setcc type depending on the bitwidth of the compared type. Patch by Ke Bai llvm-svn: 209771
* Target: remove old constructors for CallLoweringInfoSaleem Abdulrasool2014-05-171-45/+38
| | | | | | | | | | This is mostly a mechanical change changing all the call sites to the newer chained-function construction pattern. This removes the horrible 15-parameter constructor for the CallLoweringInfo in favour of setting properties of the call via chained functions. No functional change beyond the removal of the old constructors are intended. llvm-svn: 209082
* Use a logical not when inverting SetCC. This unfortunately doesn't fire on ↵Pete Cooper2014-05-121-3/+3
| | | | | | | | | | | | any targets so I couldn't find a test case to trigger it. The problem occurs when a non-i1 setcc is inverted. For example 'i8 = setcc' will get 'xor 0xff' to invert this. This is clearly wrong when the boolean contents are ZeroOrOne. This patch introduces getLogicalNOT and updates SetCC legalisation to use it. Reviewed by Hal Finkel. llvm-svn: 208641
* Implememting named register intrinsicsRenato Golin2014-05-061-0/+7
| | | | | | | | | | | This patch implements the infrastructure to use named register constructs in programs that need access to specific registers (bare metal, kernels, etc). So far, only the stack pointer is supported as a technology preview, but as it is, the intrinsic can already support all non-allocatable registers from any architecture. llvm-svn: 208104
* We already calculate WideVT above, just reuse it.Eric Christopher2014-04-281-2/+1
| | | | | | Patch by Jan Vesely <jan.vesely@rutgers.edu>. llvm-svn: 207455
* Convert SelectionDAG::getNode methods to use ArrayRef<SDValue>.Craig Topper2014-04-261-10/+5
| | | | llvm-svn: 207327
* [C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵Craig Topper2014-04-141-3/+3
| | | | | | instead of comparing to nullptr. llvm-svn: 206142
* SelectionDAG: Use helper function to improve legalization of ISD::MULTom Stellard2014-04-111-0/+17
| | | | | | | | The TargetLowering::expandMUL() helper contains lowering code extracted from the DAGTypeLegalizer and allows the SelectionDAGLegalizer to expand more ISD::MUL patterns without having to use a library call. llvm-svn: 206037
* Add an optional ability to expand larger BUILD_VECTORs with shufflesHal Finkel2014-03-311-20/+117
| | | | | | | | | | | | | | | | | | | | | | | This adds the ability to expand large (meaning with more than two unique defined values) BUILD_VECTOR nodes in terms of SCALAR_TO_VECTOR and (legal) vector shuffles. There is now no limit of the size we are capable of expanding this way, although we don't currently do this for vectors with many unique values because of the default implementation of TLI's shouldExpandBuildVectorWithShuffles function. There is currently no functional change to any existing targets because the new capabilities are not used unless some target overrides the TLI shouldExpandBuildVectorWithShuffles function. As a result, I've not included a test case for the new functionality in this commit, but regression tests will (at least) be added soon when I commit support for the PPC QPX vector instruction set. The benefit of committing this now is that it makes the shouldExpandBuildVectorWithShuffles callback, which had to be added for other reasons regardless, fully functional. I suspect that other targets will also benefit from tuning the heuristic. llvm-svn: 205243
* Add a TLI hook to control when BUILD_VECTOR might be expanded using shufflesHal Finkel2014-03-311-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two general methods for expanding a BUILD_VECTOR node: 1. Use SCALAR_TO_VECTOR on the defined scalar values and then shuffle them together. 2. Build the vector on the stack and then load it. Currently, we use a fixed heuristic: If there are only one or two unique defined values, then we attempt an expansion in terms of SCALAR_TO_VECTOR and vector shuffles (provided that the required shuffle mask is legal). Otherwise, always expand via the stack. Even when SCALAR_TO_VECTOR is not legal, this can still be a good idea depending on what tricks the target can play when lowering the resulting shuffle. If the target can't do anything special, however, and if SCALAR_TO_VECTOR is expanded via the stack, this heuristic leads to sub-optimal code (two stack loads instead of one). Because only the target knows whether the SCALAR_TO_VECTORs and shuffles for a build vector of a particular type are likely to be optimial, this adds a new TLI function: shouldExpandBuildVectorWithShuffles which takes the vector type and the count of unique defined values. If this function returns true, then method (1) will be used, subject to the constraint that all of the necessary shuffles are legal (as determined by isShuffleMaskLegal). If this function returns false, then method (2) is always used. This commit does not enhance the current code to support expanding a build_vector with more than two unique values using shuffles, but I'll commit an implementation of the more-general case shortly. llvm-svn: 205230
* Make use of previously generated stores in ↵Hal Finkel2014-03-301-4/+33
| | | | | | | | | | | | | | | | | SelectionDAGLegalize::ExpandExtractFromVectorThroughStack When expanding EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR using SelectionDAGLegalize::ExpandExtractFromVectorThroughStack, we store the entire vector and then load the piece we want. This is fine in isolation, but generating a new store (and corresponding stack slot) for each extraction ends up producing code of poor quality. When we scalarize a vector operation (using SelectionDAG::UnrollVectorOp for example) we generate one EXTRACT_VECTOR_ELT for each element in the vector. This used to generate one stored copy of the vector for each element in the vector. Now we search the uses of the vector for a suitable store before generating a new one, which results in much more efficient scalarization code. llvm-svn: 205153
* SelectionDAG: Allow promotion of SELECT nodes from float to int typesTom Stellard2014-03-241-1/+2
| | | | | | | | And vice-versa, as long as the types are the same width. There are a few R600 tests that will cover this. llvm-svn: 204616
* IR: add a second ordering operand to cmpxhg for failureTim Northover2014-03-111-0/+1
| | | | | | | | | | | | | | | The syntax for "cmpxchg" should now look something like: cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic where the second ordering argument gives the required semantics in the case that no exchange takes place. It should be no stronger than the first ordering constraint and cannot be either "release" or "acq_rel" (since no store will have taken place). rdar://problem/15996804 llvm-svn: 203559
* Fix non 2-space indentation.Matt Arsenault2014-03-111-73/+73
| | | | llvm-svn: 203514
* [C++11] Add 'override' keyword to virtual methods that override their base ↵Craig Topper2014-03-081-2/+2
| | | | | | class. llvm-svn: 203339
* [Layering] Move DebugInfo.h into the IR library where its implementationChandler Carruth2014-03-061-1/+1
| | | | | | already lives. llvm-svn: 203046
* Pass address space to allowsUnalignedMemoryAccessesMatt Arsenault2014-02-051-7/+15
| | | | llvm-svn: 200888
* Add support for legalizing SETNE/SETEQ by inverting the condition code and ↵Daniel Sanders2013-11-211-14/+54
| | | | | | | | | | | | | | | | | | | | | | | | | the result of the comparison. Summary: LegalizeSetCCCondCode can now legalize SETEQ and SETNE by returning the inverse condition and requesting that the caller invert the result of the condition. The caller of LegalizeSetCCCondCode must handle the inverted CC, and they do so as follows: SETCC, BR_CC: Invert the result of the SETCC with SelectionDAG::getNOT() SELECT_CC: Swap the true/false operands. This is necessary for MSA which lacks an integer SETNE instruction. Reviewers: resistor CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2229 llvm-svn: 195355
* long lines and white space correctionJack Carter2013-11-191-29/+34
| | | | llvm-svn: 195170
* Use more getZExtOrTruncsMatt Arsenault2013-11-171-5/+1
| | | | llvm-svn: 194945
* Use getZExtOrTrunc instead of repeating the same logic.Matt Arsenault2013-11-171-5/+1
| | | | llvm-svn: 194944
* Fix CodeGen for unaligned loads with address spacesMatt Arsenault2013-10-301-2/+5
| | | | llvm-svn: 193721
* Keep TBAA info when rewriting SelectionDAG loads and storesRichard Sandiford2013-10-281-37/+42
| | | | | | | | | | | | | | | | | Most SelectionDAG code drops the TBAA info when creating a new form of a load and store (e.g. during legalization, or when converting a plain load to an extending one). This patch tries to catch all cases where the TBAA information can legitimately be carried over. The patch adds alternative forms of getLoad() and getExtLoad() that take a MachineMemOperand instead of individual fields. (The corresponding getTruncStore() already exists.) The idea is to use the MachineMemOperand forms when all fields are carried over (size, pointer info, isVolatile, isNonTemporal, alignment and TBAA info). If some adjustment is being made, e.g. to narrow the load, then we still pass the individual fields but also pass the TBAA info. llvm-svn: 193517
* LegalizeDAG: allow libcalls for max/min atomic operationsTim Northover2013-10-251-0/+40
| | | | | | | | | | | ARM processors without ldrex/strex need to be able to make libcalls for all atomic operations, including the newer min/max versions. The alternative would probably be expanding these operations in terms of cmpxchg (as x86 does always), but in the configurations where this matters code-size tends to be paramount so the libcall is more desirable. llvm-svn: 193398
OpenPOWER on IntegriCloud