summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
* [CodeGen] Fix remaining zext() assertions in SelectionDAGScott Linder2018-09-042-16/+14
| | | | | | | | Fix remaining cases not committed in https://reviews.llvm.org/D49574 Differential Revision: https://reviews.llvm.org/D50659 llvm-svn: 341380
* DAG: Handle extract_vector_elt in isKnownNeverNaNMatt Arsenault2018-09-031-0/+3
| | | | llvm-svn: 341317
* [DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle inverted patternRoman Lebedev2018-09-021-4/+18
| | | | | | | | | | | | | | | | | | | | | | Summary: A follow-up for D49266 / rL337166 + D49497 / rL338044. This is still the same pattern to check for the [lack of] signed truncation, but in this case the constants and the predicate are negated. https://rise4fun.com/Alive/BDV https://rise4fun.com/Alive/n7Z Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma, dmgreen Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51532 llvm-svn: 341287
* [DAGCombiner] Fix bad identation. NFCCraig Topper2018-08-301-1/+1
| | | | llvm-svn: 341103
* [NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysisNicolai Haehnle2018-08-302-2/+2
| | | | | | | | | | | | | | | | | | | | Summary: This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433). The purpose of this patch is to free up the name DivergenceAnalysis for the new generic implementation. The generic implementation class will be shared by specialized divergence analysis classes. Patch by: Simon Moll Reviewed By: nhaehnle Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50434 Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb llvm-svn: 341071
* DAG: Don't use ABI copies in some contextsMatt Arsenault2018-08-301-2/+3
| | | | | | | | | | | | | If an ABI-like value is used in a different block, the type split used is not necessarily the same as the call's ABI. The value is used through an intermediate copy virtual registers from the other block. This resulted in copies with inconsistent sizes later. Fixes regressions since r338197 when AMDGPU started splitting vector types for calls. llvm-svn: 341018
* [DAGCombiner] Add X / X -> 1 & X % X -> 0 foldsSimon Pilgrim2018-08-291-1/+18
| | | | | | | | Adds more divrem folds to try and get in sync with InstructionSimplify Differential Revision: https://reviews.llvm.org/D50636 llvm-svn: 340919
* [X86] Support v2i32 gather/scatter indices with ↵Craig Topper2018-08-293-21/+46
| | | | | | | | | | | | | | | | -x86-experimental-vector-widening-legalization Summary: This is split out from D41062 to cover the code in LegalVectorTypes.cpp Reviewers: RKSimon, spatel, efriedma Reviewed By: efriedma Subscribers: sdardis, jvesely, nhaehnle, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D51337 llvm-svn: 340891
* [DAGCombine] Rework MERGE_VALUES to inline in single pass. NFCI.Nirav Dave2018-08-281-1/+4
| | | | | | | Avoid hyperlinear cost of inlining MERGE_VALUE node by constructing temporary vector and doing a single replacement. llvm-svn: 340853
* [DAG] Avoid recomputing Divergence checks. NFCI.Nirav Dave2018-08-281-6/+10
| | | | | | | When making multiple updates to the same SDNode, recompute node divergence only once after all changes have been made. llvm-svn: 340852
* [DAG] Fix updateDivergence calculationNirav Dave2018-08-281-1/+1
| | | | | | | Check correct SDNode when deciding if we should update the divergence property. llvm-svn: 340851
* [DAGCombiner][AMDGPU][Mips] Fold bitcast with volatile loads if the ↵Craig Topper2018-08-281-3/+12
| | | | | | | | | | | | | | | | | | | resulting load is legal for the target. Summary: I'm not sure if this patch is correct or if it needs more qualifying somehow. Bitcast shouldn't change the size of the load so it should be ok? We already do something similar for stores. We'll change the type of a volatile store if the resulting store is Legal or Custom. I'm not sure we should be allowing Custom there... I was playing around with converting X86 atomic loads/stores(except seq_cst) into regular volatile loads and stores during lowering. This would allow some special RMW isel patterns in X86InstrCompiler.td to be removed. But there's some floating point patterns in there that didn't work because we don't fold (f64 (bitconvert (i64 volatile load))) or (f32 (bitconvert (i32 volatile load))). Reviewers: efriedma, atanasyan, arsenm Reviewed By: efriedma Subscribers: jvesely, arsenm, sdardis, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, arichardson, jrtc27, atanasyan, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50491 llvm-svn: 340797
* DAG: Check transformed type for forming fminnum/fmaxnum from vselectMatt Arsenault2018-08-271-2/+3
| | | | | | Follow up to r340655 to fix vector types which are split. llvm-svn: 340766
* [SelectionDAG] add helper query for binops; NFCSanjay Patel2018-08-271-11/+2
| | | | | | We will also use this in a planned enhancement for vector insertelement. llvm-svn: 340741
* [SelectionDAG][x86] turn insertelement into undef with variable index into splatSanjay Patel2018-08-261-3/+10
| | | | | | | | | | | | | | | | | | I noticed this along with the patterns in D51125, but when the index is variable, we don't convert insertelement into a build_vector. For x86, that means these get expanded at legalization time into the loading/spilling code that we see in the tests. I think it's always better to avoid going to memory on these, and we get the optimal 'broadcast' if it's available. I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have regression tests that would be affected (although I did not check what would happen in those cases). In the most basic cases shown here, AArch64 would probably do much better with a splat. Differential Revision: https://reviews.llvm.org/D51186 llvm-svn: 340705
* [IR] Replace `isa<TerminatorInst>` with `isTerminator()`.Chandler Carruth2018-08-263-6/+6
| | | | | | | | | | | | This is a bit awkward in a handful of places where we didn't even have an instruction and now we have to see if we can build one. But on the whole, this seems like a win and at worst a reasonable cost for removing `TerminatorInst`. All of this is part of the removal of `TerminatorInst` from the `Instruction` type hierarchy. llvm-svn: 340701
* [SelectionDAG][X86] Reorder the operands the MaskedStoreSDNode to put the ↵Craig Topper2018-08-253-35/+18
| | | | | | | | | | | | | | | | | | | | | value first. Summary: Previously the value being stored is the last operand in SDNode. This causes the type legalizer to visit the mask operand before the value operand. The type legalizer was more complicated because of this since we want the type of the value to drive the decisions. This patch moves the value to be the first operand so we visit it first during type legalization. It also simplifies the type legalization code accordingly. X86 is currently the only in tree target that uses this SDNode. Not sure if there are any users out of tree. Reviewers: RKSimon, delena, hfinkel, eli.friedman Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50402 llvm-svn: 340689
* DAG: Allow matching fminnum/fmaxnum from vselectMatt Arsenault2018-08-241-8/+27
| | | | llvm-svn: 340655
* [DAGCombiner][Mips] Don't combine bitcast+store after LegalOperations when ↵Craig Topper2018-08-241-1/+1
| | | | | | | | | | | | | | the store is volatile, if the resulting store isn't Legal Previously we allowed the store to be Custom. But without knowing for sure that the Custom handling won't split the store, we shouldn't convert a volatile store. We also probably shouldn't be creating a store the requires custom handling after LegalizeOps. This could lead to an infinite loop if the custom handling was to insert a bitcast. Though I guess isStoreBitCastBeneficial could be used to block such a loop. The test changes here are due to the volatile part of this. The stores in the test are all volatile and i32 stores are marked custom, So we are no longer converting them This is related to D50491 where I was trying to allow some bitcasting of volatile loads Differential Revision: https://reviews.llvm.org/D50578 llvm-svn: 340626
* [SDAG] Add versions of computeKnownBits that return a valueJustin Bogner2018-08-241-93/+81
| | | | | | | | | | | Having the KnownBits as an output parameter is kind of awkward to use and a holdover from when it was two separate APInts. Instead, just return a KnownBits object. I'm leaving the existing interface in place for now, since updating the callers all at once would be thousands of lines of diff. llvm-svn: 340594
* [SelectionDAG] unroll unsupported vector FP ops earlier to avoid libcalls on ↵Sanjay Patel2018-08-221-7/+26
| | | | | | | | | | | | | | | | undef elements (PR38527) This solves the motivating case from: https://bugs.llvm.org/show_bug.cgi?id=38527 If we are legalizing an FP vector op that maps to 1 of the LLVM intrinsics that mimic libm calls, but we're going to end up with scalar libcalls for that vector type anyway, then we should unroll the vector op into scalars before widening. This avoids libcalls because we've lost the knowledge that some of the scalar elements are undef. Differential Revision: https://reviews.llvm.org/D50791 llvm-svn: 340469
* [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.Eli Friedman2018-08-221-0/+15
| | | | | | | | | | The inline sequence is very long (about 70 bytes on Thumb1), so it's not really a good idea to inline it, especially when optimizing for size. Differential Revision: https://reviews.llvm.org/D47917 llvm-svn: 340458
* [WebAssembly] Don't make wasm cleanuppads into funclet entriesHeejin Ahn2018-08-211-3/+8
| | | | | | | | | | | | | | | Summary: Catchpads and cleanuppads are not funclet entries; they are only EH scope entries. We already dont't set `isEHFuncletEntry` for catchpads. This patch does the same thing for cleanuppads. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D50654 llvm-svn: 340330
* [DAGCombiner] Reduce load widths of shifted masksSam Parker2018-08-211-8/+41
| | | | | | | | | | | During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 340261
* [TargetLowering] Add BuildSDiv support for division by one or negone.Simon Pilgrim2018-08-211-15/+27
| | | | | | This reduces most of the sdiv stages (the MULHS, shifts etc.) to just zero/identity values and use the numerator scale factor to multiply by +1/-1. llvm-svn: 340260
* [FPEnv] Support constrained FREM intrinsicCameron McInally2018-08-203-0/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D50975 llvm-svn: 340201
* [TargetLowering] Disable BuildSDiv division by one or negone.Simon Pilgrim2018-08-201-1/+2
| | | | | | Fuzz tests have detected an issue, currently working on a fix. llvm-svn: 340195
* [SelectionDAG] Reuse the Op's VT. NFCI.Simon Pilgrim2018-08-201-2/+2
| | | | llvm-svn: 340173
* [SelectionDAG] Add partial sign-bit support to ComputeNumSignBits for ↵Simon Pilgrim2018-08-201-2/+16
| | | | | | | | | | BITCAST nodes Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts. Handle the case where the sign bit extends to only part of the small elements. llvm-svn: 340169
* [SelectionDAG] Add basic demanded elements support to ComputeNumSignBits for ↵Simon Pilgrim2018-08-191-1/+8
| | | | | | | | | | BITCAST nodes Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts. The next step would be to support cases where the large elements aren't all sign bits, and determine the small element equivalent based on the demanded elements. llvm-svn: 340143
* [DebugInfo] In FastISel, convert llvm.dbg.label to DBG_LABEL MI.Hsiangkai Wang2018-08-181-0/+12
| | | | | | | | Convert llvm.dbg.label(!label_metadata) to DBG_LABEL !label_metadata. Differential Revision: https://reviews.llvm.org/D50622 llvm-svn: 340122
* [DAGCombiner] Allow divide by constant optimization on opaque constants.Craig Topper2018-08-181-2/+2
| | | | | | | | | | | | | | | | | Summary: I believe this restores the behavior we had before r339147. Fixes PR38622. Reviewers: RKSimon, chandlerc, spatel Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50936 llvm-svn: 340120
* DAG: Fix isKnownNeverNaN for basic non-sNaN casesMatt Arsenault2018-08-171-12/+7
| | | | | | | fadd/fsub/fmul need to worry about infinities as well as fdiv. llvm-svn: 340085
* [DAGCombiner] extractShiftForRotate - fix out of range shift issueSimon Pilgrim2018-08-171-2/+2
| | | | | | | | | Don't just check for negative shift amounts. Fixes OSS Fuzz #9935 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9935 llvm-svn: 340015
* [DAGCombine] Improve (sra (sra x, c1), c2) -> (sra x, (add c1, c2)) foldingSimon Pilgrim2018-08-171-17/+16
| | | | | | | | Add support for cases where only some c1+c2 results exceed the max bitshift, clamping accordingly. Differential Revision: https://reviews.llvm.org/D35722 llvm-svn: 340010
* [MISC]Fix wrong usage of std::equal()Chen Zheng2018-08-171-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D49958 llvm-svn: 340000
* [DAGCombiner] Don't reassociate operations that have the vector reduction ↵Craig Topper2018-08-161-9/+13
| | | | | | | | | | | | | | flag set. When nodes are reassociated the vector-reduction flag gets lost. The test case is here is what would happen if you had a sum of absolute differences loop that started with a non-zero but contant sum and that loop was unrolled. The vectorizer will generate a constant vector for the initial value. And DAGCombiner reassociate tries to move it down the addition tree erasing the vector-reduction flag. Interestingly this moves constants the opposite direction of the reassociate IR pass. I've chosen to just punt on the reassociate, but I suppose we could maybe preserve the flag if both nodes have it set. Differential Revision: https://reviews.llvm.org/D50827 llvm-svn: 339946
* [MI] Change the array of `MachineMemOperand` pointers to beChandler Carruth2018-08-161-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a generically extensible collection of extra info attached to a `MachineInstr`. The primary change here is cleaning up the APIs used for setting and manipulating the `MachineMemOperand` pointer arrays so chat we can change how they are allocated. Then we introduce an extra info object that using the trailing object pattern to attach some number of MMOs but also other extra info. The design of this is specifically so that this extra info has a fixed necessary cost (the header tracking what extra info is included) and everything else can be tail allocated. This pattern works especially well with a `BumpPtrAllocator` which we use here. I've also added the basic scaffolding for putting interesting pointers into this, namely pre- and post-instruction symbols. These aren't used anywhere yet, they're just there to ensure I've actually gotten the data structure types correct. I'll flesh out support for these in a subsequent patch (MIR dumping, parsing, the works). Finally, I've included an optimization where we store any single pointer inline in the `MachineInstr` to avoid the allocation overhead. This is expected to be the overwhelmingly most common case and so should avoid any memory usage growth due to slightly less clever / dense allocation when dealing with >1 MMO. This did require several ergonomic improvements to the `PointerSumType` to reasonably support the various usage models. This also has a side effect of freeing up 8 bits within the `MachineInstr` which could be repurposed for something else. The suggested direction here came largely from Hal Finkel. I hope it was worth it. ;] It does hopefully clear a path for subsequent extensions w/o nearly as much leg work. Lots of thanks to Reid and Justin for careful reviews and ideas about how to do all of this. Differential Revision: https://reviews.llvm.org/D50701 llvm-svn: 339940
* [SelectionDAG] Improve the legalisation lowering of UMULO.Eli Friedman2018-08-161-17/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ | Arch | u64*u64 t | u64*u64 s | u128*u128 t | u128*u128 s | +-------+-----------+-----------+-------------+-------------+ | X64 | - | - | ~0.5 | ~0.64 | | i686 | ~0.5 | ~0.6666 | ~0.05 | ~0.9 | | armv7 | - | ~0.75 | - | ~1.4 | +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922
* [TargetLowering] Add support for non-uniform vectors to BuildSDIVSimon Pilgrim2018-08-161-10/+24
| | | | | | | | | | This patch refactors the existing TargetLowering::BuildSDIV base implementation to support non-uniform constant vector denominators. This is the last patch necessary to close PR36545 Differential Revision: https://reviews.llvm.org/D50765 llvm-svn: 339908
* [TargetLowering] Refactor BuildSDIV in preparation for D50765. NFCI.Simon Pilgrim2018-08-161-24/+36
| | | | | | Pull out magic factor calculators into a helper function, use 0/+1/-1 multiplication factor to (optionally) add/sub the numerator. llvm-svn: 339898
* DAG: Use getObjectOffset helperMatt Arsenault2018-08-151-4/+1
| | | | llvm-svn: 339813
* DAG: Try to custom lower when promoting float operandsMatt Arsenault2018-08-151-0/+5
| | | | | | | For some reason this wasn't done for floats like integers. llvm-svn: 339811
* [TargetLowering] Minor cleanup of TargetLowering::BuildSDIV. NFCI.Simon Pilgrim2018-08-151-21/+20
| | | | | | Pull out some types to match layout in TargetLowering::BuildUDIV. Early step towards adding non-uniform vector support. llvm-svn: 339763
* [TargetLowering] Minor refactor to TargetLowering::BuildUDIV to merge ↵Simon Pilgrim2018-08-151-41/+31
| | | | | | | | scalar/vector magic value collection. NFCI. Use the same ISD::matchUnaryPredicate pattern that was used in D50392. llvm-svn: 339758
* [DagCombiner] Don't bother adding to the work list if TLI.BuildSDIVPow2 ↵Simon Pilgrim2018-08-151-4/+6
| | | | | | | | failed. NFCI. Matches the code in BuildSDIV/BuildUDIV llvm-svn: 339757
* [TargetLowering] Add support for non-uniform vectors to BuildExactSDIVSimon Pilgrim2018-08-151-12/+24
| | | | | | | | This patch refactors the existing BuildExactSDIV implementation to support non-uniform constant vector denominators. Differential Revision: https://reviews.llvm.org/D50392 llvm-svn: 339756
* [SDAG] Remove the reliance on MI's allocation strategy forChandler Carruth2018-08-145-39/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be *surprised* at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740
* [FPEnv] Scalarize StrictFP vector operationsCameron McInally2018-08-142-0/+50
| | | | | | | | Add a helper function to scalarize constrained FP operations as needed. Differential Revision: https://reviews.llvm.org/D50720 llvm-svn: 339735
* [ARM] Make PerformSHLSimplify add nodes to the DAG worklist correctly.Eli Friedman2018-08-141-2/+3
| | | | | | | | | | | | | | | | | | | | | Intentionally excluding nodes from the DAGCombine worklist is likely to lead to weird optimizations and infinite loops, so it's generally a bad idea. To avoid the infinite loops, fix DAGCombine to use the isDesirableToCommuteWithShift target hook before performing the transforms in question, and implement the target hook in the ARM backend disable the transforms in question. Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a reduced testcase for that bug. But we should have sufficient test coverage for PerformSHLSimplify given that we're not playing weird tricks with the worklist. I can try to bugpoint it if necessary, though.) Differential Revision: https://reviews.llvm.org/D50667 llvm-svn: 339734
OpenPOWER on IntegriCloud