summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/SelectionDAG
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"Nikita Popov2019-01-142-28/+4
| | | | | | | | | This reverts commit r351125. I missed test changes in an SLPVectorizer test, due to the cost model changes. Reverting for now. llvm-svn: 351129
* [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectorsNikita Popov2019-01-142-4/+28
| | | | | | | | | | | Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351125
* Reland "Refactor GetRegistersForValue. NFCI."Nirav Dave2019-01-141-55/+44
| | | | | | Remove over-strictification class membership check. llvm-svn: 351074
* [DAGCombiner] Add (sub_sat x, x) -> 0 combineSimon Pilgrim2019-01-141-0/+4
| | | | llvm-svn: 351073
* [DAGCombiner] Enable sub saturation constant foldingSimon Pilgrim2019-01-142-1/+8
| | | | llvm-svn: 351072
* [DAGCombiner] Add add/sub saturation undef handlingSimon Pilgrim2019-01-142-0/+14
| | | | | | | | Match ConstantFolding.cpp: (add_sat x, undef) -> -1 (sub_sat x, undef) -> 0 llvm-svn: 351070
* [DAGCombiner] Enable add saturation constant foldingSimon Pilgrim2019-01-142-2/+5
| | | | llvm-svn: 351060
* [DAGCombiner] Add add saturation constant folding tests.Simon Pilgrim2019-01-141-2/+3
| | | | | | Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now. llvm-svn: 351057
* [SelectionDAG] Add type sanity assertions for add/sub saturation node creation.Simon Pilgrim2019-01-141-0/+4
| | | | llvm-svn: 351055
* [DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)Simon Pilgrim2019-01-131-0/+4
| | | | | NOTE: We need more powerful signed overflow detection in computeOverflowKind llvm-svn: 351026
* Fix unused variable warning. NFCI.Simon Pilgrim2019-01-131-1/+0
| | | | llvm-svn: 351025
* [DAGCombiner] Some very basic add/sub saturation combines.Simon Pilgrim2019-01-131-0/+64
| | | | | | Handle combines with zero and constant canonicalization for adds. llvm-svn: 351024
* [LegalizeDAG] Remove 'NeedInvert' code from expansion of BR_CC. Replace with ↵Craig Topper2019-01-131-4/+1
| | | | | | | | | | | | an assert. I accidentally triggered this code while doing some experiments and it doesn't look lke it could possibly work. It calls 'getNOT' on a node that should be a CondCode. I think to do this right we would need to swap the branch target and the fallthrough target. But that's not easy to do. Or we could create an explicit SetCC and feed that into a new BR_CC? llvm-svn: 351022
* [X86] Rename overly verbose method; NFCNikita Popov2019-01-133-8/+5
| | | | | | As suggested on D56636. llvm-svn: 351021
* [DAGCombiner] fold insert_subvector of insert_subvectorSanjay Patel2019-01-121-0/+8
| | | | | | | | | | | | | | | | | | | This pattern: t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0> t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0> ...shows up in PR33758: https://bugs.llvm.org/show_bug.cgi?id=33758 ...although this patch doesn't make any difference to the final result on that yet. In the affected tests here, it looks like it just makes RA wiggle. But we might as well squash this to prevent it interfering with other pattern-matching. Differential Revision: https://reviews.llvm.org/D56604 llvm-svn: 351008
* Use getShiftAmountTy for shift amounts.Simon Pilgrim2019-01-121-1/+2
| | | | llvm-svn: 351005
* [X86][AARCH64] Improve ISD::ABS supportSimon Pilgrim2019-01-123-0/+43
| | | | | | | | This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types. Differential Revision: https://reviews.llvm.org/D56544 llvm-svn: 350998
* [Legalizer] Use correct ValueType of SELECT_CC node during Float promotionPirama Arumuga Nainar2019-01-111-3/+3
| | | | | | | | | | | | | | | | | Summary: When legalizing the result of a SELECT_CC node by promoting the floating-point type, use the promoted-to type rather than the original type. Fix PR40273. Reviewers: efriedma, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56566 llvm-svn: 350951
* Revert "[SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI."Martin Storsjo2019-01-111-42/+60
| | | | | | | This reverts commit r350841, as it actually had functional changes and broke compilation. See PR40290. llvm-svn: 350921
* [DAGCombiner] simplify code; NFCSanjay Patel2019-01-101-11/+11
| | | | llvm-svn: 350844
* [SelectionDAGBuilder] Refactor GetRegistersForValue. NFCI.Nirav Dave2019-01-101-60/+42
| | | | llvm-svn: 350841
* [SelectionDAGBuilder] Fix formatting. NFC.Nirav Dave2019-01-101-1/+2
| | | | llvm-svn: 350839
* [SelectionDAGBuilder] Refactor visitInlineAsm. NFC.Nirav Dave2019-01-101-45/+24
| | | | llvm-svn: 350837
* [opaque pointer types] Remove some calls to generic Type subtype accessors.James Y Knight2019-01-101-7/+7
| | | | | | | | | | | | That is, remove many of the calls to Type::getNumContainedTypes(), Type::subtypes(), and Type::getContainedType(N). I'm not intending to remove these accessors -- they are useful/necessary in some cases. However, removing the pointee type from pointers would potentially break some uses, and reducing the number of calls makes it easier to audit. llvm-svn: 350835
* Remove check for single use in ShrinkDemandedConstantStanislav Mekhanoshin2019-01-091-3/+0
| | | | | | | | | | | | | | | This removes check for single use from general ShrinkDemandedConstant to the BE because of the AArch64 regression after D56289/rL350475. After several hours of experiments I did not come up with a testcase failing on any other targets if check is not performed. Moreover, direct call to ShrinkDemandedConstant is not really needed and superceed by SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D56406 llvm-svn: 350684
* [TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes ↵Craig Topper2019-01-071-50/+0
| | | | | | | | | | | | | | a User and OpIdx. Stop using it in AMDGPU target for simplifyI24. As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions. Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node. This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simplifications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input. Differential Revision: https://reviews.llvm.org/D56087 llvm-svn: 350560
* [LegalizeVectorOps] Add FSHL/FSHR to the list of vector operations that ↵Craig Topper2019-01-061-0/+2
| | | | | | | | should be handled. The FSHL/FSHR nodes are handled in the expand function, but they need to also be listed in the code that queries for the operation action too. llvm-svn: 350490
* Added single use check to ShrinkDemandedConstantStanislav Mekhanoshin2019-01-051-0/+3
| | | | | | | | | Fixes cvt_f32_ubyte combine. performCvtF32UByteNCombine() could shrink source node to demanded bits only even if there are other uses. Differential Revision: https://reviews.llvm.org/D56289 llvm-svn: 350475
* [X86] Add INSERT_SUBVECTOR to ComputeNumSignBitsCraig Topper2019-01-041-1/+35
| | | | | | | | | | This adds support for calculating sign bits of insert_subvector. I based it on the computeKnownBits. My motivating case is propagating sign bits information across basic blocks on AVX targets where concatenating using insert_subvector is common. Differential Revision: https://reviews.llvm.org/D56283 llvm-svn: 350432
* [DAGCombiner][x86] scalarize binop followed by extractelementSanjay Patel2019-01-031-5/+44
| | | | | | | | | | | | | | | | | | | | As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354
* [DAGCombiner] After performing the division by constant optimization for a ↵Craig Topper2019-01-021-2/+29
| | | | | | | | | | | | DIV or REM node, replace the users of the corresponding REM or DIV node if it exists. Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced. Improves the test case from PR38217. There may be additional opportunities after this. Differential Revision: https://reviews.llvm.org/D56145 llvm-svn: 350239
* [LegalizeIntegerTypes] When promoting the result of an extract_vector_elt ↵Craig Topper2019-01-021-2/+20
| | | | | | | | | | | | | | also promote the input type if necessary By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined. By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend. This fixes the regression on X86 in D56156. Differential Revision: https://reviews.llvm.org/D56176 llvm-svn: 350236
* [DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold ↵Craig Topper2019-01-021-2/+5
| | | | | | | | | | | | (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them. If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead. The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this. Differential Revision: https://reviews.llvm.org/D56156 llvm-svn: 350235
* Reversing the commit in revision 350186. Revision causes regression in 4Ayonam Ray2019-01-012-33/+53
| | | | | | tests. llvm-svn: 350187
* Omit range checks from jump tables when lowering switches with unreachableAyonam Ray2019-01-012-53/+33
| | | | | | | | | | | | | | | default During the lowering of a switch that would result in the generation of a jump table, a range check is performed before indexing into the jump table, for the switch value being outside the jump table range and a conditional branch is inserted to jump to the default block. In case the default block is unreachable, this conditional jump can be omitted. This patch implements omitting this conditional branch for unreachable defaults. Review Reference: D52002 llvm-svn: 350186
* [SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits.Craig Topper2018-12-311-1/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D56168 llvm-svn: 350179
* [DAGCombiner] Add missing one use check on the shuffle in the ↵Craig Topper2018-12-311-1/+1
| | | | | | | | bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform. Found while trying out some other changes so I don't really have a test case. llvm-svn: 350172
* [PowerPC] Fix ADDE, SUBE do not know how to promote operatorKang Zhang2018-12-301-0/+5
| | | | | | | | | | | | | | Summary: This patch is created to fix the Bugzilla bug 39815: https://bugs.llvm.org/show_bug.cgi?id=39815 This patch is to support promotion integer result for the instruction ADDE, SUBE. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D56119 llvm-svn: 350161
* Add vtable anchor to classes.Richard Trieu2018-12-291-0/+2
| | | | llvm-svn: 350142
* [NVPTX] Allow libcalls that are defined in the current module.Justin Lebar2018-12-261-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds a possibility to make library calls on NVPTX. An important thing about library functions - they must be defined within the current module. This basically should guarantee that we produce a valid PTX assembly (without calls to not defined functions). The one who wants to use the libcalls is probably will have to link against compiler-rt or any other implementation. Currently, it's completely impossible to make library calls because of error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can lower ExternalSymbol to TargetExternalSymbol and verify if the function definition is available. Also, there was an issue with a DAG during legalisation. When we expand instruction into libcall, the inner call-chain isn't being "integrated" into outer chain. Since the last "data-flow" (call retval load) node is located in call-chain earlier than CALLSEQ_END node, the latter becomes a leaf and therefore a dead node (and is being removed quite fast). Proposed here solution relies on another data-flow pseudo nodes (ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and instruction selection phases - we remove the pseudo instructions before register scheduling phase. Patch by Denys Zariaiev! Differential Revision: https://reviews.llvm.org/D34708 llvm-svn: 350069
* [X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ.Craig Topper2018-12-241-0/+9
| | | | | | | | | | | | | | This is an alternative to what I attempted in D56057. GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users. I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits. Based on a patch that Simon Pilgrim sent me in email. Fixes PR40142. llvm-svn: 350059
* [SelectionDAGBuilder] Use ::precise LocationSizes; NFCGeorge Burgess IV2018-12-241-11/+23
| | | | | | | | | | | | | | | More migration so we can disable the implicit int -> LocationSize conversion. All of these are either scatter/gather'ed vector instructions, or direct loads. Hence, they're all precise. Perhaps if we see way more getTypeStoreSize calls, we can make a getTypeStoreLocationSize (or similar) as a wrapper that applies this ::precise. Doesn't appear that it's a good idea to make getTypeStoreSize return a LocationSize itself, however. llvm-svn: 350042
* [DAGCombiner] limit shuffle to extend transform (PR40146)Sanjay Patel2018-12-231-4/+5
| | | | | | | | | | It's dangerous to knowingly create an illegal vector type no matter what stage of combining we're in. This prevents the missed folding/scalarization seen in: https://bugs.llvm.org/show_bug.cgi?id=40146 llvm-svn: 350034
* [DAGCombiner] allow hoisting vector bitwise logic ahead of extendsSanjay Patel2018-12-231-6/+5
| | | | llvm-svn: 350032
* [DAGCombiner] allow narrowing of add followed by truncateSanjay Patel2018-12-221-2/+1
| | | | | | | | | | | | | | | trunc (add X, C ) --> add (trunc X), C' If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type. This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine). This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing. Differential Revision: https://reviews.llvm.org/D55866 llvm-svn: 350006
* [DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFCSanjay Patel2018-12-211-6/+5
| | | | llvm-svn: 349958
* [SelectionDAG] Always use the version of computeKnownBits that returns a ↵Simon Pilgrim2018-12-215-27/+16
| | | | | | | | value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349907
* [ARM] Complete the Thumb1 shift+and->shift+shift transforms.Eli Friedman2018-12-201-1/+2
| | | | | | | | | | | | | | This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision: https://reviews.llvm.org/D55630 llvm-svn: 349857
* [SelectionDAGBuilder] Enable funnel shift building to custom rotatesSimon Pilgrim2018-12-201-4/+2
| | | | | | | | | | This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations. AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK. Differential Revision: https://reviews.llvm.org/D55747 llvm-svn: 349765
* [DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand.Craig Topper2018-12-201-1/+1
| | | | llvm-svn: 349726
OpenPOWER on IntegriCloud