bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[DAGCombiner] Add add saturation constant folding tests.	Simon Pilgrim	2019-01-14	1	-2/+3
\| \| \| \| \| \|	Exposes an issue with sadd_sat for computeOverflowKind, so I've disabled it for now. llvm-svn: 351057
*	[DAGCombiner] If add_sat(x,y) can't overflow -> add(x,y)	Simon Pilgrim	2019-01-13	1	-0/+4
\| \| \| \| \|	NOTE: We need more powerful signed overflow detection in computeOverflowKind llvm-svn: 351026
*	Fix unused variable warning. NFCI.	Simon Pilgrim	2019-01-13	1	-1/+0
\| \| \| \|	llvm-svn: 351025
*	[DAGCombiner] Some very basic add/sub saturation combines.	Simon Pilgrim	2019-01-13	1	-0/+64
\| \| \| \| \| \|	Handle combines with zero and constant canonicalization for adds. llvm-svn: 351024
*	[DAGCombiner] fold insert_subvector of insert_subvector	Sanjay Patel	2019-01-12	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pattern: t33: v8i32 = insert_subvector undef:v8i32, t35, Constant:i64<0> t21: v16i32 = insert_subvector undef:v16i32, t33, Constant:i64<0> ...shows up in PR33758: https://bugs.llvm.org/show_bug.cgi?id=33758 ...although this patch doesn't make any difference to the final result on that yet. In the affected tests here, it looks like it just makes RA wiggle. But we might as well squash this to prevent it interfering with other pattern-matching. Differential Revision: https://reviews.llvm.org/D56604 llvm-svn: 351008
*	[DAGCombiner] simplify code; NFC	Sanjay Patel	2019-01-10	1	-11/+11
\| \| \| \|	llvm-svn: 350844
*	[DAGCombiner][x86] scalarize binop followed by extractelement	Sanjay Patel	2019-01-03	1	-5/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted in PR39973 and D55558: https://bugs.llvm.org/show_bug.cgi?id=39973 ...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine: // extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index) We want to have this in the DAG too because as we can see in some of the test diffs (reductions), the pattern may not be visible in IR. Given that this is already an IR canonicalization, any backend that would prefer a vector op over a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's a realistic expectation though). The transform is limited with a TLI hook because there's an existing transform in CodeGenPrepare that tries to do the opposite transform. Differential Revision: https://reviews.llvm.org/D55722 llvm-svn: 350354
*	[DAGCombiner] After performing the division by constant optimization for a ↵	Craig Topper	2019-01-02	1	-2/+29
\| \| \| \| \| \| \| \| \| \| \| \|	DIV or REM node, replace the users of the corresponding REM or DIV node if it exists. Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced. Improves the test case from PR38217. There may be additional opportunities after this. Differential Revision: https://reviews.llvm.org/D56145 llvm-svn: 350239
*	[DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold ↵	Craig Topper	2019-01-02	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	(sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them. If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead. The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this. Differential Revision: https://reviews.llvm.org/D56156 llvm-svn: 350235
*	[DAGCombiner] Add missing one use check on the shuffle in the ↵	Craig Topper	2018-12-31	1	-1/+1
\| \| \| \| \| \| \| \|	bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) transform. Found while trying out some other changes so I don't really have a test case. llvm-svn: 350172
*	[DAGCombiner] limit shuffle to extend transform (PR40146)	Sanjay Patel	2018-12-23	1	-4/+5
\| \| \| \| \| \| \| \| \| \|	It's dangerous to knowingly create an illegal vector type no matter what stage of combining we're in. This prevents the missed folding/scalarization seen in: https://bugs.llvm.org/show_bug.cgi?id=40146 llvm-svn: 350034
*	[DAGCombiner] allow hoisting vector bitwise logic ahead of extends	Sanjay Patel	2018-12-23	1	-6/+5
\| \| \| \|	llvm-svn: 350032
*	[DAGCombiner] allow narrowing of add followed by truncate	Sanjay Patel	2018-12-22	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	trunc (add X, C ) --> add (trunc X), C' If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type. This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine). This change used to show regressions for x86, but those are gone after D55494. This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) that does almost the same thing. Differential Revision: https://reviews.llvm.org/D55866 llvm-svn: 350006
*	[DAGCombiner] simplify code leading to scalarizeExtractedVectorLoad; NFC	Sanjay Patel	2018-12-21	1	-6/+5
\| \| \| \|	llvm-svn: 349958
*	[SelectionDAG] Always use the version of computeKnownBits that returns a ↵	Simon Pilgrim	2018-12-21	1	-10/+6
\| \| \| \| \| \| \| \|	value. NFCI. Continues the work started by @bogner in rL340594 to remove uses of the KnownBits output paramater version. llvm-svn: 349907
*	[ARM] Complete the Thumb1 shift+and->shift+shift transforms.	Eli Friedman	2018-12-20	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This saves materializing the immediate. The additional forms are less common (they don't usually show up for bitfield insert/extract), but they're still relevant. I had to add a new target hook to prevent DAGCombine from reversing the transform. That isn't the only possible way to solve the conflict, but it seems straightforward enough. Differential Revision: https://reviews.llvm.org/D55630 llvm-svn: 349857
*	[DAGCombiner] Fix a place that was creating a SIGN_EXTEND with an extra operand.	Craig Topper	2018-12-20	1	-1/+1
\| \| \| \|	llvm-svn: 349726
*	[SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate ↵	Simon Pilgrim	2018-12-19	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	(part 2 of 2) Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs. This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument. I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1\|c2) fold to demonstrate its use, which I believe is safe for undef cases. Differential Revision: https://reviews.llvm.org/D55822 llvm-svn: 349629
*	[TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)	Simon Pilgrim	2018-12-19	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \|	As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero. SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue. Thanks to @dmgreen for catching this. Differential Revision: https://reviews.llvm.org/D55883 llvm-svn: 349625
*	[DAGCombiner] allow hoisting vector bitwise logic ahead of truncates	Sanjay Patel	2018-12-16	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The transform performs a bitwise logic op in a wider type followed by truncate when both inputs are truncated from the same source type: logic_op (truncate x), (truncate y) --> truncate (logic_op x, y) There are a bunch of other checks that should prevent doing this when it might be harmful. We already do this transform for scalars in this spot. The vector limitation was shared with a check for the case when the operands are extended. I'm not sure if that limit is needed either, but that would be a separate patch. Differential Revision: https://reviews.llvm.org/D55448 llvm-svn: 349303
*	[SelectionDAG] Add FSHL/FSHR support to computeKnownBits	Simon Pilgrim	2018-12-16	1	-2/+4
\| \| \| \| \| \|	Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type). llvm-svn: 349298
*	[DAGCombiner][X86] Prevent visitSIGN_EXTEND from returning N when (sext ↵	Craig Topper	2018-12-14	1	-15/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(setcc)) already has the target desired type for the setcc Summary: If the setcc already has the target desired type we can reach the getSetCC/getSExtOrTrunc after the MatchingVecType check with the exact same types as the nodes we started with. This causes those causes VsetCC to be CSEd to N0 and the getSExtOrTrunc will CSE to N. When we return N, the caller will think that meant we called CombineTo and did our own worklist management. But that's not what happened. This prevents target hooks from being called for the node. To fix this, I've now returned SDValue if the setcc is already the desired type. But to avoid some regressions in X86 I've had to disable one of the target combines that wasn't being reached before in the case of a (sext (setcc)). If we get vector widening legalization enabled that entire function will be deleted anyway so hopefully this is only for the short term. Reviewers: RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55459 llvm-svn: 349137
*	[DAGCombiner] clean up visitEXTRACT_VECTOR_ELT	Sanjay Patel	2018-12-14	1	-138/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This isn't quite NFC, but I don't know how to expose any outward diffs from these changes. Mostly, this was confusing because it used 'VT' to refer to the operand type rather the usual type of the input node. There's also a large block at the end that is dedicated solely to matching loads, but that wasn't obvious. This could probably be split up into separate functions to make it easier to see. It's still not clear to me when we make certain transforms because the legality and constant conditions are intertwined in a way that might be improved. llvm-svn: 349095
*	[DAGCombiner] after simplifying demanded elements of vector operand of ↵	Sanjay Patel	2018-12-13	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \|	extract, revisit the extract; 2nd try This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from number of uses to an opcode test for DELETED_NODE based on existing similar code. Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349058
*	revert rL349051: [DAGCombiner] after simplifying demanded elements of vector ↵	Sanjay Patel	2018-12-13	1	-6/+1
\| \| \| \| \| \| \| \| \|	operand of extract, revisit the extract This causes an address sanitizer bot failure: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio llvm-svn: 349056
*	[DAGCombiner] after simplifying demanded elements of vector operand of ↵	Sanjay Patel	2018-12-13	1	-1/+6
\| \| \| \| \| \| \| \|	extract, revisit the extract Differential Revision: https://reviews.llvm.org/D55655 llvm-svn: 349051
*	[DAGCombine] Moved X86 rotate_amount % bitwidth == 0 early out to DAGCombiner	Simon Pilgrim	2018-12-13	1	-0/+7
\| \| \| \| \| \|	Remove common code from custom lowering (code is still safe if somehow a zero value gets used). llvm-svn: 349028
*	[DAGCombiner] Remove unnecessary recursive ↵	Simon Pilgrim	2018-12-10	1	-6/+0
\| \| \| \| \| \| \| \|	DAGCombiner::visitINSERT_SUBVECTOR call. As discussed on D55511, this caused an issue if the inner node deletes a node that the outer node depends upon. As it doesn't affect any lit-tests and I've only been able to expose this with the D55511 change I'm committing this now. llvm-svn: 348781
*	[DAGCombiner] Use the result value type in visitCONCAT_VECTORS	Francis Visoiu Mistrih	2018-12-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This triggers an assert when combining concat_vectors of a bitcast of merge_values. With asserts disabled, it fails to select: fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8 0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1 0x7ff19d000b50: f64 = Register %1 In function: d Differential Revision: https://reviews.llvm.org/D55507 llvm-svn: 348759
*	[DAGCombiner] re-enable truncation of binops	Sanjay Patel	2018-12-08	1	-12/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is effectively re-committing the changes from: rL347917 (D54640) rL348195 (D55126) ...which were effectively reverted here: rL348604 ...because the code had a bug that could induce infinite looping or eventual out-of-memory compilation. The bug was that this code did not guard against transforming opaque constants. More details are in the post-commit mailing list thread for r347917. A reduced test for that is included in the x86 bool-math.ll file. (I wasn't able to reduce a PPC backend test for this, but it was almost the same pattern.) Original commit message for r347917: The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. llvm-svn: 348706
*	[DAGCombiner] split trunc from extend in hoistLogicOpWithSameOpcodeHands; NFC	Sanjay Patel	2018-12-07	1	-33/+48
\| \| \| \| \| \| \|	This duplicates several shared checks, but we need to split this up to fix underlying bugs in smaller steps. llvm-svn: 348627
*	[DAGCombiner] disable truncation of binops by default	Sanjay Patel	2018-12-07	1	-1/+7
\| \| \| \| \| \| \| \| \| \|	As discussed in the post-commit thread of r347917, this transform is fighting with an existing transform causing an infinite loop or out-of-memory, so this is effectively reverting r347917 and its follow-up r348195 while we investigate the bug. llvm-svn: 348604
*	[DAGCombiner] remove explicit calls to AddToWorkList; NFCI	Sanjay Patel	2018-12-07	1	-6/+0
\| \| \| \| \| \| \| \|	As noted in the post-commit thread for rL347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608936.html ...we don't need to repeat these calls because the combiner does it automatically. llvm-svn: 348597
*	[DAGCombiner] use root SDLoc for all nodes created by logic fold	Sanjay Patel	2018-12-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	If this is not a valid way to assign an SDLoc, then we get this wrong all over SDAG. I don't know enough about the SDAG to explain this. IIUC, theoretically, debug info is not supposed to affect codegen. But here it has clearly affected 3 different targets, and the x86 change is an actual improvement. llvm-svn: 348552
*	[DAGCombiner] don't bother saving a SDLoc for a node that's dead; NFCI	Sanjay Patel	2018-12-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We shouldn't care about the debug location for a node that we're creating, but attaching the root of the pattern should be the best effort. (If this is not true, then we are doing it wrong all over the SDAG). This is no-functional-change-intended, and there are no regression test diffs...and that's what I expected. But there's a similar line above this diff, where those assumptions apparently do not hold. llvm-svn: 348550
*	[DAGCombiner] more clean up in hoistLogicOpWithSameOpcodeHands(); NFC	Sanjay Patel	2018-12-06	1	-41/+34
\| \| \| \| \| \|	This code can still misbehave. llvm-svn: 348547
*	[DAGCombiner] don't group bswap with casts in logic hoisting fold	Sanjay Patel	2018-12-06	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was probably organized as it was because bswap is a unary op. But that's where the similarity to the other opcodes ends. We should not limit this transform to scalars, and we should not try it if either input has other uses. This is another step towards trying to clean this whole function up to prevent it from causing infinite loops and memory explosions. Earlier commits in this series: rL348501 rL348508 rL348518 llvm-svn: 348534
*	[DAGCombiner] reduce indent; NFC	Sanjay Patel	2018-12-06	1	-38/+31
\| \| \| \| \| \| \| \|	Unlike some of the folds in hoistLogicOpWithSameOpcodeHands() above this shuffle transform, this has the expected hasOneUse() checks in place. llvm-svn: 348523
*	[DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef.	Andrea Di Biagio	2018-12-06	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a new DAGCombiner rule to simplify concat_vectors nodes: concat_vectors( bitcast (scalar_to_vector %A), UNDEF) --> bitcast (scalar_to_vector %A) This patch only partially addresses PR39257. In particular, it is enough to fix one of the two problematic cases mentioned in PR39257. However, it is not enough to fix the original test case posted by Craig; that particular case would probably require a more complicated approach (and knowledge about used bits). Before this patch, we used to generate the following code for function PR39257 (-mtriple=x86_64 , -mattr=+avx): vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vxorps %xmm1, %xmm1, %xmm1 vblendps $3, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2,3] vmovaps %ymm0, (%rsi) vzeroupper retq Now we generate this: vmovsd (%rdi), %xmm0 # xmm0 = mem[0],zero vmovaps %ymm0, (%rsi) vzeroupper retq As a side note: that VZEROUPPER is completely redundant... I guess the vzeroupper insertion pass doesn't realize that the definition of %xmm0 from vmovsd is already zeroing the upper half of %ymm0. Note that on %-mcpu=btver2, we don't get that vzeroupper because pass vzeroupper insertion %pass is disabled. Differential Revision: https://reviews.llvm.org/D55274 llvm-svn: 348522
*	[DAGCombiner] don't hoist logic op if operands have other uses, part 2	Sanjay Patel	2018-12-06	1	-5/+7
\| \| \| \| \| \| \| \| \|	The PPC test with 2 extra uses seems clearly better by avoiding this transform. With 1 extra use, we also prevent an extra register move (although that might be an RA problem). The general rule should be to only make a change here if it is always profitable. The x86 diffs are all neutral. llvm-svn: 348518
*	[DAGCombiner] don't hoist logic op if operands have other uses	Sanjay Patel	2018-12-06	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The AVX512 diffs are neutral, but the bswap test shows a clear overreach in hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can increase the instruction count. This could also fight with transforms trying to go in the opposite direction and possibly blow up/infinite loop. This might be enough to solve the bug noted here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html I did not add the hasOneUse() checks to all opcodes because I see a perf regression for at least one opcode. We may decide that's irrelevant in the face of potential compiler crashing, but I'll see if I can salvage that first. llvm-svn: 348508
*	[DAGCombiner] refactor function that hoists bitwise logic; NFCI	Sanjay Patel	2018-12-06	1	-56/+65
\| \| \| \| \| \| \| \| \|	Added FIXME and TODO comments for lack of safety checks. This function is a suspect in out-of-memory errors as discussed in the follow-up thread to r347917: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html llvm-svn: 348501
*	DAGCombiner::visitINSERT_VECTOR_ELT - pull out repeated ↵	Simon Pilgrim	2018-12-06	1	-3/+4
\| \| \| \| \| \|	VT.getVectorNumElements(). NFCI. llvm-svn: 348494
*	[DAGCombiner] don't try to extract a fraction of a vector binop and crash ↵	Sanjay Patel	2018-12-05	1	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	(PR39893) Because we're potentially peeking through a bitcast in this transform, we need to use overall bitwidths rather than number of elements to determine when it's safe to proceed. Should fix: https://bugs.llvm.org/show_bug.cgi?id=39893 llvm-svn: 348383
*	[SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)	Simon Pilgrim	2018-12-05	1	-0/+36
\| \| \| \| \| \| \| \| \| \|	This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions. Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code. Differential Revision: https://reviews.llvm.org/D54698 llvm-svn: 348353
*	[TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes	Simon Pilgrim	2018-12-04	1	-0/+6
\| \| \| \| \| \| \| \|	Add support for ISD::_EXTEND and ISD::_EXTEND_VECTOR_INREG opcodes. The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch. llvm-svn: 348246
*	[DAGCombiner] narrow truncated vector binops when legal	Sanjay Patel	2018-12-03	1	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the smallest vector enhancement I could find to D54640. Here, we're allowing narrowing to only legal vector ops because we'll see regressions without that. All of the test diffs are wins from what I can tell. With AVX/AVX512, we can shrink ymm/zmm ops to xmm. x86 vector multiplies are the problem case that we're avoiding due to the patchwork ISA, and it's not clear to me if we can dance around those regressions using TLI hooks or if we need preliminary patches to plug those holes. Differential Revision: https://reviews.llvm.org/D55126 llvm-svn: 348195
*	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb ↵	Craig Topper	2018-12-03	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158
*	[DAGCombiner] guard against an oversized shift crash	Sanjay Patel	2018-12-02	1	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089
*	[DAGCombiner] narrow truncated binops	Sanjay Patel	2018-11-29	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivating case for this is shown in: https://bugs.llvm.org/show_bug.cgi?id=32023 and the corresponding rot16.ll regression tests. Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc sequences that don't get folded in IR. As the TODO comments suggest, there will be regressions if we extend this (for x86, we mostly seem to be missing LEA opportunities, but there are likely vector folds missing too). I think those should be considered existing bugs because this is the same transform that we do as an IR canonicalization in instcombine. We just need more tests to make those visible independent of this patch. Differential Revision: https://reviews.llvm.org/D54640 llvm-svn: 347917