bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[InstCombine] correct test comments; NFC	Sanjay Patel	2018-07-09	1	-2/+2
\| \| \| \|	llvm-svn: 336570
*	[InstCombine] avoid extra poison when moving shift above shuffle	Sanjay Patel	2018-07-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	As discussed in D49047 / D48987, shift-by-undef produces poison, so we can't use undef vector elements in that case.. Note that we need to extend this for poison-generating flags, and there's a proposal to create poison from FMF in D47963, llvm-svn: 336562
*	[InstCombine] fix shuffle-of-binops transform to avoid poison/undef	Sanjay Patel	2018-07-09	1	-47/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted in D48987, there are many different ways for this transform to go wrong. In particular, the poison potential for shifts means we have to more careful with those ops. I added tests to make that behavior visible for all of the different cases that I could find. This is a partial fix. To make this review easier, I did not make changes for the single binop pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations noted with TODO comments. I'll follow-up once we're confident that things are correct here. The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely. Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows for some improvements to div/rem patterns, so there are wins along with the missed opportunities and fixes. Differential Revision: https://reviews.llvm.org/D49047 llvm-svn: 336546
*	[PM/Unswitch] Fix a nasty bug in the new PM's unswitch introduced in	Chandler Carruth	2018-07-09	1	-66/+418
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r335553 with the non-trivial unswitching of switches. The code correctly updated most aspects of the CFG and analyses, but missed some crucial aspects: 1) When multiple cases have the same successor, we unswitch that a single time and replace the switch with a direct branch. The CFG here is correct, but the target of this direct branch may have had a PHI node with multiple entries in it. 2) When we still have to clone a successor of the switch into an unswitched copy of the loop, we'll delete potentially multiple edges entering this successor, not just one. 3) We also have to delete multiple edges entering the successors in the original loop when they have to be retained. 4) When the "retained successor" also occurs as a case successor, we just assert failed everywhere. This doesn't happen very easily because its always valid to simply drop the case -- the retained successor for switches is always the default successor. However, it is likely possible through some contrivance of different loop passes, unrolling, and simplifying for this to occur in practice and certainly there is nothing "invalid" about the IR so this pass needs to handle it. 5) In the case of #4, we also will replace these multiple edges with a direct branch much like in #1 and need to collapse the entries in any PHI nodes to a single enrty. All of this stems from the delightful fact that the same successor can show up in multiple parts of the switch terminator, and each of these are considered a distinct edge for the purpose of PHI nodes (and iterating the successors and predecessors) but not for unswitching itself, the dominator tree, or many other things. For the record, I intensely dislike this "feature" of the IR in large part because of the complexity it causes in passes like this. We already have a ton of logic building sets and handling duplicates, and we just had to add a bunch more. I've added a complex test case that covers all five of the above failure modes. I've also added a variation on it where #4 and #5 occur in loop exit, adding fun where we have an LCSSA PHI node with "multiple entries" despite have dedicated exits. There were no additional issues found by this, but it seems a useful corner case to cover with testing. One thing that working on all of this code has made painfully clear for me as well is how amazingly inefficient our PHI node representation is (in terms of the in-memory data structures and the APIs used to update them). This code has truly marvelous complexity bounds because every time we remove an entry from a PHI node we do a linear scan to find it and then a linear update to the data structure to remove it. We could in theory batch all of the PHI node updates into a single linear walk of the operands making this much more efficient, but the APIs fight hard against this and the fact that we have to handle duplicates in the peculiar manner we do (removing all but one in some cases) makes even implementing that very tedious and annoying. Anyways, none of this is new here or specific to loop unswitching. All code in LLVM that updates PHI node operands suffers from these problems. llvm-svn: 336536
*	[PGOMemOPSize] Preserve the DominatorTree	Chijun Sima	2018-07-09	3	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: PGOMemOPSize only modifies CFG in a couple of places; thus we can preserve the DominatorTree with little effort. When optimizing SQLite with -O3, this patch can decrease 3.8% of the numbers of nodes traversed by DFS and 5.7% of the times DominatorTreeBase::recalculation is called. Reviewers: kuhar, davide, dmgreen Reviewed By: dmgreen Subscribers: mzolotukhin, vsk, llvm-commits Differential Revision: https://reviews.llvm.org/D48914 llvm-svn: 336522
*	[LoopIdiomRecognize] Support for converting loops that use LSHR to CTLZ.	Craig Topper	2018-07-08	1	-0/+162
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the 'detectCTLZIdiom' function support for loops that use LSHR instruction instead of ASHR has been added. This supports creating ctlz from the following code. int lzcnt(int x) { int count = 0; while (x > 0) { count++; x = x >> 1; } return count; } Patch by Olga Moldovanova Differential Revision: https://reviews.llvm.org/D48354 llvm-svn: 336509
*	[PM/LoopUnswitch] Fix PR37889, producing the correct loop nest structure	Chandler Carruth	2018-07-07	2	-1/+1162
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	after trivial unswitching. This PR illustrates that a fundamental analysis update was not performed with the new loop unswitch. This update is also somewhat fundamental to the core idea of the new loop unswitch -- we actually update the CFG based on the unswitching. In order to do that, we need to update the loop nest in addition to the domtree. For some reason, when writing trivial unswitching, I thought that the loop nest structure cannot be changed by the transformation. But the PR helps illustrate that it clearly can. I've expanded this to a number of different test cases that try to cover the different cases of this. When we unswitch, we move an exit edge of a loop out of the loop. If this exit edge changes which loop reached by an exit is the innermost loop, it changes the parent of the loop. Essentially, this transformation may hoist the inner loop up the nest. I've added the simple logic to handle this reliably in the trivial unswitching case. This just requires updating LoopInfo and rebuilding LCSSA on the impacted loops. In the trivial case, we don't even need to handle dedicated exits because we're only hoisting the one loop and we just split its preheader. I've also ported all of these tests to non-trivial unswitching and verified that the logic already there correctly handles the loop nest updates necessary. Differential Revision: https://reviews.llvm.org/D48851 llvm-svn: 336477
*	Revert "[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428)."	Tim Shen	2018-07-06	1	-8/+10
\| \| \| \| \| \|	This reverts commit r336140. Our tests shows that LSR assert fails with it. llvm-svn: 336473
*	[InstCombine] add more tests for potentially poisonous shifts; NFC	Sanjay Patel	2018-07-06	1	-0/+43
\| \| \| \|	llvm-svn: 336454
*	[Local] replaceAllDbgUsesWith: Update debug values before RAUW	Vedant Kumar	2018-07-06	4	-2/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The replaceAllDbgUsesWith utility helps passes preserve debug info when replacing one value with another. This improves upon the existing insertReplacementDbgValues API by: - Updating debug intrinsics in-place, while preventing use-before-def of the replacement value. - Falling back to salvageDebugInfo when a replacement can't be made. - Moving the responsibiliy for rewriting llvm.dbg.* DIExpressions into common utility code. Along with the API change, this teaches replaceAllDbgUsesWith how to create DIExpressions for three basic integer and pointer conversions: - The no-op conversion. Applies when the values have the same width, or have bit-for-bit compatible pointer representations. - Truncation. Applies when the new value is wider than the old one. - Zero/sign extension. Applies when the new value is narrower than the old one. Testing: - check-llvm, check-clang, a stage2 `-g -O3` build of clang, regression/unit testing. - This resolves a number of mis-sized dbg.value diagnostics from Debugify. Differential Revision: https://reviews.llvm.org/D48676 llvm-svn: 336451
*	[InstCombine] add more tests with poison and undef; NFC	Sanjay Patel	2018-07-06	1	-5/+540
\| \| \| \| \| \| \| \|	As discussed in D48987 and D48893, there are many different ways to go wrong depending on the binop (and as shown here we already do go wrong in some cases). llvm-svn: 336450
*	CallGraphSCCPass: iterate over all functions.	Tim Northover	2018-07-06	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we only iterated over functions reachable from the set of external functions in the module. But since some of the passes under this (notably the always-inliner and coroutine lowerer) are required for correctness, they need to run over everything. This just adds an extra layer of iteration over the CallGraph to keep track of which functions we've already visited and get the next batch of SCCs. Should fix PR38029. llvm-svn: 336419
*	Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms ↵	Max Kazantsev	2018-07-06	5	-24/+34
\| \| \| \| \| \|	are done" llvm-svn: 336410
*	Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN	Matt Arsenault	2018-07-05	1	-3/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Better NaN handling for AMDGCN fmed3. All operands are checked for NaN now. The checks were moved before the canonicalization to provide a better mapping from fclamp. Changed the behaviour of fmed3(x,y,NaN) to return max(x,y) instead of min(x,y) in light of this. Updated tests as a result and added some new cases to cover the fix. Patch by Alan Baker llvm-svn: 336375
*	[X86] Remove X86 specific scalar FMA intrinsics and upgrade to tart ↵	Craig Topper	2018-07-05	1	-30/+64
\| \| \| \| \| \|	independent FMA and extractelement/insertelement. llvm-svn: 336315
*	[InstCombine] allow narrowing of min/max/abs	Sanjay Patel	2018-07-04	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have bailout hacks based on min/max in various places in instcombine that shouldn't be necessary. The affected test was added for: D48930 ...which is a consequence of the improvement in: D48584 (https://reviews.llvm.org/rL336172) I'm assuming the visitTrunc bailout in this patch was added specifically to avoid a change from SimplifyDemandedBits, so I'm just moving that below the EvaluateInDifferentType optimization. A narrow min/max is still a min/max. llvm-svn: 336293
*	[InstCombine] add value names to test; NFC	Sanjay Patel	2018-07-04	1	-17/+17
\| \| \| \| \| \|	That makes it easier to mix and match lines into other tests. llvm-svn: 336289
*	NFC - Various typo fixes in tests	Gabor Buella	2018-07-04	6	-7/+7
\| \| \| \|	llvm-svn: 336268
*	[NFC] Add test that shows that InstCombine can do better	Max Kazantsev	2018-07-04	1	-0/+24
\| \| \| \|	llvm-svn: 336258
*	[DebugInfo][LoopVectorize] Preserve DL in generated phi instruction	Anastasis Grammenos	2018-07-04	1	-0/+9
\| \| \| \| \| \| \| \| \|	When creating `phi` instructions to resume at the scalar part of the loop, copy the DebugLoc from the original phi over to the new one. Differential Revision: https://reviews.llvm.org/D48769 llvm-svn: 336256
*	[DebugInfo][InstCombine] Preserve DI after combining zext	Anastasis Grammenos	2018-07-04	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \|	When zext is EvaluatedInDifferentType, InstCombine drops the dbg.value intrinsic. This patch tries to preserve said DI, by inserting the zext's old DI in the resulting instruction. (Only for integer type for now) Differential Revision: https://reviews.llvm.org/D48331 llvm-svn: 336254
*	[InstCombine] add tests for shuffle+binop with constant op1; NFC	Sanjay Patel	2018-07-03	1	-4/+25
\| \| \| \| \| \|	This adds coverage for a planned enhancement for ConstantExpr::getBinOpIdentity() noted in D48830. llvm-svn: 336220
*	[Constants] add identity constants for fadd/fmul	Sanjay Patel	2018-07-03	3	-14/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As the test diffs show, the current users of getBinOpIdentity() are InstCombine and Reassociate. SLP vectorizer is a candidate for using this functionality too (D28907). The InstCombine shuffle improvements are part of the planned enhancements noted in D48830. InstCombine actually has several other uses of getBinOpIdentity() via SimplifyUsingDistributiveLaws(), but we don't call that for any FP ops. Fixing that might be another part of removing the custom reassociation in InstCombine that is only done for fadd+fmul. llvm-svn: 336215
*	[Reassociate] add tests for binop with identity constant; NFC	Sanjay Patel	2018-07-03	1	-0/+74
\| \| \| \|	llvm-svn: 336214
*	[Reassociate] regenerate checks; NFC	Sanjay Patel	2018-07-03	1	-77/+115
\| \| \| \|	llvm-svn: 336211
*	[Reassociate] add test for missing FP constant analysis; NFC	Sanjay Patel	2018-07-03	1	-3/+19
\| \| \| \|	llvm-svn: 336208
*	[InstCombine] fold shuffle-with-binop and common value	Sanjay Patel	2018-07-03	1	-9/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the last significant change suggested in PR37806: https://bugs.llvm.org/show_bug.cgi?id=37806#c5 ...though there are several follow-ups noted in the code comments in this patch to complete this transform. It's possible that a binop feeding a select-shuffle has been eliminated by earlier transforms (or the code was just written like this in the 1st place), so we'll fail to match the patterns that have 2 binops from: D48401, D48678, D48662, D48485. In that case, we can try to materialize identity constants for the remaining binop to fill in the "ghost" lanes of the vector (where we just want to pass through the original values of the source operand). I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups. For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor). Differential Revision: https://reviews.llvm.org/D48830 llvm-svn: 336196
*	[DebugInfo] Corrections for salvageDebugInfo	Bjorn Pettersson	2018-07-03	2	-2/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When salvaging a dbg.declare/dbg.addr we should not add DW_OP_stack_value to the DIExpression (see test/Transforms/InstCombine/salvage-dbg-declare.ll). Consider this example %vla = alloca i32, i64 2 call void @llvm.dbg.declare(metadata i32* %vla, metadata !1, metadata !DIExpression()) Instcombine will turn it into %vla1 = alloca [2 x i32] %vla1.sub = getelementptr inbounds [2 x i32], [2 x i32]* %vla, i64 0, i64 0 call void @llvm.dbg.declare(metadata [2 x i32]* %vla1.sub, metadata !19, metadata !DIExpression()) If the GEP can be eliminated, then the dbg.declare will be salvaged and we should get %vla1 = alloca [2 x i32] call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression()) The problem was that salvageDebugInfo did not recognize dbg.declare as being indirect (%vla1 points to the value, it does not hold the value), so we incorrectly got call void @llvm.dbg.declare(metadata [2 x i32]* %vla1, metadata !19, metadata !DIExpression(DW_OP_stack_value)) I also made sure that llvm::salvageDebugInfo and DIExpression::prependOpcodes do not add DW_OP_stack_value to the DIExpression in case no new operands are added to the DIExpression. That way we avoid to, unneccessarily, turn a register location expression into an implicit location expression in some situations (see test11 in test/Transforms/LICM/sinking.ll). Reviewers: aprantl, vsk Reviewed By: aprantl, vsk Subscribers: JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D48837 llvm-svn: 336191
*	[PM/LoopUnswitch] Fix PR37651 by correctly invalidating SCEV when	Chandler Carruth	2018-07-03	1	-0/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unswitching loops. Original patch trying to address this was sent in D47624, but that didn't quite handle things correctly. There are two key principles used to select whether and how to invalidate SCEV-cached information about loops: 1) We must invalidate any info SCEV has cached before unswitching as we may change (or destroy) the loop structure by the act of unswitching, and make it hard to recover everything we want to invalidate within SCEV. 2) We need to invalidate all of the loops whose CFGs are mutated by the unswitching. Notably, this isn't the entire loop nest, this is every loop contained by the outermost loop reached by an exit block relevant to the unswitch. And we need to do this even when doing trivial unswitching. I've added more focused tests that directly check that SCEV starts off with imprecise information and after unswitching (and simplifying instructions) re-querying SCEV will produce precise information. These tests also specifically work to check that an outer loop's information becomes precise. However, the testing here is still a bit imperfect. Crafting test cases that reliably fail to be analyzed by SCEV before unswitching and succeed afterward proved ... very, very hard. It took me several hours and careful work to build these, and I'm not optimistic about necessarily coming up with more to cover more elaborate possibilities. Fortunately, the code pattern we are testing here in the pass is really straightforward and reliable. Thanks to Max Kazantsev for the initial work on this as well as the review, and to Hal Finkel for helping me talk through approaches to test this stuff even if it didn't come to much. Differential Revision: https://reviews.llvm.org/D47624 llvm-svn: 336183
*	[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done	Max Kazantsev	2018-07-03	4	-33/+21
\| \| \| \| \| \| \| \| \| \| \| \|	This patch changes order of transform in InstCombineCompares to avoid performing transforms based on ranges which produce complex bit arithmetics before more simple things (like folding with constants) are done. See PR37636 for the motivating example. Differential Revision: https://reviews.llvm.org/D48584 Reviewed By: spatel, lebedev.ri llvm-svn: 336172
*	[SCEV] Strengthen StrengthenNoWrapFlags (reapply r334428).	Tim Shen	2018-07-02	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Comment on Transforms/LoopVersioning/incorrect-phi.ll: With the change SCEV is able to prove that the loop doesn't wrap-self (due to zext i16 to i64), disabling the entire loop versioning pass. Removed the zext and just use i64. Reviewers: sanjoy Subscribers: jlebar, hiraditya, javed.absar, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D48409 llvm-svn: 336140
*	[SLP] Recognize min/max pattern using instructions producing same values.	Farhana Aleen	2018-07-02	1	-65/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization. %1 = extractelement <2 x i32> %a, i32 0 %2 = extractelement <2 x i32> %a, i32 1 %cond = icmp sgt i32 %1, %2 %3 = extractelement <2 x i32> %a, i32 0 %4 = extractelement <2 x i32> %a, i32 1 %select = select i1 %cond, i32 %3, i32 %4 Author: FarhanaAleen Reviewed By: ABataev, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D47608 llvm-svn: 336130
*	[InstCombine] reverse canonicalization of add --> or to allow more shuffle ↵	Sanjay Patel	2018-07-02	1	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	folding This extends D48485 to allow another pair of binops (add/or) to be combined either with or without a leading shuffle: or X, C --> add X, C (when X and C have no common bits set) Here, we need value tracking to determine that the 'or' can be reversed into an 'add', and we've added general infrastructure to allow extending to other opcodes or moving to where other passes could use that functionality. Differential Revision: https://reviews.llvm.org/D48662 llvm-svn: 336128
*	[SLPVectorizer][X86] Begin adding alternate tests for call operators	Simon Pilgrim	2018-07-02	1	-0/+65
\| \| \| \| \| \|	Alternate opcode handling only supports binary operators, these tests demonstrate a missed opportunity to vectorize ceil/floor calls llvm-svn: 336125
*	[ValueTracking] allow undef elements when matching vector abs	Sanjay Patel	2018-07-02	1	-4/+4
\| \| \| \|	llvm-svn: 336111
*	[InstCombine] adjust shuffle tests with IR flags; NFC	Sanjay Patel	2018-07-02	1	-4/+3
\| \| \| \| \| \| \| \| \|	Due to current limitations in constant analysis, we need flags on add or mul to show propagation for the potential transform suggested in these tests (no other binops currently report identity constants). llvm-svn: 336101
*	Recommit r328307: [IPSCCP] Use constant range information for comparisons of ↵	Florian Hahn	2018-07-02	1	-13/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	parameters. This version contains a fix to add values for which the state in ParamState change to the worklist if the state in ValueState did not change. To avoid adding the same value multiple times, mergeInValue returns true, if it added the value to the worklist. The value is added to the worklist depending on its state in ValueState. Original message: For comparisons with parameters, we can use the ParamState lattice elements which also provide constant range information. This improves the code for PR33253 further and gets us closer to use ValueLatticeElement for all values. Also, as we are using the range information in the solver directly, we do not need tryToReplaceWithConstantRange afterwards anymore. Reviewers: dberlin, mssimpso, davide, efriedma Reviewed By: mssimpso Differential Revision: https://reviews.llvm.org/D43762 llvm-svn: 336098
*	[InstCombine] add tests for shuffle-binop; NFC	Sanjay Patel	2018-07-02	1	-37/+256
\| \| \| \| \| \|	This is another pattern mentioned in PR37806. llvm-svn: 336096
*	[SLPVectorizer] Fix alternate opcode + shuffle cost function to correct ↵	Simon Pilgrim	2018-07-02	1	-3/+24
\| \| \| \| \| \| \| \| \| \|	handle SK_Select patterns. We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case. This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now... llvm-svn: 336095
*	[NFC] Test that shows unprofitability of instcombine with bit ranges	Max Kazantsev	2018-07-02	1	-0/+32
\| \| \| \|	llvm-svn: 336078
*	Implement strip.invariant.group	Piotr Padlewski	2018-07-02	7	-18/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch introduce new intrinsic - strip.invariant.group that was described in the RFC: Devirtualization v2 Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits Differential Revision: https://reviews.llvm.org/D47103 Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com> llvm-svn: 336073
*	[InstCombine] add abs tests with undef elts; NFC	Sanjay Patel	2018-07-01	1	-0/+30
\| \| \| \|	llvm-svn: 336065
*	[PatternMatch] allow undef elements in vectors with m_Neg	Sanjay Patel	2018-07-01	1	-5/+3
\| \| \| \| \| \|	This is similar to the m_Not change from D44076. llvm-svn: 336064
*	[UnrollAndJam] New Unroll and Jam pass	David Green	2018-07-01	5	-0/+2482
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062
*	[SLPVectorizer][X86] Add some alternate tests for cast operators	Simon Pilgrim	2018-07-01	1	-0/+169
\| \| \| \| \| \|	Alternate opcode handling only supports binary operators, these tests demonstrate missed opportunities to vectorize some sitofp/uitofp and fptosi/fptoui style casts as well as some (successful) float bits manipulations llvm-svn: 336060
*	[Evaluator] Improve evaluation of call instruction	Eugene Leviant	2018-07-01	3	-0/+239
\| \| \| \| \| \|	Recommit of r335324 after buildbot failure fix llvm-svn: 336059
*	[InstCombine] add tests for negate vector with undef elts; NFC	Sanjay Patel	2018-06-30	1	-3/+27
\| \| \| \|	llvm-svn: 336050
*	[InstCombine] add more tests for shuffle-binop folds; NFC	Sanjay Patel	2018-06-29	1	-1/+73
\| \| \| \| \| \| \|	The mul+shl tests add coverage for the fold enabled with D48678. The and+or tests are not handled yet; that's D48662. llvm-svn: 335984
*	[InstCombine] enhance shuffle-of-binops to allow different variable ops ↵	Sanjay Patel	2018-06-29	1	-35/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(PR37806) This was discussed in D48401 as another improvement for: https://bugs.llvm.org/show_bug.cgi?id=37806 If we have 2 different variable values, then we shuffle (select) those lanes, shuffle (select) the constants, and then perform the binop. This eliminates a binop. The new shuffle uses the same shuffle mask as the existing shuffle, so there's no danger of creating a difficult shuffle. All of the earlier constraints still apply, but we also check for extra uses to avoid creating more instructions than we'll remove. Additionally, we're disallowing the fold for div/rem because that could expose a UB hole. Differential Revision: https://reviews.llvm.org/D48678 llvm-svn: 335974
*	SCEVExpander::expandAddRecExprLiterally(): check before casting as Instruction	Roman Lebedev	2018-06-29	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: An alternative to D48597. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=37936 \| PR37936 ]]. The problem is as follows: 1. `indvars` marks `%dec` as `NUW`. 2. `loop-instsimplify` runs `instsimplify`, which constant-folds `%dec` to -1 (D47908) 3. `loop-reduce` tries to do some further modification, but crashes with an type assertion in cast, because `%dec` is no longer an `Instruction`, If the runline is split into two, i.e. you first run `-indvars -loop-instsimplify`, store that into a file, and then run `-loop-reduce`, there is no crash. So it looks like the problem is due to `-loop-instsimplify` not discarding SCEV. But in this case we can just not crash if it's not an `Instruction`. This is just a local fix, unlike D48597, so there may very well be other problems. Reviewers: mkazantsev, uabelho, sanjoy, silviu.baranga, wmi Reviewed By: mkazantsev Subscribers: evstupac, javed.absar, spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D48599 llvm-svn: 335950