summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* [Statistics] Add a method to atomically update a statistic that contains a ↵Craig Topper2017-05-181-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | maximum Summary: There are several places in the codebase that try to calculate a maximum value in a Statistic object. We currently do this in one of two ways: MaxNumFoo = std::max(MaxNumFoo, NumFoo); or MaxNumFoo = (MaxNumFoo > NumFoo) ? MaxNumFoo : NumFoo; The first version reads from MaxNumFoo one time and uncontionally rwrites to it. The second version possibly reads it twice depending on the result of the first compare. But we have no way of knowing if the value was changed by another thread between the reads and the writes. This patch adds a method to the Statistic object that can ensure that we only store if our value is the max and the previous max didn't change after we read it. If it changed we'll recheck if our value should still be the max or not and try again. This spawned from an audit I'm trying to do of all places we uses the implicit conversion to unsigned on the Statistics objects. See my previous thread on llvm-dev https://groups.google.com/forum/#!topic/llvm-dev/yfvxiorKrDQ Reviewers: dberlin, chandlerc, hfinkel, dblaikie Reviewed By: chandlerc Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D33301 llvm-svn: 303318
* [Statistics] Use Statistic::operator+= instead of adding and assigning ↵Craig Topper2017-05-172-2/+2
| | | | | | | | separately. I believe this technically fixes a multithreaded race condition in this code. But my primary concern was as part of looking at removing the ability to treat Statistics like a plain unsigned. There are many weird operations on Statistics in the codebase. llvm-svn: 303314
* [InstCombine] handle icmp i1 X, C early to avoid creating an unknown patternSanjay Patel2017-05-171-0/+23
| | | | | | | | | | | The missing optimization for xor-of-icmps still needs to be added, but by being more efficient (not generating unnecessary logic ops with constants) we avoid the bug. See discussion in post-commit comments: https://reviews.llvm.org/D32143 llvm-svn: 303312
* [InstCombine] move icmp bool canonicalizations to helper; NFCSanjay Patel2017-05-171-43/+54
| | | | | | | | As noted in the post-commit comments in D32143, we should be catching the constant operand cases sooner to be more efficient and less likely to expose a missing fold. llvm-svn: 303309
* [InstCombine] add isCanonicalPredicate() helper function and use it; NFCISanjay Patel2017-05-172-31/+32
| | | | | | | | | | | | | | | | | | | | There should be a slight efficiency improvement from handling icmp/fcmp with one matcher and reducing duplicated code. The larger motivation is that there are questions about how predicate canonicalization is handled, and the refactoring should make it easier if we want to change any of that behavior. 1. As noted in the code comment, we've chosen 3 of the 16 FCMP preds as not canonical. Why those 3? It goes back to rL32751 from what I can tell, but I'm not sure if there's a justification for that rule. 2. We currently do not canonicalize integer select conditions. Should we use the same rule that applies to branches for selects? 3. We currently do canonicalize some FP select conditions, and those rules would conflict with the rule shown here. Should one or both be changed? No-functional-change-intended, but adding tests anyway because there's no coverage for most of the predicates. Differential Revision: https://reviews.llvm.org/D33247 llvm-svn: 303261
* [coroutines] Handle spills before catchswitchGor Nishanov2017-05-171-2/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | If we need to spill the result of the PHI instruction, we insert the spill after all of the PHIs and EHPads, however, in a catchswitch block there is no room to insert the spill. Make room by splitting away catchswitch into a separate block. Before the fix: catch.dispatch: %val = phi i32 [ 1, %if.then ], [ 2, %if.else ] %switch = catchswitch within none [label %catch] unwind label %cleanuppad After: catch.dispatch: %val = phi i32 [ 1, %if.then ], [ 2, %if.else ] %tok = cleanuppad within none [] ; spill goes here cleanupret from %tok unwind label %catch.dispatch.switch catch.dispatch.switch: %switch = catchswitch within none [label %catch] unwind label %cleanuppad https://reviews.llvm.org/D31846 llvm-svn: 303232
* BitVector: add iterators for set bitsFrancis Visoiu Mistrih2017-05-172-4/+2
| | | | | | Differential revision: https://reviews.llvm.org/D32060 llvm-svn: 303227
* [ADT] Fix some Clang-tidy modernize-use-using warnings; other minor fixes (NFC).Eugene Zelenko2017-05-161-9/+28
| | | | llvm-svn: 303221
* [IR] Prefer use_empty() to !hasNUsesOrMore(1) for clarity.Davide Italiano2017-05-162-2/+2
| | | | llvm-svn: 303218
* The patch exclude a case from zero check skip inEvgeny Stupachenko2017-05-161-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CTLZ idiom recognition (r303102). Summary: The following case: i = 1; if(n) while (n >>= 1) i++; use(i); Was converted to: i = 1; if(n) i += builtin_ctlz(n >> 1, false); use(i); Which is not correct. The patch make it: i = 1; if(n) i += builtin_ctlz(n >> 1, true); use(i); From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 303212
* In debug builds non-trivial amount of time is spent in InstCombine processingDmitry Mikulin2017-05-161-1/+4
| | | | | | @llvm.dbg.* calls in visitCallInst(). They can be safely ignored. llvm-svn: 303202
* NewGVN: Only do something in verifyStoreExpressions if assertions are ↵Daniel Berlin2017-05-161-0/+2
| | | | | | enabled, to avoid unused code warnings. llvm-svn: 303201
* NewGVN: Fix PR 33051 by making sure we remove old store expressionsDaniel Berlin2017-05-161-27/+36
| | | | | | from the ExpressionToClass mapping. llvm-svn: 303200
* Revert 303174, 303176, and 303178Matthew Simpson2017-05-161-2/+2
| | | | | | These commits are breaking the bots. Reverting to investigate. llvm-svn: 303182
* [LV] Avoid potentential division by zero when selecting ICMatthew Simpson2017-05-161-2/+2
| | | | llvm-svn: 303174
* [coroutines] Handle unwind edge splittingGor Nishanov2017-05-161-4/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: RewritePHIs algorithm used in building of CoroFrame inserts a placeholder ``` %placeholder = phi [%val] ``` on every edge leading to a block starting with PHI node with multiple incoming edges, so that if one of the incoming values was spilled and need to be reloaded, we have a place to insert a reload. We use SplitEdge helper function to split the incoming edge. SplitEdge function does not deal with unwind edges comping into a block with an EHPad. This patch adds an ehAwareSplitEdge function that can correctly split the unwind edge. For landing pads, we clone the landing pad into every edge block and replace the original landing pad with a PHI collection the values from all incoming landing pads. For WinEH pads, we keep the original EHPad in place and insert cleanuppad/cleapret in the edge blocks. Reviewers: majnemer, rnk Reviewed By: majnemer Subscribers: EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D31845 llvm-svn: 303172
* [CorrelatedValuePropagation] Don't use -> to call a static method of ↵Craig Topper2017-05-161-6/+4
| | | | | | ConstantRange. NFC llvm-svn: 303147
* NewGVN: Use StoreExpression StoredValue instead of looking it up again, ↵Daniel Berlin2017-05-161-6/+5
| | | | | | since it was already looked up when it was created llvm-svn: 303144
* NewGVN: Formatting fixesDaniel Berlin2017-05-161-2/+3
| | | | llvm-svn: 303143
* Revert "[NewGVN] Replace predicate info leftovers."Davide Italiano2017-05-161-4/+0
| | | | | | It's breaking the bots. llvm-svn: 303142
* [NewGVN] Replace predicate info leftovers.Davide Italiano2017-05-161-0/+4
| | | | | | | | Fixes PR32945. Differential Revision: https://reviews.llvm.org/D33226 llvm-svn: 303141
* IR: Give function GlobalValue::getRealLinkageName() a less misleading name: ↵Peter Collingbourne2017-05-161-1/+1
| | | | | | | | | | | | dropLLVMManglingEscape(). This function gives the wrong answer on some non-ELF platforms in some cases. The function that does the right thing lives in Mangler.h. To try to discourage people from using this function, give it a different name. Differential Revision: https://reviews.llvm.org/D33162 llvm-svn: 303134
* Fix memory leakXinliang David Li2017-05-151-0/+4
| | | | llvm-svn: 303126
* PR32288: Describe a bool parameter's DWARF location with a simple registerDavid Blaikie2017-05-151-28/+23
| | | | | | | | | | | There's no need (& a bit incorrect) to mask off the high bits of the register reference when describing a simple bool value. Reviewers: aprantl Differential Revision: https://reviews.llvm.org/D31062 llvm-svn: 303117
* [SLP] Enable 64-bit wide vectorization on AArch64Adam Nemet2017-05-151-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. *** Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. ** SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116
* [asan] Better workaround for gold PR19002.Evgeniy Stepanov2017-05-151-2/+11
| | | | | | See the comment for more details. Test in a follow-up CFE commit. llvm-svn: 303113
* [NewGVN] Remove unused setDefiningExpr(). NFCI.Davide Italiano2017-05-151-1/+0
| | | | llvm-svn: 303107
* [InstCombine] restrict icmp fold with 2 sdiv exact operands (PR32949)Sanjay Patel2017-05-151-2/+9
| | | | | | | | | | | | | | | | This is the InstCombine counterpart to D32954. I added some comments about the code duplication in: rL302436 Alive-based verification: http://rise4fun.com/Alive/dPw This is a 2nd fix for the problem reported in: https://bugs.llvm.org/show_bug.cgi?id=32949 Differential Revision: https://reviews.llvm.org/D32970 llvm-svn: 303105
* The patch adds CTLZ idiom recognition.Evgeny Stupachenko2017-05-151-1/+294
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The following loops should be recognized: i = 0; while (n) { n = n >> 1; i++; body(); } use(i); And replaced with builtin_ctlz(n) if body() is empty or for CPUs that have CTLZ instruction converted to countable: for (j = 0; j < builtin_ctlz(n); j++) { n = n >> 1; i++; body(); } use(builtin_ctlz(n)); Reviewers: rengolin, joerg Differential Revision: http://reviews.llvm.org/D32605 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 303102
* [NewGVN] Fix verification of MemoryPhis in verifyMemoryCongruency().Davide Italiano2017-05-151-0/+13
| | | | | | | | | | | | verifyMemoryCongruency() filters out trivially dead MemoryDef(s), as we find them immediately dead, before moving from TOP to a new congruence class. This fixes the same problem for PHI(s) skipping MemoryPhis if all the operands are dead. Differential Revision: https://reviews.llvm.org/D33044 llvm-svn: 303100
* [InstCombine] use m_OneUse to reduce code; NFCISanjay Patel2017-05-151-6/+7
| | | | llvm-svn: 303090
* [ValueTracking] Replace all uses of ComputeSignBit with computeKnownBits.Craig Topper2017-05-157-30/+20
| | | | | | | | This patch finishes off the conversion of ComputeSignBit to computeKnownBits. Differential Revision: https://reviews.llvm.org/D33166 llvm-svn: 303035
* [InstCombine] Merge duplicate functionality between InstCombine and ↵Craig Topper2017-05-153-104/+26
| | | | | | | | | | | | | | | | | | | | | | | ValueTracking Summary: Merge overflow computation for signed add, appearing both in InstCombine and ValueTracking. As part of the merge, cleanup the interface for overflow checks in InstCombine. Patch by Yoav Ben-Shalom. Reviewers: craig.topper, majnemer Reviewed By: craig.topper Subscribers: takuto.ikuta, llvm-commits Differential Revision: https://reviews.llvm.org/D32946 llvm-svn: 303029
* [InstCombine] Remove 'return' of a called function that also returned void. NFCCraig Topper2017-05-151-3/+2
| | | | llvm-svn: 303028
* Fix test failure on windows -- do not return deleted funcXinliang David Li2017-05-141-2/+8
| | | | llvm-svn: 302999
* [LoopOptimizer][Fix]PR32859, PR24738Simon Pilgrim2017-05-131-7/+9
| | | | | | | | | | | | | | | | | | | The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form. Here it is: before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ] and after this change we have: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ] Committed on behalf of @dtemirbulatov Differential Revision: https://reviews.llvm.org/D33055 llvm-svn: 302988
* [InstCombine] Prevent InstCombine from triggering an extra iteration if ↵Craig Topper2017-05-131-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | something changed in the initial Worklist creation Summary: If the Worklist build causes an IR change this change flag currently factors into the flag for running another iteration of the iteration loop. But only changes during processing should trigger another loop. This patch captures the worklist creation change flag into the outside the loop flag currently used for DbgDeclares and only sends that flag up to the caller. Rerunning the loop only depends on IC.run() now. This uses the debug output of InstCombine to determine if one or two iterations run. I couldn't think of a better way to detect it since the second spurious iteration shoudn't make any visible changes. Just wasted computation. I can do a pre-commit of the test case with the CHECK-NOT as a CHECK if this is an ok way to check this. This is a subset of D31678 as I'm still not sure how to verify the analysis behavior for that. Reviewers: davide, majnemer, spatel, chandlerc Reviewed By: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32453 llvm-svn: 302982
* [PartialInlining] Profile based cost analysisXinliang David Li2017-05-121-45/+363
| | | | | | | | | | | | Implemented frequency based cost/saving analysis and related options. The pass is now in a state ready to be turne on in the pipeline (in follow up). Differential Revision: http://reviews.llvm.org/D32783 llvm-svn: 302967
* [KnownBits] Add bit counting methods to KnownBits struct and use them where ↵Craig Topper2017-05-126-11/+11
| | | | | | | | | | | | possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925
* [NewGVN] Improve debug output a bit. NFCI.Davide Italiano2017-05-121-1/+1
| | | | | | | While debugging a predicate info problem, I noticed this was missing a newline, making the debug output slightly less readable. llvm-svn: 302908
* [NewGVN] Format an assertion and fix a typo. NFCI.Davide Italiano2017-05-121-3/+2
| | | | llvm-svn: 302906
* [NewGVN] Don't incorrectly reset the memory leader.Davide Italiano2017-05-121-1/+1
| | | | | | | | | | This code was missing a check for stores, so we were thinking the congruency class didn't have any memory members, and reset the memory leader. Differential Revision: https://reviews.llvm.org/D33056 llvm-svn: 302905
* [PM/Unswitch] Teach the new simple loop unswitch to handle loopChandler Carruth2017-05-121-23/+138
| | | | | | | | | | | | | | | | | | | | | | | | invariant PHI inputs and to rewrite PHI nodes during the actual unswitching. The checking is quite easy, but rewriting the PHI nodes is somewhat surprisingly challenging. This should handle both branches and switches. I think this is now a full featured trivial unswitcher, and more full featured than the trivial cases in the old pass while still being (IMO) somewhat simpler in how it works. Next up is to verify its correctness in more widespread testing, and then to add non-trivial unswitching. Thanks to Davide and Sanjoy for the excellent review. There is one remaining question that I may address in a follow-up patch (see the review thread for details) but it isn't related to the functionality specifically. Differential Revision: https://reviews.llvm.org/D32699 llvm-svn: 302867
* [SLP] Emit optimization remarksAdam Nemet2017-05-111-6/+36
| | | | | | | | | | | | | | | | | | The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811
* [LV] Refactor ILV.vectorize{Loop}() by introducing LVP.executePlan(); NFCAyal Zaks2017-05-111-80/+101
| | | | | | | | | | | | | | Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from ILV to LVP. These changes facilitate building VPlans and using them to generate code, following https://reviews.llvm.org/D28975 and its tentative breakdown. Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to improve clarity; it's contents remain intact. Differential Revision: https://reviews.llvm.org/D32200 llvm-svn: 302790
* [msan] Fix PR32842Alexander Potapenko2017-05-111-2/+5
| | | | | | | | | | | | | | | | | | | | It turned out that MSan was incorrectly calculating the shadow for int comparisons: it was done by truncating the result of (Shadow1 OR Shadow2) to i1, effectively rendering all bits except LSB useless. This approach doesn't work e.g. in the case where the values being compared are even (i.e. have the LSB of the shadow equal to zero). Instead, if CreateShadowCast() has to cast a bigger int to i1, we replace the truncation with an ICMP to 0. This patch doesn't affect the code generated for SPEC 2006 binaries, i.e. there's no performance impact. For the test case reported in PR32842 MSan with the patch generates a slightly more efficient code: orq %rcx, %rax jne .LBB0_6 , instead of: orl %ecx, %eax testb $1, %al jne .LBB0_6 llvm-svn: 302787
* Remove spurious cast of nullptr. NFC.Serge Guelton2017-05-112-2/+2
| | | | | | Conversion rules allow automatic casting of nullptr to any pointer type. llvm-svn: 302780
* [InstCombine] remove fold that swaps xor/or with constants; NFCISanjay Patel2017-05-101-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | // (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2) This canonicalization was added at: https://reviews.llvm.org/rL7264 By moving xors out/down, we can more easily combine constants. I'm adding tests that do not change with this patch, so we can verify that those kinds of transforms are still happening. This is no-functional-change-intended because there's a later fold: // (X^C)|Y -> (X|Y)^C iff Y&C == 0 ...and demanded-bits appears to guarantee that any fold that would have hit the fold we're removing here would be caught by that 2nd fold. Similar reasoning was used in: https://reviews.llvm.org/rL299384 The larger motivation for removing this code is that it could interfere with the fix for PR32706: https://bugs.llvm.org/show_bug.cgi?id=32706 Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'. Differential Revision: https://reviews.llvm.org/D33050 llvm-svn: 302733
* [NewGVN] Introduce a definesNoMemory() helper and use it.Davide Italiano2017-05-101-3/+5
| | | | | | | This is nice as is, but it will be used in my next patch to fix a bug. Suggested by Daniel Berlin. llvm-svn: 302714
* Ensure non-null ProfileSummaryInfo passed to ModuleSummaryIndex builderTeresa Johnson2017-05-101-1/+3
| | | | | | | | | | | | This fixes a ubsan bot failure after r302597, which made getProfileCount non-static, but ended up invoking it on a null ProfileSummaryInfo object in some cases from buildModuleSummaryIndex. Most testing passed because the non-static getProfileCount currently doesn't access any member variables, but I found this when testing a follow on patch (D32877) that adds a member variable access. llvm-svn: 302705
OpenPOWER on IntegriCloud