summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Analysis
Commit message (Collapse)AuthorAgeFilesLines
* [SCEV] Always sort AddRecExprs from different loops by dominanceMax Kazantsev2017-05-171-7/+7
| | | | | | | | | | | | | | Sorting of AddRecExprs by loop nesting does not make sense since we only invoke the CompareSCEVComplexity for AddRecExprs that are used by one SCEV. This guarantees that there is always a dominance relationship between them. This patch removes the sorting by nesting which is a dead code in current usage of this function. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D33228 llvm-svn: 303235
* [SCEV][NFC] Replace redundant dyn_cast with cast in getAddExprMax Kazantsev2017-05-171-14/+15
| | | | | | | | Replace dyn_cast which is ensured by isa just one line above with cast. Differential Revision: https://reviews.llvm.org/D33231 llvm-svn: 303234
* BitVector: add iterators for set bitsFrancis Visoiu Mistrih2017-05-171-17/+16
| | | | | | Differential revision: https://reviews.llvm.org/D32060 llvm-svn: 303227
* [InstSimplify] add folds for constant mask of value shifted by constantSanjay Patel2017-05-161-0/+18
| | | | | | | | | | | | | | | | | We would eventually catch these via demanded bits and computing known bits in InstCombine, but I think it's better to handle the simple cases as soon as possible as a matter of efficiency. This fold allows further simplifications based on distributed ops transforms. eg: %a = lshr i8 %x, 7 %b = or i8 %a, 2 %c = and i8 %b, 1 InstSimplify can directly fold this now: %a = lshr i8 %x, 7 Differential Revision: https://reviews.llvm.org/D33221 llvm-svn: 303213
* [Inliner] Do not mix callsite and callee hotness based updates.Easwaran Raman2017-05-161-15/+27
| | | | | | | | | | Update threshold based on callee's hotness only when BFI is not available. Otherwise use only callsite's hotness. This makes it easier to reason about hotness related threshold updates. Differential revision: https://reviews.llvm.org/D33157 llvm-svn: 303210
* Add hasProfileSummary and has{Sample|Instrumentation}Profile methodsEaswaran Raman2017-05-161-1/+1
| | | | | | | | ProfileSummaryInfo already checks whether the module has sample profile in determining profile counts. This will also be useful in inliner to clean up threshold updates. llvm-svn: 303204
* [SCEV] Fix sorting order for AddRecExprsMax Kazantsev2017-05-161-16/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing sorting order in defined CompareSCEVComplexity sorts AddRecExprs by loop depth, but does not pay attention to dominance of loops. This can lead us to the following buggy situation: for (...) { // loop1 op1 = {A,+,B} } for (...) { // loop2 op2 = {A,+,B} S = add op1, op2 } In this case there is no guarantee that in operand list of S the op2 comes before op1 (loop depth is the same, so they will be sorted just lexicographically), so we can incorrectly treat S as a recurrence of loop1, which is wrong. This patch changes the sorting logic so that it places the dominated recs before the dominating recs. This ensures that when we pick the first recurrency in the operands order, it will be the bottom-most in terms of domination tree. The attached test set includes some tests that produce incorrect SCEV estimations and crashes with oldlogic. Reviewers: sanjoy, reames, apilipenko, anna Reviewed By: sanjoy Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33121 llvm-svn: 303148
* IR: Give function GlobalValue::getRealLinkageName() a less misleading name: ↵Peter Collingbourne2017-05-162-2/+2
| | | | | | | | | | | | dropLLVMManglingEscape(). This function gives the wrong answer on some non-ELF platforms in some cases. The function that does the right thing lives in Mangler.h. To try to discourage people from using this function, give it a different name. Differential Revision: https://reviews.llvm.org/D33162 llvm-svn: 303134
* [SLP] Enable 64-bit wide vectorization on AArch64Adam Nemet2017-05-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. *** Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. ** SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116
* [InstSimplify] restrict icmp fold with 2 sdiv exact operands (PR32949)Sanjay Patel2017-05-151-2/+11
| | | | | | | | | | | | | | | | | | | | These folds were introduced with https://reviews.llvm.org/rL127064 as part of solving: https://bugs.llvm.org/show_bug.cgi?id=9343 As shown here: http://rise4fun.com/Alive/C8 ...however, the sdiv exact case needs a stronger predicate. I opted for duplicated code instead of adding another fallthrough because I think that's easier to read (and edit in case we need/want to restrict/loosen the predicates any more). This should fix: https://bugs.llvm.org/show_bug.cgi?id=32949 https://bugs.llvm.org/show_bug.cgi?id=32948 Differential Revision: https://reviews.llvm.org/D32954 llvm-svn: 303104
* [SCEV] Use copy initialization of APInts instead of direct initialization.Craig Topper2017-05-151-6/+6
| | | | | | This is based on post commit feed back from r302769. llvm-svn: 303092
* [ValueTracking] Replace all uses of ComputeSignBit with computeKnownBits.Craig Topper2017-05-154-71/+54
| | | | | | | | This patch finishes off the conversion of ComputeSignBit to computeKnownBits. Differential Revision: https://reviews.llvm.org/D33166 llvm-svn: 303035
* Move some code into ScalarEvolution.cpp; NFCSanjoy Das2017-05-151-0/+24
| | | | | | | I need to add some asserts to these constructors that are easier to add once they're in the .cpp file. llvm-svn: 303032
* [InstCombine] Merge duplicate functionality between InstCombine and ↵Craig Topper2017-05-151-5/+66
| | | | | | | | | | | | | | | | | | | | | | | ValueTracking Summary: Merge overflow computation for signed add, appearing both in InstCombine and ValueTracking. As part of the merge, cleanup the interface for overflow checks in InstCombine. Patch by Yoav Ben-Shalom. Reviewers: craig.topper, majnemer Reviewed By: craig.topper Subscribers: takuto.ikuta, llvm-commits Differential Revision: https://reviews.llvm.org/D32946 llvm-svn: 303029
* [InstSimplify] Add patterns for folding (A & B) | (~A ^ B) -> (~A ^ B) and ↵Craig Topper2017-05-141-0/+18
| | | | | | | | its commuted variants. We already had (A & ~B) | (A ^ B), but we missed the cases where the not was part of the xor. llvm-svn: 303004
* [BasicAA] Alphabetize includes. NFCCraig Topper2017-05-141-1/+1
| | | | llvm-svn: 303002
* [ValueTracking] Remove const_casts on several calls to computeKnownBits and ↵Craig Topper2017-05-132-6/+3
| | | | | | ComputeSignBit. NFC llvm-svn: 302991
* [TLI] Add mapping for various '__<func>_finite' forms of the math routines ↵Andrew Kaylor2017-05-121-0/+24
| | | | | | | | | | to SVML routines Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31789 llvm-svn: 302957
* [ConstantFolding] Add folding for various math '__<func>_finite' routines ↵Andrew Kaylor2017-05-121-11/+69
| | | | | | | | | | generated from -ffast-math Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31788 llvm-svn: 302956
* [TLI] Add declarations for various math header file routines from ↵Andrew Kaylor2017-05-121-0/+86
| | | | | | | | | | math-finite.h that create '__<func>_finite as functions Patch by Chris Chrulski Differential Revision: https://reviews.llvm.org/D31787 llvm-svn: 302955
* [KnownBits] Add bit counting methods to KnownBits struct and use them where ↵Craig Topper2017-05-124-43/+35
| | | | | | | | | | | | possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925
* [BPI] Ignore remainder while distributing the remaining probability from ↵Serguei Katkov2017-05-121-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | unreachanble This is a follow up patch for https://reviews.llvm.org/rL300440 to address a comment. To make implementation to be consistent with other cases we just ignore the remainder after distribution of remaining probability between reachable edges. If we reduced the probability of some edges coming to unreachable blocks we should distribute the remaining part across other edges coming to reachable blocks to satisfy the condition that sum of all probabilities should be equal to one. If this remaining part is not divided by number of "reachable" edges then we get this remainder. This remainder probability should be pretty small. Other cases just ignore if the sum of probabilities is not equal to one so we do the same. Reviewers: chandlerc, sanjoy, vsk, junbuml, reames Reviewed By: reames Subscribers: reames, llvm-commits Differential Revision: https://reviews.llvm.org/D32124 llvm-svn: 302883
* CallGraph: Remove almost-unused field 'Root'.Peter Collingbourne2017-05-111-29/+5
| | | | llvm-svn: 302852
* Restrict call metadata based hotness detection to Sample PGO modeTeresa Johnson2017-05-111-5/+8
| | | | | | | | | | | | | | | | | | | | | | | Summary: Don't use the metadata on call instructions for determining hotness unless we are in sample PGO mode, where it is needed because profile counts are not accurate. In instrumentation mode this is not necessary and does more harm than good when calls have VP metadata that hasn't been properly scaled after transformations or dropped after constant prop based devirtualization (both should be fixed, but we don't need to do this in the first place for instrumentation PGO). This required adjusting a number of tests to distinguish between sample and instrumentation PGO handling, and to add in profile summary metadata so that getProfileCount can get the summary. Reviewers: davidxl, danielcdh Subscribers: aemerson, rengolin, mehdi_amini, Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D32877 llvm-svn: 302844
* Decrease inlinecold-threshold to 45Easwaran Raman2017-05-111-1/+1
| | | | | | | | | | | | | | | | | | | | I ran the test-suite (including SPEC 2006) in PGO mode comparing cold thresholds of 225 and 45. Here are some stats on the text size: Out of 904 tests that ran, 197 see a change in text size. The average text size reduction (of all the 904 binaries) is 1.07%. Of the 197 binaries, 19 see a text size increase, as high as 18%, but most of them are small single source benchmarks. There are 3 multisource benchmarks with a >0.5% size increase (0.7, 1.3 and 2.1 are their % increases). On the other side of the spectrum, 31 benchmarks see >10% size reduction and 6 of them are MultiSource. I haven't run the test-suite with other values of inlinecold-threshold. Since we have a cold callsite threshold of 45, I picked this value. Differential revision: https://reviews.llvm.org/D33106 llvm-svn: 302829
* [SCEV] Reduce possible APInt allocations a bit.Craig Topper2017-05-111-7/+11
| | | | llvm-svn: 302769
* [SCEV] Remove unneeded 'using namespace APIntOps'.Craig Topper2017-05-111-37/+34
| | | | llvm-svn: 302768
* Ensure non-null ProfileSummaryInfo passed to ModuleSummaryIndex builderTeresa Johnson2017-05-101-0/+1
| | | | | | | | | | | | This fixes a ubsan bot failure after r302597, which made getProfileCount non-static, but ended up invoking it on a null ProfileSummaryInfo object in some cases from buildModuleSummaryIndex. Most testing passed because the non-static getProfileCount currently doesn't access any member variables, but I found this when testing a follow on patch (D32877) that adds a member variable access. llvm-svn: 302705
* Add a late IR expansion pass for the experimental reduction intrinsics.Amara Emerson2017-05-101-0/+3
| | | | | | | | | This pass uses a new target hook to decide whether or not to expand a particular intrinsic to the shuffevector sequence. Differential Revision: https://reviews.llvm.org/D32245 llvm-svn: 302631
* [ProfileSummary] Make getProfileCount a non-static member function.Easwaran Raman2017-05-091-1/+1
| | | | | | | | | | This change is required because the notion of count is different for sample profiling and getProfileCount will need to determine the underlying profile type. Differential revision: https://reviews.llvm.org/D33012 llvm-svn: 302597
* Introduce experimental generic intrinsics for horizontal vector reductions.Amara Emerson2017-05-092-0/+7
| | | | | | | | | | | | | | - This change allows targets to opt-in to using them instead of the log2 shufflevector algorithm. - The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle. - For CodeGen, basic legalization support is added. Differential Revision: https://reviews.llvm.org/D30086 llvm-svn: 302514
* [SCEV] Don't use std::move on both inputs to APInt::operator+ or operator-. ↵Craig Topper2017-05-081-4/+4
| | | | | | It might be confusing to the reader. NFC llvm-svn: 302448
* [ValueTracking] Use KnownOnes to provide a better bound on known zeros for ↵Craig Topper2017-05-081-3/+16
| | | | | | | | | | ctlz/cttz intrinics This patch uses KnownOnes of the input of ctlz/cttz to bound the value that can be returned from these intrinsics. This makes these intrinsics more similar to the handling for ctpop which already uses known bits to produce a similar bound. Differential Revision: https://reviews.llvm.org/D32521 llvm-svn: 302444
* [InstSimplify] fix typo; NFCSanjay Patel2017-05-081-1/+1
| | | | llvm-svn: 302439
* [ValueTracking] Introduce a version of computeKnownBits that returns a ↵Craig Topper2017-05-081-67/+52
| | | | | | | | | | | | | | | | KnownBits struct. Begin using it to replace internal usages of ComputeSignBit This introduces a new interface for computeKnownBits that returns the KnownBits object instead of requiring it to be pre-constructed and passed in by reference. This is a much more convenient interface as it doesn't require the caller to figure out the BitWidth to pre-construct the object. It's so convenient that I believe we can use this interface to remove the special ComputeSignBit flavor of computeKnownBits. As a step towards that idea, this patch replaces all of the internal usages of ComputeSignBit with this new interface. As you can see from the patch there were a couple places where we called ComputeSignBit which really called computeKnownBits, and then called computeKnownBits again directly. I've reduced those places to only making one call to computeKnownBits. I bet there are probably external users that do it too. A future patch will update the external users and remove the ComputeSignBit interface. I'll also working on moving more locations to the KnownBits returning interface for computeKnownBits. Differential Revision: https://reviews.llvm.org/D32848 llvm-svn: 302437
* [InstCombine/InstSimplify] add comments about code duplication; NFCSanjay Patel2017-05-081-0/+3
| | | | llvm-svn: 302436
* InstructionSimplify: Refactor foldIdentityShuffles. NFC.Zvi Rackover2017-05-081-13/+7
| | | | | | | | | | | | | | | | | | | Summary: Minor refactoring of foldIdentityShuffles() which allows the removal of a ConstantDataVector::get() in SimplifyShuffleVectorInstruction. Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32955 Conflicts: lib/Analysis/InstructionSimplify.cpp llvm-svn: 302433
* IR: Add a shufflevector mask commutation helper function. NFC.Zvi Rackover2017-05-081-7/+1
| | | | | | | | | | | | | | | | Summary: Following up on Sanjay's suggetion in D32955, move this functionality into ShuffleVectornstruction. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32956 llvm-svn: 302420
* [SCEV] Use APInt::operator*=(uint64_t) to avoid a temporary APInt for a ↵Craig Topper2017-05-081-2/+1
| | | | | | constant. llvm-svn: 302404
* [SCEV] Have getRangeForAffineARHelper take StartRange by const reference to ↵Craig Topper2017-05-081-1/+1
| | | | | | avoid a copy in many of the cases. llvm-svn: 302398
* InstructionSimplify: Relanding r301766Zvi Rackover2017-05-071-20/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Re-applying r301766 with a fix to a typo and a regression test. The log message for r301766 was: ================================================================================== InstructionSimplify: Canonicalize shuffle operands. NFC-ish. Summary: Apply canonicalization rules: 1. Input vectors with no elements selected from can be replaced with undef. 2. If only one input vector is constant it shall be the second one. This allows constant-folding to cover more ad-hoc simplifications that were in place and avoid duplication for RHS and LHS checks. There are more rules we may want to add in the future when we see a justification. e.g. mask elements that select undef elements can be replaced with undef. ================================================================================== Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32863 llvm-svn: 302373
* [SCEV] Use move semantics in ScalarEvolution::setRangeCraig Topper2017-05-071-3/+3
| | | | | | | | | | | | | | Summary: This makes setRange take ConstantRange by rvalue reference since most callers were passing an unnamed temporary ConstantRange. We can then move that ConstantRange into the DenseMap caches. For the callers that weren't passing a temporary, I've added std::move to to the local variable being passed. Reviewers: sanjoy, mzolotukhin, efriedma Reviewed By: sanjoy Subscribers: takuto.ikuta, llvm-commits Differential Revision: https://reviews.llvm.org/D32943 llvm-svn: 302371
* [InstSimplify] use ConstantRange to simplify or-of-icmpsSanjay Patel2017-05-071-19/+47
| | | | | | | | | | | | | We can simplify (or (icmp X, C1), (icmp X, C2)) to 'true' or one of the icmps in many cases. I had to check some of these with Alive to prove to myself it's right, but everything seems to check out. Eg, the deleted code in instcombine was completely ignoring predicates with mismatched signedness. This is a follow-up to: https://reviews.llvm.org/rL301260 https://reviews.llvm.org/D32143 llvm-svn: 302370
* Remove unnecessary const_castSanjoy Das2017-05-071-7/+4
| | | | llvm-svn: 302368
* Use array_pod_sort instead of std::sortSanjoy Das2017-05-071-1/+1
| | | | llvm-svn: 302367
* [SCEV] Remove extra APInt copies from getRangeForAffineARHelper.Craig Topper2017-05-061-13/+10
| | | | | | This changes one parameter to be a const APInt& since we only read from it. Use std::move on local APInts once they are no longer needed so we can reuse their allocations. Lastly, use operator+=(uint64_t) instead of adding 1 to an APInt twice creating a new APInt each time. llvm-svn: 302335
* [SCEV] Use std::move to avoid some APInt copies.Craig Topper2017-05-061-5/+5
| | | | llvm-svn: 302334
* [SCEV] Use APInt's uint64_t operations instead of creating a temporary APInt ↵Craig Topper2017-05-061-3/+2
| | | | | | to hold 1. llvm-svn: 302333
* [SCEV] Avoid a couple APInt copies by capturing by reference since the ↵Craig Topper2017-05-061-2/+2
| | | | | | method returns a reference. llvm-svn: 302332
* [LazyValueInfo] Avoid unnecessary copies of ConstantRangesCraig Topper2017-05-061-7/+7
| | | | | | | | | | | | | | | | | | | | | Summary: ConstantRange contains two APInts which can allocate memory if their width is larger than 64-bits. So we shouldn't copy it when we can avoid it. This changes LVILatticeVal::getConstantRange() to return its internal ConstantRange by reference. This allows many places that just need a ConstantRange reference to avoid making a copy. Several places now capture the return value of getConstantRange() by reference so they can call methods on it that don't need a new object. Lastly it adds std::move in one place to capture to move a local ConstantRange into an LVILatticeVal. Reviewers: reames, dberlin, sanjoy, anna Reviewed By: reames Subscribers: grandinj, llvm-commits Differential Revision: https://reviews.llvm.org/D32884 llvm-svn: 302331
OpenPOWER on IntegriCloud