summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Analysis
Commit message (Collapse)AuthorAgeFilesLines
...
* [SCEV] Reduce the scope of a struct; NFCSanjoy Das2016-09-271-22/+20
| | | | llvm-svn: 282513
* [SCEV] Remove custom RAII wrapper; NFCSanjoy Das2016-09-271-22/+5
| | | | | | Instead use the pre-existing `scope_exit` class. llvm-svn: 282512
* [SCEV] Make PendingLoopPredicates more frugal; NFCISanjoy Das2016-09-271-3/+4
| | | | | | | | | I don't expect `PendingLoopPredicates` to have very many elements (e.g. when -O3'ing the sqlite3 amalgamation, `PendingLoopPredicates` has at most 3 elements). So now we use a `SmallPtrSet` for it instead of the more heavyweight `DenseSet`. llvm-svn: 282511
* Revert "Output optimization remarks in YAML"Adam Nemet2016-09-271-80/+0
| | | | | | | | This reverts commit r282499. The GCC bots are failing llvm-svn: 282503
* Output optimization remarks in YAMLAdam Nemet2016-09-271-0/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows various presentation of this data using an external tool. This was first recommended here[1]. As an example, consider this module: 1 int foo(); 2 int bar(); 3 4 int baz() { 5 return foo() + bar(); 6 } The inliner generates these missed-optimization remarks today (the hotness information is pulled from PGO): remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30) remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30) Now with -pass-remarks-output=<yaml-file>, we generate this YAML file: --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 } Function: baz Hotness: 30 Args: - Callee: foo - String: will not be inlined into - Caller: baz ... --- !Missed Pass: inline Name: NotInlined DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 } Function: baz Hotness: 30 Args: - Callee: bar - String: will not be inlined into - Caller: baz ... This is a summary of the high-level decisions: * There is a new streaming interface to emit optimization remarks. E.g. for the inliner remark above: ORE.emit(DiagnosticInfoOptimizationRemarkMissed( DEBUG_TYPE, "NotInlined", &I) << NV("Callee", Callee) << " will not be inlined into " << NV("Caller", CS.getCaller()) << setIsVerbose()); NV stands for named value and allows the YAML client to process a remark using its name (NotInlined) and the named arguments (Callee and Caller) without parsing the text of the message. Subsequent patches will update ORE users to use the new streaming API. * I am using YAML I/O for writing the YAML file. YAML I/O requires you to specify reading and writing at once but reading is highly non-trivial for some of the more complex LLVM types. Since it's not clear that we (ever) want to use LLVM to parse this YAML file, the code supports and asserts that we're writing only. On the other hand, I did experiment that the class hierarchy starting at DiagnosticInfoOptimizationBase can be mapped back from YAML generated here (see D24479). * The YAML stream is stored in the LLVM context. * In the example, we can probably further specify the IR value used, i.e. print "Function" rather than "Value". * As before hotness is computed in the analysis pass instead of DiganosticInfo. This avoids the layering problem since BFI is in Analysis while DiagnosticInfo is in IR. [1] https://reviews.llvm.org/D19678#419445 Differential Revision: https://reviews.llvm.org/D24587 llvm-svn: 282499
* [thinlto] Basic thinlto fdo heuristicPiotr Padlewski2016-09-261-13/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch improves thinlto importer by importing 3x larger functions that are called from hot block. I compared performance with the trunk on spec, and there were about 2% on povray and 3.33% on milc. These results seems to be consistant and match the results Teresa got with her simple heuristic. Some benchmarks got slower but I think they are just noisy (mcf, xalancbmki, omnetpp)- running the benchmarks again with more iterations to confirm. Geomean of all benchmarks including the noisy ones were about +0.02%. I see much better improvement on google branch with Easwaran patch for pgo callsite inlining (the inliner actually inline those big functions) Over all I see +0.5% improvement, and I get +8.65% on povray. So I guess we will see much bigger change when Easwaran patch will land (it depends on new pass manager), but it is still worth putting this to trunk before it. Implementation details changes: - Removed CallsiteCount. - ProfileCount got replaced by Hotness - hot-import-multiplier is set to 3.0 for now, didn't have time to tune it up, but I see that we get most of the interesting functions with 3, so there is no much performance difference with higher, and binary size doesn't grow as much as with 10.0. Reviewers: eraman, mehdi_amini, tejohnson Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24638 llvm-svn: 282437
* [SCEV] Fix the order of members in the initializer list.Chandler Carruth2016-09-261-1/+1
| | | | | | | Noticed due to the warning on this line. Sanjoy is on a less-than-awesome internet connection, so committing on his behalf. llvm-svn: 282380
* [SCEV] Assign LoopPropertiesCache in the move constructorSanjoy Das2016-09-261-0/+1
| | | | | | | | | | | In a previous change I collapsed two different caches into one. When doing that I noticed that ScalarEvolution's move constructor was not moving those caches. To keep the previous change simple, I've moved that bugfix into this separate change. llvm-svn: 282376
* [SCEV] Combine two predicates into one; NFCSanjoy Das2016-09-261-31/+24
| | | | | | | | | Both `loopHasNoSideEffects` and `loopHasNoAbnormalExits` involve walking the loop and maintaining similar sorts of caches. This commit changes SCEV to compute both the predicates via a single walk, and maintain a single cache instead of two. llvm-svn: 282375
* [SCEV] Make it obvious BackedgeTakenInfo's constructor steals storageSanjoy Das2016-09-261-2/+4
| | | | | | | Specifically, it moves SCEVUnionPredicates from its input into its own storage. Make this obvious at the type level. llvm-svn: 282374
* [SCEV] Further isolate incidental data structure; NFCSanjoy Das2016-09-261-4/+7
| | | | llvm-svn: 282373
* [SCEV] Simplify BackedgeTakenInfo::getMax; NFCSanjoy Das2016-09-261-7/+7
| | | | llvm-svn: 282372
* [SCEV] Reserve space in SmallVector; NFCSanjoy Das2016-09-251-0/+1
| | | | llvm-svn: 282368
* [SCEV] Have ExitNotTakenInfo keep a pointer to its predicate; NFCSanjoy Das2016-09-251-11/+15
| | | | | | | SCEVUnionPredicate is a "heavyweight" structure, so it is beneficial to store the (optional) data out of line. llvm-svn: 282366
* [SCEV] Simplify tracking ExitNotTakenInfo instances; NFCSanjoy Das2016-09-251-72/+24
| | | | | | | | | | This change simplifies a data structure optimization in the `BackedgeTakenInfo` class for loops with exactly one computable exit. I've sanity checked that this does not regress compile time performance, using sqlite3's amalgamated build. llvm-svn: 282365
* [SCEV] Rename a couple of fields; NFCSanjoy Das2016-09-251-48/+55
| | | | llvm-svn: 282364
* [SCEV] Remove incidental data structure; NFCSanjoy Das2016-09-251-15/+19
| | | | llvm-svn: 282363
* Analysis: Return early for UndefValue in computeKnownBitsDuncan P. N. Exon Smith2016-09-241-0/+8
| | | | | | | | | There is no benefit in looking through assumptions on UndefValue to guess known bits. Return early to avoid walking their use-lists, and assert that all instances of ConstantData are handled here for similar reasons (UndefValue was the only integer/pointer holdout). llvm-svn: 282337
* Analysis: Return early in isKnownNonNullAt for ConstantDataDuncan P. N. Exon Smith2016-09-241-0/+4
| | | | | | | | | | | | | Check and return early for ConstantPointerNull and UndefValue specifically in isKnownNonNullAt, and assert that ConstantData never make it to isKnownNonNullFromDominatingCondition. This confirms that isKnownNonNullFromDominatingCondition never walks through the use-list of an instance of ConstantData. Given that such use-lists cross module boundaries, it never really made sense to do so, and was potentially very expensive. llvm-svn: 282333
* [TLI] isdigit / isascii / toascii param type should match return type (PR30484)Sanjay Patel2016-09-231-1/+4
| | | | | | We crash in LibCallSimplifier if we don't check the validity of the function signature properly. llvm-svn: 282278
* Enhance calcColdCallHeuristics for InvokeInstJun Bum Lim2016-09-231-0/+10
| | | | | | | | | | | | Summary: When identifying cold blocks, consider only the edge to the normal destination if the terminator is InvokeInst and let calcInvokeHeuristics() decide edge weights for the InvokeInst. Reviewers: mcrosier, hfinkel, davidxl Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D24868 llvm-svn: 282262
* [LV] When reporting about a specific instruction without debug location use ↵Adam Nemet2016-09-211-1/+4
| | | | | | | | loop's This can occur for example if some optimization drops the debug location. llvm-svn: 282048
* [InferAttributes] Don't access parameters that don't exist.Michael Kuperstein2016-09-201-2/+2
| | | | | | | Check for the correct number of parameters before querying their type. This fixes PR30455. llvm-svn: 282038
* move variables closer to their uses; add FIXMEs; NFCSanjay Patel2016-09-201-10/+10
| | | | llvm-svn: 281972
* [Loop Vectorizer] Consecutive memory access - fixed and simplifiedElena Demikhovsky2016-09-181-4/+4
| | | | | | | | | Amended consecutive memory access detection in Loop Vectorizer. Load/Store were not handled properly without preceding GEP instruction. Differential Revision: https://reviews.llvm.org/D20789 llvm-svn: 281853
* [InstCombine] allow vector types for constant folding / computeKnownBits ↵Sanjay Patel2016-09-161-3/+4
| | | | | | | | | | | | | | | | (PR24942) computeKnownBits() already works for integer vectors, so allow vector types when calling that from InstCombine. I don't think the change to use m_APInt in computeKnownBits is strictly necessary because we do check for ConstantVector later, but it's more efficient to handle the splat case without needing to loop on vector elements. This should work with InstSimplify, but doesn't yet, so I made that a FIXME comment on the test for PR24942: https://llvm.org/bugs/show_bug.cgi?id=24942 Differential Revision: https://reviews.llvm.org/D24677 llvm-svn: 281777
* Reapplying r278731 after fixing the problem that caused it to be reverted.David L Kreitzer2016-09-161-15/+100
| | | | | | | | | | Enhance SCEV to compute the trip count for some loops with unknown stride. Patch by Pankaj Chawla Differential Revision: https://reviews.llvm.org/D22377 llvm-svn: 281732
* [LCG] Redesign the lazy post-order iteration mechanism for theChandler Carruth2016-09-161-103/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LazyCallGraph to support repeated, stable iterations, even in the face of graph updates. This is particularly important to allow the CGSCC pass manager to walk the RefSCCs (and thus everything else) in a module more than once. Lots of unittests and other tests were hard or impossible to write because repeated CGSCC pass managers which didn't invalidate the LazyCallGraph would conclude the module was empty after the first one. =[ Really, really bad. The interesting thing is that in many ways this simplifies the code. We can now re-use the same code for handling reference edge insertion updates of the RefSCC graph as we use for handling call edge insertion updates of the SCC graph. Outside of adapting to the shared logic for this (which isn't trivial, but is *much* simpler than the DFS it replaces!), the new code involves putting newly created RefSCCs when deleting a reference edge into the cached list in the correct way, and to re-formulate the iterator to be stable and effective even in the face of these kinds of updates. I've updated the unittests for the LazyCallGraph to re-iterate the postorder sequence and verify that this all works. We even check for using alternating iterators to trigger the lazy formation of RefSCCs after mutation has occured. It's worth noting that there are a reasonable number of likely simplifications we can make past this. It isn't clear that we need to keep the "LeafRefSCCs" around any more. But I've not removed that mostly because I want this to be a more isolated change. Differential Revision: https://reviews.llvm.org/D24219 llvm-svn: 281716
* [PM] Port CFGViewer and CFGPrinter to the new Pass ManagerSriraman Tallam2016-09-152-50/+69
| | | | | | Differential Revision: https://reviews.llvm.org/D24592 llvm-svn: 281640
* Add some shortcuts in LazyValueInfo to reduce compile time of Correlated ↵Wei Mi2016-09-151-0/+29
| | | | | | | | | | | | | | | Value Propagation. The patch is to partially fix PR10584. Correlated Value Propagation queries LVI to check non-null for pointer params of each callsite. If we know the def of param is an alloca instruction, we know it is non-null and can return early from LVI. Similarly, CVP queries LVI to check whether pointer for each mem access is constant. If the def of the pointer is an alloca instruction, we know it is not a constant pointer. These shortcuts can reduce the cost of CVP significantly. Differential Revision: https://reviews.llvm.org/D18066 llvm-svn: 281586
* Create a getelementptr instead of sub expr for ValueOffsetPair if theWei Mi2016-09-141-3/+22
| | | | | | | | | | | | value is a pointer. This patch is to fix PR30213. When expanding an expr based on ValueOffsetPair, if the value is of pointer type, we can only create a getelementptr instead of sub expr. Differential Revision: https://reviews.llvm.org/D24088 llvm-svn: 281439
* [ConstantFold] Improve the bitcast folding logic for constant vectors.Andrea Di Biagio2016-09-131-2/+13
| | | | | | | | | | | | | | | | | | | | The constant folder didn't know how to always fold bitcasts of constant integer vectors. In particular, it was unable to handle the case where a constant vector had some undef elements, and the resulting (i.e. bitcasted) vector type had more elements than the original vector type. Example: %cast = bitcast <2 x i64><i64 undef, i64 2> to <4 x i32> On a little endian target, %cast could have been folded to: <4 x i32><i32 undef, i32 undef, i32 2, i32 0> This patch improves the folding logic by teaching how to correctly propagate undef elements in the folded vector. Differential Revision: https://reviews.llvm.org/D24301 llvm-svn: 281343
* [LVI] Complete the abstract of the cache layer [NFCI]Philip Reames2016-09-121-72/+94
| | | | | | | | Convert the previous introduced is-a relationship between the LVICache and LVIImple clases into a has-a relationship and hide all the implementation details of the cache from the lazy query layer. The only slightly concerning change here is removing the addition of a queried block into the SeenBlock set in LVIImpl::getBlockValue. As far as I can tell, this was effectively dead code. I think it *used* to be the case that getCachedValueInfo wasn't const and might end up inserting elements in the cache during lookup. That's no longer true and hasn't been for a while. I did fixup the const usage to make that more obvious. llvm-svn: 281272
* [LVI] Sink a couple more cache manipulation routines into the cache itself ↵Philip Reames2016-09-121-36/+45
| | | | | | | | [NFCI] The only interesting bit here is the refactor of the handle callback and even that's pretty straight-forward. llvm-svn: 281267
* [LVI] Abstract out the actual cache logic [NFCI]Philip Reames2016-09-121-89/+97
| | | | | | Seperate the caching logic from the implementation of the lazy analysis. For the moment, the lazy analysis impl has a is-a relationship with the cache; this will change to a has-a relationship shortly. This was done as two steps merely to keep the changes simple and the diff understandable. llvm-svn: 281266
* Add handling of !invariant.load to PropagateMetadata.Justin Lebar2016-09-111-6/+6
| | | | | | | | | | | | | | Summary: This will let e.g. the load/store vectorizer propagate this metadata appropriately. Reviewers: arsenm Subscribers: tra, jholewinski, hfinkel, mzolotukhin Differential Revision: https://reviews.llvm.org/D23479 llvm-svn: 281153
* Do not widen load for different variable in GVN.Dehao Chen2016-09-091-37/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Widening load in GVN is too early because it will block other optimizations like PRE, LICM. https://llvm.org/bugs/show_bug.cgi?id=29110 The SPECCPU2006 benchmark impact of this patch: Reference: o2_nopatch (1): o2_patched Benchmark Base:Reference (1) ------------------------------------------------------- spec/2006/fp/C++/444.namd 25.2 -0.08% spec/2006/fp/C++/447.dealII 45.92 +1.05% spec/2006/fp/C++/450.soplex 41.7 -0.26% spec/2006/fp/C++/453.povray 35.65 +1.68% spec/2006/fp/C/433.milc 23.79 +0.42% spec/2006/fp/C/470.lbm 41.88 -1.12% spec/2006/fp/C/482.sphinx3 47.94 +1.67% spec/2006/int/C++/471.omnetpp 22.46 -0.36% spec/2006/int/C++/473.astar 21.19 +0.24% spec/2006/int/C++/483.xalancbmk 36.09 -0.11% spec/2006/int/C/400.perlbench 33.28 +1.35% spec/2006/int/C/401.bzip2 22.76 -0.04% spec/2006/int/C/403.gcc 32.36 +0.12% spec/2006/int/C/429.mcf 41.04 -0.41% spec/2006/int/C/445.gobmk 26.94 +0.04% spec/2006/int/C/456.hmmer 24.5 -0.20% spec/2006/int/C/458.sjeng 28 -0.46% spec/2006/int/C/462.libquantum 55.25 +0.27% spec/2006/int/C/464.h264ref 45.87 +0.72% geometric mean +0.23% For most benchmarks, it's a wash, but we do see stable improvements on some benchmarks, e.g. 447,453,482,400. Reviewers: davidxl, hfinkel, dberlin, sanjoy, reames Subscribers: gberry, junbuml Differential Revision: https://reviews.llvm.org/D24096 llvm-svn: 281074
* [LCG] Clean up and make NDEBUG verify calls more rigorous withChandler Carruth2016-09-041-32/+38
| | | | | | | | | | | make_scope_exit now that we have that utility. This makes the code much more clear and readable by isolating the check. It also makes it easy to go through and make sure all the interesting update routines have a start and end verify so we don't slowly let the graph drift into an invalid state. llvm-svn: 280619
* [LCG] A NFC refactoring to extract the logic for doingChandler Carruth2016-09-041-111/+184
| | | | | | | | | | | | | | | | | | | | | a postorder-sequence based update after edge insertion into a generic helper function. This separates the SCC-specific logic into two fairly simple lambdas and extracts the rest into a generic helper template function. I think this is a net win on its own merits because it disentangles different pieces of the algorithm. Now there is one place that does the two-step partition to identify a set of newly connected components and at the same time update the postorder sequence. However, I'm also hoping to re-use this an upcoming patch to update a cached post-order sequence of RefSCCs when doing the analogous update to the RefSCC graph, and I don't want to have two copies. The diff is quite messy but this really is just moving things around and making types generic rather than specific. llvm-svn: 280618
* Simplify code a bit. No functional change intended.Andrea Di Biagio2016-09-021-15/+16
| | | | | | | We don't need to call `GetCompareTy(LHS)' every single time true or false is returned from function SimplifyFCmpInst as suggested by Sanjay in review D24142. llvm-svn: 280491
* [instsimplify] Fix incorrect folding of an ordered fcmp with a vector of all ↵Andrea Di Biagio2016-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | NaN. This patch fixes a crash caused by an incorrect folding of an ordered comparison between a packed floating point vector and a splat vector of NaN. An ordered comparison between a vector and a constant vector of NaN, should always be folded into a constant vector where each element is i1 false. Since revision 266175, SimplifyFCmpInst folds the ordered fcmp into a scalar 'false'. Later on, this would cause an assertion failure, since the value type of the folded value doesn't match the expected value type of the uses of the original instruction: "Assertion failed: New->getType() == getType() && "replaceAllUses of value with new value of different type!". This patch fixes the issue and adds a test case to the already existing test InstSimplify/floating-point-compares.ll. Differential Revision: https://reviews.llvm.org/D24143 llvm-svn: 280488
* [LoopInfo] Add verification by recomputation.Michael Zolotukhin2016-08-311-3/+6
| | | | | | | | | | | | | | | | | Summary: Current implementation of LI verifier isn't ideal and fails to detect some cases when LI is incorrect. For instance, it checks that all recorded loops are in a correct form, but it has no way to check if there are no more other (unrecorded in LI) loops in the function. This patch adds a way to detect such bugs. Reviewers: chandlerc, sanjoy, hfinkel Subscribers: llvm-commits, silvas, mzolotukhin Differential Revision: https://reviews.llvm.org/D23437 llvm-svn: 280280
* Fix indent. NFC.Chad Rosier2016-08-311-2/+2
| | | | llvm-svn: 280270
* s/static inline/static/ for headers I have changed in r279475. NFC.Tim Shen2016-08-311-1/+1
| | | | llvm-svn: 280257
* [Loads] Properly populate the visited set in isDereferenceableAndAlignedPointerDavid Majnemer2016-08-311-2/+5
| | | | | | | | | There were paths where we wouldn't populate the visited set, causing us to recurse forever if an SSA variable was defined in terms of itself. This fixes PR30210. llvm-svn: 280191
* Fixup r279618, instantiate ↵NAKAMURA Takumi2016-08-301-2/+2
| | | | | | | | | | | *AnalysisManagerProxy<*AnalysisManager,LazyCallGraph::SCC>, instead of *AnalysisManagerProxy<*AnalysisManager,LazyCallGraph::SCC,LazyCallGraph&>, for PassID. Or they were not instantiated as expected; llvm::InnerAnalysisManagerProxy<llvm::AnalysisManager<llvm::Function>, llvm::LazyCallGraph::SCC>::PassID llvm::InnerAnalysisManagerProxy<llvm::AnalysisManager<llvm::Function>, llvm::LazyCallGraph::SCC>::PassID llvm-svn: 280105
* NFC: add early exit in ModuleSummaryAnalysisPiotr Padlewski2016-08-301-29/+32
| | | | | | | | | | | | | | | | Summary: Changed this code because it was not very readable. The one question that I got after changing it is, should we count calls to intrinsics? We don't add them to caller summary, so maybe we shouldn't also count them? Reviewers: tejohnson, eraman, mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23949 llvm-svn: 280036
* Fix a thinko in r278189.Easwaran Raman2016-08-291-1/+1
| | | | llvm-svn: 280008
* [Loop Vectorizer] Fixed memory confilict checks.Elena Demikhovsky2016-08-281-3/+29
| | | | | | | | | Fixed a bug in run-time checks for possible memory conflicts inside loop. The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size". Differential revision: https://reviews.llvm.org/D23176 llvm-svn: 279930
* [Inliner] Report when inlining fails because callee's def is unavailableAdam Nemet2016-08-261-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is obviously an interesting case because it may motivate code restructuring or LTO. Reporting this requires instantiation of ORE in the loop where the call sites are first gathered. I've checked compile-time overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in mcf and quickly tailing off. As before without -Rpass-with-hotness there is no overhead. Because this could be a pretty noisy diagnostics, it is currently qualified as 'verbose'. As of this patch, 'verbose' diagnostics are only emitted with -Rpass-with-hotness, i.e. when the output is expected to be filtered. Reviewers: eraman, chandlerc, davidxl, hfinkel Subscribers: tejohnson, Prazek, davide, llvm-commits Differential Revision: https://reviews.llvm.org/D23415 llvm-svn: 279860
OpenPOWER on IntegriCloud