summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [ThinLTO] Debug message cleanup (NFC)Teresa Johnson2015-12-101-8/+8
| | | | | | | | Added some missing spaces between the module identifier and the start of the debug message. Also added a ":" after the module identifier to make this look a little nicer. llvm-svn: 255259
* [DeadStoreElimination] Add support for non-local DSE.Chad Rosier2015-12-101-90/+242
| | | | | | | | | | | | We extend the search for redundant stores to predecessor blocks that unconditionally lead to the block BB with the current store instruction. That also includes single-block loops that unconditionally lead to BB, and if-then-else blocks where then- and else-blocks unconditionally lead to BB. http://reviews.llvm.org/D13363 Patch by Ivan Baev <ibaev@codeaurora.org>! llvm-svn: 255247
* [LLE] Use the PredicatedScalarEvolution interface to query SCEVs for dependencesSilviu Baranga2015-12-101-16/+15
| | | | | | | | | | | | | | | | | Summary: LAA uses the PredicatedScalarEvolution interface, so it can produce forward/backward dependences having SCEVs that are AddRecExprs only after being transformed by PredicatedScalarEvolution. Use PredicatedScalarEvolution to get the expected expressions. Reviewers: anemet Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15382 llvm-svn: 255241
* Revert r255137.Akira Hatanaka2015-12-101-39/+0
| | | | | | This commit broke apple's internal bot. llvm-svn: 255227
* Add arg_begin() and arg_end() to CallInst and InvokeInst; NFCISanjoy Das2015-12-103-7/+4
| | | | | | | | | | - This simplifies the CallSite class, arg_begin / arg_end are now simple wrapper getters. - In several places, we were creating CallSite instances solely to call arg_begin and arg_end. With this change, that's no longer required. llvm-svn: 255226
* [Float2Int] Don't operate on vector instructionsReid Kleckner2015-12-091-0/+2
| | | | | | | This fixes a crash bug. It's also not clear if we'd want to do this transform for vectors. llvm-svn: 255155
* Don't assign a temporary string to a StringRef.Rafael Espindola2015-12-091-1/+1
| | | | | | Should fix the windows debug and asan bots. llvm-svn: 255149
* Use WeakVH to keep track of calls with operand bundles in CloneCodeInfoSanjoy Das2015-12-091-1/+3
| | | | | | | | `CloneAndPruneIntoFromInst` can DCE instructions after cloning them into the new function, and so an AssertingVH is too strong. This change switches CloneCodeInfo to use a std::vector<WeakVH>. llvm-svn: 255148
* Delete trailing whitespace; NFCSanjoy Das2015-12-091-1/+1
| | | | llvm-svn: 255147
* [ThinLTO] FunctionImport pass can take a const index pointer (NFC)Teresa Johnson2015-12-091-3/+3
| | | | llvm-svn: 255140
* [InstCombine] fold bitcasts around an extractelement (2nd try)Sanjay Patel2015-12-091-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a redo of r255124 (reverted at r255126) with an added check for a scalar destination type and an added test for the failure seen in Clang's test/CodeGen/vector.c. The extra test shows a different missing optimization. Original commit message: Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255137
* Revert "Revert r253253 and r253126: "Don't recompute LCSSA after ↵Michael Zolotukhin2015-12-091-2/+12
| | | | | | | | | | | loop-unrolling when possible."" The bug in IndVarSimplify was fixed in r254976, r254977, so I'm reapplying the original patch for avoiding redundant LCSSA recomputation. This reverts commit ffe3b434e505e403146aff00be0c177bb6d13466. llvm-svn: 255133
* [PGO] Resubmit "MST based PGO instrumentation infrastructure" (r254021)Rong Xu2015-12-095-1/+939
| | | | | | | | | | | | | | | This new patch fixes a few bugs that exposed in last submit. It also improves the test cases. --Original Commit Message-- This patch implements a minimum spanning tree (MST) based instrumentation for PGO. The use of MST guarantees minimum number of CFG edges getting instrumented. An addition optimization is to instrument the less executed edges to further reduce the instrumentation overhead. The patch contains both the instrumentation and the use of the profile to set the branch weights. Differential Revision: http://reviews.llvm.org/D12781 llvm-svn: 255132
* Revert "[InstCombine] fold bitcasts around an extractelement"Mehdi Amini2015-12-091-37/+0
| | | | | | | | | This reverts commit r255124. Broke http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/4193/steps/test/logs/stdio From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255126
* [InstCombine] fold bitcasts around an extractelementSanjay Patel2015-12-091-0/+37
| | | | | | | | | | | | | | | | | | | | | | | | Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255124
* Re-commit r255115, with the PredicatedScalarEvolution class moved toSilviu Baranga2015-12-094-92/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ScalarEvolution.h, in order to avoid cyclic dependencies between the Transform and Analysis modules: [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions Summary: This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is that both LAA and LV should use this interface everywhere. This also solves a problem involving the result of SCEV expression rewritting when the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates P1: {a,+,b} has nsw P2: b = 1. Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies). The SCEVPredicatedLayer maintains the order of transformations by feeding back the results of previous transformations into new transformations, and therefore avoiding this issue. The SCEVPredicatedLayer maintains a cache to remember the results of previous SCEV rewritting results. This also has the benefit of reducing the overall number of expression rewrites. Reviewers: mzolotukhin, anemet Subscribers: jmolloy, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D14296 llvm-svn: 255122
* Revert r255115 until we figure out how to fix the bot failures.Silviu Baranga2015-12-095-131/+92
| | | | llvm-svn: 255117
* [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV ↵Silviu Baranga2015-12-095-92/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | expressions Summary: This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is that both LAA and LV should use this interface everywhere. This also solves a problem involving the result of SCEV expression rewritting when the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates P1: {a,+,b} has nsw P2: b = 1. Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies). The SCEVPredicatedLayer maintains the order of transformations by feeding back the results of previous transformations into new transformations, and therefore avoiding this issue. The SCEVPredicatedLayer maintains a cache to remember the results of previous SCEV rewritting results. This also has the benefit of reducing the overall number of expression rewrites. Reviewers: mzolotukhin, anemet Subscribers: jmolloy, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D14296 llvm-svn: 255115
* EarlyCSE: fix typo from rL255054.JF Bastien2015-12-091-1/+1
| | | | llvm-svn: 255102
* The current importing scheme is processing one function at a time,Mehdi Amini2015-12-091-54/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | loading the source Module, linking the function in the destination module, and destroying the source Module before repeating with the next function to import (potentially from the same Module). Ideally we would keep the source Module alive and import the next Function needed from this Module. Unfortunately this is not possible because the linker does not leave it in a usable state. However we can do better by first computing the list of all candidates per Module, and only then load the source Module and import all the function we need for it. The trick to process callees is to materialize function in the source module when building the list of function to import, and inspect them in their source module, collecting the list of callees for each callee. When we move the the actual import, we will import from each source module exactly once. Each source module is loaded exactly once. The only drawback it that it requires to have all the lazy-loaded source Module in memory at the same time. Currently this patch already improves considerably the link time, a multithreaded link of llvm-dis on my laptop was: real 1m12.175s user 6m32.430s sys 0m10.529s and is now: real 0m40.697s user 2m10.237s sys 0m4.375s Note: this is the full link time (linker+Import+Optimizer+CodeGen) Differential Revision: http://reviews.llvm.org/D15178 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255100
* Test commit access - Fix few missing '.' in comments of LoopInterchange code.Vikram TV2015-12-091-4/+4
| | | | llvm-svn: 255095
* Return a std::unique_ptr from CloneModule. NFC.Rafael Espindola2015-12-081-13/+15
| | | | llvm-svn: 255078
* [IndVars] Use any_of and foreach instead of explicit for loops; NFCSanjoy Das2015-12-081-11/+6
| | | | llvm-svn: 255077
* [OperandBundles] Have PruneEH work correct with operand bundles.Sanjoy Das2015-12-081-2/+7
| | | | | | | | For an invoke with operand bundles, the [op_begin(), op_end()-3] range can contain things other than invoke arguments. This change teaches PruneEH to use arg_begin() and arg_end() explicitly. llvm-svn: 255073
* Fix/Improve Debug print in FunctionImport passMehdi Amini2015-12-081-9/+12
| | | | | From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255071
* Remove caching in FunctionImport: a Module can't be reused after being ↵Mehdi Amini2015-12-081-20/+13
| | | | | | | | | linked from The Linker destroys the source module (API change coming to make it explicit) From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 255064
* [OperandBundles] Fix a transform in simplifycfgSanjoy Das2015-12-081-2/+6
| | | | | | | | | | Reviewers: pcc, majnemer, reames Subscribers: reames, llvm-commits Differential Revision: http://reviews.llvm.org/D15345 llvm-svn: 255062
* [EarlyCSE] Value forwarding for unordered atomicsPhilip Reames2015-12-081-19/+77
| | | | | | | | | | | | This patch teaches the fully redundant load part of EarlyCSE how to forward from atomic and volatile loads and stores, and how to eliminate unordered atomics (only). This patch does not include dead store elimination support for unordered atomics, that will follow in the near future. The basic idea is that we allow all loads and stores to be tracked by the AvailableLoad table. We store a bit in the table which tracks whether load/store was atomic, and then only replace atomic loads with ones which were also atomic. No attempt is made to refine our handling of ordered loads or stores. Those are still treated as full fences. We could pretty easily extend the release fence handling to release stores, but that should be a separate patch. Differential Revision: http://reviews.llvm.org/D15337 llvm-svn: 255054
* [OperandBundles] Remove unncessary constructorSanjoy Das2015-12-081-1/+1
| | | | | | | | The StringRef constructor is unnecessary (since we're converting to std::string anyway), and having it requires an explicit call to StringRef's or std::string's constructor. llvm-svn: 255000
* [IndVars] Have getInsertPointForUses preserve LCSSASanjoy Das2015-12-081-14/+28
| | | | | | | | | | | | | | | Summary: Also add a stricter post-condition for IndVarSimplify. Fixes PR25578. Test case by Michael Zolotukhin. Reviewers: hfinkel, atrick, mzolotukhin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15059 llvm-svn: 254977
* Reapply 254950 w/fixPhilip Reames2015-12-071-44/+51
| | | | | | | | | | | | | | | | | 254950 ended up being not NFC. The previous code was overriding the flags for whether an instruction read or wrote memory using the target specific flags returned via TTI. I'd missed this in my refactoring. Since I mistakenly built only x86 and didn't notice the number of unsupported tests, I didn't catch that before the original checkin. This raises an interesting issue though. Given we have function attributes (i.e. readonly, readnone, argmemonly) which describe the aliasing of intrinsics, why does TTI have this information overriding the instruction definition at all? I see no reason for this, but decided to preserve existing behavior for the moment. The root issue might be that we don't have a "writeonly" attribute. Original commit message: [EarlyCSE] Simplify and invert ParseMemoryInst [NFCI] Restructure ParseMemoryInst - which was introduced to abstract over target specific load and stores instructions - to just query the underlying instructions. In theory, this could be slightly slower than caching the results, but in practice, it's very unlikely to be measurable. The simple query scheme makes it far easier to understand, and much easier to extend with new queries. Given I'm about to need to add new query types, doing the cleanup first seemed worthwhile. Do we still believe the target specific intrinsic handling is worthwhile in EarlyCSE? It adds quite a bit of complexity and makes the code harder to read. Being able to delete the abstraction entirely would be wonderful. llvm-svn: 254957
* Revert 254950Philip Reames2015-12-071-45/+44
| | | | | | It's causing test failures on AArch64. Due to a bad build config on my part, I apparently wasn't running the tests I thought I was. llvm-svn: 254954
* [EarlyCSE] Simplify and invert ParseMemoryInst [NFCI]Philip Reames2015-12-071-44/+45
| | | | | | | | | | Restructure ParseMemoryInst - which was introduced to abstract over target specific load and stores instructions - to just query the underlying instructions. In theory, this could be slightly slower than caching the results, but in practice, it's very unlikely to be measurable. The simple query scheme makes it far easier to understand, and much easier to extend with new queries. Given I'm about to need to add new query types, doing the cleanup first seemed worthwhile. Do we still believe the target specific intrinsic handling is worthwhile in EarlyCSE? It adds quite a bit of complexity and makes the code harder to read. Being able to delete the abstraction entirely would be wonderful. llvm-svn: 254950
* [ThinLTO] Support for specifying function index from pass managerTeresa Johnson2015-12-072-11/+33
| | | | | | | | | | | | | | | | Summary: Add a field on the PassManagerBuilder that clang or gold can use to pass down a pointer to the function index in memory to use for importing when the ThinLTO backend is triggered. Add support to supply this to the function import pass. Reviewers: joker.eph, dexonsmith Subscribers: davidxl, llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D15024 llvm-svn: 254926
* Create llvm.global_ctors in the new format.Rafael Espindola2015-12-061-2/+2
| | | | llvm-svn: 254878
* [InstCombine] Call getCmpPredicateForMinMax only with a valid SPFSanjoy Das2015-12-051-1/+5
| | | | | | | | | | | | | | | | Summary: There are `SelectPatternFlavor`s that don't represent min or max idioms, and we should not be passing those to `getCmpPredicateForMinMax`. Fixes PR25745. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15249 llvm-svn: 254869
* [ASAN] Add doFinalization to reset stateKeno Fischer2015-12-051-0/+11
| | | | | | | | | | | | | | | Summary: If the same pass manager is used for multiple modules ASAN complains about GlobalsMD being initialized twice. Fix this by resetting GlobalsMD in a new doFinalization method to allow this use case. Reviewers: kcc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14962 llvm-svn: 254851
* Fix a typo in LoopVectorize.cpp. NFC.Cong Hou2015-12-051-1/+1
| | | | llvm-svn: 254813
* [EarlyCSE] IsSimple vs IsVolatile naming clarification (NFC)Philip Reames2015-12-051-11/+11
| | | | | | | | | | | | When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access. Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing. Note that the actual implementation was always bailing if the load or store wasn't simple. Reminder: - "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered - "ordered" - imposes ordering constraints on other nearby memory operations - "atomic" - can't be split or sheared. In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used. - "simple" - a load which is none of the above. These are normal loads and what most of the optimizer works with. llvm-svn: 254805
* [SimplifyLibCalls] Optimization for pow(x, n) where n is some constantWeiming Zhao2015-12-041-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In order to avoid calling pow function we generate repeated fmul when n is a positive or negative whole number. For each exponent we pre-compute Addition Chains in order to minimize the no. of fmuls. Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html We pre-compute addition chains for exponents upto 32 (which results in a max of 7 fmuls). For eg: 4 = 2+2 5 = 2+3 6 = 3+3 and so on Hence, pow(x, 4.0) ==> y = fmul x, x x = fmul y, y ret x For negative exponents, we simply compute the reciprocal of the final result. Note: This transformation is only enabled under fast-math. Patch by Mandeep Singh Grang <mgrang@codeaurora.org> Reviewers: weimingz, majnemer, escha, davide, scanon, joerg Subscribers: probinson, escha, llvm-commits Differential Revision: http://reviews.llvm.org/D13994 llvm-svn: 254776
* [asan] Fix dynamic allocas unpoisoning on PowerPC64.Yury Gribov2015-12-041-2/+17
| | | | | | | | | | | | | | | For PowerPC64 we cannot just pass SP extracted from @llvm.stackrestore to _asan_allocas_unpoison due to specific ABI requirements (http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#DYNAM-STACK). This patch adds the value returned by @llvm.get.dynamic.area.offset to extracted from @llvm.stackrestore stack pointer, so dynamic allocas unpoisoning stuff would work correctly on PowerPC64. Patch by Max Ostapenko. Differential Revision: http://reviews.llvm.org/D15108 llvm-svn: 254707
* clang-format FunctionImport after refactoring (NFC)Mehdi Amini2015-12-031-9/+10
| | | | | From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 254585
* Refactor FunctionImporter::importFunctions with a helper function to process ↵Mehdi Amini2015-12-031-29/+45
| | | | | | | | | the Worklist (NFC) This precludes some more functional changes to perform bulk imports. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 254583
* Move EH-specific helper functions to a more appropriate placeDavid Majnemer2015-12-024-4/+4
| | | | | | No functionality change is intended. llvm-svn: 254562
* Fix a typo in LoopVectorize.cpp. NFC.Cong Hou2015-12-021-1/+1
| | | | llvm-svn: 254549
* Do (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1 rather than (A == C1 || A ↵David Majnemer2015-12-021-4/+4
| | | | | | | | | | == C2) -> (A | (C1 ^ C2)) == C2 when C1 ^ C2 is a power of 2. Differential Revision: http://reviews.llvm.org/D14223 Patch by Amaury SECHET! llvm-svn: 254518
* [AttributeSet] Overload AttributeSet::addAttribute to reduce compileAkira Hatanaka2015-12-022-17/+28
| | | | | | | | | | | | | | | | | | | | time. The new overloaded function is used when an attribute is added to a large number of slots of an AttributeSet (for example, to function parameters). This is much faster than calling AttributeSet::addAttribute once per slot, because AttributeSet::getImpl (which calls FoldingSet::FIndNodeOrInsertPos) is called only once per function instead of once per slot. With this commit, clang compiles a file which used to take over 22 minutes in just 13 seconds. rdar://problem/23581000 Differential Revision: http://reviews.llvm.org/D15085 llvm-svn: 254491
* Change ModuleLinker to take a set of GlobalValues to import instead of a ↵Mehdi Amini2015-12-021-1/+5
| | | | | | | | | | | | single one For efficiency reason, when importing multiple functions for the same Module, we can avoid reparsing it every time. Differential Revision: http://reviews.llvm.org/D15102 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 254486
* [sanitizer coverage] when adding a bb trace instrumentation, do it instead, ↵Kostya Serebryany2015-12-021-15/+10
| | | | | | not in addition to, regular coverage. Do the regular coverage in the run-time instead llvm-svn: 254482
* Modify FunctionImport to take a callback to load modulesMehdi Amini2015-12-021-4/+7
| | | | | | | | | | | | When linking static archive, there is no individual module files to load. Instead they can be mmap'ed and could be initialized from a buffer directly. The callback provide flexibility to override the scheme for loading module from the summary. Differential Revision: http://reviews.llvm.org/D15101 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 254479
OpenPOWER on IntegriCloud