summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [EarlyCSE] IsSimple vs IsVolatile naming clarification (NFC)Philip Reames2015-12-051-11/+11
| | | | | | | | | | | | When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access. Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing. Note that the actual implementation was always bailing if the load or store wasn't simple. Reminder: - "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered - "ordered" - imposes ordering constraints on other nearby memory operations - "atomic" - can't be split or sheared. In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used. - "simple" - a load which is none of the above. These are normal loads and what most of the optimizer works with. llvm-svn: 254805
* [SimplifyLibCalls] Optimization for pow(x, n) where n is some constantWeiming Zhao2015-12-041-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In order to avoid calling pow function we generate repeated fmul when n is a positive or negative whole number. For each exponent we pre-compute Addition Chains in order to minimize the no. of fmuls. Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html We pre-compute addition chains for exponents upto 32 (which results in a max of 7 fmuls). For eg: 4 = 2+2 5 = 2+3 6 = 3+3 and so on Hence, pow(x, 4.0) ==> y = fmul x, x x = fmul y, y ret x For negative exponents, we simply compute the reciprocal of the final result. Note: This transformation is only enabled under fast-math. Patch by Mandeep Singh Grang <mgrang@codeaurora.org> Reviewers: weimingz, majnemer, escha, davide, scanon, joerg Subscribers: probinson, escha, llvm-commits Differential Revision: http://reviews.llvm.org/D13994 llvm-svn: 254776
* [asan] Fix dynamic allocas unpoisoning on PowerPC64.Yury Gribov2015-12-041-2/+17
| | | | | | | | | | | | | | | For PowerPC64 we cannot just pass SP extracted from @llvm.stackrestore to _asan_allocas_unpoison due to specific ABI requirements (http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#DYNAM-STACK). This patch adds the value returned by @llvm.get.dynamic.area.offset to extracted from @llvm.stackrestore stack pointer, so dynamic allocas unpoisoning stuff would work correctly on PowerPC64. Patch by Max Ostapenko. Differential Revision: http://reviews.llvm.org/D15108 llvm-svn: 254707
* clang-format FunctionImport after refactoring (NFC)Mehdi Amini2015-12-031-9/+10
| | | | | From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 254585
* Refactor FunctionImporter::importFunctions with a helper function to process ↵Mehdi Amini2015-12-031-29/+45
| | | | | | | | | the Worklist (NFC) This precludes some more functional changes to perform bulk imports. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 254583
* Move EH-specific helper functions to a more appropriate placeDavid Majnemer2015-12-024-4/+4
| | | | | | No functionality change is intended. llvm-svn: 254562
* Fix a typo in LoopVectorize.cpp. NFC.Cong Hou2015-12-021-1/+1
| | | | llvm-svn: 254549
* Do (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1 rather than (A == C1 || A ↵David Majnemer2015-12-021-4/+4
| | | | | | | | | | == C2) -> (A | (C1 ^ C2)) == C2 when C1 ^ C2 is a power of 2. Differential Revision: http://reviews.llvm.org/D14223 Patch by Amaury SECHET! llvm-svn: 254518
* [AttributeSet] Overload AttributeSet::addAttribute to reduce compileAkira Hatanaka2015-12-022-17/+28
| | | | | | | | | | | | | | | | | | | | time. The new overloaded function is used when an attribute is added to a large number of slots of an AttributeSet (for example, to function parameters). This is much faster than calling AttributeSet::addAttribute once per slot, because AttributeSet::getImpl (which calls FoldingSet::FIndNodeOrInsertPos) is called only once per function instead of once per slot. With this commit, clang compiles a file which used to take over 22 minutes in just 13 seconds. rdar://problem/23581000 Differential Revision: http://reviews.llvm.org/D15085 llvm-svn: 254491
* Change ModuleLinker to take a set of GlobalValues to import instead of a ↵Mehdi Amini2015-12-021-1/+5
| | | | | | | | | | | | single one For efficiency reason, when importing multiple functions for the same Module, we can avoid reparsing it every time. Differential Revision: http://reviews.llvm.org/D15102 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 254486
* [sanitizer coverage] when adding a bb trace instrumentation, do it instead, ↵Kostya Serebryany2015-12-021-15/+10
| | | | | | not in addition to, regular coverage. Do the regular coverage in the run-time instead llvm-svn: 254482
* Modify FunctionImport to take a callback to load modulesMehdi Amini2015-12-021-4/+7
| | | | | | | | | | | | When linking static archive, there is no individual module files to load. Instead they can be mmap'ed and could be initialized from a buffer directly. The callback provide flexibility to override the scheme for loading module from the summary. Differential Revision: http://reviews.llvm.org/D15101 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 254479
* Use references now that it is natural to do so.Rafael Espindola2015-12-011-2/+2
| | | | | | | The linker never takes ownership of a module or changes which module it is refering to, making it natural to use references. llvm-svn: 254449
* [ThinLTO] Wrap dbgs() output in DEBUG macroTeresa Johnson2015-12-011-5/+5
| | | | | | Missed in a couple places. llvm-svn: 254422
* [ThinLTO] Remove stale comment (NFC)Teresa Johnson2015-12-011-4/+0
| | | | | | Stale as of r254036 which added basic profitability check. llvm-svn: 254421
* Bring r254336 back:Rafael Espindola2015-12-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The difference is that now we don't error on out-of-comdat access to internal global values. We copy them instead. This seems to match the expectation of COFF linkers (see pr25686). Original message: Start deciding earlier what to link. A traditional linker is roughly split in symbol resolution and "copying stuff". The two tasks are badly mixed in lib/Linker. This starts splitting them apart. With this patch there are no direct call to linkGlobalValueBody or linkGlobalValueProto. Everything is linked via WapValue. This also includes a few fixes: * A GV goes undefined if the comdat is dropped (comdat11.ll). * We error if an internal GV goes undefined (comdat13.ll). * We don't link an unused comdat. The first two match the behavior of an ELF linker. The second one is equivalent to running globaldce on the input. llvm-svn: 254418
* [LIR] Push check into helper function. NFC.Chad Rosier2015-12-011-4/+4
| | | | llvm-svn: 254416
* [safestack] Protect byval function arguments.Evgeniy Stepanov2015-12-012-48/+117
| | | | | | | Detect unsafe byval function arguments and move them to the unsafe stack. llvm-svn: 254353
* [safestack] Fix handling of array allocas.Evgeniy Stepanov2015-12-011-5/+17
| | | | | | | | The current code does not take alloca array size into account and, as a result, considers any access past the first array element to be unsafe. llvm-svn: 254350
* This reverts commit r254336 and r254344.Rafael Espindola2015-11-301-3/+3
| | | | | | They broke a bot and I am debugging why. llvm-svn: 254347
* Start deciding earlier what to link.Rafael Espindola2015-11-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | A traditional linker is roughly split in symbol resolution and "copying stuff". The two tasks are badly mixed in lib/Linker. This starts splitting them apart. With this patch there are no direct call to linkGlobalValueBody or linkGlobalValueProto. Everything is linked via WapValue. This also includes a few fixes: * A GV goes undefined if the comdat is dropped (comdat11.ll). * We error if an internal GV goes undefined (comdat13.ll). * We don't link an unused comdat. The first two match the behavior of an ELF linker. The second one is equivalent to running globaldce on the input. llvm-svn: 254336
* [SimplifyLibCalls] Transform log(exp2(y)) to y*log(2) under fast-math.Davide Italiano2015-11-301-1/+9
| | | | llvm-svn: 254317
* fix typos in comments; NFCSanjay Patel2015-11-291-6/+8
| | | | llvm-svn: 254266
* [SimplifyLibCalls] Don't crash if the function doesn't have a name.Davide Italiano2015-11-291-3/+2
| | | | llvm-svn: 254265
* [SimplifyLibCalls] Cross out implemented transformations.Davide Italiano2015-11-291-2/+0
| | | | llvm-svn: 254264
* [SimplifyLibCalls] Tranform log(pow(x, y)) -> y*log(x).Davide Italiano2015-11-291-5/+50
| | | | | | | | | | | | | | | | | | This one is enabled only under -ffast-math. There are cases where the difference between the value computed and the correct value is huge even for ffast-math, e.g. as Steven pointed out: x = -1, y = -4 log(pow(-1), 4) = 0 4*log(-1) = NaN I checked what GCC does and apparently they do the same optimization (which result in the dramatic difference). Future work might try to make this (slightly) less worse. Differential Revision: http://reviews.llvm.org/D14400 llvm-svn: 254263
* SamplePGO - Do not use std::to_string in diagnostics.Diego Novillo2015-11-291-12/+17
| | | | | | | | This fixes buildbots in systems that std::to_string is not present. It also tidies the output of the diagnostic to render doubles a bit better (thanks Ben Kramer for help with string streams and format). llvm-svn: 254261
* Remove an intermediate lambda. NFCCraig Topper2015-11-291-3/+2
| | | | llvm-svn: 254246
* [SimplifyLibCalls] Use any_of(). Suggested by David Blaikie!Davide Italiano2015-11-281-4/+3
| | | | llvm-svn: 254239
* [SimplifyLibCalls] Fix inverted condition that lead to an uninitialized ↵Benjamin Kramer2015-11-281-2/+2
| | | | | | | | memory read below. Found by msan! llvm-svn: 254238
* Use range-based for loops. NFCCraig Topper2015-11-281-37/+20
| | | | llvm-svn: 254222
* SamplePGO - Add initial support for inliner annotations.Diego Novillo2015-11-271-1/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds two thresholds to the sample profiler to affect inlining decisions: the concept of global hotness and coldness. Functions that have accumulated more than a certain fraction of samples at runtime, are annotated with the InlineHint attribute. Conversely, functions that accumulate less than a certain fraction of samples, are annotated with the Cold attribute. This is very similar to the hints emitted by Clang when using instrumentation profiles. Notice that this is a very blunt instrument. A function may have globally collected a significant fraction of samples, but that does not necessarily mean that every callsite for that function is hot. Ideally, we would annotate each callsite with the samples collected at that callsite. This way, the inliner can incorporate all these weights into its cost model. Once the inliner offers this functionality, we can change the hints emitted here to a more precise per-callsite annotation. For now, this is providing some measure of speedups with our internal benchmarks. I've observed speedups of up to 23% (though the geo mean is about 3%). I expect these numbers to improve as the inliner gets better annotations. llvm-svn: 254212
* SamplePGO - Fix default threshold for hot callsites.Diego Novillo2015-11-271-3/+4
| | | | | | | | | | | | | | | | | | | | | | | Based on testing of internal benchmarks, I'm lowering this threshold to a value of 0.1%. This means that SamplePGO will respect 99.9% of the original inline decisions when following a profile. The performance difference is noticeable in some tests. With the previous threshold, the speedups over baseline -O2 was about 0.63%. With the new default, the speedups are around 3% on average. The point of this threshold is not to do more aggressive inlining. When an inlined callsite crosses this threshold, SamplePGO will redo the inline decision so that it can better apply the input profile. By respecting most original inline decisions, we can apply more of the input profile because the shape of the code follows the profile more closely. In the next series, I'll be looking at adding some inline hints for the cold callsites and for toplevel functions that are hot/cold as well. llvm-svn: 254211
* Simplify the linking of recursive data.Rafael Espindola2015-11-271-2/+10
| | | | | | | | Now the ValueMapper has two callbacks. The first one maps the declaration. The ValueMapper records the mapping and then materializes the body/initializer. llvm-svn: 254209
* [sanitizer] [dfsan] Unify aarch64 mappingAdhemerval Zanella2015-11-271-16/+21
| | | | | | | | | | | | This patch changes the DFSan instrumentation for aarch64 to instead of using fixes application mask defined by SANITIZER_AARCH64_VMA to read the application shadow mask value from compiler-rt. The value is initialized based on runtime VAM detection. Along with this patch a compiler-rt one will also be added to export the shadow mask variable. llvm-svn: 254196
* [SimplifyLibCalls] Use range-based loop. NFC.Davide Italiano2015-11-271-4/+2
| | | | llvm-svn: 254193
* [LoopVectorize] Use MapVector rather than DenseMap for MinBWs.Charlie Turner2015-11-261-3/+3
| | | | | | | | | | | | | | | | | The order in which instructions are truncated in truncateToMinimalBitwidths effects code generation. Switch to a map with a determinisic order, since the iteration order over a DenseMap is not defined. This code is not hot, so the difference in container performance isn't interesting. Many thanks to David Blaikie for making me aware of MapVector! Fixes PR25490. Differential Revision: http://reviews.llvm.org/D14981 llvm-svn: 254179
* Disallow aliases to available_externally.Rafael Espindola2015-11-261-37/+0
| | | | | | | | | | | | They are as much trouble as aliases to declarations. They are requiring the code generator to define a symbol with the same value as another symbol, but the second symbol is undefined. If representing this is important for some optimization, we could add support for available_externally aliases. They would be *required* to point to a declaration (or available_externally definition). llvm-svn: 254170
* [SimplifyLibCalls] Don't depend on a called function having a name, it might ↵Benjamin Kramer2015-11-261-11/+8
| | | | | | | | be an indirect call. Fixes the crasher in PR25651 and related crashers using the same pattern. llvm-svn: 254145
* [safestack] Fix alignment of dynamic allocas.Evgeniy Stepanov2015-11-251-1/+1
| | | | | | Fixes PR25588. llvm-svn: 254109
* [SCCP] More informative message if we don't know how to handle a terminator.Davide Italiano2015-11-251-1/+1
| | | | llvm-svn: 254093
* [OperandBundles] Extract duplicated code into a helper function, NFCSanjoy Das2015-11-252-10/+2
| | | | llvm-svn: 254047
* [InstCombine] Don't drop operand bundlesSanjoy Das2015-11-251-3/+10
| | | | | | | | | | Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14857 llvm-svn: 254046
* [PGO] Revert revision r254021,r254028,r254035Rong Xu2015-11-246-933/+2
| | | | | | Revert the above revision due to multiple issues. llvm-svn: 254040
* [ThinLTO] Add option to limit importing based on instruction countTeresa Johnson2015-11-241-0/+12
| | | | | | | | Add a simple initial heuristic to control importing based on the number of instructions recorded in the function's summary. Add option to control the limit, and test using option. llvm-svn: 254036
* SamplePGO - Add test for hot/cold inlined functions.Diego Novillo2015-11-241-17/+47
| | | | | | | | | | | | | | | | | | | | | | | When the original binary is executed and sampled, the resulting profile contains information on the original inline stack. We currently follow the original inline plan if we notice that the inlined callsite has more than 0 samples to it. A better way is to determine whether the callsite is actually worth inlining. If the callsite accumulates a small fraction of the samples spent in the parent function, then we don't want to bother inlining it (as it means that the callsite is actually cold). This patch introduces a threshold expressed in percentage of samples in relation to the parent function. If the callsite uses less than N% of the total samples used by its parent, the original inline decision is not re-applied. I've set the threshold to the very arbitrary value of 5%. I'm yet to do any actual experiments to see what's a good value. I wanted to separate the basic mechanism from the tuning. llvm-svn: 254034
* [PGO] Fix build errors in x86_64-darwinRong Xu2015-11-242-3/+3
| | | | | | Fix buildbot failure for x86_64-darwin due to r254021 llvm-svn: 254028
* [PGO] MST based PGO instrumentation infrastructureRong Xu2015-11-246-2/+933
| | | | | | | | | | | | This patch implements a minimum spanning tree (MST) based instrumentation for PGO. The use of MST guarantees minimum number of CFG edges getting instrumented. An addition optimization is to instrument the less executed edges to further reduce the instrumentation overhead. The patch contains both the instrumentation and the use of the profile to set the branch weights. Differential Revision: http://reviews.llvm.org/D12781 llvm-svn: 254021
* [ThinLTO] Refactor function body scan during importing into helper (NFC)Teresa Johnson2015-11-241-36/+27
| | | | llvm-svn: 254020
* [ThinLTO] Enable iterative importing in FunctionImport passTeresa Johnson2015-11-241-2/+36
| | | | | | | | | | | Analyze imported function bodies and add any new external calls to the worklist for importing. Currently no controls on the importing so this will end up importing everything possible in the call tree below the importing module. Basic profitability checks coming next. Update test to check for iteratively inlined functions. llvm-svn: 254011
OpenPOWER on IntegriCloud