| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A series of patches beginning with https://reviews.llvm.org/D71898
propose to add an implementation of the coroutine passes to the new pass
manager. As part of these changes, the coroutine passes that implement
the legacy pass manager interface are renamed, to `<PassName>Legacy`.
This mirrors similar changes that have been made to many other passes in
LLVM as they've been transitioned to support both old and new pass
managers.
This commit splits out the renaming portion of that patch and commits it
in advance as an NFC (no functional change intended) commit. It renames:
* `CoroEarly` => `CoroEarlyLegacy`
* `CoroSplit` => `CoroSplitLegacy`
* `CoroElide` => `CoroElideLegacy`
* `CoroCleanup` => `CoroCleanupLegacy`
|
|
|
|
|
|
|
|
|
| |
Force `-Werror=strict-prototypes` so that C API tests fail to compile if
we add a non-prototype declaration. This should help avoid regressions
like bddecba4b333f7772029b4937d2c34f9f2fda6ca was fixing.
https://reviews.llvm.org/D70285
rdar://problem/57203137
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374784
|
|
|
|
|
|
|
| |
This reverts commit r374743. It broke the build with Ocaml enabled:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218
llvm-svn: 374768
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.
The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.
Differential Revision: https://reviews.llvm.org/D65280
llvm-svn: 374743
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MergeFunctions and DCE pass are missing from OCaml/C-api. This patch
adds them.
Differential Revision: https://reviews.llvm.org/D65071
Reviewers: whitequark, hiraditya, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Tags: #llvm
Authored by: kren1
llvm-svn: 373170
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Adds a binding to the internalize pass that allows the caller to pass a function pointer that acts as the visibility-preservation predicate. Previously, one could only pass an unsigned value (not LLVMBool?) that directed the pass to consider "main" or not.
Reviewers: whitequark, deadalnix, harlanhaskins
Reviewed By: whitequark, harlanhaskins
Subscribers: kren1, hiraditya, llvm-commits, harlanhaskins
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62456
llvm-svn: 366777
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Add bindings to create a wrapped "Add Discriminators" pass. Now that we have debug info support, this is a handy transform to have.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: dblaikie, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58624
llvm-svn: 356272
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
|
|
|
|
|
|
|
| |
This reverts commit c4baf7c2f06ff5459c4f5998ce980346e72bff97.
Broke the bots, and should really be in Transforms/Coroutines
instead.
llvm-svn: 343337
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch adds bindings to C and Go for addCoroutinePassesToExtensionPoints, which is used to add coroutine passes to the correct locations in PassManagerBuilder.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: mehdi_amini, modocache, llvm-commits
Differential Revision: https://reviews.llvm.org/D51642
llvm-svn: 343336
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds LLVMAddUnifyFunctionExitNodesPass to expose
createUnifyFunctionExitNodesPass to the C and OCaml APIs.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52212
llvm-svn: 342476
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds LLVMAddLowerAtomicPass to expose createLowerAtomicPass in the C
and OCaml APIs.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D52211
llvm-svn: 342475
|
|
|
|
| |
llvm-svn: 341461
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50950
llvm-svn: 340147
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder Loop
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 336062
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: It was fully replaced back in 2014, and the implementation was removed 11 months ago by r306797.
Reviewers: hfinkel, chandlerc, whitequark, deadalnix
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47436
llvm-svn: 333378
|
|
|
|
|
|
| |
I'm guessing the tests reply on the ARM backend being built.
llvm-svn: 333359
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a simple implementation of the unroll-and-jam classical loop
optimisation.
The basic idea is that we take an outer loop of the form:
for i..
ForeBlocks(i)
for j..
SubLoopBlocks(i, j)
AftBlocks(i)
Instead of doing normal inner or outer unrolling, we unroll as follows:
for i... i+=2
ForeBlocks(i)
ForeBlocks(i+1)
for j..
SubLoopBlocks(i, j)
SubLoopBlocks(i+1, j)
AftBlocks(i)
AftBlocks(i+1)
Remainder
So we have unrolled the outer loop, then jammed the two inner loops into
one. This can lead to a simpler inner loop if memory accesses can be shared
between the now-jammed loops.
To do this we have to prove that this is all safe, both for the memory
accesses (using dependence analysis) and that ForeBlocks(i+1) can move before
AftBlocks(i) and SubLoopBlocks(i, j).
Differential Revision: https://reviews.llvm.org/D41953
llvm-svn: 333358
|
|
|
|
|
|
|
|
| |
(notionally Scalar.h is part of libLLVMScalarOpts, so it shouldn't be
included by InstCombine which doesn't/shouldn't need to depend on
ScalarOpts)
llvm-svn: 330669
|
|
|
|
|
|
| |
I just tried to copy what was done for regular InstCombine. Hopefully I didn't miss anything.
llvm-svn: 330668
|
|
|
|
|
|
|
| |
To fix layering (so that Scalar.h, a libScalarOpts header, isn't
included from Utils - which libScalarOpts depends on).
llvm-svn: 328839
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is no-functional-change-intended.
This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired).
The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.
Differential Revision: https://reviews.llvm.org/D38631
llvm-svn: 316835
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new pass for attaching !callees metadata to indirect call
sites. The pass propagates values to call sites by performing an IPSCCP-like
analysis using the generic sparse propagation solver. For indirect call sites
having a small set of possible callees, the attached metadata indicates what
those callees are. The metadata can be used to facilitate optimizations like
intersecting the function attributes of the possible callees, refining the call
graph, performing indirect call promotion, etc.
Differential Revision: https://reviews.llvm.org/D37355
llvm-svn: 316576
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It served us well, helped kick-start much of the vectorization efforts
in LLVM, etc. Its time has come and past. Back in 2014:
http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html
Time to actually let go and move forward. =]
I've updated the release notes both about the removal and the
deprecation of the corresponding C API.
llvm-svn: 306797
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.
The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.
Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.
Differential Revision: https://reviews.llvm.org/D30333
llvm-svn: 298799
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code have been developed by Daniel Berlin over the years, and
the new implementation goal is that of addressing shortcomings of
the current GVN infrastructure, i.e. long compile time for large
testcases, lack of phi predication, no load/store value numbering
etc...
The current code just implements the "core" GVN algorithm, although
other pieces (load coercion, phi handling, predicate system) are
already implemented in a branch out of tree. Once the core is stable,
we'll start adding pieces on top of the base framework.
The test currently living in test/Transform/NewGVN are a copy
of the ones in GVN, with proper `XFAIL` (missing features in NewGVN).
A flag will be added in a future commit to enable NewGVN, so that
interested parties can exercise this code easily.
Differential Revision: https://reviews.llvm.org/D26224
llvm-svn: 290346
|
|
|
|
|
|
|
|
|
|
| |
Previous change broke the C API for creating an EarlyCSE pass w/
MemorySSA by adding a bool parameter to control whether MemorySSA was
used or not. This broke the OCaml bindings. Instead, change the old C
API entry point back and add a new one to request an EarlyCSE pass with
MemorySSA.
llvm-svn: 280379
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Use MemorySSA, if requested, to do less conservative memory dependency
checking.
This change doesn't enable the MemorySSA enhanced EarlyCSE in the
default pipelines, so should be NFC.
Reviewers: dberlin, sanjoy, reames, majnemer
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19821
llvm-svn: 280279
|
|
|
|
|
|
|
|
|
|
| |
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM. LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.
Differential Revision: http://reviews.llvm.org/D21316
llvm-svn: 272737
|
|
|
|
|
|
|
|
|
| |
Type specific declarations have been moved to Type.h and error handling
routines have been moved to ErrorHandling.h. Both are included in Core.h
so nothing should change for projects directly including the headers,
but transitive dependencies may be affected.
llvm-svn: 255965
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
|
|
|
|
| |
llvm-svn: 217630
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a ScalarEvolution-powered transformation that updates load, store and
memory intrinsic pointer alignments based on invariant((a+q) & b == 0)
expressions. Many of the simple cases we can get with ValueTracking, but we
still need something like this for the more complicated cases (such as those
with an offset) that require some algebra. Note that gcc's
__builtin_assume_aligned's optional third argument provides exactly for this
kind of 'misalignment' offset for which this kind of logic is necessary.
The primary motivation is to fixup alignments for vector loads/stores after
vectorization (and unrolling). This pass is added to the optimization pipeline
just after the SLP vectorizer runs (which, admittedly, does not preserve SE,
although I imagine it could). Regardless, I actually don't think that the
preservation matters too much in this case: SE computes lazily, and this pass
won't issue any SE queries unless there are any assume intrinsics, so there
should be no real additional cost in the common case (SLP does preserve DT and
LoopInfo).
llvm-svn: 217344
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
1. To preserve noalias function attribute information when inlining
2. To provide the ability to model block-scope C99 restrict pointers
Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.
What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:
!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }
Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:
... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }
When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.
Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.
[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]
Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.
llvm-svn: 213864
|
|
|
|
|
|
|
|
|
|
|
| |
Merges equivalent loads on both sides of a hammock/diamond
and hoists into into the header.
Merges equivalent stores on both sides of a hammock/diamond
and sinks it to the footer.
Can enable if conversion and tolerate better load misses
and store operand latencies.
llvm-svn: 213396
|
|
|
|
|
|
| |
file that it doesn't use.
llvm-svn: 205755
|
|
|
|
| |
llvm-svn: 195471
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The
transformation aims to take loops like this:
for (int i = 0; i < 3200; i += 5) {
a[i] += alpha * b[i];
a[i + 1] += alpha * b[i + 1];
a[i + 2] += alpha * b[i + 2];
a[i + 3] += alpha * b[i + 3];
a[i + 4] += alpha * b[i + 4];
}
and turn them into this:
for (int i = 0; i < 3200; ++i) {
a[i] += alpha * b[i];
}
and loops like this:
for (int i = 0; i < 500; ++i) {
x[3*i] = foo(0);
x[3*i+1] = foo(0);
x[3*i+2] = foo(0);
}
and turn them into this:
for (int i = 0; i < 1500; ++i) {
x[i] = foo(0);
}
There are two motivations for this transformation:
1. Code-size reduction (especially relevant, obviously, when compiling for
code size).
2. Providing greater choice to the loop vectorizer (and generic unroller) to
choose the unrolling factor (and a better ability to vectorize). The loop
vectorizer can take vector lengths and register pressure into account when
choosing an unrolling factor, for example, and a pre-unrolled loop limits that
choice. This is especially problematic if the manual unrolling was optimized
for a machine different from the current target.
The current implementation is limited to single basic-block loops only. The
rerolling recognition should work regardless of how the loop iterations are
intermixed within the loop body (subject to dependency and side-effect
constraints), but the significant restriction is that the order of the
instructions in each iteration must be identical. This seems sufficient to
capture all current use cases.
This pass is not currently enabled by default at any optimization level.
llvm-svn: 194939
|
|
|
|
|
|
|
|
|
|
| |
...so that it can be used for z too. Most of the code is the same.
The only real change is to use TargetTransformInfo to test when a sqrt
instruction is available.
The pass is opt-in because at the moment it only handles sqrt.
llvm-svn: 189097
|
|
|
|
|
|
|
| |
or the C++ files themselves. This enables people to use
just a C compiler to interoperate with LLVM.
llvm-svn: 180063
|
|
|
|
|
|
| |
expose it in the header file.
llvm-svn: 179272
|
|
|
|
| |
llvm-svn: 178769
|
|
|
|
|
|
| |
Filip Pizlo.
llvm-svn: 178713
|
|
|
|
| |
llvm-svn: 176793
|
|
|
|
| |
llvm-svn: 172025
|
|
|
|
|
|
| |
functions static.
llvm-svn: 166376
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This gives a lot of love to the docs for the C API. Like Clang's
documentation, the C API is now organized into a Doxygen "module"
(LLVMC). Each C header file is a child of the main module. Some modules
(like Core) have a hierarchy of there own. The produced documentation is
thus better organized (before everything was in one monolithic list).
This patch also includes a lot of new documentation for APIs in Core.h.
It doesn't document them all, but is better than none. Function docs are
missing @param and @return annotation, but the documentation body now
commonly provides help details (like the expected llvm::Value sub-type
to expect).
llvm-svn: 153157
|
|
|
|
| |
llvm-svn: 149472
|
|
|
|
|
|
|
| |
This is the initial checkin of the basic-block autovectorization pass along with some supporting vectorization infrastructure.
Special thanks to everyone who helped review this code over the last several months (especially Tobias Grosser).
llvm-svn: 149468
|