| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than using globals use a structure to pass parameters from the
vectorizer. This prepares the class to be moved outside the LoopVectorizer.
It's not great how all this is passed through in LoopAccessAnalysis but this
is all expected to change once the class start servicing the Loop Distribution
pass as well where some of these parameters make no sense.
NFC. This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.
llvm-svn: 227754
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the canVectorizeMemory functionality from LoopVectorizationLegality to a
new class LoopAccessAnalysis and forward users.
Currently the collection of the symbolic stride information is kept with
LoopVectorizationLegality and it becomes an input to LoopAccessAnalysis.
NFC. This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.
llvm-svn: 227751
|
| |
|
|
|
|
|
|
|
|
|
| |
These members are moving to LoopAccessAnalysis. The accessors help to hide
this.
NFC. This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.
llvm-svn: 227750
|
| |
|
|
|
|
|
|
|
|
|
| |
This class will become public in the new LoopAccessAnalysis header so the name
needs to be more global.
NFC. This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.
llvm-svn: 227749
|
| |
|
|
|
|
|
|
|
|
|
| |
The logic in emitAnalysis is duplicated across multiple functions. This
splits it into a function. Another use will be added by the patchset.
NFC. This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.
llvm-svn: 227748
|
| |
|
|
|
|
|
|
|
|
|
|
| |
RuntimePointerCheck will be used through LoopAccessAnalysis in
LoopVectorizationLegality. Later in the patchset it will become a local class
of LoopAccessAnalysis.
NFC. This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.
llvm-svn: 227747
|
| |
|
|
|
|
| |
per-function and supports the exact desired interface.
llvm-svn: 227743
|
| |
|
|
|
|
| |
Brings it in line with the other hashes in EarlyCSE.
llvm-svn: 227733
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
getTTI method used to get an actual TTI object.
No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.
The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.
llvm-svn: 227730
|
| |
|
|
|
|
|
|
| |
This should be sufficient to replace the initial (minor) function pass
pipeline in Clang with the new pass manager. I'll probably add an (off
by default) flag to do that just to ensure we can get extra testing.
llvm-svn: 227726
|
| |
|
|
|
|
|
|
| |
I've added RUN lines both to the basic test for EarlyCSE and the
target-specific test, as this serves as a nice test that the TTI layer
in the new pass manager is in fact working well.
llvm-svn: 227725
|
| |
|
|
| |
llvm-svn: 227717
|
| |
|
|
| |
llvm-svn: 227705
|
| |
|
|
|
|
| |
SeparateConstOffsetFromGEP does not change the shape of the control flow graph.
llvm-svn: 227704
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA
driver from unrolling a loop marked with llvm.loop.unroll.disable is not
unrolled by CUDA driver, we need to emit .pragma "nounroll" at the
header of that loop.
This patch also extracts getting unroll metadata from loop ID metadata
into a shared helper function.
Test Plan: test/CodeGen/NVPTX/nounroll.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D7041
llvm-svn: 227703
|
| |
|
|
|
|
|
|
| |
aggregate or scalar, the debug info needs to refer to the absolute offset
(relative to the entire variable) instead of storing the offset inside
the smaller aggregate.
llvm-svn: 227702
|
| |
|
|
| |
llvm-svn: 227684
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
type erased interface and a single analysis pass rather than an
extremely complex analysis group.
The end result is that the TTI analysis can contain a type erased
implementation that supports the polymorphic TTI interface. We can build
one from a target-specific implementation or from a dummy one in the IR.
I've also factored all of the code into "mix-in"-able base classes,
including CRTP base classes to facilitate calling back up to the most
specialized form when delegating horizontally across the surface. These
aren't as clean as I would like and I'm planning to work on cleaning
some of this up, but I wanted to start by putting into the right form.
There are a number of reasons for this change, and this particular
design. The first and foremost reason is that an analysis group is
complete overkill, and the chaining delegation strategy was so opaque,
confusing, and high overhead that TTI was suffering greatly for it.
Several of the TTI functions had failed to be implemented in all places
because of the chaining-based delegation making there be no checking of
this. A few other functions were implemented with incorrect delegation.
The message to me was very clear working on this -- the delegation and
analysis group structure was too confusing to be useful here.
The other reason of course is that this is *much* more natural fit for
the new pass manager. This will lay the ground work for a type-erased
per-function info object that can look up the correct subtarget and even
cache it.
Yet another benefit is that this will significantly simplify the
interaction of the pass managers and the TargetMachine. See the future
work below.
The downside of this change is that it is very, very verbose. I'm going
to work to improve that, but it is somewhat an implementation necessity
in C++ to do type erasure. =/ I discussed this design really extensively
with Eric and Hal prior to going down this path, and afterward showed
them the result. No one was really thrilled with it, but there doesn't
seem to be a substantially better alternative. Using a base class and
virtual method dispatch would make the code much shorter, but as
discussed in the update to the programmer's manual and elsewhere,
a polymorphic interface feels like the more principled approach even if
this is perhaps the least compelling example of it. ;]
Ultimately, there is still a lot more to be done here, but this was the
huge chunk that I couldn't really split things out of because this was
the interface change to TTI. I've tried to minimize all the other parts
of this. The follow up work should include at least:
1) Improving the TargetMachine interface by having it directly return
a TTI object. Because we have a non-pass object with value semantics
and an internal type erasure mechanism, we can narrow the interface
of the TargetMachine to *just* do what we need: build and return
a TTI object that we can then insert into the pass pipeline.
2) Make the TTI object be fully specialized for a particular function.
This will include splitting off a minimal form of it which is
sufficient for the inliner and the old pass manager.
3) Add a new pass manager analysis which produces TTI objects from the
target machine for each function. This may actually be done as part
of #2 in order to use the new analysis to implement #2.
4) Work on narrowing the API between TTI and the targets so that it is
easier to understand and less verbose to type erase.
5) Work on narrowing the API between TTI and its clients so that it is
easier to understand and less verbose to forward.
6) Try to improve the CRTP-based delegation. I feel like this code is
just a bit messy and exacerbating the complexity of implementing
the TTI in each target.
Many thanks to Eric and Hal for their help here. I ended up blocked on
this somewhat more abruptly than I expected, and so I appreciate getting
it sorted out very quickly.
Differential Revision: http://reviews.llvm.org/D7293
llvm-svn: 227669
|
| |
|
|
| |
llvm-svn: 227614
|
| |
|
|
| |
llvm-svn: 227605
|
| |
|
|
|
|
|
| |
instruction and generalize it to optionally dereference the variable.
Follow-up to r227544.
llvm-svn: 227604
|
| |
|
|
|
|
|
|
|
|
|
|
| |
analyses back into the LTO code generator.
The pass manager builder (and the transforms library in general)
shouldn't be referencing the target machine at all.
This makes the LTO population work like the others -- the data layout
and target transform info need to be pre-populated.
llvm-svn: 227576
|
| |
|
|
|
|
| |
covering switch.
llvm-svn: 227575
|
| |
|
|
|
|
|
|
|
|
| |
Previously, only -1 and +1 step values are supported for induction variables. This patch extends LV to support
arbitrary constant steps.
Initial patch by Alexey Volkov. Some bug fixes are added in the following version.
Differential Revision: http://reviews.llvm.org/D6051 and http://reviews.llvm.org/D7193
llvm-svn: 227557
|
| |
|
|
|
|
| |
so we need to move the dbg.declare intrinsics that describe them, too.
llvm-svn: 227544
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The validation algorithm used an incremental approach, building each
iteration's data structures temporarily, validating them, then
adding them to a global set.
This does not scale well to having multiple sets of Root nodes, as the
set of instructions used in each iteration is the union over all
the root nodes. Therefore, refactor the logic to create a single, simple
container to which later logic then refers. This makes it simpler
control-flow wise to make the creation of the container more complex with
the addition of multiple root sets.
llvm-svn: 227499
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In http://reviews.llvm.org/D6911, we allowed GVN to propagate FP equalities
to allow some simple value range optimizations. But that introduced a bug
when comparing to -0.0 or 0.0: these compare equal even though they are not
bitwise identical.
This patch disallows propagating zero constants in equality comparisons.
Fixes: http://llvm.org/bugs/show_bug.cgi?id=22376
Differential Revision: http://reviews.llvm.org/D7257
llvm-svn: 227491
|
| |
|
|
|
|
|
|
|
| |
reroll() was slightly monolithic and a pain to modify. Refactor
a bunch of its state from local variables to member variables
of a helper class, and do some trivial simplification while we're
there.
llvm-svn: 227439
|
| |
|
|
|
|
|
|
|
|
| |
Patch by: Igor Laevsky <igor@azulsystems.com>
"Currently SplitBlockPredecessors generates incorrect code in case if basic block we are going to split has a landingpad. Also seems like it is fairly common case among it's users to conditionally call either SplitBlockPredecessors or SplitLandingPadPredecessors. Because of this I think it is reasonable to add this condition directly into SplitBlockPredecessors."
Differential Revision: http://reviews.llvm.org/D7157
llvm-svn: 227390
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
abomination.
For starters, this API is incredibly slow. In order to lookup the name
of a pass it must take a memory fence to acquire a pointer to the
managed static pass registry, and then potentially acquire locks while
it consults this registry for information about what passes exist by
that name. This stops the world of LLVMs in your process no matter
how little they cared about the result.
To make this more joyful, you'll note that we are preserving many passes
which *do not exist* any more, or are not even analyses which one might
wish to have be preserved. This means we do all the work only to say
"nope" with no error to the user.
String-based APIs are a *bad idea*. String-based APIs that cannot
produce any meaningful error are an even worse idea. =/
I have a patch that simply removes this API completely, but I'm hesitant
to commit it as I don't really want to perniciously break out-of-tree
users of the old pass manager. I'd rather they just have to migrate to
the new one at some point. If others disagree and would like me to kill
it with fire, just say the word. =]
llvm-svn: 227294
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Also add enum types for __C_specific_handler and _CxxFrameHandler3 for
which we know a few things.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7214
llvm-svn: 227284
|
| |
|
|
|
|
|
|
|
| |
COMDATs must be identically named to the symbol. When support for COMDATs was
introduced, the symbol rewriter was not updated, resulting in rewriting failing
for symbols which were placed into COMDATs. This corrects the behaviour and
adds test cases for this.
llvm-svn: 227261
|
| |
|
|
|
|
|
| |
The rewrite for the pattern based rewrite is unnecessary if the existing name
matches the pattern.
llvm-svn: 227260
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was introduced in a faulty refactoring (r225640, mea culpa):
the tests weren't testing the return values, so, for both
__strcpy_chk and __stpcpy_chk, we would return the end of the
buffer (matching stpcpy) instead of the beginning (for strcpy).
The root cause was the prefix "__" being ignored when comparing,
which made us always pick LibFunc::stpcpy_chk.
Pass the LibFunc::Func directly to avoid this kind of error.
Also, make the testcases as explicit as possible to prevent this.
The now-useful testcases expose another, entangled, stpcpy problem,
with the further simplification. This was introduced in a
refactoring (r225640) to match the original behavior.
However, this leads to problems when successive simplifications
generate several similar instructions, none of which are removed
by the custom replaceAllUsesWith.
For instance, InstCombine (the main user) doesn't erase the
instruction in its custom RAUW. When trying to simplify say
__stpcpy_chk:
- first, an stpcpy is created (fortified simplifier),
- second, a memcpy is created (normal simplifier), but the
stpcpy call isn't removed.
- third, InstCombine later revisits the instructions,
and simplifies the first stpcpy to a memcpy. We now have
two memcpys.
llvm-svn: 227250
|
| |
|
|
|
|
|
|
|
|
|
| |
Splitting a loop to make range checks redundant is profitable only if
the range check "never" fails. Make this fact a part of recognizing a
range check -- a branch is a range check only if it is expected to
pass (via branch_weights metadata).
Differential Revision: http://reviews.llvm.org/D7192
llvm-svn: 227249
|
| |
|
|
|
|
|
|
|
|
|
| |
If a memory access is unaligned, emit __tsan_unaligned_read/write
callbacks instead of __tsan_read/write.
Required to change semantics of __tsan_unaligned_read/write to not do the user memory.
But since they were unused (other than through __sanitizer_unaligned_load/store) this is fine.
Fixes long standing issue 17:
https://code.google.com/p/thread-sanitizer/issues/detail?id=17
llvm-svn: 227231
|
| |
|
|
|
|
|
|
|
|
|
| |
'is_zero_undef' flag.
This patch teaches the Instruction Combiner how to fold a cttz/ctlz followed by
a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared.
Added test InstCombine/select-cmp-cttz-ctlz.ll.
llvm-svn: 227197
|
| |
|
|
|
|
|
| |
Sanitizer coverage constructor must run after asan constructor (for each DSO).
Bump constructor priority to guarantee that.
llvm-svn: 227195
|
| |
|
|
|
|
| |
getSubtarget.
llvm-svn: 227172
|
| |
|
|
|
|
|
|
|
|
| |
LoopRotate wanted to avoid live range interference by looking at the
uses of a Value in the loop latch and seeing if any lied outside of the
loop. We would wrongly perform this operation on Constants.
This fixes PR22337.
llvm-svn: 227171
|
| |
|
|
| |
llvm-svn: 227170
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
object that manages a single run of this pass.
This was already essentially how it worked. Within the run function, it
would point members at *stack local* allocations that were only live for
a single run. Instead, it seems much cleaner to have a utility object
whose lifetime is clearly bounded by the run of the pass over the
function and can use member variables in a more direct way.
This also makes it easy to plumb the analyses used into it from the pass
and will make it re-usable with the new pass manager.
No functionality changed here, its just a refactoring.
llvm-svn: 227162
|
| |
|
|
|
|
|
| |
Phabricator revision: http://reviews.llvm.org/D7121
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
llvm-svn: 227149
|
| |
|
|
|
|
|
|
|
|
|
| |
unreachable
The range check would get optimized away later, but we might as well not emit
them in the first place.
http://reviews.llvm.org/D6471
llvm-svn: 227126
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An unreachable default destination can be exploited by other optimizations and
allows for more efficient lowering. Both the SDag switch lowering and
LowerSwitch can exploit unreachable defaults.
Also make TurnSwitchRangeICmp handle switches with unreachable default.
This is kind of separate change, but it cannot be tested without the change
above, and I don't want to land the change above without this since that would
regress other tests.
Differential Revision: http://reviews.llvm.org/D6471
llvm-svn: 227125
|
| |
|
|
|
|
|
|
| |
Tested by Transforms/SimplifyCFG/switch-to-br.ll's @unreachable function.
Differential Revision: http://reviews.llvm.org/D6471
llvm-svn: 227124
|
| |
|
|
|
|
| |
This fixes PR22306.
llvm-svn: 227077
|
| |
|
|
| |
llvm-svn: 227001
|
| |
|
|
|
|
|
| |
when refactoring for the new pass manager without introducing too many
formatting changes into meaning full diffs.
llvm-svn: 227000
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This just lifts the logic into a static helper function, sinks the
legacy pass to be a trivial wrapper of that helper fuction, and adds
a trivial wrapper for the new PM as well. Not much to see here.
I switched a test case to run in both modes, but we have to strip the
dead prototypes separately as that pass isn't in the new pass manager
(yet).
llvm-svn: 226999
|