| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
| |
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.
llvm-svn: 220341
|
| |
|
|
|
|
|
|
|
| |
This function was complicated by the fact that it tried to perform
canonicalizations that were already preformed by InstSimplify. Remove
this extra code and move the tests over to InstSimplify. Add asserts to
make sure our preconditions hold before we make any assumptions.
llvm-svn: 220314
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.
To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.
These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.
I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.
llvm-svn: 220277
|
| |
|
|
|
|
|
|
|
| |
The newly introduced 'nonnull' metadata is analogous to existing 'nonnull' attributes, but applies to load instructions rather than call arguments or returns. Long term, it would be nice to combine these into a single construct. The value of the load is allowed to vary between successive loads, but null is not a valid value to be loaded by any load marked nonnull.
Reviewed by: Hal Finkel
Differential Revision: http://reviews.llvm.org/D5220
llvm-svn: 220240
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original code had an implicit assumption that if the test for
allocas or globals was reached, the two pointers were not equal. With my
changes to make the pointer analysis more powerful here, I also had to
guard against circumstances where the results weren't useful. That in
turn violated the assumption and gave rise to a circumstance in which we
could have a store with both the queried pointer and stored pointer
rooted at *the same* alloca. Clearly, we cannot ignore such a store.
There are other things we might do in this code to better handle the
case of both pointers ending up at the same alloca or global, but it
seems best to at least make the test explicit in what it intends to
check.
I've added tests for both the alloca and global case here.
llvm-svn: 220190
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
r220178. First, the creation routine doesn't insert prior to the
terminator of the basic block provided, but really at the end of the
basic block. Instead, get the terminator and insert before that. The
next issue was that we need to ensure multiple PHI node entries for
a single predecessor re-use the same cast instruction rather than
creating new ones.
All of the logic here was without tests previously. I've reduced and
added a test case from the test suite that crashed without both of these
fixes.
llvm-svn: 220186
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
logic to look through pointer casts, making them trivially stronger in
the face of loads and stores with intervening pointer casts.
I've included a few test cases that demonstrate the kind of folding
instcombine can do without pointer casts and then variations which
obfuscate the logic through bitcasts. Without this patch, the variations
all fail to optimize fully.
This is more important now than it has been in the past as I've started
moving the load canonicialization to more closely follow the value type
requirements rather than the pointer type requirements and thus this
needs to be prepared for more pointer casts. When I made the same change
to stores several test cases regressed without logic along these lines
so I wanted to systematically improve matters first.
llvm-svn: 220178
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
of InstCombine rather than just the bits enabled when datalayout is
optional.
The primary fixes here are because now things are little endian.
In good news, silliness like this seems like it will be going away as
we've got pretty stong consensus on dropping optional datalayout
entirely.
llvm-svn: 220176
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
loads.
This handles many more cases than just the AA metadata, some of them
suggested by Hal in his review of the AA metadata handling patch. I've
tried to test this behavior where tractable to do so.
I'll point out that I have specifically *not* included a test for
debuginfo because it was going to require 2 or 3 times as much work to
craft some input which would survive the "helpful" stripping of debug
info metadata that doesn't match the desired schema. This is another
good example of why the current state of write-ability for our debug
info metadata is unacceptable. I spent over 30 minutes trying to conjure
some test case that would survive, even copying from other debug info
tests, but it always failed to survive with no explanation of why or how
I might fix it. =[
llvm-svn: 220165
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
up to where it actually works as intended. The problem is that
a GlobalAlias isa GlobalValue and so the prior block handled all of the
cases.
This allows us to constant fold based on the actual constant expression
in the global alias. As an example, see the last function in the newly
added test case which explicitly aligns an unaligned pointer using
constant expression math. Without this change, we fail to see that and
fold an alignment test to zero.
llvm-svn: 220164
|
| |
|
|
|
|
|
|
|
|
|
| |
The following implements the transformation:
(sub (or A B) (xor A B)) --> (and A B).
Patch by Ankur Garg!
Differential Revision: http://reviews.llvm.org/D5719
llvm-svn: 220163
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following implements the optimization for sequences of the form:
icmp eq/ne (shl Const2, A), Const1
Such sequences can be transformed to:
icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2))
This handles only the equality operators for now. Other operators need
to be handled.
Patch by Ankur Garg!
llvm-svn: 220162
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
by my refactoring of this code.
The method isSafeToLoadUnconditionally assumes that the load will
proceed with the preferred type alignment. Given that, it has to ensure
that the alloca or global is at least that aligned. It has always done
this historically when a datalayout is present, but has never checked it
when the datalayout is absent. When I refactored the code in r220156,
I exposed this path when datalayout was present and that turned the
latent bug into a patent bug.
This fixes the issue by just removing the special case which allows
folding things without datalayout. This isn't worth the complexity of
trying to tease apart when it is or isn't safe without actually knowing
the preferred alignment.
llvm-svn: 220161
|
| |
|
|
|
|
| |
(...))).
llvm-svn: 220141
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
...)) and (load (cast ...)): canonicalize toward the former.
Historically, we've tried to load using the type of the *pointer*, and
tried to match that type as closely as possible removing as many pointer
casts as we could and trading them for bitcasts of the loaded value.
This is deeply and fundamentally wrong.
Repeat after me: memory does not have a type! This was a hard lesson for
me to learn working on SROA.
There is only one thing that should actually drive the type used for
a pointer, and that is the type which we need to use to load from that
pointer. Matching up pointer types to the loaded value types is very
useful because it minimizes the physical size of the IR required for
no-op casts. Similarly, the only thing that should drive the type used
for a loaded value is *how that value is used*! Again, this minimizes
casts. And in fact, the *only* thing motivating types in any part of
LLVM's IR are the types used by the operations in the IR. We should
match them as closely as possible.
I've ended up removing some tests here as they were testing bugs or
behavior that is no longer present. Mostly though, this is just cleanup
to let the tests continue to function as intended.
The only fallout I've found so far from this change was SROA and I have
fixed it to not be impeded by the different type of load. If you find
more places where this change causes optimizations not to fire, those
too are likely bugs where we are assuming that the type of pointers is
"significant" for optimization purposes.
llvm-svn: 220138
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This test is pretty awesome. It is claiming to test devirtualization.
However, the code in question is not in fact devirtualized by LLVM. If
you take the original C++ test case and run it through Clang at -O3 we
fail to devirtualize it completely. It also isn't a sufficiently focused
test case.
The *reason* we fail to devirtualize it isn't because of any missing
instcombine though. Instead, it is because we fail to emit an available
externally vtable and thus the vtable is just an external and completely
opaque. If I cause the vtable to be emitted, we successfully
devirtualize things.
Anyways, I'm just removing it because it is providing negative value at
this point: it isn't representative of the output of Clang really, LLVM
isn't doing the transform it claims to be testing, LLVM's failure to do
the transform isn't actually an LLVM bug at all and we shouldn't be
testing for it here, and finally the test is written in such a way that
it will trivially pass even when the point of the test is failing.
llvm-svn: 220137
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cases where the alloca type, the load types, and the store types used
all disagree.
Previously, the only way that vector-based promotion occured was if the
alloca type was a vector type. This was one of the *very* few remaining
uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns
out it was a bad idea.
The alloca type can change very easily based on the mixture of types
loaded and stored to that alloca. We shouldn't be relying on it as
a signal for very much. Instead, the source of truth should be loads and
stores. We should canonicalize the loads and stores as much as possible
and then rely on them exclusively in SROA.
When looking and loads and stores, we may find many different candidate
vector types. This change will let SROA try all of them to find a vector
type which is a viable way to promote the entire alloca to a vector
register.
With this change, it becomes possible to do better canonicalization and
optimization of loads and stores without breaking SROA in random ways,
and that should allow fixing a core source of performance loss in hot
numerical loops such as those in Eigen.
llvm-svn: 220116
|
| |
|
|
|
|
|
|
|
| |
This reverts commit r219899.
This also updates byval-tail-call.ll to make it clear what was breaking.
Adding r219899 again will cause the load/store to disappear.
llvm-svn: 220093
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DSE's overlap checking contained special logic, used only when no DataLayout
was available, which inferred a complete overwrite when the pointee types were
equal. This logic seems fine for regular loads/stores, but does not work for
memcpy and friends. Instead of fixing this, I'm just removing it.
Philosophically, transformations should not contain enhanced behavior used only
when data layout is lacking (data layout should be strictly additive), and
maintaining these rarely-tested code paths seems not worthwhile at this stage.
Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case
(slightly reduced from that provided by Aliaksei) replaces the original
contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few
other tests have been updated to have a data layout.
llvm-svn: 220035
|
| |
|
|
|
|
| |
These days -std-compile-opts was just a silly alias for -O3.
llvm-svn: 219951
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently, call slot optimization requires that if the destination is an
argument, the argument has the sret attribute. This is to ensure that
the memory access won't trap. In addition to sret, we can also allow the
optimization to happen for arguments that have the new dereferenceable
attribute, which gives the same guarantee.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5832
llvm-svn: 219950
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().
In the simplest case, this:
y = sqrt(x * x);
becomes this:
y = fabs(x);
This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.
Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.
Differential Revision: http://reviews.llvm.org/D5787
llvm-svn: 219944
|
| |
|
|
|
|
|
| |
The code committed in r219832 asserted when it attempted to shrink a switch
statement whose type was larger than 64-bit.
llvm-svn: 219902
|
| |
|
|
|
|
|
|
|
| |
Make tail recursion elimination a bit more aggressive. This allows us to get
tail recursion on functions that are just branches to a different function. The
fact that the function takes a byval argument does not restrict it from being
optimised into just a tail call.
llvm-svn: 219899
|
| |
|
|
| |
llvm-svn: 219884
|
| |
|
|
|
|
|
| |
This change breaks the asan buildbots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13468
llvm-svn: 219878
|
| |
|
|
|
|
|
|
|
| |
For pointer-typed function arguments, enhanced alignment can be asserted using
the 'align' attribute. When inlining, if this enhanced alignment information is
not otherwise available, preserve it using @llvm.assume-based alignment
assumptions.
llvm-svn: 219876
|
| |
|
|
|
|
|
|
|
|
|
|
| |
If x is known to have the range [a, b) in a loop predicated by (icmp
ne x, a), its range can be sharpened to [a + 1, b). Get
ScalarEvolution and hence IndVars to exploit this fact.
This change triggers an optimization to widen-loop-comp.ll, so it had
to be edited to get it to pass.
phabricator: http://reviews.llvm.org/D5639
llvm-svn: 219834
|
| |
|
|
|
|
|
|
|
| |
Truncate the operands of a switch instruction to a narrower type if the upper
bits are known to be all ones or zeros.
rdar://problem/17720004
llvm-svn: 219832
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SLP vectorizer should not vectorize ephemeral values. These are used to
express information to the optimizer, and vectorizing them does not lead to
faster code (because the ephemeral values are dropped prior to code generation,
vectorized or not), and obscures the information the instructions are
attempting to communicate (the logic that interprets the arguments to
@llvm.assume generically does not understand vectorized conditions).
Also, uses by ephemeral values are free (because they, and the necessary
extractelement instructions, will be dropped prior to code generation).
llvm-svn: 219816
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
A few minor changes to prevent @llvm.assume from interfering with loop
vectorization. First, treat @llvm.assume like the lifetime intrinsics, which
are scalarized (but don't otherwise interfere with the legality checking).
Second, ignore the cost of ephemeral instructions in the loop (these will go
away anyway during CodeGen).
Alignment assumptions and other uses of @llvm.assume can often end up inside of
loops that should be vectorized (this is not uncommon for assumptions generated
by __attribute__((align_value(n))), for example).
llvm-svn: 219741
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Eliminate library calls and intrinsic calls to fabs when the input
is a squared value.
Note that no unsafe-math / fast-math assumptions are needed for
this optimization.
Differential Revision: http://reviews.llvm.org/D5777
llvm-svn: 219717
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We assumed that A must be greater than B because the right hand side of
a remainder operator must be nonzero.
However, it is possible for A to be less than B if Pow2 is a power of
two greater than 1.
Take for example:
i32 %A = 0
i32 %B = 31
i32 Pow2 = 2147483648
((Pow2 << 0) >>u 31) is non-zero but A is less than B.
This fixes PR21274.
llvm-svn: 219713
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reapply r216913, a fix for PR20832 by Andrea Di Biagio. The commit was reverted
because of buildbot failures, and credit goes to Ulrich Weigand for isolating
the underlying issue (which can be confirmed by Valgrind, which does helpfully
light up like the fourth of July). Uli explained the problem with the original
patch as:
It seems the problem is calling multiplySignificand with an addend of category
fcZero; that is not expected by this routine. Note that for fcZero, the
significand parts are simply uninitialized, but the code in (or rather, called
from) multiplySignificand will unconditionally access them -- in effect using
uninitialized contents.
This version avoids using a category == fcZero addend within
multiplySignificand, which avoids this problem (the Valgrind output is also now
clean).
Original commit message:
[APFloat] Fixed a bug in method 'fusedMultiplyAdd'.
When folding a fused multiply-add builtin call, make sure that we propagate the
correct result in the case where the addend is zero, and the two other operands
are finite non-zero.
Example:
define double @test() {
%1 = call double @llvm.fma.f64(double 7.0, double 8.0, double 0.0)
ret double %1
}
Before this patch, the instruction simplifier wrongly folded the builtin call
in function @test to constant 'double 7.0'.
With this patch, method 'fusedMultiplyAdd' correctly evaluates the multiply and
propagates the expected result (i.e. 56.0).
Added test fold-builtin-fma.ll with the reproducible from PR20832 plus extra
test cases to verify the behavior of method 'fusedMultiplyAdd' in the presence
of NaN/Inf operands.
This fixes PR20832.
llvm-svn: 219708
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When LazyValueInfo uses @llvm.assume intrinsics to provide edge-value
constraints, we should check for intrinsics that dominate the edge's branch,
not just any potential context instructions. An assumption that dominates the
edge's branch represents a truth on that edge. This is specifically useful, for
example, if multiple predecessors assume a pointer to be nonnull, allowing us
to simplify a later null comparison.
The test case, and an initial patch, were provided by Philip Reames. Thanks!
llvm-svn: 219688
|
| |
|
|
|
|
|
| |
This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.
llvm-svn: 219656
|
| |
|
|
|
|
|
|
|
|
|
| |
We assumed that negation operations of the form (0 - %Z) resulted in a
negative number. This isn't true if %Z was originally negative.
Substituting the negative number into the remainder operation may result
in undefined behavior because the dividend might be INT_MIN.
This fixes PR21256.
llvm-svn: 219639
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We have a transform that changes:
(x lshr C1) udiv C2
into:
x udiv (C2 << C1)
However, it is unsafe to do so if C2 << C1 discards any of C2's bits.
This fixes PR21255.
llvm-svn: 219634
|
| |
|
|
| |
llvm-svn: 219587
|
| |
|
|
|
|
| |
<u C-1)
llvm-svn: 219585
|
| |
|
|
|
|
|
|
|
|
| |
Consider the case where X is 2. (2 <<s 31)/s-2147483648 is zero but we
would fold to X. Note that this is valid when we are in the unsigned
domain because we require NUW: 2 <<u 31 results in poison.
This fixes PR21245.
llvm-svn: 219568
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
consider:
C1 = INT_MIN
C2 = -1
C1 * C2 overflows without a doubt but consider the following:
%x = i32 INT_MIN
This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1.
N. B. Move the unsigned version of this transform to InstSimplify, it
doesn't create any new instructions.
This fixes PR21243.
llvm-svn: 219567
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
consider:
mul i32 nsw %x, -2147483648
this instruction will not result in poison if %x is 1
however, if we transform this into:
shl i32 nsw %x, 31
then we will be generating poison because we just shifted into the sign
bit.
This fixes PR21242.
llvm-svn: 219566
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LLVM Lang Ref states for signed/unsigned int to float conversions:
"If the value cannot fit in the floating point value, the results are undefined."
And for FP to signed/unsigned int:
"If the value cannot fit in ty2, the results are undefined."
This matches the C definitions.
The existing behavior pins to infinity or a max int value, but that may just
lead to more confusion as seen in:
http://llvm.org/bugs/show_bug.cgi?id=21130
Returning undef will hopefully lead to a less silent failure.
Differential Revision: http://reviews.llvm.org/D5603
llvm-svn: 219542
|
| |
|
|
|
|
|
|
|
|
|
|
| |
It also makes it more aggressive in querying range information by
adding a call to isKnownPredicateWithRanges to
isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond.
phabricator: http://reviews.llvm.org/D5638
Reviewed by: atrick, hfinkel
llvm-svn: 219532
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ScalarEvolution in the presence of multiple exits. Previously all
loops exits had to have identical counts for a loop trip count to be
considered computable. This pessimization was implemented by calling
getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock)
inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME
in the comments of that function). The pessimization was added to fix
a corner case involving undefined behavior (pr/16130). This patch more
precisely handles the undefined behavior case allowing the pessimization
to be removed.
ControlsExit replaces IsSubExpr to more precisely track the case where
undefined behavior is expected to occur. Because undefined behavior is
tracked more precisely we can remove MustExit from ExitLimit. MustExit
was used to track the case where the limit was computed potentially
assuming undefined behavior even if undefined behavior didn't necessarily
occur.
llvm-svn: 219517
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead
We used to transform this:
define void @test6(i1 %cond, i8* %ptr) {
entry:
br i1 %cond, label %bb1, label %bb2
bb1:
br label %bb2
bb2:
%ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
store i8 2, i8* %ptr.2, align 8
ret void
}
into this:
define void @test6(i1 %cond, i8* %ptr) {
%ptr.2 = select i1 %cond, i8* null, i8* %ptr
store i8 2, i8* %ptr.2, align 8
ret void
}
because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).
The existing transformation that removes unreachable control flow in simplifycfg
is:
/// If BB has an incoming value that will always trigger undefined behavior
/// (eg. null pointer dereference), remove the branch leading here.
static bool removeUndefIntroducingPredecessor(BasicBlock *BB)
Now we generate:
define void @test6(i1 %cond, i8* %ptr) {
store i8 2, i8* %ptr.2, align 8
ret void
}
I did not see any impact on the test-suite + externals.
rdar://18596215
llvm-svn: 219462
|
| |
|
|
|
|
|
| |
Phabricator Revision: http://reviews.llvm.org/D5674
PR21205
llvm-svn: 219434
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
negative values.
This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we
wrongly computed the distance between the highest bits set of two negative
values.
This fixes PR21222.
Differential Revision: http://reviews.llvm.org/D5700
llvm-svn: 219406
|
| |
|
|
|
|
|
|
|
|
| |
A function with discardable linkage cannot be discarded if its a member
of a COMDAT group without considering all the other COMDAT members as
well. This sort of thing is already handled by GlobalOpt/GlobalDCE.
This fixes PR21206.
llvm-svn: 219335
|