| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
order of their operands across instructions. This allows for greater CSE opportunities.
llvm-svn: 156323
|
| |
|
|
|
|
|
|
| |
case when alloca's size is calculated within the "add/sub/... nsw".
Also added fix to 2011-06-13-nsw-alloca.ll test.
llvm-svn: 156231
|
| |
|
|
|
|
|
|
| |
being done for malloc)
fix a few typos found by Chad in my previous commit
llvm-svn: 156110
|
| |
|
|
| |
llvm-svn: 156102
|
| |
|
|
|
|
|
| |
methods. Use a weak value handle to keep up with this.
PR12245
llvm-svn: 155984
|
| |
|
|
|
|
| |
has no exit blocks. Fixes PR12706!
llvm-svn: 155884
|
| |
|
|
|
|
|
|
|
| |
<rdar://problem/11291436>.
This is a second attempt at a fix for this, the first was r155468. Thanks
to Chandler, Bob and others for the feedback that helped me improve this.
llvm-svn: 155866
|
| |
|
|
|
|
| |
known zero in the LHS. Fixes PR12541.
llvm-svn: 155818
|
| |
|
|
|
|
|
|
|
|
|
| |
Allow the "SplitCriticalEdge" function to split the edge to a landing pad. If
the pass is *sure* that it thinks it knows what it's doing, then it may go ahead
and specify that the landing pad can have its critical edge split. The loop
unswitch pass is one of these passes. It will split the critical edges of all
edges coming from a loop to a landing pad not within the loop. Doing so will
retain important loop analysis information, such as loop simplify.
llvm-svn: 155817
|
| |
|
|
|
|
| |
inputs.
llvm-svn: 155809
|
| |
|
|
|
|
|
|
|
|
| |
Target specific types should not be vectorized. As a practical matter,
these types are already register matched (at least in the x86 case),
and codegen does not always work correctly (at least in the ppc case,
and this is not worth fixing because ppc_fp128 is currently broken and
will probably go away soon).
llvm-svn: 155729
|
| |
|
|
|
|
| |
properly with how the code handles all-undef PHI nodes.
llvm-svn: 155721
|
| |
|
|
|
|
|
|
| |
vectors"
It broke stage2 build. stage1/clang sometimes crashed.
llvm-svn: 155699
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead of getAggregateElement. This has the advantage of being
more consistent and allowing higher-level constant folding to
procede even if an inner extract element cannot be folded.
Make ConstantFoldInstruction call ConstantFoldConstantExpression
on the instruction's operands, making it more consistent with
ConstantFoldConstantExpression itself. This makes sure that
ConstantExprs get TargetData-aware folding before being handed
off as operands for further folding.
This causes more expressions to be folded, but due to a known
shortcoming in constant folding, this currently has the side effect
of stripping a few more nuw and inbounds flags in the non-targetdata
side of constant-fold-gep.ll. This is mostly harmless.
This fixes rdar://11324230.
llvm-svn: 155682
|
| |
|
|
|
|
|
|
|
|
| |
(x & y) | (x ^ y) -> x | y
(x & y) + (x ^ y) -> x | y
Patch by Manman Ren.
rdar://10770603
llvm-svn: 155674
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
elements to minimize the number of multiplies required to compute the
final result. This uses a heuristic to attempt to form near-optimal
binary exponentiation-style multiply chains. While there are some cases
it misses, it seems to at least a decent job on a very diverse range of
inputs.
Initial benchmarks show no interesting regressions, and an 8%
improvement on SPASS. Let me know if any other interesting results (in
either direction) crop up!
Credit to Richard Smith for the core algorithm, and helping code the
patch itself.
llvm-svn: 155616
|
| |
|
|
| |
llvm-svn: 155532
|
| |
|
|
|
|
|
|
| |
in poor taste.
Talking through some alternate solutions with Chandler.
llvm-svn: 155530
|
| |
|
|
|
|
| |
Fix 12592. Patch by Matt Pharr.
llvm-svn: 155480
|
| |
|
|
|
|
| |
<rdar://problem/11291436>.
llvm-svn: 155468
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
constants in C++11 mode. I have no idea why it required such particular
circumstances to get here, the code seems clearly to rely upon unchecked
assumptions.
Specifically, when we decide to form an index into a struct type, we may
have gone through (at least one) zero-length array indexing round, which
would have left the offset un-adjusted, and thus not necessarily valid
for use when indexing the struct type.
This is just an canonicalization step, so the correct thing is to refuse
to canonicalize nonsensical GEPs of this form. Implemented, and test
case added.
Fixes PR12642. Pair debugged and coded with Richard Smith. =] I credit
him with most of the debugging, and preventing me from writing the wrong
code.
llvm-svn: 155466
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Original commit message:
Defer some shl transforms to DAGCombine.
The shl instruction is used to represent multiplication by a constant
power of two as well as bitwise left shifts. Some InstCombine
transformations would turn an shl instruction into a bit mask operation,
making it difficult for later analysis passes to recognize the
constsnt multiplication.
Disable those shl transformations, deferring them to DAGCombine time.
An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'.
These transformations are deferred:
(X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses)
(X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1)
(X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2)
The corresponding exact transformations are preserved, just like
div-exact + mul:
(X >>?,exact C) << C --> X
(X >>?,exact C1) << C2 --> X << (C2-C1)
(X >>?,exact C1) << C2 --> X >>?,exact (C1-C2)
The disabled transformations could also prevent the instruction selector
from recognizing rotate patterns in hash functions and cryptographic
primitives. I have a test case for that, but it is too fragile.
llvm-svn: 155362
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
1) Make the checked assertions a bit more precise. We really want the
canonical forms coming out of reassociate to be exactly what is
expected.
2) Remove other passes, and switch the test to actually directly check
that reassociate makes the important transforms and
canonicalizations.
3) Fold in a related test case now that we're using FileCheck. Make the
same tidying changes to it.
llvm-svn: 155311
|
| |
|
|
| |
llvm-svn: 155310
|
| |
|
|
|
|
|
|
|
| |
While the patch was perfect and defect free, it exposed a really nasty
bug in X86 SelectionDAG that caused an llc crash when compiling lencod.
I'll put the patch back in after fixing the SelectionDAG problem.
llvm-svn: 155181
|
| |
|
|
|
|
| |
loop repeatedlt making the same change. This is for rdar://11256239.
llvm-svn: 155160
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The shl instruction is used to represent multiplication by a constant
power of two as well as bitwise left shifts. Some InstCombine
transformations would turn an shl instruction into a bit mask operation,
making it difficult for later analysis passes to recognize the
constsnt multiplication.
Disable those shl transformations, deferring them to DAGCombine time.
An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'.
These transformations are deferred:
(X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses)
(X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1)
(X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2)
The corresponding exact transformations are preserved, just like
div-exact + mul:
(X >>?,exact C) << C --> X
(X >>?,exact C1) << C2 --> X << (C2-C1)
(X >>?,exact C1) << C2 --> X >>?,exact (C1-C2)
The disabled transformations could also prevent the instruction selector
from recognizing rotate patterns in hash functions and cryptographic
primitives. I have a test case for that, but it is too fragile.
llvm-svn: 155136
|
| |
|
|
| |
llvm-svn: 155081
|
| |
|
|
| |
llvm-svn: 155010
|
| |
|
|
| |
llvm-svn: 155009
|
| |
|
|
|
|
| |
Fixes build on MSVC
llvm-svn: 154970
|
| |
|
|
| |
llvm-svn: 154967
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is mostly to test the waters. I'd like to get results from FNT
build bots and other bots running on non-x86 platforms.
This feature has been pretty heavily tested over the last few months by
me, and it fixes several of the execution time regressions caused by the
inlining work by preventing inlining decisions from radically impacting
block layout.
I've seen very large improvements in yacr2 and ackermann benchmarks,
along with the expected noise across all of the benchmark suite whenever
code layout changes. I've analyzed all of the regressions and fixed
them, or found them to be impossible to fix. See my email to llvmdev for
more details.
I'd like for this to be in 3.1 as it complements the inliner changes,
but if any failures are showing up or anyone has concerns, it is just
a flag flip and so can be easily turned off.
I'm switching it on tonight to try and get at least one run through
various folks' performance suites in case SPEC or something else has
serious issues with it. I'll watch bots and revert if anything shows up.
llvm-svn: 154816
|
| |
|
|
|
|
|
|
|
|
| |
When vectorizing pointer types it is important to realize that potential
pairs cannot be connected via the address pointer argument of a load or store.
This is because even after vectorization, the address is still a scalar because
the address of the higher half of the pair is implicit from the address of the
lower half (it need not be, and should not be, explicitly computed).
llvm-svn: 154735
|
| |
|
|
| |
llvm-svn: 154734
|
| |
|
|
| |
llvm-svn: 154700
|
| |
|
|
|
|
|
| |
their argument as "escape" points for objc_retainBlock optimization.
This fixes rdar://11229925.
llvm-svn: 154682
|
| |
|
|
|
|
|
| |
library return value optimization for phi uses. Even when the
phi itself is not dominated, the specific use may be dominated.
llvm-svn: 154647
|
| |
|
|
|
|
|
| |
optimizing autorelease calls on phi nodes with null operands.
This fixes rdar://11207070.
llvm-svn: 154642
|
| |
|
|
|
|
|
|
| |
Take this opportunity to generalize the indirectbr bailout logic for
loop transformations. CFG transformations will never get indirectbr
right, and there's no point trying.
llvm-svn: 154386
|
| |
|
|
|
|
|
|
|
|
|
|
| |
GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).
llvm-svn: 154285
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
speculate. Without this, loop rotate (among many other places) would
suddenly stop working in the presence of debug info. I found this
looking at loop rotate, and have augmented its tests with a reduction
out of a very hot loop in yacr2 where failing to do this rotation costs
sometimes more than 10% in runtime performance, perturbing numerous
downstream optimizations.
This should have no impact on performance without debug info, but the
change in performance when debug info is enabled can be extreme. As
a consequence (and this how I got to this yak) any profiling of
performance problems should be treated with deep suspicion -- they may
have been wildly innacurate of debug info was enabled for profiling. =/
Just a heads up.
llvm-svn: 154263
|
| |
|
|
|
|
|
|
|
|
|
| |
simplification has been performed. This is a bit less efficient
(requires another ilist walk of the basic blocks) but shouldn't matter
in practice. More importantly, it's just too much work to keep track of
all the various ways the return instructions can be mutated while
simplifying them. This fixes yet another crasher, reported by Daniel
Dunbar.
llvm-svn: 154179
|
| |
|
|
|
|
| |
Smith for pointing this out in review.
llvm-svn: 154178
|
| |
|
|
|
|
| |
Matt for pointing this out.
llvm-svn: 154158
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
dead code, including dead return instructions in some cases. Otherwise,
we end up having a bogus poniter to a return instruction that blows up
much further down the road.
It turns out that this pattern is both simpler to code, easier to update
in the face of enhancements to the inliner cleanup, and likely cheaper
given that it won't add dead instructions to the list.
Thanks to John Regehr's numerous test cases for teasing this out.
llvm-svn: 154157
|
| |
|
|
|
|
| |
testcase slightly less trivial. This fixes rdar://11171718.
llvm-svn: 154118
|
| |
|
|
|
|
| |
the loop should be unrolled according the value of OptSizeUnrollThreshold.
llvm-svn: 154014
|
| |
|
|
|
|
|
|
| |
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.
llvm-svn: 154011
|
| |
|
|
|
|
|
|
|
|
| |
http://llvm.org/bugs/show_bug.cgi?id=12343
We have not trivial way for splitting edges that are goes from indirect branch. We can do it with some tricks, but it should be additionally discussed. And it is still dangerous due to difficulty of indirect branches controlling.
Fix forbids this case for unswitching.
llvm-svn: 153879
|