| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
Also make some static function class functions to avoid having to mention the
class namespace for enums all the time.
No functionality change intended.
llvm-svn: 179886
|
|
|
|
| |
llvm-svn: 179826
|
|
|
|
|
|
|
| |
If the return type is a pointer and the call returns an integer, then do the
inttoptr convertions. And vice versa.
llvm-svn: 179817
|
|
|
|
| |
llvm-svn: 179789
|
|
|
|
|
|
| |
limitation that extract is promoted over a cast only if the cast has only one use.
llvm-svn: 179786
|
|
|
|
|
|
| |
it has only 2 uses: one to promote the vector phi in a loop and the other use is an extract operation of one element at a constant location.
llvm-svn: 179783
|
|
|
|
| |
llvm-svn: 179775
|
|
|
|
|
|
|
|
|
|
|
|
| |
A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y)
sequence in LLVM. If we see such a sequence we can treat it just as any other
commutative binary instruction and reduce it.
This appears to help bzip2 by about 1.5% on an imac12,2.
radar://12960601
llvm-svn: 179773
|
|
|
|
|
|
| |
Fixes PR15748.
llvm-svn: 179757
|
|
|
|
|
|
| |
It is causing stage2 builds to fail, let's get them running again.
llvm-svn: 179750
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Simplify:
(select (icmp eq (and X, C1), 0), Y, (or Y, C2))
Into:
(or (shl (and X, C1), C3), y)
Where:
C3 = Log(C2) - Log(C1)
If:
C1 and C2 are both powers of two
llvm-svn: 179748
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
outside said for loop in the presense of differing provenance caused by escaping blocks.
This occurs due to an alloca representing a separate ownership from the
original pointer. Thus consider the following pseudo-IR:
objc_retain(%a)
for (...) {
objc_retain(%a)
%block <- %a
F(%block)
objc_release(%block)
}
objc_release(%a)
From the perspective of the optimizer, the %block is a separate
provenance from the original %a. Thus the optimizer pairs up the inner
retain for %a and the outer release from %a, resulting in segfaults.
This is fixed by noting that the signature of a mismatch of
retain/releases inside the for loop is a Use/CanRelease top down with an
None bottom up (since bottom up the Retain-CanRelease-Use-Release
sequence is completed by the inner objc_retain, but top down due to the
differing provenance from the objc_release said sequence is not
completed). In said case in CheckForCFGHazards, we now clear the state
of %a implying that no pairing will occur.
Additionally a test case is included.
rdar://12969722
llvm-svn: 179747
|
|
|
|
| |
llvm-svn: 179746
|
|
|
|
|
|
| |
ssa identifier.
llvm-svn: 179729
|
|
|
|
| |
llvm-svn: 179721
|
|
|
|
|
|
| |
EnableCheckForCFGHazards, EnableARCOptimizations.
llvm-svn: 179718
|
|
|
|
| |
llvm-svn: 179717
|
|
|
|
|
|
| |
Differential Revision: http://llvm-reviews.chandlerc.com/D620
llvm-svn: 179661
|
|
|
|
|
|
|
|
|
|
|
| |
If a switch instruction has a case for every possible value of its type,
with the same successor, SimplifyCFG would replace it with an icmp ult,
but the computation of the bound overflows in that case, which inverts
the test.
Patch by Jed Davis!
llvm-svn: 179587
|
|
|
|
|
|
|
|
| |
Two return types are not equivalent if one is a pointer and the other is an
integral. This is because we cannot bitcast a pointer to an integral value.
PR15185
llvm-svn: 179569
|
|
|
|
|
|
| |
vector-gather sequence out of loops.
llvm-svn: 179562
|
|
|
|
| |
llvm-svn: 179542
|
|
|
|
|
|
| |
-fslp-vectorize run the slp-vectorizer.
llvm-svn: 179508
|
|
|
|
| |
llvm-svn: 179505
|
|
|
|
|
|
| |
instructions.
llvm-svn: 179504
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
One performs: (X == 13 | X == 14) -> X-13 <u 2
The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1
The problem is that there are certain values of C1 and C2 that
trigger both transforms but the first one blocks out the second,
this generates suboptimal code.
Reordering the transforms should be better in every case and
allows us to do interesting stuff like turn:
%shr = lshr i32 %X, 4
%and = and i32 %shr, 15
%add = add i32 %and, -14
%tobool = icmp ne i32 %add, 0
into:
%and = and i32 %X, 240
%tobool = icmp ne i32 %and, 224
llvm-svn: 179493
|
|
|
|
| |
llvm-svn: 179483
|
|
|
|
| |
llvm-svn: 179479
|
|
|
|
|
|
| |
and add the cost of extracting values from the roots of the tree.
llvm-svn: 179475
|
|
|
|
| |
llvm-svn: 179470
|
|
|
|
|
|
| |
There is a Constant with non-constant operands: blockaddress.
llvm-svn: 179460
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is basically the same fix in three different places. We use a set to avoid
walking the whole tree of a big ConstantExprs multiple times.
For example: (select cmp, (add big_expr 1), (add big_expr 2))
We don't want to visit big_expr twice here, it may consist of thousands of
nodes.
The testcase exercises this by creating an insanely large ConstantExprs out of
a loop. It's questionable if the optimizer should ever create those, but this
can be triggered with real C code. Fixes PR15714.
llvm-svn: 179458
|
|
|
|
|
|
| |
Fixes PR15737.
llvm-svn: 179417
|
|
|
|
|
|
| |
perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from.
llvm-svn: 179414
|
|
|
|
| |
llvm-svn: 179412
|
|
|
|
|
|
|
|
| |
The transform will execute like so:
(A & ~B) == 0 --> (A & B) != 0
(A & ~B) != 0 --> (A & B) == 0
llvm-svn: 179386
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.
Example:
int a[] = { 2, 5, 2, 2}
int x = 80;
for()
x /= a[i];
Scalar:
x /= 2 // = 40
x /= 5 // = 8
x /= 2 // = 4
x /= 2 // = 2
Vectorized:
<80, 1> / <2,5> //= <40,0>
<40, 0> / <2,2> //= <20,0>
20*0 = 0
radar://13640654
llvm-svn: 179381
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allows LLVM to optimize sequences like the following:
%add = add nsw i32 %x, 1
%cmp = icmp sgt i32 %add, %y
into:
%cmp = icmp sge i32 %x, %y
as well as:
%add1 = add nsw i32 %x, 20
%add2 = add nsw i32 %y, 57
%cmp = icmp sge i32 %add1, %add2
into:
%add = add nsw i32 %y, 37
%cmp = icmp sle i32 %cmp, %x
llvm-svn: 179316
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to collapse sequences of insertelement/extractelement
instructions into single shuffle instructions, there is one specific
case where the Instruction Combiner wrongly updates the resulting
Mask of shuffle indexes.
The problem is in function CollectShuffleElments.
If we have a sequence of insert/extract element instructions
like the one below:
%tmp1 = extractelement <4 x float> %LHS, i32 0
%tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1
%tmp3 = extractelement <4 x float> %RHS, i32 2
%tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3
Where:
. %RHS will have a mask of [4,5,6,7]
. %LHS will have a mask of [0,1,2,3]
The Mask of shuffle indexes is wrongly computed to [4,1,6,7]
instead of [4,0,6,7].
When analyzing %tmp2 in order to compute the Mask for the
resulting shuffle instruction, the algorithm forgets to update
the mask index at position 1 with the index associated to the
element extracted from %LHS by instruction %tmp1.
Patch by Andrea DiBiagio!
llvm-svn: 179291
|
|
|
|
| |
llvm-svn: 179280
|
|
|
|
|
|
| |
expose it in the header file.
llvm-svn: 179272
|
|
|
|
|
|
| |
function calls when we check if it is safe to sink instructions.
llvm-svn: 179207
|
|
|
|
| |
llvm-svn: 179206
|
|
|
|
|
|
|
| |
rather than checking if the source and destination have the same number of
arguments and copying the attributes over directly.
llvm-svn: 179169
|
|
|
|
| |
llvm-svn: 179132
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:
1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).
2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.
3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.
This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:
void SAXPY(int *x, int *y, int a, int i) {
x[i] = a * x[i] + y[i];
x[i+1] = a * x[i+1] + y[i+1];
x[i+2] = a * x[i+2] + y[i+2];
x[i+3] = a * x[i+3] + y[i+3];
}
llvm-svn: 179117
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
invalidation in Reassociate.
I brazenly think this change is slightly simpler than r178793 because:
- no "state" in functor
- "OpndPtrs[i]" looks simpler than "&Opnds[OpndIndices[i]]"
While I can reproduce the probelm in Valgrind, it is rather difficult to come up
a standalone testing case. The reason is that when an iterator is invalidated,
the stale invalidated elements are not yet clobbered by nonsense data, so the
optimizer can still proceed successfully.
Thank Benjamin for fixing this bug and generously providing the test case.
llvm-svn: 179062
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The fix for PR14972 in r177055 introduced a real think-o in the *store*
side, likely because I was much more focused on the load side. While we
can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
widen a value to be stored, as that changes the width of memory access!
Lock down the code path in the store rewriting which would do this to
only handle the intended circumstance.
All of the existing tests continue to pass, and I've added a test from
the PR.
llvm-svn: 178974
|
|
|
|
| |
llvm-svn: 178932
|
|
|
|
|
|
|
| |
This is the counterpart to commit r160637, except it performs the action
in the bottomup portion of the data flow analysis.
llvm-svn: 178922
|