| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 185262
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Changing the sign when comparing the base pointer would introduce all
sorts of unexpected things like:
%gep.i = getelementptr inbounds [1 x i8]* %a, i32 0, i32 0
%gep2.i = getelementptr inbounds [1 x i8]* %b, i32 0, i32 0
%cmp.i = icmp ult i8* %gep.i, %gep2.i
%cmp.i1 = icmp ult [1 x i8]* %a, %b
%cmp = icmp ne i1 %cmp.i, %cmp.i1
ret i1 %cmp
into:
%cmp.i = icmp slt [1 x i8]* %a, %b
%cmp.i1 = icmp ult [1 x i8]* %a, %b
%cmp = xor i1 %cmp.i, %cmp.i1
ret i1 %cmp
By preserving the original sign, we now get:
ret i1 false
This fixes PR16483.
llvm-svn: 185259
|
|
|
|
| |
llvm-svn: 185258
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Real world code sometimes has the denominator of a 'udiv' be a
'select'. LLVM can handle such cases but only when the 'select'
operands are symmetric in structure (both select operands are a constant
power of two or a left shift, etc.). This falls apart if we are dealt a
'udiv' where the code is not symetric or if the select operands lead us
to more select instructions.
Instead, we should treat the LHS and each select operand as a distinct
divide operation and try to optimize them independently. If we can
to simplify each operation, then we can replace the 'udiv' with, say, a
'lshr' that has a new select with a bunch of new operands for the
select.
llvm-svn: 185257
|
|
|
|
| |
llvm-svn: 185251
|
|
|
|
| |
llvm-svn: 185250
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We may, after other optimizations, find ourselves with IR that looks
like:
%shl = shl i32 1, %y
%cmp = icmp ult i32 %shl, 32
Instead, we should just compare the shift count:
%cmp = icmp ult i32 %y, 5
llvm-svn: 185242
|
|
|
|
|
|
|
| |
To support this we have to insert 'extractelement' instructions to pick the right lane.
We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated.
llvm-svn: 185230
|
|
|
|
|
|
|
|
|
| |
blocks.
In this code we keep track of pointers that we are allowed to read from, if they are accessed by non-predicated blocks.
We use this list to allow vectorization of conditional loads in predicated blocks because we know that these addresses don't segfault.
llvm-svn: 185214
|
|
|
|
|
|
| |
- should unbreak Windows builds
llvm-svn: 185198
|
|
|
|
|
|
| |
- missed svn add...
llvm-svn: 185194
|
|
|
|
|
|
|
|
| |
- Build debug metadata for 'bare' Modules using DIBuilder
- DebugIR can be constructed to generate an IR file (to be seen by a debugger)
or not in cases where the user already has an IR file on disk.
llvm-svn: 185193
|
|
|
|
| |
llvm-svn: 185168
|
|
|
|
|
|
|
|
|
| |
I used the class to safely reset the state of the builder's debug location. I
think I have caught all places where we need to set the debug location to a new
one. Therefore, we can replace the class by a function that just sets the debug
location.
llvm-svn: 185165
|
|
|
|
|
|
|
|
|
|
|
| |
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.
Also update testing cases to make them conform to the format of DI classes.
llvm-svn: 185135
|
|
|
|
|
|
| |
radar://14169017
llvm-svn: 185122
|
|
|
|
| |
llvm-svn: 185121
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
either zero/NaN but allowed you to arbitrarily set the category of the float.""
This reverts commit r185099.
Looks like both the ppc-64 and mips bots are still failing after I reverted this
change.
Since:
1. The mips bot always performs a clean build,
2. The ppc64-bot failed again after a clean build (I asked the ppc-64
maintainers to clean the bot which they did... Thanks Will!),
I think it is safe to assume that this change was not the cause of the failures
that said builders were seeing. Thus I am recomitting.
llvm-svn: 185111
|
|
|
|
|
|
|
|
|
|
|
|
| |
zero/NaN but allowed you to arbitrarily set the category of the float."
This reverts commit r185095. This is causing a FileCheck failure on
the 3dnow intrinsics on at least the mips/ppc bots but not on the x86
bots.
Reverting while I figure out what is going on.
llvm-svn: 185099
|
|
|
|
|
|
|
| |
Otherwise, we end up with an exponential IR blowup.
Fixes PR16472.
llvm-svn: 185097
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
but allowed you to arbitrarily set the category of the float.
The category which an APFloat belongs to should be dependent on the
actual value that the APFloat has, not be arbitrarily passed in by the
user. This will prevent inconsistency bugs where the category and the
actual value in APFloat differ.
I also fixed up all of the references to this constructor (which were
only in LLVM).
llvm-svn: 185095
|
|
|
|
|
|
|
|
|
| |
Use vectorized instruction instead of original instruction anchored in the
original loop.
Fixes PR16452 and t2075.c of PR16455.
llvm-svn: 185081
|
|
|
|
|
|
|
|
|
|
| |
When we store values for reversed induction stores we must not store the
reversed value in the vectorized value map. Another instruction might use this
value.
This fixes 3 test cases of PR16455.
llvm-svn: 185051
|
|
|
|
|
|
|
|
| |
The Builtin attribute is an attribute that can be placed on function call site that signal that even though a function is declared as being a builtin,
rdar://problem/13727199
llvm-svn: 185049
|
|
|
|
| |
llvm-svn: 185047
|
|
|
|
|
|
| |
post-order because we grow chains upwards.
llvm-svn: 185041
|
|
|
|
|
|
| |
outerloops from iterating over the instructions.
llvm-svn: 185040
|
|
|
|
|
|
| |
!APFloat.isDenormal => APFloat.isNormal.
llvm-svn: 185037
|
|
|
|
|
|
| |
This reverts commit r185020
llvm-svn: 185032
|
|
|
|
|
|
|
|
| |
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify.
llvm-svn: 185020
|
|
|
|
| |
llvm-svn: 184969
|
|
|
|
|
|
| |
consider them as a candidate for replacement of instructions to be visited.
llvm-svn: 184966
|
|
|
|
|
|
| |
more than the redzone size
llvm-svn: 184928
|
|
|
|
| |
llvm-svn: 184927
|
|
|
|
|
|
|
| |
debug statements to add a missing newline. Also canonicalize to '\n' instead of
"\n"; the latter calls a function with a loop the former does not.
llvm-svn: 184897
|
|
|
|
| |
llvm-svn: 184888
|
|
|
|
|
|
|
|
|
|
|
| |
When a 1-element vector alloca is promoted, a store instruction can often be
rewritten without converting the value to a scalar and using an insertelement
instruction to stuff it into the new alloca. This patch just adds a check
to skip that conversion when it is unnecessary. This turns out to be really
important for some ARM Neon operations where <1 x i64> is used to get around
the fact that i64 is not a legal type.
llvm-svn: 184870
|
|
|
|
| |
llvm-svn: 184827
|
|
|
|
| |
llvm-svn: 184749
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg
testers.
"LoopVectorize: Use the dependence test utility class
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.
We can now vectorize loops with simple constant dependence distances.
for (i = 8; i < 256; ++i) {
a[i] = a[i+4] * a[i+8];
}
for (i = 8; i < 256; ++i) {
a[i] = a[i-4] * a[i-8];
}
We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.
I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.
radar://13681598"
llvm-svn: 184724
|
|
|
|
|
|
|
| |
We are creating the runtime checks using this set so we need a deterministic
iteration order.
llvm-svn: 184723
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CGSCC pass manager. This should insulate the inlining decisions from the
vectorization decisions, however it may have both compile time and code
size problems so it is just an experimental option right now.
Adding this based on a discussion with Arnold and it seems at least
worth having this flag for us to both run some experiments to see if
this strategy is workable. It may solve some of the regressions seen
with the loop vectorizer.
llvm-svn: 184698
|
|
|
|
|
|
|
|
| |
This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac.
We are seeing a stage2 and stage3 miscompare on some dragonegg bots.
llvm-svn: 184690
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.
We can now vectorize loops with simple constant dependence distances.
for (i = 8; i < 256; ++i) {
a[i] = a[i+4] * a[i+8];
}
for (i = 8; i < 256; ++i) {
a[i] = a[i-4] * a[i-8];
}
We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.
I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.
radar://13681598
llvm-svn: 184685
|
|
|
|
|
|
|
|
|
|
|
|
| |
This class checks dependences by subtracting two Scalar Evolution access
functions allowing us to catch very simple linear dependences.
The checker assumes source order in determining whether vectorization is safe.
We currently don't reorder accesses.
Positive true dependencies need to be a multiple of VF otherwise we impede
store-load forwarding.
llvm-svn: 184684
|
|
|
|
|
|
|
| |
Sets of dependent accesses are built by unioning sets based on underlying
objects. This class will be used by the upcoming dependence checker.
llvm-svn: 184683
|
|
|
|
|
|
|
|
|
|
|
|
| |
Untill now we detected the vectorizable tree and evaluated the cost of the
entire tree. With this patch we can decide to trim-out branches of the tree
that are not profitable to vectorizer.
Also, increase the max depth from 6 to 12. In the worse possible case where all
of the code is made of diamond-shaped graph this can bring the cost to 2**10,
but diamonds are not very common.
llvm-svn: 184681
|
|
|
|
|
|
|
|
| |
sequences.
Make sure that we don't replace and RAUW two sequences if one does not dominate the other.
llvm-svn: 184674
|
|
|
|
|
|
| |
The RAII builder location guard is saving a reference to instructions, so we can't erase instructions during vectorization.
llvm-svn: 184671
|
|
|
|
| |
llvm-svn: 184660
|