| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
The PowerPC A2 core greatly benefits from aggressive concatenation unrolling;
use the new getUnrollingPreferences to enable this by default when targeting
the PPC A2 core.
llvm-svn: 190549
|
| |
|
|
| |
llvm-svn: 190491
|
| |
|
|
| |
llvm-svn: 190426
|
| |
|
|
|
|
|
|
|
| |
LLVM IR doesn't currently allow atomic bool load/store operations, and the
transformation is dubious anyway because it isn't profitable on all platforms.
PR17163.
llvm-svn: 190357
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Several architectures use the same instruction to perform both a comparison and
a subtract. The instruction selection framework does not allow to consider
different basic blocks to expose such fusion opportunities.
Therefore, these instructions are “merged” by CSE at MI IR level.
To increase the likelihood of CSE to apply in such situation, we reorder the
operands of the comparison, when they have the same complexity, so that they
matches the order of the most frequent subtract.
E.g.,
icmp A, B
...
sub B, A
<rdar://problem/14514580>
llvm-svn: 190352
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The work on this project was left in an unfinished and inconsistent state.
Hopefully someone will eventually get a chance to implement this feature, but
in the meantime, it is better to put things back the way the were. I have
left support in the bitcode reader to handle the case-range bitcode format,
so that we do not lose bitcode compatibility with the llvm 3.3 release.
This reverts the following commits: 155464, 156374, 156377, 156613, 156704,
156757, 156804 156808, 156985, 157046, 157112, 157183, 157315, 157384, 157575,
157576, 157586, 157612, 157810, 157814, 157815, 157880, 157881, 157882, 157884,
157887, 157901, 158979, 157987, 157989, 158986, 158997, 159076, 159101, 159100,
159200, 159201, 159207, 159527, 159532, 159540, 159583, 159618, 159658, 159659,
159660, 159661, 159703, 159704, 160076, 167356, 172025, 186736
llvm-svn: 190328
|
| |
|
|
|
|
| |
Context should be either null or MDNode.
llvm-svn: 190267
|
| |
|
|
|
|
|
|
| |
Field 2 of DIType (Context), field 9 of DIDerivedType (TypeDerivedFrom),
field 12 of DICompositeType (ContainingType), fields 2, 7, 12 of DISubprogram
(Context, Type, ContainingType).
llvm-svn: 190205
|
| |
|
|
| |
llvm-svn: 189975
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r189886.
I found a corner case where this optimization is not valid:
Say we have a "linkonce_odr unnamed_addr" in two translation units:
* In TU 1 this optimization kicks in and makes it hidden.
* In TU 2 it gets const merged with a constant that is *not* unnamed_addr,
resulting in a non unnamed_addr constant with default visibility.
* The static linker rules for combining visibility them produce a hidden
symbol, which is incorrect from the point of view of the non unnamed_addr
constant.
The one place we can do this is when we know that the symbol is not used from
another TU in the same shared object, i.e., during LTO. I will move it there.
llvm-svn: 189954
|
| |
|
|
|
|
|
|
|
|
| |
"(icmp op i8 A, B)" is equivalent to "(icmp op i8 (A & 0xff), B)" as a
degenerate case. Allowing this as a "masked" comparison when analysing "(icmp)
&/| (icmp)" allows us to combine them in more cases.
rdar://problem/7625728
llvm-svn: 189931
|
| |
|
|
|
|
|
|
|
|
|
| |
Even in cases which aren't universally optimisable like "(A & B) != 0 && (A &
C) != 0", the masks can make one of the comparisons completely redundant. In
this case, since we've gone to the effort of spotting masked comparisons we
should combine them.
rdar://problem/7625728
llvm-svn: 189930
|
| |
|
|
|
|
|
|
|
|
| |
Original message:
If a constant or a function has linkonce_odr linkage and unnamed_addr, mark
hidden. Being linkonce_odr guarantees that it is available in every dso that
needs it. Being a constant/function with unnamed_addr guarantees that the
copies don't have to be merged.
llvm-svn: 189886
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The reason that I am turning off this optimization is that there is an
additional case where a block can escape that has come up. Specifically, this
occurs when a block is used in a scope outside of its current scope.
This can cause a captured retainable object pointer whose life is preserved by
the objc_retainBlock to be deallocated before the block is invoked.
An example of the code needed to trigger the bug is:
----
\#import <Foundation/Foundation.h>
int main(int argc, const char * argv[]) {
void (^somethingToDoLater)();
{
NSObject *obj = [NSObject new];
somethingToDoLater = ^{
[obj self]; // Crashes here
};
}
NSLog(@"test.");
somethingToDoLater();
return 0;
}
----
In the next commit, I remove all the dead code that results from this.
Once I put in the fixing commit I will bring back the tests that I deleted in
this commit.
rdar://14802782.
rdar://14868830.
llvm-svn: 189869
|
| |
|
|
|
|
| |
some missing CHECK-LABELS.
llvm-svn: 189868
|
| |
|
|
|
|
|
|
| |
This is another one that doesn't matter much,
but uses the right GEP index types in the first
place.
llvm-svn: 189854
|
| |
|
|
|
|
|
|
|
| |
1) If the width of vectorization list candidate is bigger than vector reg width, we will break it down to fit the vector reg.
2) We do not vectorize the width which is not power of two.
The performance result shows it will help some spec benchmarks. mesa improved 6.97% and ammp improved 1.54%.
llvm-svn: 189830
|
| |
|
|
|
|
|
|
|
|
| |
available.
The existing code missed some edge cases when e.g. we're going to emit sqrtf but
only the availability of sqrt was checked. This happens on odd platforms like
windows.
llvm-svn: 189724
|
| |
|
|
|
|
|
|
|
| |
integer overflow.
PR17026. Also avoid undefined shifts and shift amounts larger than 64 bits
(those are always undef because we can't represent integer types that large).
llvm-svn: 189672
|
| |
|
|
| |
llvm-svn: 189547
|
| |
|
|
| |
llvm-svn: 189529
|
| |
|
|
| |
llvm-svn: 189527
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When unrolling is disabled in the pass manager, the loop vectorizer should also
not unroll loops. This will allow the -fno-unroll-loops option in Clang to
behave as expected (even for vectorizable loops). The loop vectorizer's
-force-vector-unroll option will (continue to) override the pass-manager
setting (including -force-vector-unroll=0 to force use of the internal
auto-selection logic).
In order to test this, I added a flag to opt (-disable-loop-unrolling) to force
disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also,
this fixes a small bug in opt where the loop vectorizer was enabled only after
the pass manager populated the queue of passes (the global_alias.ll test needed
a slight update to the RUN line as a result of this fix).
llvm-svn: 189499
|
| |
|
|
|
|
|
|
|
|
|
| |
The builder inserts from before the insert point,
not after, so this would insert before the last
instruction in the bundle instead of after it.
I'm not sure if this can actually be a problem
with any of the current insertions.
llvm-svn: 189285
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DICompositeType will have an identifier field at position 14. For now, the
field is set to null in DIBuilder.
For DICompositeTypes where the template argument field (the 13th field)
was optional, modify DIBuilder to make sure the template argument field is set.
Now DICompositeType has 15 fields.
Update DIBuilder to use NULL instead of "i32 0" for null value of a MDNode.
Update verifier to check that DICompositeType has 15 fields and the last
field is null or a MDString.
Update testing cases to include an extra field for DICompositeType.
The identifier field will be used by type uniquing so a front end can
genearte a DICompositeType with a unique identifer.
llvm-svn: 189282
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
profitable.
This patch enables unrolling of loops when vectorization is legal but not profitable.
We add a new class InnerLoopUnroller, that extends InnerLoopVectorizer and replaces some of the vector-specific logic with scalars.
This patch does not introduce any runtime regressions and improves the following workloads:
SingleSource/Benchmarks/Shootout/matrix -22.64%
SingleSource/Benchmarks/Shootout-C++/matrix -13.06%
External/SPEC/CINT2006/464_h264ref/464_h264ref -3.99%
SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -1.95%
llvm-svn: 189281
|
| |
|
|
| |
llvm-svn: 189248
|
| |
|
|
| |
llvm-svn: 189233
|
| |
|
|
| |
llvm-svn: 189079
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
stale to the point of not working and more resilient to debug info changes.
The current version of StripDeadDebugInfo became stale and no longer actually
worked since it was expecting an older version of debug info.
This patch updates it to use DebugInfoFinder and the modern DebugInfo classes as
much as possible to make it more redundent to such changes. Additionally, the
only place where that was avoided (the code where we replace the old sets with
the new), I call verify on the DIContextUnit implying that if the format changes
and my live set changes no longer make sense an assert will be hit. In order to
ensure that that occurs I have included a test case.
The actual stripping of the dead debug info follows the same strategy as was
used before in this class: find the live set and replace the old set in the
given compile unit (which may contain dead global variables/functions) with the
new live one.
llvm-svn: 189078
|
| |
|
|
|
|
|
|
|
| |
A single metadata will not span multiple lines. This also helps me with
my script to automatic update the testing cases.
A debug info testing case should have a llvm.dbg.cu.
Do not use hard-coded id for debug nodes.
llvm-svn: 189033
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
using GEPs. Previously, it used a number of different heuristics for
analyzing the GEPs. Several of these were conservatively correct, but
failed to fall back to SCEV even when SCEV might have given a reasonable
answer. One was simply incorrect in how it was formulated.
There was good code already to recursively evaluate the constant offsets
in GEPs, look through pointer casts, etc. I gathered this into a form
code like the SLP code can use in a previous commit, which allows all of
this code to become quite simple.
There is some performance (compile time) concern here at first glance as
we're directly attempting to walk both pointers constant GEP chains.
However, a couple of thoughts:
1) The very common cases where there is a dynamic pointer, and a second
pointer at a constant offset (usually a stride) from it, this code
will actually not do any unnecessary work.
2) InstCombine and other passes work very hard to collapse constant
GEPs, so it will be rare that we iterate here for a long time.
That said, if there remain performance problems here, there are some
obvious things that can improve the situation immensely. Doing
a vectorizer-pass-wide memoizer for each individual layer of pointer
values, their base values, and the constant offset is likely to be able
to completely remove redundant work and strictly limit the scaling of
the work to scrape these GEPs. Since this optimization was not done on
the prior version (which would still benefit from it), I've not done it
here. But if folks have benchmarks that slow down it should be straight
forward for them to add.
I've added a test case, but I'm not really confident of the amount of
testing done for different access patterns, strides, and pointer
manipulation.
llvm-svn: 189007
|
| |
|
|
| |
llvm-svn: 188980
|
| |
|
|
|
|
|
| |
This will make it easier to turn on struct-path aware TBAA since the metadata
format will change.
llvm-svn: 188944
|
| |
|
|
| |
llvm-svn: 188926
|
| |
|
|
| |
llvm-svn: 188919
|
| |
|
|
| |
llvm-svn: 188917
|
| |
|
|
|
|
|
|
|
|
|
| |
Update iterator when the SLP vectorizer changes the instructions in the basic
block by restarting the traversal of the basic block.
Patch by Yi Jiang!
Fixes PR 16899.
llvm-svn: 188832
|
| |
|
|
| |
llvm-svn: 188831
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.
In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.
In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).
llvm-svn: 188728
|
| |
|
|
| |
llvm-svn: 188721
|
| |
|
|
|
|
|
|
| |
Also fix it calculating the wrong value. The struct index
is not a ConstantInt, so it was being interpreted as an array
index.
llvm-svn: 188713
|
| |
|
|
|
|
| |
Re-add the inboundsless tests I didn't add originally
llvm-svn: 188710
|
| |
|
|
|
|
|
| |
* pow(x, 0.5) -> fabs(sqrt(x))
* pow(2.0, x) -> exp2(x)
llvm-svn: 188656
|
| |
|
|
| |
llvm-svn: 188529
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Instead of setting the suffixes in a bunch of places, just set one master
list in the top-level config. We now only modify the suffix list in a few
suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).
- Aside from removing the need for a bunch of lit.local.cfg files, this enables
4 tests that were inadvertently being skipped (one in
Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
XFAILED).
- This commit also fixes a bunch of config files to use config.root instead of
older copy-pasted code.
llvm-svn: 188513
|
| |
|
|
|
|
|
|
|
|
|
| |
When both constants are positive or both constants are negative,
InstCombine already simplifies comparisons like this, but when
it's exactly zero and -1, the operand sorting ends up reversed
and the pattern fails to match. Handle that special case.
Follow up for rdar://14689217
llvm-svn: 188512
|
| |
|
|
|
|
|
| |
This path wasn't tested before without a datalayout,
so add some more tests and re-run with and without one.
llvm-svn: 188507
|
| |
|
|
| |
llvm-svn: 188503
|
| |
|
|
|
|
|
|
|
| |
the input character is not converted to char before comparing with zero.
The patch was discussed in this thread:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184069.html
llvm-svn: 188489
|