| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 269511
|
| |
|
|
|
|
|
|
|
|
| |
increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...""
This reverts commit r269395.
Try to reapply with a fix from chapuni.
llvm-svn: 269486
|
| |
|
|
| |
llvm-svn: 269449
|
| |
|
|
| |
llvm-svn: 269447
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Enhancement to: http://reviews.llvm.org/rL269426
With discussion in: http://reviews.llvm.org/D17859
This should complete the fixes for: PR26701, PR26819:
https://llvm.org/bugs/show_bug.cgi?id=26701
https://llvm.org/bugs/show_bug.cgi?id=26819
llvm-svn: 269439
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This change fix the bug in isProfitableToUseMemset() where MaxIntSize shoule be in byte, not bit.
Reviewers: arsenm, joker.eph, mcrosier
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20176
llvm-svn: 269433
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR26701, PR26819)
*We don't currently handle the edge case constants (min/max values), so it's not a complete
canonicalization.
To fully solve the motivating bugs, we need to enhance this to recognize a zero vector
too because that's a ConstantAggregateZero which is a ConstantData, not a ConstantVector
or a ConstantDataVector.
Differential Revision: http://reviews.llvm.org/D17859
llvm-svn: 269426
|
| |
|
|
|
|
|
|
|
|
|
| |
tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..."
This reverts commit r269388.
It caused some bots to fail, I'm reverting it until I investigate the
issue.
llvm-svn: 269395
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...
Summary:
...loop after the last iteration.
This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.
This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.
The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.
Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.
Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.
We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.
Reviewers: chandlerc
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11758
llvm-svn: 269388
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
simplified.
Summary:
Currently we consider such instructions as simplified, which is incorrect,
because if their user isn't simplified, we can't actually simplify them too.
This biases our estimates of profitability: for instance the analyzer expects
much more gains from unrolling memcpy loops than there actually are.
Reviewers: hfinkel, chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17365
llvm-svn: 269387
|
| |
|
|
|
|
|
|
|
| |
Shifts beyond the bitwidth are undef but SCCP resolved them to zero.
Instead, DTRT and resolve them to undef.
This reimplements the transform which caused PR27712.
llvm-svn: 269269
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new verifier rule lets us unambigously pick a calling convention
when creating a new declaration for
`@llvm.experimental.deoptimize.<ty>`. It is also congruent with our
lowering strategy -- since all calls to `@llvm.experimental.deoptimize`
are lowered to calls to `__llvm_deoptimize`, it is reasonable to enforce
a unique calling convention.
Some of the tests that were breaking this verifier rule have had to be
split up into different .ll files.
The inliner was violating this rule as well, and has been fixed to avoid
producing invalid IR.
llvm-svn: 269261
|
| |
|
|
|
|
|
|
| |
defined."
This reverts commit r269105 as it caused PR27712.
llvm-svn: 269252
|
| |
|
|
| |
llvm-svn: 269138
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: In sample profile, some branches may have profile missing due to profile inaccuracy. We want existing branch probability still valid after propagation.
Reviewers: hfinkel, davidxl, spatel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19948
llvm-svn: 269137
|
| |
|
|
| |
llvm-svn: 269131
|
| |
|
|
| |
llvm-svn: 269129
|
| |
|
|
|
|
|
| |
This reverts commit r269125. It was in my tree when I ran "git svn dcommit".
It's really still under review.
llvm-svn: 269127
|
| |
|
|
|
|
|
|
| |
Sort of the BB-local equivalent to idiom-recognizer: if we have a basic-block
that really implements a memcpy operation, backends can benefit from seeing
this.
llvm-svn: 269125
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before r268509, Clang would disable the loop unroll pass when optimizing
for size. That commit enabled it to be able to support unroll pragmas
in -Os builds. However, this regressed binary size in one of Chromium's
DLLs with ~100 KB.
This restores the original behaviour of no unrolling at -Os, but doing it
in LLVM instead of Clang makes more sense, and also allows the pragmas to
keep working.
Differential revision: http://reviews.llvm.org/D20115
llvm-svn: 269124
|
| |
|
|
|
|
|
|
|
| |
This patch extend loopreroll to allow the instruction chain
of loop control only IV has sext.
Differential Revision: http://reviews.llvm.org/D19820
llvm-svn: 269121
|
| |
|
|
| |
llvm-svn: 269119
|
| |
|
|
| |
llvm-svn: 269117
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do simplifications common to all shift instructions based on the amount shifted:
1. If the shift amount is known larger than the bitwidth, the result is undefined.
2. If the valid bits of the shift amount are all known to be 0, it's a shift by zero, so the shift operand is the result.
Note that we could generalize the shift-by-zero transform into a shift-by-constant if all of the valid bits in the shift
amount are known, but that would have to be done in InstCombine rather than here because it would mean we need to create
a new shift instruction.
Differential Revision: http://reviews.llvm.org/D19874
llvm-svn: 269114
|
| |
|
|
|
|
|
|
|
|
| |
This patch adds support for two optimizations:
icmp ugt (udiv C2, X), C1 -> icmp ule X, C2/(C1+1)
icmp ult (udiv C2, X), C1 -> icmp ugt X, C2/C1
Differential Revision: http://reviews.llvm.org/D20123
llvm-svn: 269109
|
| |
|
|
|
|
|
|
|
|
| |
With this patch:
%r1 = lshr i64 -1, 4294967296 -> undef
Before this patch:
%r1 = lshr i64 -1, 4294967296 -> 0
llvm-svn: 269105
|
| |
|
|
|
|
|
|
| |
The LoopPassManager needs to calculate the loops analysis in order to
iterate over the loops at all. Requiring it is redundant and just adds
noise to the RUN lines here.
llvm-svn: 269097
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An oddity of the .ll syntax is that the "@var = " in
@var = global i32 42
is optional. Writing just
global i32 42
is equivalent to
@0 = global i32 42
This means that there is a pretty big First set at the top level. The
current implementation maintains it manually. I was trying to refactor
it, but then started wondering why keep it a all. I personally find the
above syntax confusing. It looks like something is missing.
This patch removes the feature and simplifies the parser.
llvm-svn: 269096
|
| |
|
|
|
|
|
|
|
| |
This patch extend loopreroll to allow the instruction chain
of loop control only IV has sext.
Differential Revision: http://reviews.llvm.org/D19820
llvm-svn: 269093
|
| |
|
|
|
|
| |
Put the test into a target specific directory.
llvm-svn: 269090
|
| |
|
|
|
|
|
| |
This patch extend loopreroll to allow the instruction chain
of loop control only IV has sext.
llvm-svn: 269084
|
| |
|
|
|
|
|
|
| |
This was a fairly simple patch but on closer inspection was seriously flawed and caused PR27690.
This reverts commit r268921.
llvm-svn: 269051
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Loop rotation clones instruction from the old header into the preheader. If
there were uses of values produced by these instructions that were outside
the loop, we have to insert PHI nodes to merge the two values. If the values
are used by DbgIntrinsics they will be used as a MetadataAsValue of a
ValueAsMetadata of the original values, and iterating all of the uses of the
original value will not update the DbgIntrinsics. The new code checks if the
values are used by DbgIntrinsics and if so, updates them using essentially
the same logic as the original code.
The attached testcase demonstrates the issue. Without the fix, the
DbgIntrinic outside the loop uses values computed inside the loop, even
though these values do not dominate the DbgIntrinsic.
Author: Thomas Jablin (tjablin)
Reviewers: dblaikie aprantl kbarton hfinkel cycheng
http://reviews.llvm.org/D19564
llvm-svn: 269034
|
| |
|
|
|
|
|
|
|
|
|
|
| |
When a va_start or va_copy is immediately followed by a va_end (ignoring
debug information or other start/end in between), then it is safe to
remove the pair. As this code shares some commonalities with the lifetime
markers, this has been factored to helper functions.
This InstCombine pattern kicks-in 3 times when running the LLVM test
suite.
llvm-svn: 269033
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
instrumentation generates a COMDAT symbol __llvm_profile_raw_version to overwrite the same symbol in profile run-time to distinguish IR profiles from Clang generated profiles. In MACHO, LinkOnceODR linkage is used due to the lack of COMDAT support."
This reverts commits r268969, r268979 and r268984. They had target specific test
in generic directories without the correct specifiers and made it hard for us to
come up with a good solution by rapidly committing untested changes.
This test needs to be in a target specific directory or have the correct REQUIRED
identifier.
llvm-svn: 269027
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow vectorization when the step is a loop-invariant variable.
This is the loop example that is getting vectorized after the patch:
int int_inc;
int bar(int init, int *restrict A, int N) {
int x = init;
for (int i=0;i<N;i++){
A[i] = x;
x += int_inc;
}
return x;
}
"x" is an induction variable with *loop-invariant* step.
But it is not a primary induction. Primary induction variable with non-constant step is not handled yet.
Differential Revision: http://reviews.llvm.org/D19258
llvm-svn: 269023
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: apilipenko, majnemer, reames
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20044
llvm-svn: 269008
|
| |
|
|
| |
llvm-svn: 268999
|
| |
|
|
|
|
|
|
|
|
| |
When we encounter unsafe memory dependencies, loop distribution could
help.
Even though, the diagnostics is in LAA, it's only currently emitted in
the vectorizer.
llvm-svn: 268987
|
| |
|
|
| |
llvm-svn: 268984
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D20077
llvm-svn: 268980
|
| |
|
|
| |
llvm-svn: 268979
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
IR instrumentation generates a COMDAT symbol __llvm_profile_raw_version to
overwrite the same symbol in profile run-time to distinguish IR profiles from
Clang generated profiles. In MACHO, LinkOnceODR linkage is used due to the
lack of COMDAT support.
But LinkOnceODR linkage might have .weak_def_can_be_hidden assembly directive,
while the weak variable in run-time has a .weak_definition directive. Linker
will not merge these two symbols even they have the same name. The end result
is IR profiles are not properly flagged in MACHO.
This patch changes the linkage for __llvm_profile_raw_version in each module to
LinkOnceAny so that it has same .weak_definition directive as in the run-time.
Differential Revision: http://reviews.llvm.org/D20078
llvm-svn: 268969
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D20036
llvm-svn: 268960
|
| |
|
|
| |
llvm-svn: 268949
|
| |
|
|
| |
llvm-svn: 268922
|
| |
|
|
|
|
| |
When deciding if a vector calculation can be done in a smaller bitwidth, use sign bit information from ValueTracking to add more information and allow more truncations.
llvm-svn: 268921
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This moves the code that handles stripping debug info intrinsic from
StripDebugInfo(Module) to StripDebugInfo(Function). The latter is
already walking every instructions so it makes sense to do it at the
same time.
This makes also stripDebugInfo(Function) as an API more useful: it
is really dropping every debug info in the Function.
Finally the existing code is trigerring an assertion when the Module
is not fully materialized.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 268847
|
| |
|
|
|
|
| |
Added 256-bit vector test as well
llvm-svn: 268811
|
| |
|
|
|
|
|
|
|
|
|
| |
Original Commit Message
Extend load/store type canonicalization to handle unordered operations
Extend the type canonicalization logic to work for unordered atomic loads and stores. Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before. Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered. If you see problems, feel free to revert this change, but please make sure you collect a test case.
Note that the concern about lowering is now much less likely. PR27490 proved that we already *were* mucking with the types of ordered atomics and volatiles. As a result, this change doesn't introduce as much new behavior as originally thought.
llvm-svn: 268809
|