| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
| |
creating critical edge.
Differential Revision: https://reviews.llvm.org/D29976
llvm-svn: 295145
|
| |
|
|
|
|
|
|
|
|
|
| |
decides whether to apply them. NFCI.
The idea is that the apply* functions will also be called when importing
devirt optimizations.
Differential Revision: https://reviews.llvm.org/D29745
llvm-svn: 295144
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Multiple blocks in the callee can be mapped to a single cloned block
since we prune the callee as we clone it. The existing code
iterates over the value map and clones the block frequency (and
eventually scales the frequencies of the cloned blocks). Value map's
iteration is not deterministic and so the cloned block might get the
frequency of any of the original blocks. The fix is to set the max of
the original frequencies to the cloned block. The first block in the
sequence must have this max frequency and, in the call context,
subsequent blocks must have its frequency.
Differential Revision: https://reviews.llvm.org/D29696
llvm-svn: 295115
|
| |
|
|
| |
llvm-svn: 295111
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Group calls into constant and non-constant arguments up front, and use uint64_t
instead of ConstantInt to represent constant arguments. The goal is to allow
the information from the summary to fit naturally into this data structure in
a future change (specifically, it will be added to CallSiteInfo).
This has two side effects:
- We disallow VCP for constant integer arguments of width >64 bits.
- We remove the restriction that the bitwidth of a vcall's argument and return
types must match those of the vfunc definitions.
I don't expect either of these to matter in practice. The first case is
uncommon, and the second one will lead to UB (so we can do anything we like).
Differential Revision: https://reviews.llvm.org/D29744
llvm-svn: 295110
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
created in SplitBlockPredecessors
Summary:
When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of
```
1 typedef struct _node_t {
2 struct _node_t *next;
3 } node_t;
4
5 extern node_t *root;
6
7 int foo() {
8 node_t *node, *tmp;
9 int ret = 0;
10
11 node = tmp = root->next;
12 while (node != root) {
13 while (node) {
14 tmp = node;
15 node = node->next;
16 ret++;
17 }
18 }
19
20 return ret;
21 }
```
, below is the basicblock corresponding to line 12 after Reassociate expressions pass:
```
while.cond: ; preds = %while.cond2, %entry
%node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ]
%ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ]
tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21
tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31
%cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33
br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35
```
As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is
```
!21 = !DILocation(line: 9, column: 7, scope: !6)
```
, which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction.
Reviewers: dblaikie, samsonov, eugenis
Reviewed By: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29867
llvm-svn: 295106
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts 295092 (re-applies 295084), with a fix for dangling
references from the array of coverage names passed down from frontends.
I missed this in my initial testing because I only checked test/Profile,
and not test/CoverageMapping as well.
Original commit message:
The profile name variables passed to counter increment intrinsics are dead
after we emit the finalized name data in __llvm_prf_nm. However, we neglect to
erase these name variables. This causes huge size increases in the
__TEXT,__const section as well as slowdowns when linker dead stripping is
disabled. Some affected projects are so massive that they fail to link on
Darwin, because only the small code model is supported.
Fix the issue by throwing away the name constants as soon as we're done with
them.
Differential Revision: https://reviews.llvm.org/D29921
llvm-svn: 295099
|
| |
|
|
|
|
|
|
| |
This reverts commit r295084. There is a test failure on:
http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/2620/
llvm-svn: 295092
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The profile name variables passed to counter increment intrinsics are
dead after we emit the finalized name data in __llvm_prf_nm. However, we
neglect to erase these name variables. This causes huge size increases
in the __TEXT,__const section as well as slowdowns when linker dead
stripping is disabled. Some affected projects are so massive that they
fail to link on Darwin, because only the small code model is supported.
Fix the issue by throwing away the name constants as soon as we're done
with them.
Differential Revision: https://reviews.llvm.org/D29921
llvm-svn: 295084
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As written in the comments above, LastCallToStaticBonus is already applied to
the cost if Caller has only one user, so it is redundant to reapply the bonus
here.
If the only user is not a caller, TotalSecondaryCost will not be adjusted
anyway because callerWillBeRemoved is false. If there's no caller at all, we
don't need to care about TotalSecondaryCost because
inliningPreventsSomeOuterInline is false.
Reviewers: chandlerc, eraman
Reviewed By: eraman
Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D29169
llvm-svn: 295075
|
| |
|
|
| |
llvm-svn: 295066
|
| |
|
|
|
|
|
|
|
|
|
| |
This reapplies commit r294967 with a fix for the execution time regressions
caught by the clang-cmake-aarch64-quick bot. We now extend the truncate
optimization to non-primary induction variables only if the truncate isn't
already free.
Differential Revision: https://reviews.llvm.org/D29847
llvm-svn: 295063
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
back into a vector
Previously the cost of the existing ExtractElement/ExtractValue
instructions was considered as a dead cost only if it was detected that
they have only one use. But these instructions may be considered
dead also if users of the instructions are also going to be vectorized,
like:
```
%x0 = extractelement <2 x float> %x, i32 0
%x1 = extractelement <2 x float> %x, i32 1
%x0x0 = fmul float %x0, %x0
%x1x1 = fmul float %x1, %x1
%add = fadd float %x0x0, %x1x1
```
This can be transformed to
```
%1 = fmul <2 x float> %x, %x
%2 = extractelement <2 x float> %1, i32 0
%3 = extractelement <2 x float> %1, i32 1
%add = fadd float %2, %3
```
because though `%x0` and `%x1` have 2 users each other, these users are
part of the vectorized tree and we can consider these `extractelement`
instructions as dead.
Differential Revision: https://reviews.llvm.org/D29900
llvm-svn: 295056
|
| |
|
|
|
|
|
|
|
| |
accesses"
This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed.
I'm reverting to investigate.
llvm-svn: 295042
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.
This is fixing pr31900.
Reviewers: HaoLiu, mssimpso, mkuper
Reviewed By: mssimpso, mkuper
Subscribers: llvm-commits, efriedma, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29717
llvm-svn: 295038
|
| |
|
|
|
|
| |
Removing whitespace.
llvm-svn: 295037
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Function isCompatibleIVType is already used as a guard before the call to
SE.getMinusSCEV(OperExpr, PrevExpr);
in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions
to be of the same type, so we now consider two pointers with different
address spaces to be incompatible, since it is possible that the pointers
in fact have different sizes.
Reviewers: qcolombet, eli.friedman
Reviewed By: qcolombet
Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29885
llvm-svn: 295033
|
| |
|
|
|
|
|
|
| |
functions to merged module.
Differential Revision: https://reviews.llvm.org/D29701
llvm-svn: 295021
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled.
Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct.
It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following:
while(b) {
load i128 ...
if (can lower i128 atomic)
store atomic i128 ...
else
store i128
}
It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically.
It's not clear we need to be this conservative - arguably the program above is brocken since it can't be lowered unless the branch is folded - but I didn't want to have to fix any fallout which might result.
Differential Revision: https://reviews.llvm.org/D15592
llvm-svn: 295015
|
| |
|
|
|
|
|
|
|
|
|
| |
specific copy of a function. NFC.
This will later be used by ThinLTOBitcodeWriter to add copies of readnone
functions to the regular LTO module.
Differential Revision: https://reviews.llvm.org/D29695
llvm-svn: 295008
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to its parent function
As discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108182.html
...we should be able to propagate 'nonnull' info from a callsite back to its parent.
The original motivation for this patch is our botched optimization of "dyn_cast" (PR28430),
but this won't solve that problem.
The transform is currently disabled by default while we wait for clang to work-around
potential security problems:
http://lists.llvm.org/pipermail/cfe-dev/2017-January/052066.html
Differential Revision: https://reviews.llvm.org/D27855
llvm-svn: 294998
|
| |
|
|
|
|
|
|
|
|
| |
Make the whole thing testable by adding YAML I/O support for the WPD
summary information and adding some negative tests that exercise the
YAML support.
Differential Revision: https://reviews.llvm.org/D29782
llvm-svn: 294981
|
| |
|
|
|
|
|
|
| |
This reverts commit r294967. This patch caused execution time slowdowns in a
few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot.
I'm reverting to investigate.
llvm-svn: 294973
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch extends the optimization of truncations whose operand is an
induction variable with a constant integer step. Previously we were only
applying this optimization to the primary induction variable. However, the cost
model assumes the optimization is applied to the truncation of all integer
induction variables (even regardless of step type). The transformation is now
applied to the other induction variables, and I've updated the cost model to
ensure it is better in sync with the transformation we actually perform.
Differential Revision: https://reviews.llvm.org/D29847
llvm-svn: 294967
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reductions.
Currently, LLVM supports vectorization of horizontal reduction
instructions with initial value set to 0. Patch supports vectorization
of reduction with non-zero initial values. Also, it supports a
vectorization of instructions with some extra arguments, like:
```
float f(float x[], int a, int b) {
float p = a % b;
p += x[0] + 3;
for (int i = 1; i < 32; i++)
p += x[i];
return p;
}
```
Patch allows vectorization of this kind of horizontal reductions.
Differential Revision: https://reviews.llvm.org/D29727
llvm-svn: 294934
|
| |
|
|
|
|
| |
deadness
llvm-svn: 294926
|
| |
|
|
| |
llvm-svn: 294925
|
| |
|
|
|
|
| |
This reverts commit r294919
llvm-svn: 294923
|
| |
|
|
| |
llvm-svn: 294922
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This adds support for placing predicateinfo such that it affects critical edges.
This fixes the issues mentioned by Nuno on the mailing list.
Depends on D29519
Reviewers: davide, nlopes
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29606
llvm-svn: 294921
|
| |
|
|
| |
llvm-svn: 294920
|
| |
|
|
| |
llvm-svn: 294919
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I found one special case of this transform for 'slt 0', so I removed that and added the general transform.
Alive code to check correctness:
Name: slt_no_overflow
Pre: WillNotOverflowSignedSub(C1, C2)
%a = add nsw i8 %x, C2
%b = icmp slt %a, C1
=>
%b = icmp slt %x, C1 - C2
Name: sgt_no_overflow
Pre: WillNotOverflowSignedSub(C1, C2)
%a = add nsw i8 %x, C2
%b = icmp sgt %a, C1
=>
%b = icmp sgt %x, C1 - C2
http://rise4fun.com/Alive/MH
Differential Revision: https://reviews.llvm.org/D29774
llvm-svn: 294898
|
| |
|
|
| |
llvm-svn: 294851
|
| |
|
|
| |
llvm-svn: 294850
|
| |
|
|
| |
llvm-svn: 294849
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
it is dead or unreachable, as it should be.
This also makes the leader of INITIAL undef, enabling us to handle
irreducibility properly.
Summary:
This lets us verify, more than we do now, that we didn't screw up
value numbering.
Reviewers: davide
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D29842
llvm-svn: 294844
|
| |
|
|
| |
llvm-svn: 294837
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The patch adds instructions number generated by a solution
to LSR cost under "-lsr-insns-cost" option.
Reviewers: qcolombet, hfinkel
Differential Revision: http://reviews.llvm.org/D28307
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 294821
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
outerloop.
The recommit includes some changes of testcases. No functional change to the patch.
In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr,
and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser
and dropped.
Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only
handle inner loop now so only %for.body2 will be handled.
Using the logic above, formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped
no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related
with outerloop. Only formula like
reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept
because the SCEVAddRecExpr related with outerloop is folded into the initial value of the
SCEVAddRecExpr related with current loop.
But in some cases, we do need to share the basic induction variable
reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction
variables used by LSR, so we don't want to drop the formula like
reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally.
From the existing comment, it tries to avoid considering multiple level loops at the same time.
However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other
than current loop, it is an invariant and will be simple to handle, and the formula doesn't have
to be dropped.
Differential Revision: https://reviews.llvm.org/D26429
llvm-svn: 294814
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was marking the loop for deletion after the loop was deleted. This
almost works, except that when we do any kind of debug logging it starts
reading the name of the loop from deleted memory or otherwise blowing
up. This can fail in a bunch of ways. I recently added a test that
*always* does this, and it started failing on the sanitizer bots.
The fix is to mark the loop as deleted in the loop PM infrastructure
before we remove the loop. We can do this by passing the updater into
the routine. That also lets us simplify a bunch of other interface
components here for a net win.
llvm-svn: 294810
|
| |
|
|
|
|
|
|
| |
This is necessary to avoid warnings from GCC.
InstCombineLoadStoreAlloca.cpp:238:7: error: 'PointerReplacer' declared
with greater visibility than the type of its field 'PointerReplacer::IC'
llvm-svn: 294794
|
| |
|
|
| |
llvm-svn: 294788
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For function-scope variables with large initialisation list, FE usually
generates a global variable to hold the initializer, then generates
memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst
identifies such allocas which are accessed only by reading and replaces
them with the global variable. This is done by casting the global variable
to the type of the alloca and replacing all references.
However, when the global variable is in a different address space which
is disjoint with addr space 0 (e.g. for IR generated from OpenCL,
global variable cannot be in private addr space i.e. addr space 0), casting
the global variable to addr space 0 results in invalid IR for certain
targets (e.g. amdgpu).
To fix this issue, when the global variable is not in addr space 0,
instead of casting it to addr space 0, this patch chases down the uses
of alloca until reaching the load instructions, then replaces load from
alloca with load from the global variable. If during the chasing
bitcast and GEP are encountered, new bitcast and GEP based on the global
variable are generated and used in the load instructions.
Differential Revision: https://reviews.llvm.org/D27283
llvm-svn: 294786
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
discriminator.
Summary:
This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html
When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations.
The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default.
Reviewers: probinson, aprantl, davidxl, hfinkel, echristo
Reviewed By: hfinkel
Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26420
llvm-svn: 294782
|
| |
|
|
|
|
|
|
|
| |
We previously only created a vector phi node for an induction variable if its
type matched the type of the canonical induction variable.
Differential Revision: https://reviews.llvm.org/D29776
llvm-svn: 294755
|
| |
|
|
|
|
|
|
| |
Chandler mentioned at the last social that the need for BFI in the new pass manager was causing a slight hiccup for this pass. Given this code has been checked in, but off for over a year, it makes sense to just remove it for now.
Note that there's nothing wrong with the general idea - it's actually a quite good one - and once we have the infrastructure in place to implement this without the full recompuation on every loop, we absolutely should.
llvm-svn: 294715
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that the call graph supports efficient replacement of a function and
spurious reference edges, we can port ArgumentPromotion to the new pass
manager very easily.
The old PM-specific bits are sunk into callbacks that the new PM simply
doesn't use. Unlike the old PM, the new PM simply does argument
promotion and afterward does the update to LCG reflecting the promoted
function.
Differential Revision: https://reviews.llvm.org/D29580
llvm-svn: 294667
|
| |
|
|
|
|
|
|
| |
evaluating them.
This was crashing before.
llvm-svn: 294666
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
disturbing the graph or having to update edges.
This is motivated by porting argument promotion to the new pass manager.
Because of how LLVM IR Function objects work, in order to change their
signature a new object needs to be created. This is efficient and
straight forward in the IR but previously was very hard to implement in
LCG. We could easily replace the function a node in the graph
represents. The challenging part is how to handle updating the edges in
the graph.
LCG previously used an edge to a raw function to represent a node that
had not yet been scanned for calls and references. This was the core
of its laziness. However, that model causes this kind of update to be
very hard:
1) The keys to lookup an edge need to be `Function*`s that would all
need to be updated when we update the node.
2) There will be some unknown number of edges that haven't transitioned
from `Function*` edges to `Node*` edges.
All of this complexity isn't necessary. Instead, we can always build
a node around any function, always pointing edges at it and always using
it as the key to lookup an edge. To maintain the laziness, we need to
sink the *edges* of a node into a secondary object and explicitly model
transitioning a node from empty to populated by scanning the function.
This design seems much cleaner in a number of ways, but importantly
there is now exactly *one* place where the `Function*` has to be
updated!
Some other cleanups that fall out of this include having something to
model the *entry* edges more accurately. Rather than hand rolling parts
of the node in the graph itself, we have an explicit `EdgeSequence`
object that gives us exactly the functionality needed. We also have
a consistent place to define the edge iterators and can use them for
both the entry edges and the internal edges of the graph.
The API used to model the separation between a node and its edges is
intentionally very thin as most clients are expected to deal with nodes
that have populated edges. We model this exactly as an optional does
with an additional method to populate the edges when that is
a reasonable thing for a client to do. This is based on API design
suggestions from Richard Smith and David Blaikie, credit goes to them
for helping pick how to model this without it being either too explicit
or too implicit.
The patch is somewhat noisy due to shifting around iterator types and
new syntax for walking the edges of a node, but most of the
functionality change is in the `Edge`, `EdgeSequence`, and `Node` types.
Differential Revision: https://reviews.llvm.org/D29577
llvm-svn: 294653
|