| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
| |
decides whether to apply them. NFCI.
The idea is that the apply* functions will also be called when importing
devirt optimizations.
Differential Revision: https://reviews.llvm.org/D29745
llvm-svn: 295144
|
| |
|
|
|
|
| |
This broke buildbots.
llvm-svn: 295142
|
| |
|
|
| |
llvm-svn: 295138
|
| |
|
|
| |
llvm-svn: 295137
|
| |
|
|
| |
llvm-svn: 295136
|
| |
|
|
|
|
|
|
|
|
| |
This patch corrects the maximum workgroups per CU if we have big
workgroups (more than 128). This calculation contributes to the
occupancy calculation in respect to LDS size.
Differential Revision: https://reviews.llvm.org/D29974
llvm-svn: 295134
|
| |
|
|
| |
llvm-svn: 295132
|
| |
|
|
| |
llvm-svn: 295131
|
| |
|
|
| |
llvm-svn: 295128
|
| |
|
|
| |
llvm-svn: 295126
|
| |
|
|
| |
llvm-svn: 295124
|
| |
|
|
| |
llvm-svn: 295120
|
| |
|
|
| |
llvm-svn: 295119
|
| |
|
|
| |
llvm-svn: 295117
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Multiple blocks in the callee can be mapped to a single cloned block
since we prune the callee as we clone it. The existing code
iterates over the value map and clones the block frequency (and
eventually scales the frequencies of the cloned blocks). Value map's
iteration is not deterministic and so the cloned block might get the
frequency of any of the original blocks. The fix is to set the max of
the original frequencies to the cloned block. The first block in the
sequence must have this max frequency and, in the call context,
subsequent blocks must have its frequency.
Differential Revision: https://reviews.llvm.org/D29696
llvm-svn: 295115
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This helps to avoid signed integer overflow after running a fast fuzz target for several hours, e.g.:
<...>
Done -1097903291 runs in 54001 second(s)
Reviewers: kcc
Reviewed By: kcc
Differential Revision: https://reviews.llvm.org/D29941
llvm-svn: 295112
|
| |
|
|
| |
llvm-svn: 295111
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Group calls into constant and non-constant arguments up front, and use uint64_t
instead of ConstantInt to represent constant arguments. The goal is to allow
the information from the summary to fit naturally into this data structure in
a future change (specifically, it will be added to CallSiteInfo).
This has two side effects:
- We disallow VCP for constant integer arguments of width >64 bits.
- We remove the restriction that the bitwidth of a vcall's argument and return
types must match those of the vfunc definitions.
I don't expect either of these to matter in practice. The first case is
uncommon, and the second one will lead to UB (so we can do anything we like).
Differential Revision: https://reviews.llvm.org/D29744
llvm-svn: 295110
|
| |
|
|
|
|
|
| |
Correct the definition of MIPS16 instructions that act as return instructions
so that isReturn = 1 as expected.
llvm-svn: 295109
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
created in SplitBlockPredecessors
Summary:
When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of
```
1 typedef struct _node_t {
2 struct _node_t *next;
3 } node_t;
4
5 extern node_t *root;
6
7 int foo() {
8 node_t *node, *tmp;
9 int ret = 0;
10
11 node = tmp = root->next;
12 while (node != root) {
13 while (node) {
14 tmp = node;
15 node = node->next;
16 ret++;
17 }
18 }
19
20 return ret;
21 }
```
, below is the basicblock corresponding to line 12 after Reassociate expressions pass:
```
while.cond: ; preds = %while.cond2, %entry
%node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ]
%ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ]
tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21
tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31
%cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33
br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35
```
As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is
```
!21 = !DILocation(line: 9, column: 7, scope: !6)
```
, which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction.
Reviewers: dblaikie, samsonov, eugenis
Reviewed By: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29867
llvm-svn: 295106
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Blocks ending in unreachable are typically cold because they end the
program or throw an exception, so merging them with other identical
blocks is usually profitable because it reduces the size of cold code.
MachineBlockPlacement generally does not arrange to fall through to such
blocks, so commoning these blocks will not introduce additional
unconditional branches.
Reviewers: hans, iteratee, haicheng
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29153
llvm-svn: 295105
|
| |
|
|
|
|
| |
It's just an AND-immediate instruction for us, surprisingly simple to select.
llvm-svn: 295104
|
| |
|
|
|
|
|
|
|
| |
This instruction clears the low bits of a pointer without requiring (possibly
dodgy if pointers aren't ints) conversions to and from an integer. Since (as
far as I'm aware) all masks are statically known, the instruction takes an
immediate operand rather than a register to specify the mask.
llvm-svn: 295103
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts 295092 (re-applies 295084), with a fix for dangling
references from the array of coverage names passed down from frontends.
I missed this in my initial testing because I only checked test/Profile,
and not test/CoverageMapping as well.
Original commit message:
The profile name variables passed to counter increment intrinsics are dead
after we emit the finalized name data in __llvm_prf_nm. However, we neglect to
erase these name variables. This causes huge size increases in the
__TEXT,__const section as well as slowdowns when linker dead stripping is
disabled. Some affected projects are so massive that they fail to link on
Darwin, because only the small code model is supported.
Fix the issue by throwing away the name constants as soon as we're done with
them.
Differential Revision: https://reviews.llvm.org/D29921
llvm-svn: 295099
|
| |
|
|
| |
llvm-svn: 295096
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Store instructions can have more than one memory operand as a result
of optimizations that fold different stores into one.
When we identify spill instructions to generate DBG_VALUE instructions
to record the spilling of a variable, we disregard stores with
multiple memory operands for now. We may miss some relevant spills but
the handling is a bit more complex, so we'll do it in a different patch.
This fixes PR31935.
llvm-svn: 295093
|
| |
|
|
|
|
|
|
| |
This reverts commit r295084. There is a test failure on:
http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/2620/
llvm-svn: 295092
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D29918
llvm-svn: 295089
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The profile name variables passed to counter increment intrinsics are
dead after we emit the finalized name data in __llvm_prf_nm. However, we
neglect to erase these name variables. This causes huge size increases
in the __TEXT,__const section as well as slowdowns when linker dead
stripping is disabled. Some affected projects are so massive that they
fail to link on Darwin, because only the small code model is supported.
Fix the issue by throwing away the name constants as soon as we're done
with them.
Differential Revision: https://reviews.llvm.org/D29921
llvm-svn: 295084
|
| |
|
|
|
|
|
|
|
|
| |
To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV
llvm-svn: 295081
|
| |
|
|
| |
llvm-svn: 295078
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As written in the comments above, LastCallToStaticBonus is already applied to
the cost if Caller has only one user, so it is redundant to reapply the bonus
here.
If the only user is not a caller, TotalSecondaryCost will not be adjusted
anyway because callerWillBeRemoved is false. If there's no caller at all, we
don't need to care about TotalSecondaryCost because
inliningPreventsSomeOuterInline is false.
Reviewers: chandlerc, eraman
Reviewed By: eraman
Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D29169
llvm-svn: 295075
|
| |
|
|
| |
llvm-svn: 295073
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
And use it in MachineOptimizationRemarkEmitter. A test will follow on top of
Justin's changes to enable MachineORE in AsmPrinter.
The approach is similar to the IR-level pass. It's a bit simpler because BPI
is immutable at the Machine level so we don't need to make that lazy.
Because of this, a new function mapping is introduced (BPIPassTrait::getBPI).
This function extracts BPI from the pass. In case of the lazy pass, this is
when the calculation of the BFI occurs. For Machine-level, this is the
identity function.
Differential Revision: https://reviews.llvm.org/D29836
llvm-svn: 295072
|
| |
|
|
| |
llvm-svn: 295068
|
| |
|
|
| |
llvm-svn: 295066
|
| |
|
|
| |
llvm-svn: 295065
|
| |
|
|
|
|
|
|
|
|
|
| |
This reapplies commit r294967 with a fix for the execution time regressions
caught by the clang-cmake-aarch64-quick bot. We now extend the truncate
optimization to non-primary induction variables only if the truncate isn't
already free.
Differential Revision: https://reviews.llvm.org/D29847
llvm-svn: 295063
|
| |
|
|
|
|
| |
Add support for specifying an UNPCK input as UNDEF
llvm-svn: 295061
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D29759
llvm-svn: 295060
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
back into a vector
Previously the cost of the existing ExtractElement/ExtractValue
instructions was considered as a dead cost only if it was detected that
they have only one use. But these instructions may be considered
dead also if users of the instructions are also going to be vectorized,
like:
```
%x0 = extractelement <2 x float> %x, i32 0
%x1 = extractelement <2 x float> %x, i32 1
%x0x0 = fmul float %x0, %x0
%x1x1 = fmul float %x1, %x1
%add = fadd float %x0x0, %x1x1
```
This can be transformed to
```
%1 = fmul <2 x float> %x, %x
%2 = extractelement <2 x float> %1, i32 0
%3 = extractelement <2 x float> %1, i32 1
%add = fadd float %2, %3
```
because though `%x0` and `%x1` have 2 users each other, these users are
part of the vectorized tree and we can consider these `extractelement`
instructions as dead.
Differential Revision: https://reviews.llvm.org/D29900
llvm-svn: 295056
|
| |
|
|
| |
llvm-svn: 295055
|
| |
|
|
|
|
| |
This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907.
llvm-svn: 295054
|
| |
|
|
| |
llvm-svn: 295053
|
| |
|
|
|
|
|
|
| |
Don't bother setting the V1/V2 operands again for unary shuffles.
Don't bother legalizing the value type unless the match succeeds.
llvm-svn: 295051
|
| |
|
|
|
|
|
|
|
| |
accesses"
This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed.
I'm reverting to investigate.
llvm-svn: 295042
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.
This is fixing pr31900.
Reviewers: HaoLiu, mssimpso, mkuper
Reviewed By: mssimpso, mkuper
Subscribers: llvm-commits, efriedma, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29717
llvm-svn: 295038
|
| |
|
|
|
|
| |
Removing whitespace.
llvm-svn: 295037
|
| |
|
|
| |
llvm-svn: 295035
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Function isCompatibleIVType is already used as a guard before the call to
SE.getMinusSCEV(OperExpr, PrevExpr);
in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions
to be of the same type, so we now consider two pointers with different
address spaces to be incompatible, since it is possible that the pointers
in fact have different sizes.
Reviewers: qcolombet, eli.friedman
Reviewed By: qcolombet
Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29885
llvm-svn: 295033
|