| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If there's a callees metadata attached to the indirect call instruction, add CallGraphEdges to the callees mentioned in the metadata when computing FunctionSummary.
* Why this is necessary:
Consider following code example:
```
(foo.c)
static int f1(int x) {...}
static int f2(int x);
static int (*fptr)(int) = f2;
static int f2(int x) {
if (x) fptr=f1; return f1(x);
}
int foo(int x) {
(*fptr)(x); // !callees metadata of !{i32 (i32)* @f1, i32 (i32)* @f2} would be attached to this call.
}
(bar.c)
int bar(int x) {
return foo(x);
}
```
At LTO time when `foo.o` is imported into `bar.o`, function `foo` might be inlined into `bar` and PGO-guided indirect call promotion will run after that. If the profile data tells that the promotion of `@f1` or `@f2` is beneficial, the optimizer will check if the "promoted" `@f1` or `@f2` (such as `@f1.llvm.0` or `@f2.llvm.0`) is available. Without this patch, importing `!callees` metadata would only add promoted declarations of `@f1` and `@f2` to the `bar.o`, but still the optimizer will assume that the function is available and perform the promotion. The result of that is link failure with `undefined reference to @f1.llvm.0`.
This patch fixes this problem by adding callees in the `!callees` metadata to CallGraphEdges so that their definition would be properly imported into.
One may ask that there already is a logic to add indirect call promotion targets to be added to CallGraphEdges. However, if profile data says "indirect call promotion is only beneficial under a certain inline context", the logic wouldn't work. In the code example above, if profile data is like
```
bar:1000000:100000
1:100000
1: foo:100000
1: 100000 f1:100000
```
, Computing FunctionSummary for `foo.o` wouldn't add `foo->f1` to CallGraphEdges. (Also, it is at least "possible" that one can provide profile data to only link step but not to compilation step).
Reviewers: tejohnson, mehdi_amini, pcc
Reviewed By: tejohnson
Subscribers: inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D44399
llvm-svn: 327358
|
| |
|
|
|
|
| |
input instead of creating new extract_subvectors.
llvm-svn: 327355
|
| |
|
|
|
|
|
| |
I forgot to incorporate these comments into the original revision. This
is just code cleanup addressing the feedback, NFC.
llvm-svn: 327351
|
| |
|
|
|
|
|
| |
Blocks may have function calls, so don’t erase functions
directly to avoid erasing a function that has a user.
llvm-svn: 327340
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the case that the CallInst that is being moved has an associated
operand bundle which is a funclet, the move will construct an invalid
instruction. The new site will have a different token and needs to be
reassociated with the new instruction.
Unfortunately, there is no way to alter the bundle after the
construction of the instruction. Replace the call instruction cloning
with a custom helper to clone the instruction and reassociate the
funclet token.
llvm-svn: 327336
|
| |
|
|
|
|
| |
Moved 'special cases' to be closer to other system classes.
llvm-svn: 327332
|
| |
|
|
|
|
|
|
|
|
|
|
| |
LoopInstSimplify is unused and untested. Reading through the commit
history the pass also seems to have a high maintenance burden.
It would be best to retire the pass for now. It should be easy to
recover if we need something similar in the future.
Differential Revision: https://reviews.llvm.org/D44053
llvm-svn: 327329
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
ProvenanceAnalysis::related(A, B) currently memoizes its results, and on big
tests the cache grows too large, and we're spending most of the time
growing/looking through DenseMap.
This patch reduces the size of the cache by normalizing keys first: we do that
by calling GetUnderlyingObjCPtr on the input values. The results of
GetUnderlyingObjCPtr are also memoized in a separate cache.
The patch doesn't bring noticable changes to compile time on CTMark, however
significantly helps one of our internal tests.
Reviewers: gottesmm
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D44270
llvm-svn: 327328
|
| |
|
|
|
|
| |
adding the f128 type
llvm-svn: 327319
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D44355
llvm-svn: 327316
|
| |
|
|
|
|
|
|
|
|
| |
getNumUses is a linear time operation. It traverses the user linked list to the end and counts as it goes. Since we are only interested in small constant counts, we should use hasNUses or hasNUsesMore more that terminate the traversal as soon as it can provide the answer.
There are still two other locations in InstCombine, but changing those would force a rebase of D44266 which if accepted would remove them.
Differential Revision: https://reviews.llvm.org/D44398
llvm-svn: 327315
|
| |
|
|
|
|
|
|
| |
non-zero return from getNumUses
getNumUses is a linear operation. It walks a linked list to get a count. So in this case its better to just ask if there are any users rather than how many.
llvm-svn: 327314
|
| |
|
|
| |
llvm-svn: 327312
|
| |
|
|
| |
llvm-svn: 327311
|
| |
|
|
| |
llvm-svn: 327308
|
| |
|
|
|
|
| |
I love you llvm-mca.
llvm-svn: 327306
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
1) Make sure to discard dangling debug info if the variable (or
variable fragment) is mapped to something new before we had a
chance to resolve the dangling debug info.
2) When resolving debug info, make sure to bump the associated
SDNodeOrder to ensure that the DBG_VALUE is emitted after the
instruction that defines the value used in the DBG_VALUE.
This will avoid a debug-use before def scenario as seen in
https://bugs.llvm.org/show_bug.cgi?id=36417.
The new test case, test/DebugInfo/X86/sdag-dangling-dbgvalue.ll,
show some other limitations in how dangling debug info is
handled in the SelectionDAG. Since we currently only support
having one dangling dbg.value per Value, we will end up dropping
debug info when there are more than one variable that is described
by the same "dangling value".
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: aprantl, eraman, llvm-commits, JDevlieghere
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D44369
llvm-svn: 327303
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds two features: "packets", and "nvj".
Enabling "packets" allows the compiler to generate instruction packets,
while disabling it will prevent it and disable all optimizations that
generate them. This feature is enabled by default on all subtargets.
The feature "nvj" allows the compiler to generate new-value jumps and it
implies "packets". It is enabled on all subtargets.
The exception is made for packets with endloop instructions, since they
require a certain minimum number of instructions in the packets to which
they apply. Disabling "packets" will not prevent hardware loops from
being generated.
llvm-svn: 327302
|
| |
|
|
|
|
| |
MMX buildvectors were improved at rL327247 - new MMX bugs should be raised on bugzilla
llvm-svn: 327300
|
| |
|
|
|
|
|
|
|
| |
See bug 36558: https://bugs.llvm.org/show_bug.cgi?id=36558
Differential Revision: https://reviews.llvm.org/D43950
Reviewers: artem.tamazov, arsenm
llvm-svn: 327299
|
| |
|
|
|
|
| |
These are all global, so prefix with 'J' to help prevent accidental name clashes with other models.
llvm-svn: 327296
|
| |
|
|
|
|
|
|
| |
MVT belongs to the CodeGen layer, but ShuffleDecode is used by the X86 InstPrinter which is part of the MC layer. This only worked because MVT is completely implemented in a header file with no other library dependencies.
Differential Revision: https://reviews.llvm.org/D44353
llvm-svn: 327292
|
| |
|
|
|
|
|
|
|
|
| |
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.
Differential Revision: https://reviews.llvm.org/D44322
llvm-svn: 327291
|
| |
|
|
|
|
| |
This allows the single resource classes (VarBlend, MPSAD, VarVecShift) to use the JWriteResFpuPair macro.
llvm-svn: 327289
|
| |
|
|
|
|
|
|
| |
cases. NFCI.
These are single pipe and have the default resource/uop counts like JWriteResFpuPair so there's no need to handle them separately.
llvm-svn: 327283
|
| |
|
|
|
|
|
|
|
| |
See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252
Differential Revision: https://reviews.llvm.org/D43874
Reviewers: artem.tamazov, arsenm
llvm-svn: 327278
|
| |
|
|
|
|
|
|
|
| |
Invalid user input should not trigger assertions and unreachables. We
already return an Option so we should just return None here.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5532
llvm-svn: 327274
|
| |
|
|
|
|
| |
clang 3.8 emits
llvm-svn: 327273
|
| |
|
|
| |
llvm-svn: 327269
|
| |
|
|
| |
llvm-svn: 327268
|
| |
|
|
| |
llvm-svn: 327267
|
| |
|
|
|
|
|
|
|
|
|
| |
This simplifies tagging instructions with the correct ISA and ASE, albeit making
instruction definitions a bit more verbose.
Reviewers: atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D44299
llvm-svn: 327265
|
| |
|
|
|
|
|
|
| |
Ports parts of r193000 to the intel parser. Fixes part of PR36676.
https://reviews.llvm.org/D44359
llvm-svn: 327262
|
| |
|
|
| |
llvm-svn: 327259
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After r327219 was landed, the bot with expensive checks on GreenDragon
started failing. The problem was missing symbols `regex_t` and
`regmatch_t` in `xlocale/_regex.h`. The latter was included because
after the change in r327219, `random` is needed, which transitively
includes `xlocale.h.` which in turn conditionally includes
`xlocale/_regex.h` when _REGEX_H_ is defined. Because this is the header
guard in `regex_impl.h` and because `regex_impl.h` was included before
the other LLVM includes, `xlocale/_regex.h` was included without the
necessary types being available.
This commit fixes this by moving the include of `regex_impl.h` all the
way down. I also added a comment to stress the significance of its
position.
llvm-svn: 327256
|
| |
|
|
|
|
|
| |
This wasreverted in r326638 due to link problems and fixed
afterwards
llvm-svn: 327254
|
| |
|
|
|
|
|
|
|
|
|
| |
udiv/urem instructions."
This reverts r326908, originally landed as D44102.
Reverted for causing performance regressions on x86. (These regressions
are not yet understood.)
llvm-svn: 327252
|
| |
|
|
|
|
| |
We called MaskedValueIsZero with two different masks, but underneath that calls computeKnownBits before applying the mask. This means we compute the same known bits twice due to the two calls. Instead just call computeKnownBits directly and apply the two masks ourselves.
llvm-svn: 327251
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we replace the Phi we created with matched ones it is possible that
there are two identical phi nodes in IR. And matcher is smart enough to find that
new created phi matches both of them. So we try to replace our phi node with
matched ones twice and what is bad we delete our phi node twice causing a crash.
As soon as we found that we have two identical Phi nodes it makes sense to do
a clean-up and replace one phi node by other one.
The patch implements it.
Reviewers: john.brawn, reames
Reviewed By: john.brawn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43758
llvm-svn: 327250
|
| |
|
|
|
|
|
|
|
|
|
|
| |
64-bit MMX vector generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type.
This patch creates a MMX vector from MMX source values, taking the lowest element from each source and constructing broadcasts/build_vectors with direct calls to the MMX PUNPCKL/PSHUFW intrinsics.
We're missing a few consecutive load combines that could be handled in a future patch if that would be useful - my main interest here is just avoiding a lot of the MMX/SSE crossover.
Differential Revision: https://reviews.llvm.org/D43618
llvm-svn: 327247
|
| |
|
|
|
|
|
|
| |
v32i8 codegen
XOP was already doing this, and now AVX performs v32i8 variable permutes as well.
llvm-svn: 327245
|
| |
|
|
|
|
| |
vector is wider than the destination type
llvm-svn: 327244
|
| |
|
|
|
|
|
|
| |
variable permutes
Same as the VPERMILPS/VPERMILPD approach for v8f32/v4f64 cases, rely on PSHUFB using bits[3:0] for indexing - we can ignore the sign bit (zero element) as those index vector values are considered undefined. The select between the lo/hi permute results based on the index size.
llvm-svn: 327242
|
| |
|
|
| |
llvm-svn: 327241
|
| |
|
|
|
|
|
|
| |
any number of ops.
I've kept SplitBinaryOpsAndApply as a wrapper to avoid a lot of makeArrayRef code.
llvm-svn: 327240
|
| |
|
|
|
|
|
|
|
|
| |
v8i32/v8f32 and v4i64/v4f64 variable permutes
As VPERMILPS/VPERMILPD only selects elements based on the bits[1:0]/bit[1] then we can permute both the (repeated) lo/hi 128-bit vectors in each case and then select between these results based on whether the index was for for lo/hi.
For v4i64/v4f64 this avoids some rather nasty v4i64 multiples on the AVX2 implementation, which seems to be worse than the extra port5 pressure from the additional shuffles/blends.
llvm-svn: 327239
|
| |
|
|
|
|
|
|
| |
variable permutes to v8i64/v8f64
Permutes in the upper elements will be undefined, but they will be discarded anyway.
llvm-svn: 327238
|
| |
|
|
|
|
|
|
| |
Helper function to insert a subvector into the bottom elements of a larger zero/undef vector with the same scalar type.
I've converted a couple of INSERT_SUBVECTOR calls to use it, there are plenty more although in some cases I was worried it might make the code more ambiguous.
llvm-svn: 327236
|
| |
|
|
|
|
| |
StartingAccess is already a MemoryUseOrDef.
llvm-svn: 327235
|
| |
|
|
| |
llvm-svn: 327234
|