diff options
author | Taewook Oh <twoh@fb.com> | 2018-03-13 04:26:58 +0000 |
---|---|---|
committer | Taewook Oh <twoh@fb.com> | 2018-03-13 04:26:58 +0000 |
commit | 7646e7796979c952ac679e0842d94c06b0807724 (patch) | |
tree | 6adbfc504c47162ee5809ac3d6ac6dccd3cb2145 /llvm | |
parent | 33a28363076898487d36f91134c8b8cd6c24c755 (diff) | |
download | bcm5719-llvm-7646e7796979c952ac679e0842d94c06b0807724.tar.gz bcm5719-llvm-7646e7796979c952ac679e0842d94c06b0807724.zip |
[ThinLTO] Add funtions in callees metadata to CallGraphEdges
Summary:
If there's a callees metadata attached to the indirect call instruction, add CallGraphEdges to the callees mentioned in the metadata when computing FunctionSummary.
* Why this is necessary:
Consider following code example:
```
(foo.c)
static int f1(int x) {...}
static int f2(int x);
static int (*fptr)(int) = f2;
static int f2(int x) {
if (x) fptr=f1; return f1(x);
}
int foo(int x) {
(*fptr)(x); // !callees metadata of !{i32 (i32)* @f1, i32 (i32)* @f2} would be attached to this call.
}
(bar.c)
int bar(int x) {
return foo(x);
}
```
At LTO time when `foo.o` is imported into `bar.o`, function `foo` might be inlined into `bar` and PGO-guided indirect call promotion will run after that. If the profile data tells that the promotion of `@f1` or `@f2` is beneficial, the optimizer will check if the "promoted" `@f1` or `@f2` (such as `@f1.llvm.0` or `@f2.llvm.0`) is available. Without this patch, importing `!callees` metadata would only add promoted declarations of `@f1` and `@f2` to the `bar.o`, but still the optimizer will assume that the function is available and perform the promotion. The result of that is link failure with `undefined reference to @f1.llvm.0`.
This patch fixes this problem by adding callees in the `!callees` metadata to CallGraphEdges so that their definition would be properly imported into.
One may ask that there already is a logic to add indirect call promotion targets to be added to CallGraphEdges. However, if profile data says "indirect call promotion is only beneficial under a certain inline context", the logic wouldn't work. In the code example above, if profile data is like
```
bar:1000000:100000
1:100000
1: foo:100000
1: 100000 f1:100000
```
, Computing FunctionSummary for `foo.o` wouldn't add `foo->f1` to CallGraphEdges. (Also, it is at least "possible" that one can provide profile data to only link step but not to compilation step).
Reviewers: tejohnson, mehdi_amini, pcc
Reviewed By: tejohnson
Subscribers: inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D44399
llvm-svn: 327358
Diffstat (limited to 'llvm')
-rw-r--r-- | llvm/lib/Analysis/ModuleSummaryAnalysis.cpp | 12 | ||||
-rw-r--r-- | llvm/test/ThinLTO/X86/Inputs/callees-metadata.ll | 34 | ||||
-rw-r--r-- | llvm/test/ThinLTO/X86/callees-metadata.ll | 22 |
3 files changed, 68 insertions, 0 deletions
diff --git a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp index 4a84816f616..ae01b928e5c 100644 --- a/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp +++ b/llvm/lib/Analysis/ModuleSummaryAnalysis.cpp @@ -291,6 +291,18 @@ computeFunctionSummary(ModuleSummaryIndex &Index, const Module &M, if (!CalledValue || isa<Constant>(CalledValue)) continue; + // Check if the instruction has a callees metadata. If so, add callees + // to CallGraphEdges to reflect the references from the metadata, and + // to enable importing for subsequent indirect call promotion and + // inlining. + if (auto *MD = I.getMetadata(LLVMContext::MD_callees)) { + for (auto &Op : MD->operands()) { + Function *Callee = mdconst::extract_or_null<Function>(Op); + if (Callee) + CallGraphEdges[Index.getOrInsertValueInfo(Callee)]; + } + } + uint32_t NumVals, NumCandidates; uint64_t TotalCount; auto CandidateProfileData = diff --git a/llvm/test/ThinLTO/X86/Inputs/callees-metadata.ll b/llvm/test/ThinLTO/X86/Inputs/callees-metadata.ll new file mode 100644 index 00000000000..1e77b08863a --- /dev/null +++ b/llvm/test/ThinLTO/X86/Inputs/callees-metadata.ll @@ -0,0 +1,34 @@ +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" +target triple = "x86_64-unknown-linux-gnu" + +@fptr = internal unnamed_addr global i32 (i32)* @f2, align 8 + +define dso_local i32 @foo(i32 %x) local_unnamed_addr { +entry: + %0 = load i32 (i32)*, i32 (i32)** @fptr, align 8 + %call = tail call i32 %0(i32 %x), !callees !0 + ret i32 %call +} + +define internal i32 @f2(i32 %x) { +entry: + %tobool = icmp eq i32 %x, 0 + br i1 %tobool, label %if.end, label %if.then + +if.then: ; preds = %entry + store i32 (i32)* @f1, i32 (i32)** @fptr, align 8 + %sub.i = add nsw i32 %x, -1 + br label %if.end + +if.end: ; preds = %entry, %if.then + %phi.call = phi i32 [ %sub.i, %if.then ], [ -1, %entry ] + ret i32 %phi.call +} + +define internal i32 @f1(i32 %x) { +entry: + %sub = add nsw i32 %x, -1 + ret i32 %sub +} + +!0 = !{i32 (i32)* @f1, i32 (i32)* @f2} diff --git a/llvm/test/ThinLTO/X86/callees-metadata.ll b/llvm/test/ThinLTO/X86/callees-metadata.ll new file mode 100644 index 00000000000..457cdb319fa --- /dev/null +++ b/llvm/test/ThinLTO/X86/callees-metadata.ll @@ -0,0 +1,22 @@ +; Do setup work: generate bitcode and combined index +; RUN: opt -module-summary %s -o %t1.bc +; RUN: opt -module-summary %p/Inputs/callees-metadata.ll -o %t2.bc + +; RUN: llvm-lto2 run %t1.bc %t2.bc -o %t.o -save-temps \ +; RUN: -r=%t1.bc,bar,plx \ +; RUN: -r=%t1.bc,foo,l \ +; RUN: -r=%t2.bc,foo,pl +; RUN: llvm-dis %t.o.1.3.import.bc -o - | FileCheck %s +; CHECK: define {{.*}} i32 @f1.llvm.0 +; CHECK: define {{.*}} i32 @f2.llvm.0 + +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" +target triple = "x86_64-unknown-linux-gnu" + +define dso_local i32 @bar(i32 %x) { +entry: + %call = call i32 @foo(i32 %x) + ret i32 %call +} + +declare dso_local i32 @foo(i32) |