| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The CallAnalyzer::visitSwitchInst has an early exit when the estimated
lower bound of the switch cost will put the overall cost of the inline
above the threshold. However, this code is not correctly estimating the
lower bound for switches that can be transformed into bit tests, leading
to unnecessary lost inlines, and also differing behavior with
optimization remarks enabled.
First, the early exit is controlled by whether ComputeFullInlineCost is
enabled or not, and that in turn is disabled by default but enabled when
enabling -pass-remarks=missed. This by itself wouldn't lead to a
problem, except that as described below, the lower bound can be above
the real lower bound, so we can sometimes get different inline decisions
with inline remarks enabled, which is problematic.
The early exit was added in along with a new switch cost model in D31085.
The reason why this early exit was added is due to a concern one reviewer
raised about compile time for large switches:
https://reviews.llvm.org/D31085?id=94559#inline-276200
However, the code just below there calls
getEstimatedNumberOfCaseClusters, which in turn immediately calls
BasicTTIImpl getEstimatedNumberOfCaseClusters, which in the worst case
does a linear scan of the cases to get the high and low values. The
bit test handling in particular is guarded by whether the number of
cases fits into the max bit width. There is no suggestion that anyone
measured a compile time issue, it appears to be theoretical.
The problem is that the reviewer's comment about the lower bound
calculation is incorrect, specifically in the case of a switch that can
be lowered to a bit test. This isn't followed up on the comment
thread, but the author does add a FIXME to that effect above the early
exit added when they subsequently revised the patch.
As a result, we were incorrectly early exiting and not inlining
functions with switch statements that would be lowered to bit tests in
cases where we were nearing the threshold. Combined with the fact that
this early exit was skipped with opt remarks enabled, this caused
different inlining decisions to be made when -pass-remarks=missed is
enabled to debug the missing inline.
Remove the early exit for the above reasons.
I also copied over an existing AArch64 inlining test to X86, and
adjusted the threshold so that the bit test inline only occurs with the
fix in this patch.
Reviewers: davidxl
Subscribers: eraman, kristof.beyls, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67716
llvm-svn: 372440
|
|
|
|
| |
llvm-svn: 363538
|
|
|
|
|
|
|
|
| |
The reversion apparently deleted the test/Transforms directory.
Will be re-reverting again.
llvm-svn: 358552
|
|
|
|
|
|
|
|
| |
As it's causing some bot failures (and per request from kbarton).
This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda.
llvm-svn: 358546
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
checking for function inline compatibility.
Tuning flags don't have any effect on the available instructions so aren't a good reason to prevent inlining.
There are also some ISA flags that don't have any intrinsics our ABI requirements that we can exclude. I've put only the most basic ones like cmpxchg16b and lahfsahf. These are interesting because they aren't present in all 64-bit CPUs, but we have codegen workarounds when they aren't present.
Loosening these checks can help with scenarios where a caller has a more specific CPU than a callee. The default tuning flags on our generic 'x86-64' CPU can currently make it inline compatible with other CPUs. I've also added an example test for 'nocona' and 'prescott' where 'nocona' is just a 64-bit capable version of 'prescott' but in 32-bit mode they should be completely compatible.
I've based the implementation here of the similar code in AMDGPU.
Differential Revision: https://reviews.llvm.org/D58371
llvm-svn: 354355
|
|
|
|
|
|
|
|
|
|
|
| |
Now, getUserCost() only checks the src and dst types of EXT to decide it is free
or not. This change first checks the types, then calls isExtFreeImpl(), and
check if EXT can form ExtLoad at last. Currently, only AArch64 has customized
implementation of isExtFreeImpl() to check if EXT can be folded into its use.
Differential Revision: https://reviews.llvm.org/D34458
llvm-svn: 308076
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
most of the inliner test cases.
The inliner involves a bunch of interesting code and tends to be where
most of the issues I've seen experimenting with the new PM lie. All of
these test cases pass, but I'd like to keep some more thorough coverage
here so doing a fairly blanket enabling.
There are a handful of interesting tests I've not enabled yet because
they're focused on the always inliner, or on functionality that doesn't
(yet) exist in the inliner.
llvm-svn: 290592
|
|
inlining test into it.
llvm-svn: 241223
|