| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
| |
relocations
This only implements the non-dwo part, but loclistx is necessary to use
location lists in DWARFv5, so it's a precursor to that work - and
generally reduces relocations (only using one reloc, then
indexes/relative offsets for all location list references) in non-split
DWARF.
|
|
|
|
|
|
| |
operands.
NFC
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shuffle vector
LLVM IR of 1-element vectors get lower into scalar in GISel. As a
result, shuffle vector may also produce a scalar.
This patch teaches the shuffle combiner how to deal with scalars when
they are in the destination type of a shuffle vector.
For now, we just support the easy case where this can be lowered to
a plain copy. For other cases, we leave the shuffle vector as is.
This type of IR are seen in O0 pipelines. E.g., as produced with
SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c.
rdar://problem/57198904
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://reviews.llvm.org/D70210
Previously:
Due to sensitivity of the algorithm with gaps, and extra instructions,
when diffing, often we see naming being off by a few. Makes the diff
unreadable even for tests with 7 and 8 instructions respectively.
Naming can change depending on candidates (and order of picking
candidates). Suddenly if there's one extra instruction somewhere, the
entire subtree would be named completely differently.
No consistent naming of similar instructions which occur in different
functions. If we try to do something like count the frequency
distribution of various differences across suite, then the above
sensitivity issues are going to result in poor results.
Instead:
Name instruction based on semantics of the instruction (hash of the
opcode and operands). Essentially for a given instruction that occurs in
any module/function it'll be named similarly (ie semantic). This has
some nice properties
Can easily look at many instructions and just check the hash and if
they're named similarly, then it's the same instruction. Makes it very
easy to spot the same instruction both multiple times, as well as across
many functions (useful for frequency distribution).
Independent of traversal/candidates/depth of graph. No need to keep
track of last index/gaps/skip count etc.
No off by few issues with diffs. I've tried the old vs new
implementation in files ranging from 30 to 700 instructions. In both
cases with the old algorithm, diffs are a sea of red, where as for the
semantic version, in both cases, the diffs line up beautifully.
Simplified implementation of the main loop (simple iteration) , no keep
track of what's visited and not.
Handle collision just by incrementing a counter. Roughly
bb[N]_hash_[CollisionCount].
Additionally with the new implementation, we can probably avoid doing
the hoisting of instructions to various places, as they'll likely be
named the same resulting in differences only based on collision (ie
regardless of whether the instruction is hoisted or not/close to use or
not, it'll be named the same hash which should result in use of the
instruction be identical with the only change being the collision count)
which is very easy to spot visually.
|
|
|
|
|
|
|
|
|
|
| |
SUMMARY:
The patch will emit read-only variable assembly code for aix.
Reviewers: daltenty,Xiangling_Liao
Subscribers: rupprecht, seiyai,hiraditya
Differential Revision: https://reviews.llvm.org/D70182
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enumerations that describe rounding mode and exception behavior were
defined inside ConstrainedFPIntrinsic. It makes sense to use the same
definitions to represent the same properties in other cases, not only
in constrained intrinsics. It was however inconvenient as required to
include constrained intrinsics definitions even if they were not needed.
Also using long scope prefix reduced readability.
This change moves these definitioins to the namespace llvm::fp.
No functional changes.
Differential Revision: https://reviews.llvm.org/D69552
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SmallVector reserve() call in
MachineInstrExpressionTrait::getHashValue accounted for over 3% of all
calls to malloc() when I compiled a bunch of graphics shaders for the
AMDGPU target. Its initial size was only enough for machine instructions
with up to 7 operands, but for AMDGPU 8 and 10 operands are very common.
Here's a histogram of number of operands for each call to getHashValue,
gathered from the same collection of shaders:
1 13503
2 254273
3 135781
4 422508
5 614997
6 194953
7 287248
8 1517255
9 31218
10 1191269
11 70731
12 24
13 77
15 84
17 4692
27 16
33 705
49 6
Typical instructions with 8 and 10 operands are floating point
arithmetic and multiply-accumulate instructions like:
%83:vgpr_32 = V_MUL_F32_e64 0, killed %82:vgpr_32, 0, killed %81:vgpr_32, 0, 0, implicit $exec
%330:vgpr_32 = V_MAC_F32_e64 0, killed %327:vgpr_32, 0, killed %329:sgpr_32, 0, %328:vgpr_32(tied-def 0), 0, 0, implicit $exec
Differential Revision: https://reviews.llvm.org/D70301
|
| |
|
|
|
|
|
| |
These are a pre-requisite to removing #include "llvm/Support/Options.h"
from LLVMContext.h: https://reviews.llvm.org/D70280
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow call site paramter descriptions to reference spill slots. Spill
slots are not visible to high-level LLVM IR, so they can safely be
referenced during entry value evaluation (as they cannot be clobbered by
some other function).
This gives a 5% increase in the number of call site parameter DIEs in an
LTO x86_64 build of the xnu kernel.
This reverts commit eb4c98ca3d2590bad9f6542afbf3a7824d2b53fa (
[DebugInfo] Exclude memory location values as parameter entry values),
effectively reintroducing the portion of D60716 which dealt with memory
locations (authored by Djordje, Nikola, Ananth, and Ivan).
This partially addresses llvm.org/PR43343. However, not all memory
operands forwarded to callees live in spill slots. In the xnu build, it
may be possible to use an escape analysis to increase the number of call
site parameter by another 15% (more details in PR43343).
Differential Revision: https://reviews.llvm.org/D70254
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reductions.
We were previously pushing all intrinsics used in a function to the
worklist. This is wasteful for memory in a function with a lot of
intrinsics.
We also ask TTI if we should expand every intrinsic, but we only
have expansion support for the reduction intrinsics. This just
wastes time for the non-reduction intrinsics.
This patch only pushes reduction intrinsics into the worklist and
skips other intrinsics.
Differential Revision: https://reviews.llvm.org/D69470
|
|
|
|
|
|
|
|
| |
I reviewed the diff hunks of 05da2fe52162c80dfa that don't contain
'#include' lines, and found two unintended changes. I deleted a header
banner inadvertently while inserting a header, and changed the
indentation of a constructor in an odd way. Add back the banner, and
reformat the constructor.
|
| |
|
| |
|
|
|
|
|
|
|
| |
Avoids the need to include TargetMachine.h from various places just for
an enum. Various other enums live here, such as the optimization level,
TLS model, etc. Data suggests that this change probably doesn't matter,
but it seems nice to have anyway.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
|
|
|
|
|
|
|
| |
This method is private and only called from this file and doesn't need
to be inline. Saves a TargetMachine.h include in MachineFunction.h, a
popular header. The include was introduced in 98603a8153086 despite the
forward decl of LLVMTargetMachine.
|
|
|
|
|
|
|
|
|
|
| |
type break down for v256i1 and other types to be stored correctly
v256i1 on X86 without avx512 breaks down to 256 i8 values when passed between basic blocks. But the NumRegistersForVT was sized at a byte for each VT. This results in 256 being stored as 0.
This patch enlarges the type to 16 bits and adds an assert to ensure that no information is lost when the entry is stored.
Differential Revision: https://reviews.llvm.org/D70138
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.
This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.
E.g., let say we want to merge:
V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>
We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
V2: sub0 sub1 sub2 sub3
V1: <offset> sub0 sub1
This offset will look like a composed subregidx in the the class:
V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
=> V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.
Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.
For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)
This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.
looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.
None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Entry values are considered for parameters that have register-described
DBG_VALUEs in the entry block (along with other conditions).
If a parameter's value has been propagated from the caller to the
callee, then the parameter's DBG_VALUE in the entry block may be
described using a register defined by some instruction, and entry values
should not be emitted for the parameter, which can currently occur.
One such case was seen in the attached test case, in which the second
parameter, which is described by a redefinition of the first parameter's
register, would incorrectly get an entry value using the first
parameter's register. This commit intends to solve such cases by keeping
track of register defines, and ignoring DBG_VALUEs in the entry block
that are described by such registers.
In a RelWithDebInfo build of clang-8, the average size of the set was
27, and in a RelWithDebInfo+ASan build it was 30.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: djtodoro, vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D69889
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The conditions that are used to determine if entry values should be
emitted for a parameter are quite many, and will grow slightly
in a follow-up commit, so move those to a helper function, as was
suggested in the code review for D69889.
Reviewers: djtodoro, NikolaPrica
Reviewed By: djtodoro
Subscribers: probinson, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69955
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a target interface to set the StackID for a given type,
which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a
'sve-vec' StackID, so it is allocated in the SVE area of the stack frame.
Reviewers: ostannard, efriedma, rengolin, cameron.mcinally
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D70080
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Replaces
```
unsigned getShiftAmountThreshold(EVT VT)
```
by
```
bool shouldAvoidTransformToShift(EVT VT, unsigned amount)
```
thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not.
Updates the MSP430 target with a custom implementation.
This continues D69116, D69120, D69326 and updates them, so all of them must be committed before this.
Existing tests apply, a few more have been added.
Reviewers: asl, spatel
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70042
|
|
|
|
|
|
|
|
|
|
|
|
| |
In MachineCopyPropagation, when propagating the source of a copy into
the operand of a later instruction, bail if a destination overlaps
(partly defines) the copy source. If the instruction where the
substitution is happening is also a copy, allowing the propagation
confuses the tracking mechanism.
Differential Revision: https://reviews.llvm.org/D69953
Change-Id: Ic570754f878f2d91a4a50a9bdcf96fbaa240726d
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch redefines freeze instruction from being UnaryOperator to a subclass of UnaryInstruction.
ConstantExpr freeze is removed, as discussed in the previous review.
FreezeOperator is not added because there's no ConstantExpr freeze.
`freeze i8* null` test is added to `test/Bindings/llvm-c/freeze.ll` as well, because the null pointer-related bug in `tools/llvm-c/echo.cpp` is now fixed.
InstVisitor has visitFreeze now because freeze is not unaryop anymore.
Reviewers: whitequark, deadalnix, craig.topper, jdoerfert, lebedev.ri
Reviewed By: craig.topper, lebedev.ri
Subscribers: regehr, nlopes, mehdi_amini, hiraditya, steven_wu, dexonsmith, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69932
|
|
|
|
|
|
|
|
| |
For XCOFF, globals mapped into the .bss section are linked as COMMON
definitions. This behaviour is incorrect for zero initialized data, so
emit those to the .data section instead.
Differential Revision: https://reviews.llvm.org/D69528
|
|
|
|
|
|
|
|
|
| |
In current Hoist() function of machine licm pass, it will not check the source and destination basic block frequencies that a instruction is hoisted from/to.
There is a chance that instruction is hoisted from a cold to a hot basic block.
In this patch, we add options to disable machine instruction hoisting if destination block is hotter.
Differential Revision: https://reviews.llvm.org/D63676
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new experimental expansion has a problem when a value has a data
dependency with an instruction from a previous stage. This is due to
the way we peel out the kernel. To fix that I'm changing the way we
peel out the kernel. We now peel the kernel NumberStage - 1 times.
The code would be correct at this point if we didn't have to handle
cases where the loop iteration is smaller than the number of stages.
To handle this case we move instructions between different epilogues
based on their stage and remap the PHI instructions correctly.
Differential Revision: https://reviews.llvm.org/D69538
|
|
|
|
|
|
|
|
|
| |
Simple change to call target hook analyzeLoopForPipelining before
changing the loop. After peeling analyzing the loop may be more
complicated for target that don't have a loop instruction. This doesn't
affect Hexagone and PPC as they have hardware loop instructions.
Differential Revision: https://reviews.llvm.org/D69912
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For example:
long long test(long long a, long long b) {
if (a << b > 0)
return b;
if (a << b < 0)
return a;
return a*b;
}
Produces:
sld. 5, 3, 4
ble 0, .LBB0_2
mr 3, 4
blr
.LBB0_2: # %if.end
cmpldi 5, 0
li 5, 1
isel 4, 4, 5, 2
mulld 3, 4, 3
blr
But the compare (cmpldi 5, 0) is redundant and can be removed (CR0 already
contains the result of that comparison).
The root cause of this is that LLVM converts signed comparisons into equality
comparison based on dominance. Equality comparisons are unsigned by default, so
we get either a record-form or cmp (without the l for logical) feeding a cmpl.
That is the situation we want to avoid here.
Differential Revision: https://reviews.llvm.org/D60506
|
|
|
|
|
|
|
|
|
|
|
| |
The tail-call-kind-ness is known by the ObjCARC analysis and can be
enforced while lowering the intrinsics to calls.
This allows us to get the requested tail calls at -O0 without trying to
preserve the attributes throughout passes that change code even at -O0
,like the Always Inliner, where the ObjCOpt pass doesn't run.
Differential Revision: https://reviews.llvm.org/D69980
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Additional filtering of undesired shifts for targets that do not support them efficiently.
Related with D69116 and D69120
Applies the TLI.getShiftAmountThreshold hook to prevent undesired generation of shifts for the following IR code:
```
define i16 @testShiftBits(i16 %a) {
entry:
%and = and i16 %a, -64
%cmp = icmp eq i16 %and, 64
%conv = zext i1 %cmp to i16
ret i16 %conv
}
define i16 @testShiftBits_11(i16 %a) {
entry:
%cmp = icmp ugt i16 %a, 63
%conv = zext i1 %cmp to i16
ret i16 %conv
}
define i16 @testShiftBits_12(i16 %a) {
entry:
%cmp = icmp ult i16 %a, 64
%conv = zext i1 %cmp to i16
ret i16 %conv
}
```
The attached diff file shows the piece code in TargetLowering that is responsible for the generation of shifts in relation to the IR above.
Before applying this patch, shifts will be generated to replace non-legal icmp immediates. However, shifts may be undesired if they are even more expensive for the target.
For all my previous patches in this series (cited above) I added test cases for the MSP430 target. However, in this case, the target is not suitable for showing improvements related with this patch, because the MSP430 does not implement "isLegalICmpImmediate". The default implementation returns always true, therefore the patched code in TargetLowering is never reached for that target. Targets implementing both "isLegalICmpImmediate" and "getShiftAmountThreshold" will benefit from this.
The differential effect of this patch can only be shown for the MSP430 by temporarily implementing "isLegalICmpImmediate" to return false for large immediates. This is simulated with the implementation of a command line flag that was incorporated in D69975
This patch belongs to a initiative to "relax" the generation of shifts by LLVM for targets requiring it
Reviewers: spatel, lebedev.ri, asl
Reviewed By: spatel
Subscribers: lenary, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69326
|
| |
|
| |
|
| |
|
|
|
|
|
| |
These checks fall out naturally from the current implementation without
needing to be explicitly considered anymore.
|
|
|
|
|
|
| |
macros
Patch based on Sourabh Singh's D69839 patch.
|
|
|
|
|
|
|
| |
This was arbitrarily appearing in only the last section emitted - which
made tests more sensitive than they needed to be (removing the last
section - like the macinfo section change that's coming after this)
would, surprisingly, move the blank line to the previous section.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The macinfo support was broken for LTO situations, by terminating
macinfo lists only once - multiple macinfo contributions were correctly
labeled, but they all continued/flowed into later contributions until
only one terminator appeared at the end of the section.
Correctly terminate each contribution & fix the parsing to handle this
situation too. The parsing fix is also necessary for dumping linked
binaries - the previous code would stop at the end of the first
contribution - missing all later contributions in a linked binary.
It'd be nice to improve the dumping to print the offsets of each
contribution so it'd be easier to know which CU AT_macro_info refers to
which macinfo contribution.
|
|
|
|
|
|
|
|
|
|
| |
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.
(I think this started showing up recently because we added an
optimization that converts pow to powi.)
Differential Revision: https://reviews.llvm.org/D69013
|
|
|
|
|
|
|
|
|
|
| |
Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.
Patch by Nikola Prica
Differential Revision: https://reviews.llvm.org/D69622
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This triggered asserts in the Chromium build, see https://crbug.com/1022729 for
details and reproducer.
> Without this change, when a nested tag type of any kind (enum, class,
> struct, union) is used as a variable type, it is emitted without
> emitting the parent type. In CodeView, parent types point to their inner
> types, and inner types do not point back to their parents. We already
> walk over all of the parent scopes to build the fully qualified name.
> This change simply requests their type indices as we go along to enusre
> they are all emitted.
>
> Fixes PR43905
>
> Reviewers: akhuang, amccarth
>
> Differential Revision: https://reviews.llvm.org/D69924
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The greedy register allocator occasionally decides to insert a large number of
unnecessary copies, see below for an example. The -consider-local-interval-cost
option (which X86 already enables by default) fixes this. We enable this option
for AArch64 only after receiving feedback that this change is not beneficial for
PowerPC.
We evaluated the impact of this change on compile time, code size and
performance benchmarks.
This option has a small impact on compile time, measured on CTMark. A 0.1%
geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5%
on individual benchmarks.
The effect on both code size and performance on AArch64 for the LLVM test suite
is nil on the geomean with individual outliers (ignoring short exec_times)
between:
best worst
size..text -3.3% +0.0%
exec_time -5.8% +2.3%
On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at
most) in code size on some benchmarks, with a tiny movement (-0.01%) on the
geomean. Neither intrate nor fprate show any change in performance.
This patch makes the following changes.
- For the AArch64 target, enableAdvancedRASplitCost() now returns true.
- Ensures that -consider-local-interval-cost=false can disable the new
behaviour if necessary.
This matrix multiply example:
$ cat test.c
long A[8][8];
long B[8][8];
long C[8][8];
void run_test() {
for (int k = 0; k < 8; k++) {
for (int i = 0; i < 8; i++) {
for (int j = 0; j < 8; j++) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
}
results in the following generated code on AArch64:
$ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o -
[...]
// %for.cond1.preheader
// =>This Inner Loop Header: Depth=1
add x14, x11, x9
str q0, [sp, #16] // 16-byte Folded Spill
ldr q0, [x14]
mov v2.16b, v15.16b
mov v15.16b, v14.16b
mov v14.16b, v13.16b
mov v13.16b, v12.16b
mov v12.16b, v11.16b
mov v11.16b, v10.16b
mov v10.16b, v9.16b
mov v9.16b, v8.16b
mov v8.16b, v31.16b
mov v31.16b, v30.16b
mov v30.16b, v29.16b
mov v29.16b, v28.16b
mov v28.16b, v27.16b
mov v27.16b, v26.16b
mov v26.16b, v25.16b
mov v25.16b, v24.16b
mov v24.16b, v23.16b
mov v23.16b, v22.16b
mov v22.16b, v21.16b
mov v21.16b, v20.16b
mov v20.16b, v19.16b
mov v19.16b, v18.16b
mov v18.16b, v17.16b
mov v17.16b, v16.16b
mov v16.16b, v7.16b
mov v7.16b, v6.16b
mov v6.16b, v5.16b
mov v5.16b, v4.16b
mov v4.16b, v3.16b
mov v3.16b, v1.16b
mov x12, v0.d[1]
fmov x15, d0
ldp q1, q0, [x14, #16]
ldur x1, [x10, #-256]
ldur x2, [x10, #-192]
add x9, x9, #64 // =64
mov x13, v1.d[1]
fmov x16, d1
ldr q1, [x14, #48]
mul x3, x15, x1
mov x14, v0.d[1]
fmov x17, d0
mov x18, v1.d[1]
fmov x0, d1
mov v1.16b, v3.16b
mov v3.16b, v4.16b
mov v4.16b, v5.16b
mov v5.16b, v6.16b
mov v6.16b, v7.16b
mov v7.16b, v16.16b
mov v16.16b, v17.16b
mov v17.16b, v18.16b
mov v18.16b, v19.16b
mov v19.16b, v20.16b
mov v20.16b, v21.16b
mov v21.16b, v22.16b
mov v22.16b, v23.16b
mov v23.16b, v24.16b
mov v24.16b, v25.16b
mov v25.16b, v26.16b
mov v26.16b, v27.16b
mov v27.16b, v28.16b
mov v28.16b, v29.16b
mov v29.16b, v30.16b
mov v30.16b, v31.16b
mov v31.16b, v8.16b
mov v8.16b, v9.16b
mov v9.16b, v10.16b
mov v10.16b, v11.16b
mov v11.16b, v12.16b
mov v12.16b, v13.16b
mov v13.16b, v14.16b
mov v14.16b, v15.16b
mov v15.16b, v2.16b
ldr q2, [sp] // 16-byte Folded Reload
fmov d0, x3
mul x3, x12, x1
[...]
With -consider-local-interval-cost the same section of code results in the
following:
$ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o -
[...]
.LBB0_1: // %for.cond1.preheader
// =>This Inner Loop Header: Depth=1
add x14, x11, x9
ldp q0, q1, [x14]
ldur x1, [x10, #-256]
ldur x2, [x10, #-192]
add x9, x9, #64 // =64
mov x12, v0.d[1]
fmov x15, d0
mov x13, v1.d[1]
fmov x16, d1
ldp q0, q1, [x14, #32]
mul x3, x15, x1
cmp x9, #512 // =512
mov x14, v0.d[1]
fmov x17, d0
fmov d0, x3
mul x3, x12, x1
[...]
Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet
Reviewed By: dmgreen
Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69437
|
|
|
|
| |
This reverts commit b7b170c to give the author more time to address failing tests on the expensive checks buildbots.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this change, when a nested tag type of any kind (enum, class,
struct, union) is used as a variable type, it is emitted without
emitting the parent type. In CodeView, parent types point to their inner
types, and inner types do not point back to their parents. We already
walk over all of the parent scopes to build the fully qualified name.
This change simply requests their type indices as we go along to enusre
they are all emitted.
Fixes PR43905
Reviewers: akhuang, amccarth
Differential Revision: https://reviews.llvm.org/D69924
|
| |
|
| |
|