| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
This fixes a crash when debug instructions are in between 2 stores.
|
|
|
|
|
|
|
|
| |
This reverts commit 3f5bf35f868d1e33cd02a5825d33ed4675be8cb1 as it was
causing build failures in llvm-clang-x86_64-expensive-checks:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In some cases, we can rename a store operand, in order to enable pairing
of stores. For store pairs, that cannot be merged because the first
tored register is defined in between the second store, we try to find
suitable rename register.
First, we check if we can rename the given register:
1. The first store register must be killed at the store, which means we
do not have to rename instructions after the first store.
2. We scan backwards from the first store, to find the definition of the
stored register and check all uses in between are renamable. Along
they way, we collect the minimal register classes of the uses for
overlapping (sub/super)registers.
Second, we try to find an available register from the minimal physical
register class of the original register. A suitable register must not be
1. defined before FirstMI
2. between the previous definition of the register to rename
3. a callee saved register.
We use KILL flags to clear defined registers while scanning from the
beginning to the end of the block.
This triggers quite often, here are the top changes for MultiSource,
SPEC2000, SPEC2006 compiled with -O3 for iOS:
Metric: aarch64-ldst-opt.NumPairCreated
Program base patch diff
test-suite...nch/fourinarow/fourinarow.test 2.00 39.00 1850.0%
test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test 46.00 80.00 73.9%
test-suite...chmarks/Olden/power/power.test 70.00 96.00 37.1%
test-suite...cations/hexxagon/hexxagon.test 29.00 39.00 34.5%
test-suite...nchmarks/McCat/05-eks/eks.test 100.00 132.00 32.0%
test-suite.../Trimaran/enc-rc4/enc-rc4.test 46.00 59.00 28.3%
test-suite...T2006/473.astar/473.astar.test 160.00 200.00 25.0%
test-suite.../Trimaran/enc-md5/enc-md5.test 8.00 10.00 25.0%
test-suite...telecomm-gsm/telecomm-gsm.test 113.00 139.00 23.0%
test-suite...ediabench/gsm/toast/toast.test 113.00 139.00 23.0%
test-suite...Source/Benchmarks/sim/sim.test 91.00 111.00 22.0%
test-suite...C/CFP2000/179.art/179.art.test 41.00 49.00 19.5%
test-suite...peg2/mpeg2dec/mpeg2decode.test 245.00 279.00 13.9%
test-suite...marks/Olden/health/health.test 16.00 18.00 12.5%
test-suite...ks/Prolangs-C/cdecl/cdecl.test 90.00 101.00 12.2%
test-suite...fice-ispell/office-ispell.test 91.00 100.00 9.9%
test-suite...oxyApps-C/miniGMG/miniGMG.test 430.00 465.00 8.1%
test-suite...lowfish/security-blowfish.test 39.00 42.00 7.7%
test-suite.../Applications/spiff/spiff.test 42.00 45.00 7.1%
test-suite...arks/mafft/pairlocalalign.test 2473.00 2646.00 7.0%
test-suite.../VersaBench/ecbdes/ecbdes.test 29.00 31.00 6.9%
test-suite...nch/beamformer/beamformer.test 220.00 235.00 6.8%
test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2252.00 6.7%
test-suite...ve-susan/automotive-susan.test 109.00 116.00 6.4%
test-suite...s-C/unix-smail/unix-smail.test 65.00 69.00 6.2%
test-suite...CI_Purple/SMG2000/smg2000.test 1194.00 1265.00 5.9%
test-suite.../Benchmarks/nbench/nbench.test 472.00 500.00 5.9%
test-suite...oxyApps-C/miniAMR/miniAMR.test 248.00 262.00 5.6%
test-suite...quoia/CrystalMk/CrystalMk.test 18.00 19.00 5.6%
test-suite...rks/tramp3d-v4/tramp3d-v4.test 7331.00 7710.00 5.2%
test-suite.../Benchmarks/Bullet/bullet.test 5651.00 5938.00 5.1%
test-suite...ternal/HMMER/hmmcalibrate.test 750.00 788.00 5.1%
test-suite...T2006/456.hmmer/456.hmmer.test 764.00 802.00 5.0%
test-suite...ications/JM/ldecod/ldecod.test 1028.00 1079.00 5.0%
test-suite...CFP2006/444.namd/444.namd.test 1368.00 1434.00 4.8%
test-suite...marks/7zip/7zip-benchmark.test 4471.00 4685.00 4.8%
test-suite...6/464.h264ref/464.h264ref.test 3122.00 3271.00 4.8%
test-suite...pplications/oggenc/oggenc.test 1497.00 1565.00 4.5%
test-suite...T2000/300.twolf/300.twolf.test 742.00 774.00 4.3%
test-suite.../Prolangs-C/loader/loader.test 24.00 25.00 4.2%
test-suite...0.perlbench/400.perlbench.test 1983.00 2058.00 3.8%
test-suite...ications/JM/lencod/lencod.test 4612.00 4785.00 3.8%
test-suite...yApps-C++/PENNANT/PENNANT.test 995.00 1032.00 3.7%
test-suite...arks/VersaBench/dbms/dbms.test 54.00 56.00 3.7%
Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D70450
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These changes allow us to support sign-extending gather loads with the
exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*).
Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential revision: https://reviews.llvm.org/D70812
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
outlined functions"
This reverts commit cec2d5c17457722113580251c8a045fa9aca9b1b.
Reverting because this is still creating outlined functions with return
address signing instructions with mismatches SP values. For example:
int *volatile v;
void foo(int x) {
int a[x];
v = &a[0];
v = &a[0];
v = &a[0];
v = &a[0];
v = &a[0];
v = &a[0];
}
void bar(int x) {
int a[x];
v = 0;
v = &a[0];
v = &a[0];
v = &a[0];
v = &a[0];
v = &a[0];
}
This generates these two outlined functions, both of which modify SP
between the paciasp and retaa instructions:
$ clang --target=aarch64-arm-none-eabi -march=armv8.3-a -c test2.c -o - -S -Oz -mbranch-protection=pac-ret+leaf
...
OUTLINED_FUNCTION_0: // @OUTLINED_FUNCTION_0
.cfi_sections .debug_frame
.cfi_startproc
// %bb.0:
paciasp
.cfi_negate_ra_state
mov w8, w0
lsl x8, x8, #2
add x8, x8, #15 // =15
mov x9, sp
and x8, x8, #0x7fffffff0
sub x8, x9, x8
mov x29, sp
mov sp, x8
adrp x9, v
retaa
...
OUTLINED_FUNCTION_1: // @OUTLINED_FUNCTION_1
.cfi_startproc
// %bb.0:
paciasp
.cfi_negate_ra_state
str x8, [x9, :lo12:v]
str x8, [x9, :lo12:v]
str x8, [x9, :lo12:v]
str x8, [x9, :lo12:v]
str x8, [x9, :lo12:v]
mov sp, x29
retaa
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the following intrinsics:
- llvm.aarch64.sve.ldnt1
- llvm.aarch64.sve.stnt1
This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch fixes a few issues when large arrays are allocated on the
stack. Currently, clang has inconsistent behaviour, for debug builds
there is an assertion failure when the array size on stack is around 2GB
but there is no assertion when the stack is around 8GB. For release
builds there is no assertion, the compilation succeeds but generates
incorrect code. The incorrect code generated is due to using
int/unsigned int instead of their 64-bit counterparts. This patch,
1) Removes the assertion in frame legality check.
2) Converts int/unsigned int in some places to the 64-bit variants. This
helps in generating correct code and removes the inconsistent behaviour.
3) Adds a test which runs without optimisations.
Reviewers: sdesmalen, efriedma, fhahn, aemerson
Reviewed By: efriedma
Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70496
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Recognize wide compares where the wide operand is a splat of a scalar
value in the appropriate range and convert to the immediate variant of
the instruction.
Patch by Graham Hunter
Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71009
|
|
|
|
|
|
|
|
|
|
| |
The generated sequence with whilelo is unintuitive, but it's the best
I could come up with given the limited number of SVE instructions that
interact with scalar registers. The other sequence I was considering
was something like dup+cmpne, but an extra scalar instruction seems
better than an extra vector instruction.
Differential Revision: https://reviews.llvm.org/D71160
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.
This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.
This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.
This also fixes situations in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.
This also allows us to handle cases like this:
$ebx = [...]
$rdi = MOVSX64rr32 $ebx
$esi = MOV32rr $edi
CALL64pcrel32 @call
The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.
This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.
I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: djtodoro, vsk
Subscribers: ormris, hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D70431
|
|
|
|
|
| |
This reverts commit 3cd93a4efcdeabeb20cb7bec9fbddcb540d337a1.
I'll recommit with a well-formatted arcanist commit message.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.
This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.
This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.
This also fixes a case in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.
This also allows us to handle cases like this:
$ebx = [...]
$rdi = MOVSX64rr32 $ebx
$esi = MOV32rr $edi
CALL64pcrel32 @call
The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.
This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.
I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
|
| |
|
|
|
|
| |
This prevents unused variable warnings from breaking the build.
|
|
|
|
|
|
| |
Only implemented for the type combinations already supported for G_SHL.
Differential Revision: https://reviews.llvm.org/D71153
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When trying to calculate the offsets for the jump table entries
we fail to take into account the block alignment, which could be
greater than 4 bytes. This led to cases where the jump table
offset was too big to fit in a byte.
Reviewers: t.p.northover, sdesmalen, ostannard
Reviewed By: ostannard
Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits
Committed on behalf of David Sherwood (david-arm)
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70533
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the following intrinsics:
* whilege, whilegt, whilehi, whilehs
Reviewers: sdesmalen, rovka, dancgr, efriedma, rengolin, huntergr
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70909
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds intrinsics for the following:
* cmphs, cmphi
* cmpge, cmpgt
* cmpeq, cmpne
* cmplt, cmple
* cmplo, cmpls
Includes a minor change to `TLI.getMemValueType` that fixes a crash due to the
scalable flag being dropped.
Reviewers: sdesmalen, efriedma, rengolin, rovka, dancgr, huntergr
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70889
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When MUL is the first operand to SUB, we can't use MLS because the accumulator
should be negated. Emit a NEG of the accumulator and an MLA instead, similar to
what we do for FMUL / FSUB fusing.
Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf, mstorsjo, asbirlea
Reviewed By: asbirlea
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71067
|
|
|
|
|
|
|
|
| |
Added pattern matching/intrinsics for the following SVE instructions:
-- saddv, uaddv
-- smaxv, sminv, umaxv, uminv
-- orv, eorv, andv
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds intrinsics for the following:
* cntb
* cnth
* cntw
* cntd
* cntp
Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma, rovka
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70967
|
|
|
|
|
|
|
| |
Add instrinics and patters for the following logical predicate instructions:
-- and, ands, bic, bics, eor, eors
-- sel
-- orr, orrs, orn, orns, nor, nors, nand, nads
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Reland after fixing an ASan failure by stopping outlining early if the
constraints for return address signing removed too many outlining candidates.
During AArch64 frame lowering instructions to enable return address
signing are inserted into functions if needed. Functions generated during
machine outlining don't run through target frame lowering and hence are
missing such instructions.
This patch introduces the following changes:
1. If not all functions that potentially participate in function outlining agree
on their return address signing scope and their return address signing key,
outlining is disabled for these functions.
2. If not all functions that potentially participate in function outlining agree
on their support for v8.3A features, outlining is disabled for these
functions.
3. If an outlining candidate would outline instructions that modify sp in a way
that invalidates return address signing, outlining is disabled for that
particular candidate.
4. If all candidate functions agree on the signing scope, signing key and their
support for v8.3 features, the outlined function behaves as if it had the
same scope and key attributes and as if it would provide the same v8.3A
support as the original functions.
Reviewers: ostannard, paquette
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70635
|
|
|
|
|
|
|
|
|
| |
outlined functions"
This reverts commit 02760b750b2ffcc0e2f5d78ecb137c80930c42c3.
The original commit is not asan clean.
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/37147/steps/check-llvm%20asan/logs/stdio
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Reland after fixing a bug that allowed outlining of SP modifying instructions
that invalidated return address signing.
During AArch64 frame lowering instructions to enable return address
signing are inserted into functions if needed. Functions generated during
machine outlining don't run through target frame lowering and hence are
missing such instructions.
This patch introduces the following changes:
1. If not all functions that potentially participate in function outlining agree
on their return address signing scope and their return address signing key,
outlining is disabled for these functions.
2. If not all functions that potentially participate in function outlining agree
on their support for v8.3A features, outlining is disabled for these
functions.
3. If an outlining candidate would outline instructions that modify sp in a way
that invalidates return address signing, outlining is disabled for that
particular candidate.
4. If all candidate functions agree on the signing scope, signing key and their
support for v8.3 features, the outlined function behaves as if it had the
same scope and key attributes and as if it would provide the same v8.3A
support as the original functions.
Reviewers: ostannard, paquette
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70635
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds intrinsics for the following:
* rbit
* revb
* revh
* revw
Patterns are also defined to map the 'llvm.bswap.*' intrinsic to the SVE
revb instruction.
Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma, rovka
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70960
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, getIntImmCost returns TCC_Free for almost all intrinsics.
For most AArch64 specific intrinsics however, it looks like integer
constants cannot be folded into most of them (at least the ones I checked).
Unless we know that we can fold integer operands with the intrinsic, we
handle more cases correctly by returning the cost to materialize the
immediate than return TCC_Free.
Reviewers: SjoerdMeijer, dmgreen, t.p.northover, ributzka
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D70669
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The ISel pattern for SIMD MLA is a bit too eager: it replaces the ADD with an
MLA even when the MUL cannot be eliminated, e.g. when it has another use. An
MLA is usually has a higher latency than an ADD (and there are fewer pipes
available that can execute it), so trading an MLA for an ADD is not great.
ISel is not taking the number of uses of the MUL result into account, nor any
other factors such as the length of the critical path or other resource pressure.
The MachineCombiner is able to make these judgments so this patch ports the ISel
pattern for MUL/ADD fusing to the MachineCombiner.
Similarly for MUL/SUB -> MLS, as well as the indexed variants.
The change has no impact on SPEC CPU© intrate nor fprate.
Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70673
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index:
* @llvm.aarch64.sve.ld1.gather.imm
This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words):
* ld1h { z0.d }, p0/z, [z0.d, #16]
Committed on behalf of Andrzej Warzynski (andwar)
Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma
Reviewed By: sdesmalen
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70806
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are:
* unscaled
* @llvm.aarch64.sve.ld1.gather.sxtw
* @llvm.aarch64.sve.ld1.gather.uxtw
* scaled (offsets become indices)
* @llvm.arch64.sve.ld1.gather.sxtw.index
* @llvm.arch64.sve.ld1.gather.uxtw.index
The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits.
These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words):
* unscaled
* ld1h { z0.s }, p0/z, [x0, z0.s, sxtw]
* ld1h { z0.s }, p0/z, [x0, z0.s, uxtw]
* scaled
* ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1]
* ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1]
Committed on behalf of Andrzej Warzynski (andwar)
Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma
Reviewed By: sdesmalen
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70782
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the following intrinsics:
- faddp
- fmaxp, fminp, fmaxnmp & fminnmp
- fmlalb, fmlalt, fmlslb & fmlslt
- flogb
Reviewers: huntergr, sdesmalen, dancgr, efriedma
Reviewed By: sdesmalen
Subscribers: efriedma, tschuett, kristof.beyls, hiraditya, cameron.mcinally, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70253
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the following intrinsics for gather loads with 64-bit offsets:
* @llvm.aarch64.sve.ld1.gather (unscaled offset)
* @llvm.aarch64.sve.ld1.gather.index (scaled offset)
These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words):
* ld1h { z0.d }, p0/z, [x0, z0.d]
* ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1]
Committing on behalf of Andrzej Warzynski (andwar)
Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70542
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds the following intrinsics:
- asr & asrd
- insr
- lsl & lsr
This patch also adds a new AArch64ISD node (INSR) to represent the int_aarch64_sve_insr intrinsic.
Reviewers: huntergr, sdesmalen, dancgr, mgudim, rengolin, efriedma
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70437
|
|
|
|
|
|
|
|
|
|
|
| |
analyzePhysReg does not really fit into the iterator and moving it
makes it easier to change the base iterator.
Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D70559
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add support for vcadd_* family of intrinsics. This set of intrinsics is
available in Armv8.3-A.
The fp16 versions require the FP16 extension, which has been available
(opt-in) since Armv8.2-A.
Reviewers: t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70862
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
adjustment when optimizing for size"
This caused asserts (and perhaps also miscompiles) while building for Windows
on AArch64. See the discussion on D68530 for details and reproducer.
Reverting until this can be investigated and fixed.
> For arm64, D18619 introduced the ability to combine bumping the stack pointer
> upfront in case it needs to be bumped for both the callee-save area as well as
> the local stack area.
>
> That diff already remarks that "This change can cause an increase in
> instructions", but argues that even when that happens, it should be still be a
> performance benefit because the number of micro-ops is reduced.
>
> We have observed that this code-size increase can be significant in practice.
> This diff disables combining stack bumping for methods that are marked as
> optimize-for-size.
>
> Example of a prologue with the behavior before this diff (combining stack bumping when possible):
> sub sp, sp, #0x40
> stp d9, d8, [sp, #0x10]
> stp x20, x19, [sp, #0x20]
> stp x29, x30, [sp, #0x30]
> add x29, sp, #0x30
> [... compute x8 somehow ...]
> stp x0, x8, [sp]
>
> And after this diff, if the method is marked as optimize-for-size:
> stp d9, d8, [sp, #-0x30]!
> stp x20, x19, [sp, #0x10]
> stp x29, x30, [sp, #0x20]
> add x29, sp, #0x20
> [... compute x8 somehow ...]
> stp x0, x8, [sp, #-0x10]!
>
> Note that without combining the stack bump there are two auto-decrements,
> nicely folded into the stp instructions, whereas otherwise there is a single
> sub sp, ... instruction, but not folded.
>
> Patch by Nikolai Tillmann!
>
> Differential Revision: https://reviews.llvm.org/D68530
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In rG643ac6c0420b, the syntax `ldraa x1, [x0]!` was added as an alias
for `ldraa x1, [x0, #0]!`. That syntax is less obvious in meaning, and
also will not be accepted by assemblers that haven't been updated yet.
So it would be better not to emit it as the preferred disassembly for
that instruction.
This change lowers the EmitPriority of the new alias so that the more
explicit syntax `[x0, #0]!` is preferred by the disassembler. The new
syntax is still accepted by the assembler.
Reviewers: ab, ostannard
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70813
|
|
|
|
| |
Very simple change, just adding the extra syntax variant.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.
To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.
This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
legality of masked operations as well as normal ones. This array is
fairly small, so doubling the size still won't make it very large.
Offset masked loads can then be controlled with
setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
the same way.
- The ARM backend is then adjusted to make use of these indexed masked
loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.
Differential Revision: https://reviews.llvm.org/D70176
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds intrinsics for the following:
- fcvt
- fcvtzs & fcvtzu
- scvtf & ucvtf
- fcvtlt, fcvtnt
- fcvtx & fcvtxnt
Reviewers: huntergr, sdesmalen, dancgr, mgudim, efriedma
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70180
|
| |
|
|
|
|
| |
Add the scheduling and cost models for Exynos M5.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The insertion of most CFI instructions during AArch64 frame lowering can
be disabled (e.g. using the function attribute `nounwind`).
This patch enables conditional insertion for one more CFI instruction.
Reviewers: t.p.northover, ostannard
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70129
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
(Split of off D67120)
SelectionDAG::shouldOptForSize changes for profile guided size optimization.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70095
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so. Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:
1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so. This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.
With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.
2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set. This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.
I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:
- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON
Reviewers: beanz, smeenai, compnerd, phosek
Reviewed By: beanz
Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70179
|
|
|
|
|
|
| |
as it's causing test failures in llvm-mca.
This reverts commit 9bdfee2a3bd13d405ce1592930182f23849d2897.
|
|
|
|
| |
Add the scheduling and cost models for Exynos M5.
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D69434
|