| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the situation, where we generate the following code:
crxor 8, 8, 8
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 8, 8
cror (COPY of CRbit) depends on the result of the crxor instruction.
CR8 is known to be zero as crxor is equivalent to CRUNSET. We can simply use
crxor 1, 1, 1 instead to zero out CR1, which does not have any dependency on
any previous instruction.
This patch will optimize it to:
< Some instructions>
.LBB0_1:
< Some instructions>
cror 1, 1, 1
Patch By: Victor Huang (NeHuang)
Differential Revision: https://reviews.llvm.org/D62044
llvm-svn: 361632
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the SVE2 character match instructions MATCH and NMATCH.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62206
llvm-svn: 361627
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 bitwise shift right narrow:
* SQSHRUNB, SQSHRUNT, SQRSHRUNB, SQRSHRUNT, SHRNB, SHRNT, RSHRNB, RSHRNT,
SQSHRNB, SQSHRNT, SQRSHRNB, SQRSHRNT, UQSHRNB, UQSHRNT, UQRSHRNB,
UQRSHRNT
SVE2 integer add/subtract narrow high part:
* ADDHNB, ADDHNT, RADDHNB, RADDHNT, SUBHNB, SUBHNT, RSUBHNB, RSUBHNT
SVE2 saturating extract narrow:
* SQXTNB, SQXTNT, UQXTNB, UQXTNT, SQXTUNB, SQXTUNT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62205
llvm-svn: 361624
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 bitwise shift and insert:
* SRI, SLI
SVE2 bitwise shift right and accumulate:
* SSRA, USRA, SRSRA, URSRA
SVE2 complex integer add:
* CADD, SQCADD
SVE2 integer absolute difference and accumulate:
* SABA, UABA
SVE2 integer absolute difference and accumulate long:
* SABALB, SABALT, UABALB, UABALT
SVE2 integer add/subtract long with carry:
* ADCLB, ADCLT, SBCLB, SBCLT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62204
llvm-svn: 361622
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.
computeKnownBits then uses this function to improve codegen, notably vector code after legalization.
A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.
This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.
Differential Revision: https://reviews.llvm.org/D61887
llvm-svn: 361620
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds support for the polynomial multiplication instructions
PMULLB/PMULLT. The 64-bit source and 128-bit destination element
variants are enabled with crypto extensions (+sve2-aes), similar to the
NEON PMULL2 instruction. All other variants are enabled with +sve2.
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62145
llvm-svn: 361619
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
SVE2 integer add/subtract long:
* SADDLB, SADDLT, UADDLB, UADDLT, SSUBLB, SSUBLT, USUBLB, USUBLT,
SABDLB, SABDLT, UABDLB, UABDLT
SVE2 integer add/subtract wide:
* SADDWB, SADDWT, UADDWB, UADDWT, SSUBWB, SSUBWT, USUBWB, USUBWT
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62142
llvm-svn: 361615
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds support for the SVE2 saturating/rounding bitwise shift
left (predicated) group of instructions:
* SRSHL, URSHL, SRSHLR, URSHLR, SQSHL, UQSHL, SQRSHL, UQRSHL,
SQSHLR, UQSHLR, SQRSHLR, UQRSHLR
Immediate forms of the SQSHL and UQSHL instructions are also added to
the existing SVE bitwise shift by immediate (predicated) group, as well
as three new instructions SRSHR/URSHR/SQSHLU. The new instructions in
this group are encoded similarly and are implemented using the same
TableGen class with a minimal change (1 bit in encoding).
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62140
llvm-svn: 361612
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
* SQADD, UQADD, SUQADD, USQADD
* SQSUB, UQSUB, SQSUBR, UQSUBR
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62130
llvm-svn: 361611
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Bit 20 in sve2_int_arith_pred TableGen class was overlapping. The
encodings are not affected as bit 20 is defined by the opc bits
and this was overwriting the earlier error of setting bit 20 to 0.
Raised by Momchil: https://reviews.llvm.org/D62130
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D62292
llvm-svn: 361609
|
| |
|
|
|
|
|
|
| |
swifterror marks an argument as a register pretending to be a pointer, so we
need a guaranteed mem2reg-like analysis of its uses. Fortunately most of the
infrastructure can be reused from the DAG world.
llvm-svn: 361608
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The D45316 introduced the `shouldTransformMulToShiftsAddsSubs` function
to check that breaking down constant multiplications into a series
of shifts, adds, and subs is efficient. Unfortunately, this function
does not check maximum number of steps on all paths of the algorithm.
This patch fixes this bug.
Fix for PR41929.
Differential Revision: https://reviews.llvm.org/D62166
llvm-svn: 361606
|
| |
|
|
|
|
|
|
|
|
|
| |
This pass wasn't printing any messages at all, which I find really inconvenient
while debugging/tracing things. It now dumps the before and after of expanded
instructions. It doesn't do this yet for all instructions, but this is a good
start I guess.
Differential Revision: https://reviews.llvm.org/D62297
llvm-svn: 361604
|
| |
|
|
|
|
|
|
|
|
| |
When we are scheduling the load and addi, if all other heuristic didn't take effect,
we will try to schedule the addi before the load, to hide the latency, and avoid the
true dependency added by RA. And this only take effects for Power9.
Differential Revision: https://reviews.llvm.org/D61930
llvm-svn: 361600
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
On Windows, X8 may be used to pass in the address of an aggregate that
is returned indirectly. Therefore, it should be forwarded to variadic
musttail calls and preserved in thunks.
Fixes PR41997
Reviewers: mgrang, efriedma
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62344
llvm-svn: 361585
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Constant stores of f32 values can create such NvCast nodes.
Reviewers: t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62285
llvm-svn: 361584
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: These were previously causing ISel failures.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62354
llvm-svn: 361577
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were assuming a much larger possible per-wave visible stack
allocation than is possible:
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/faa3ae51388517353afcdaf9c16621f879ef0a59/src/core/runtime/amd_gpu_agent.cpp#L70
Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.
Remove the corresponding subtarget feature and option that made
this configurable.
llvm-svn: 361541
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
debug info"
Fixes https://bugs.llvm.org/show_bug.cgi?id=40969
The functions findPotentiallyBlockedCopies and buildCopy are currently not
accounting for the presence of debug instructions. In the former this results
in the optimization not being trigerred, and in the latter results in
inconsistent codegen.
This patch enables the optimization to be performed in a debug build and
ensures the codegen is consistent with non-debug builds.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D61680
llvm-svn: 361527
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61037
llvm-svn: 361526
|
| |
|
|
| |
llvm-svn: 361519
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These features will both be implemented soon, so I thought I would
save time by adding the boilerplate for both of them at the same time.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D62047
llvm-svn: 361516
|
| |
|
|
|
|
|
|
|
| |
This patch adds the pseudo instructions la.tls.ie and la.tls.gd, used in
the initial-exec and global-dynamic TLS models respectively when
addressing a global. The pseudo instructions are expanded in the
assembly parser.
llvm-svn: 361499
|
| |
|
|
|
|
|
|
|
|
|
| |
The previous patch added a member set to store instructions that we
could allow to wrap. But this wasn't cleared between searches meaning
that they could get promoted, incorrectly, during the promotion of a
separate valid chain.
Differential Revision: https://reviews.llvm.org/D62254
llvm-svn: 361462
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In this patch, `ISD::RETURNADDR` is lowered on the emscripten target
to the new Emscripten runtime function `emscripten_return_address`, which
implements the functionality.
Patch by Guanzhong Chen
Reviewers: tlively, aheejin
Reviewed By: tlively
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62210
llvm-svn: 361454
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In general dynamic/local dynamic TLS models, with -fno-plt,
* x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT`
Note, on x86, if we can get rid of %ebx as the PIC register,
it may be better to use a register not preserved across function calls.
* x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT`
Reorganize the code by separating 32-bit and 64-bit.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D62106
llvm-svn: 361453
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove a bunch of isel patterns that become unnecessary.
We effectively had a second set of isel patterns that tried to use a
regular store instruction and an extract_subreg instruction. Or a masked move
and an extract_subreg. These patterns were intended to override the
matching of VEXTRACT instructions by taking advantage of the priority
of the explicit immediate 0 for the index.
This patch instaed just disables the immediate 0 matchin the VEXTRACT
patterns. This each of the component pieces of the larger patterns will
match by themselves.
This found a bug of sorts were we didn't use 128-bit store for 512->128
extract on KNL. Its unclear what the right thing here should be.
Using the vextract avoids constraining the register allocator to use
xmm0-15. But it always results in a longer encoding if the register
allocator ends up choosing xmm0-15 anyway.
llvm-svn: 361431
|
| |
|
|
|
|
|
|
| |
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/17792/steps/test-check-all/logs/stdio
Failing Tests (1):
LLVM :: CodeGen/AMDGPU/regbank-reassign.mir
llvm-svn: 361430
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into llvm.ceil/floor. Remove some isel patterns that existed because that was happening.
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions. For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.
We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.
Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.
I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.
llvm-svn: 361425
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fix is for the problem from https://bugs.llvm.org/show_bug.cgi?id=38714.
Specifically, Simple Register Coalescing creates following conversion :
undef %0.sub_32:gpr64 = ORRWrs $wzr, %3:gpr32common, 0, debug-location !24;
It copies 32-bit value from gpr32 into gpr64. But Live DEBUG_VALUE analysis
is not able to create debug location record for that instruction. So the problem
is in that debug info for argc variable is incorrect. The fix is
to write custom isCopyInstrImpl() which would recognize the ORRWrs instr.
llvm-svn: 361417
|
| |
|
|
|
|
| |
Don't check for unsupported targets for every instruction.
llvm-svn: 361406
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.
For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.
llvm-svn: 361405
|
| |
|
|
|
|
|
|
|
|
| |
See bug 41361: https://bugs.llvm.org/show_bug.cgi?id=41361
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D61012
llvm-svn: 361386
|
| |
|
|
|
|
| |
Fixes scan-build warning.
llvm-svn: 361375
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the tiny code model is requested for a target machine that does not
support this, we get an error message (which is nice) but also this diagnostic
and request to submit a bug report:
fatal error: error in backend: Target does not support the tiny CodeModel
[Inferior 2 (process 31509) exited with code 0106]
clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation)
(gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74)
Target: arm-arm-none-eabi
Thread model: posix
clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-9: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c
clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh
clang-9: note: diagnostic msg:
But this is not a bug, this is a feature. :-) Not only is this not a bug, this
is also pretty confusing. This patch causes just to print the fatal error and
not the diagnostic:
fatal error: error in backend: Target does not support the tiny CodeModel
Differential Revision: https://reviews.llvm.org/D62236
llvm-svn: 361370
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For big-endian powerpc64, the default ABI is ELFv1. OpenPower ABI ELFv2 is supported when -mabi=elfv2 is specified. FreeBSD support for PowerPC64 ELFv2 ABI with LLVM is in progress[1]. This patch adds an alternative way to specify ELFv2 ABI on target triple [2].
The following results are expected:
ELFv1 when using:
-target powerpc64-unknown-freebsd12.0
-target powerpc64-unknown-freebsd12.0 -mabi=elfv1
-target powerpc64-unknown-freebsd12.0-elfv1
ELFv2 when using:
-target powerpc64-unknown-freebsd12.0 -mabi=elfv2
-target powerpc64-unknown-freebsd12.0-elfv2
[1] https://wiki.freebsd.org/powerpc/llvm-elfv2
[2] https://clang.llvm.org/docs/CrossCompilation.html
Patch by Alfredo Dal'Ava Júnior!
Differential Revision: https://reviews.llvm.org/D61950
llvm-svn: 361355
|
| |
|
|
|
|
|
|
|
| |
The Armv8.2-A crypto extensions all defaulted to true, but should default to
false, like all the other extensions.
Differential Revision: https://reviews.llvm.org/D62180
llvm-svn: 361354
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the
combineVectorSizedSetCCEquality() transform more conservative by
checking that the bitcast to the vector type will be cheap/free
for both operands. I'm considering it cheap if it's a constant,
a load or already a vector. I've dropped the explicit check for
f128 because it should fall out naturally (in the cases where
it'd be detrimental).
Differential Revision: https://reviews.llvm.org/D62220
llvm-svn: 361352
|
| |
|
|
| |
llvm-svn: 361347
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D62173
llvm-svn: 361346
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
CET-IBT enabled
Return-twice functions will indirectly jump after the caller's position.
So when CET-IBT is enable, we should make sure these is endbr*
instructions follow these Return-twice function caller. Like GCC does.
Patch by Xiang Zhang (xiangzhangllvm)
Differential Revision: https://reviews.llvm.org/D61881
llvm-svn: 361342
|
| |
|
|
| |
llvm-svn: 361333
|
| |
|
|
|
|
|
| |
There should probably be nonconvergent versions, but my guess is it
doesn't matter in practice.
llvm-svn: 361331
|
| |
|
|
| |
llvm-svn: 361330
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
r360889 added new llround builtin functions. This patch adds their
signatures for the WebAssembly backend.
It also adds wasm32 support to utils/update_llc_test_checks.py, since
that's the script other targets are using for their testcases for this
feature.
Differential Revision: https://reviews.llvm.org/D62207
llvm-svn: 361327
|
| |
|
|
|
|
| |
We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again.
llvm-svn: 361288
|
| |
|
|
|
|
| |
Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692
llvm-svn: 361270
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On PowerPC64 ELFv2 ABI, functions may have 2 entry points: global and local.
The local entry point location of a function is stored in the st_other field of the symbol, as an offset relative to the global entry point.
In order to make symbol assignments (e.g. .equ/.set) work properly with this, PPCTargetELFStreamer already copies the local entry bits from the source symbol to the destination one, on emitAssignment(). The problem is that this copy is performed only at the assignment location, where the source symbol may not yet have processed the .localentry directive, that sets the local entry. This may cause the destination symbol to end up with wrong local entry information. Other symbol info is not affected by this because, in this case, the destination symbol value is actually a symbol reference.
This change keeps track of these assignments, and update all needed st_other fields when finish() is called.
Patch by Leandro Lupori!
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D56586
llvm-svn: 361237
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some checks in isShuffleMaskLegal expect an even number of elements,
e.g. isTRN_v_undef_Mask or isUZP_v_undef_Mask, otherwise they access
invalid elements and crash. This patch adds checks to the impacted
functions.
Fixes PR41951
Reviewers: t.p.northover, dmgreen, samparker
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D60690
llvm-svn: 361235
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch adds support for the following instructions:
* URECPE, URSQRTE, SQABS, SQNEG
The specification can be found here:
https://developer.arm.com/docs/ddi0602/latest
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62129
llvm-svn: 361230
|