| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In 2e24219d3cbf, a number of ARM pcrel fixups were resolved at assembly
time, to solve PR44929. This only covered little-endian ARM however, so
add similar fixups for big-endian ARM. Also extend the test case to
cover big-endian ARM.
Reviewers: hans, psmith, MaskRay
Reviewed By: psmith, MaskRay
Subscribers: kristof.beyls, hiraditya, danielkiss, emaste, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79774
(cherry picked from commit fc373522b044e0b150561204958f0d603fb4caba)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When compiling for a arm5te cpu from clang, the +dsp attribute is set.
This meant we could try and generate qadd8 instructions where we would
end up having no pattern. I've changed the condition here to be hasV6Ops
&& hasDSP, which is what other parts of ARMISelLowering seem to use for
similar instructions.
Fixed PR45677.
Differential Revision: https://reviews.llvm.org/D78877
(cherry picked from commit 8807139026b64ac40163bb255dad38a1d8054f08)
|
|
|
|
|
|
|
|
|
|
|
|
| |
MC currently does not emit these relocation types, and lld does not
handle them. Add FKF_Constant as a work-around of some ARM code after
D72197. Eventually we probably should implement these relocation types.
By Fangrui Song!
Differential revision: https://reviews.llvm.org/D72892
(cherry picked from commit 2e24219d3cbfcb8c824c58872f97de0a2e94a7c8)
|
|
|
|
|
|
|
| |
The previous patch (cff90f07cb5cc3c3bc58277926103af31caef308) didn't
cover ARM.
(cherry picked from commit decd021facba804b57e8d80b6159c987d3261ab8)
|
|
|
|
|
|
|
|
|
|
|
|
| |
mutateStrictFPToFP can delete the node and replace it with another with the same
value which can later cause problems, and returning the result of
mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the
returned value has the same number of results as the original. Instead handle
things by doing the mutation manually.
Differential Revision: https://reviews.llvm.org/D74726
(cherry picked from commit 594a89f7270da74c89f2321432bc6a7135773fa5)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND
and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the
current implementation will cause in infinite loop for STRICT_FP_EXTEND due to
emitting a merge_values of the original node which after replacement becomes a
merge_values of itself.
Fix this by not doing anything for f32 to f64 extend when we have FP64, though
for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that
doesn't happen automatically for opcodes with custom lowering.
Differential Revision: https://reviews.llvm.org/D74559
(cherry picked from commit 0ec57972967dfb43fc022c2e3788be041d1db730)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These can be lowered to code sequences using CMPFP and CMPFPE which then get
selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain
operand isn't handled correctly, but resolving that looks like it would involve
changes around FPSCR-handling instructions and how the FPSCR is modelled.
The fp-intrinsics test was already testing some of this but as the entire test
was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the
cases where we aren't generating the right instruction sequences as FIXME.
Differential Revision: https://reviews.llvm.org/D73194
(cherry picked from commit b37d59353f699e99f139a9227a6a69964ef4b132)
|
|
|
|
| |
(cherry picked from commit ea9850b6c71d975935de15bd4128508b260165c5)
|
|
|
|
| |
This reverts commit 60e0120c913dd1d4bfe33769e1f000a076249a42.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under MVE, we do not have any lowering for fminimum, which a
vector_reduce_fmin without NoNan will be expanded into. As with the
other recent patches, force this to expand in the pre-isel pass. Note
that Neon lowering would be OK because the scalar fminimum uses the
vector VMIN instruction, but is probably better to just rely on the
scalar operations, which is what is done here.
Also fixes what appears to be the reversal of INF vs -INF in the
vector_reduce_fmin widening code.
(cherry picked from commit 362d00e0510ee75750499e2993a782428e377215)
|
|
|
|
|
|
|
|
|
|
|
|
| |
Followup to D73135. If the target doesn't have hard float (default
for ARM), then we assert when trying to soften the result of vector
reduction intrinsics. This patch marks these for expansion as well.
(A bit odd to use vectors on a target without hard float ... but
that's where you end up if you expose target-independent vector types.)
Differential Revision: https://reviews.llvm.org/D73854
(cherry picked from commit 1cc4f8d17247cd9be88addd75d060f9321b6f8b0)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fadd/fmul reductions without reassoc are lowered to
VECREDUCE_STRICT_FADD/FMUL nodes, which don't have legalization
support. Until that is in place, expand these intrinsics on
ARM and AArch64. Other targets always expand the vector reduction
intrinsics.
Additionally expand fmax/fmin reductions without nonan flag on
AArch64, as the backend asserts that the flag is present when
lowering VECREDUCE_FMIN/FMAX.
This fixes https://bugs.llvm.org/show_bug.cgi?id=44600.
Differential Revision: https://reviews.llvm.org/D73135
(cherry picked from commit 70d345e687caba4ac1f95655c6924dfa91e0083f)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
-fpatchable-function-entry=N,M where M>0
Similar to the function attribute `prefix` (prefix data),
"patchable-function-prefix" inserts data (M NOPs) before the function
entry label.
-fpatchable-function-entry=2,1 (1 NOP before entry, 1 NOP after entry)
will look like:
```
.type foo,@function
.Ltmp0: # @foo
nop
foo:
.Lfunc_begin0:
# optional `bti c` (AArch64 Branch Target Identification) or
# `endbr64` (Intel Indirect Branch Tracking)
nop
.section __patchable_function_entries,"awo",@progbits,get,unique,0
.p2align 3
.quad .Ltmp0
```
-fpatchable-function-entry=N,0 + -mbranch-protection=bti/-fcf-protection=branch has two reasonable
placements (https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01185.html):
```
(a) (b)
func: func:
.Ltmp0: bti c
bti c .Ltmp0:
nop nop
```
(a) needs no additional code. If the consensus is to go for (b), we will
need more code in AArch64BranchTargets.cpp / X86IndirectBranchTracking.cpp .
Differential Revision: https://reviews.llvm.org/D73070
(cherry picked from commit 22467e259507f5ead2a87d989251b4c951a587e4)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
|
|
|
|
|
|
|
|
|
|
| |
Fix a missing and broken test: 2 VPT blocks predicated on the same VCMP
instruction that can be folded. The problem was that for each VPT block, we
record the predicate statements with a list, but the same instruction was added
twice. Thus, we were running in an assert trying to remove the same instruction
twice. To avoid this the instructions are now recorded with a set.
Differential Revision: https://reviews.llvm.org/D72699
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This cleans up a lot of ugly `foreach` bodges that I've been using to
work around the lack of those two language features. Now they both
exist, I can make then all into something more legible!
In particular, in the common pattern in `ARMInstrMVE.td` where a
multiclass defines an `Instruction` instance plus one or more `Pat` that
select it, I've used a `defvar` to wrap `!cast<Instruction>(NAME)` so
that the patterns themselves become a little more legible.
Replacing a `foreach` with a `defvar` removes a level of block
structure, so several pieces of code have their indentation changed by
this patch. Best viewed with whitespace ignored.
NFC: the output of `llvm-tblgen -print-records` on the two affected
Tablegen sources is exactly identical before and after this change, so
there should be no effect at all on any of the other generated files.
Reviewers: MarkMurrayARM, miyuki
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72690
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a whitelist of instructions that we allow when tail
predicating, since these are trivial ones that we've deemed need no
special handling. Now change ARMLowOverheadLoops to allow the
non-trivial instructions if they're contained within a valid VPT
block. Since a valid block is one that is predicated upon the VCTP so
we know that these non-trivial instructions will still behave as
expected once the implicit predication is used instead.
This also fixes a previous test failure.
Differential Revision: https://reviews.llvm.org/D72509
|
|
|
|
|
|
|
|
|
|
| |
Use the already provided helper function to get the operand type so
that we can detect whether the vpr is being used as a predicate or
not. Also use existing helpers to get the predicate indices when we
converting the vpt blocks. This enables us to support both types of
vpr predicate operand.
Differential Revision: https://reviews.llvm.org/D72504
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80".
The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable.
To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP).
Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations.
When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant )
It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt).
Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma, andreadb
Reviewed By: efriedma
Subscribers: gbedwell, john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70680
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to the current way that we collect predicated instructions, we
can't easily handle vpsel in tail predicated loops. There are a
couple of issues:
1) It will use the VPR as a predicate operand, but doesn't have to be
instead a VPT block, which means we can assert while building up
the VPT block because we don't find another VPST to being a new
one.
2) VPSEL still requires a VPR operand even after tail predicating,
which means we can't remove it unless there is another
instruction, such as vcmp, that can provide the VPR def.
The first issue should be a relatively simple fix in the logic of the
LowOverheadLoops pass, whereas the second will require us to
represent the 'implicit' tail predication with an explicit value.
Differential Revision: https://reviews.llvm.org/D72629
|
|
|
|
|
|
|
|
| |
Enables the masked gather pass to create a masked
gather loading from a base and vector of offsets.
This also enables v8i16 and v16i8 gather loads.
Differential Revision: https://reviews.llvm.org/D72330
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.
If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RunttimeLibcalls.def and all associated usages
Summary:
This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare.
This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code.
Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn
Reviewed By: efriedma
Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72536
|
|
|
|
|
|
|
| |
Add the MVE min and max instructions to our tail predication
whitelist.
Differential Revision: https://reviews.llvm.org/D72502
|
| |
|
|
|
|
|
|
|
|
| |
Summary: This reverts commit 8c12769f3046029e2a9b4e48e1645b1a77d28650.
Reviewers:
Subscribers:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch fixes pr23772 [ARM] r226200 can emit illegal thumb2 instruction: "sub sp, r12, #80".
The violation was that SUB and ADD (reg, immediate) instructions can only write to SP if the source register is also SP. So the above instructions was unpredictable.
To enforce that the instruction t2(ADD|SUB)ri does not write to SP we now enforce the destination register to be rGPR (That exclude PC and SP).
Different than the ARM specification, that defines one instruction that can read from SP, and one that can't, here we inserted one that can't write to SP, and other that can only write to SP as to reuse most of the hard-coded size optimizations.
When performing this change, it uncovered that emitting Thumb2 Reg plus Immediate could not emit all variants of ADD SP, SP #imm instructions before so it was refactored to be able to. (see test/CodeGen/Thumb2/mve-stacksplot.mir where we use a subw sp, sp, Imm12 variant )
It also uncovered a disassembly issue of adr.w instructions, that were only written as SUBW instructions (see llvm/test/MC/Disassembler/ARM/thumb2.txt).
Reviewers: eli.friedman, dmgreen, carwil, olista01, efriedma
Reviewed By: efriedma
Subscribers: john.brawn, efriedma, ostannard, kristof.beyls, hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70680
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.
Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.
Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.
Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.
Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.
One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
|
|
|
|
|
|
| |
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
|
|
|
|
| |
Add a comment describing the dependencies of the pass.
|
|
|
|
|
|
|
|
| |
We don't unroll vector loops for MVE targets, but we miss the case
when loops only contain intrinsic calls. So just move the logic a
bit to catch this case.
Differential Revision: https://reviews.llvm.org/D72440
|
|
|
|
|
|
|
| |
This reverts commit e93e0d413f3afa1df5c5f88df546bebcd1183155.
There's some ordering problems on some on the buildbots which needs
investigating.
|
|
|
|
|
|
|
|
| |
After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).
Differential Revision: https://reviews.llvm.org/D72131
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This batch of intrinsics fills in all the shift instructions that take
a variable shift distance in a register, instead of an immediate. Some
of these instructions take a single shift distance in a scalar
register and apply it to all lanes; others take a vector of per-lane
distances.
These instructions are all basically one family, varying in whether
they saturate out-of-range values, and whether they round when bits
are shifted off the bottom. I've implemented them at the IR level by a
much smaller family of IR intrinsics, which take flag parameters to
indicate saturating and/or rounding (along with the usual one to
specify signed/unsigned integers).
An oddity is that all of them are //left// shift instructions – but if
you pass a negative shift count, they'll shift right. So the vector
shift distances are always vectors of //signed// integers, regardless
of whether you're considering the other input vector to be of signed
or unsigned. Also, even the simplest `vshlq` instruction in this
family (neither saturating nor rounding) has to be implemented as an
IR intrinsic, because the ordinary LLVM IR `shl` operation would
consider an out-of-range shift count to be undefined behavior.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72329
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This batch of intrinsics covers two sets of immediate shift
instructions, which have in common that they only overwrite part of
their output register and so they need an extra input giving its
previous value.
The VSLI and VSRI instructions shift each lane of the input vector
left or right just as if they were normal immediate VSHL/VSHR, but
then they only overwrite the output bits that correspond to actual
shifted bits of the input. So VSLI will leave the low n bits of each
output lane unchanged, and VSRI the same with the top n bits.
The V[Q][R]SHR[U]N family are all narrowing shifts: they take an input
vector of 2n-bit integers, shift each lane right by a constant, and
then narrowing the shifted result to only n bits. So they only
overwrite half of the n-bit lanes in the output register, and the B/T
suffix indicates whether it's the bottom or top half of each 2n-bit
lane.
I've implemented the whole of the latter family using a single IR
intrinsic `vshrn`, which takes a lot of i32 parameters indicating
which instruction it expands to (by specifying signedness of the input
and output types, whether it saturates and/or rounds, etc).
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72328
|
|
|
|
|
|
|
|
| |
Adds a pass to the ARM backend that takes a v4i32
gather and transforms it into a call to MVE's
masked gather intrinsics.
Differential Revision: https://reviews.llvm.org/D71743
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a recommit of D71330, but with a few things fixed and changed:
1) ReachingDefAnalysis: this was not running with optnone as it was checking
skipFunction(), which other analysis passes don't do. I guess this is a
copy-paste from a codegen pass.
2) VPTBlockPass: here I've added skipFunction(), because like most/all
optimisations, we don't want to run this with optnone.
This fixes the issues with the initial/previous commit: the VPTBlockPass was
running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was
crashing querying ReachingDefAnalysis.
I've added test case mve-vpt-block-optnone.mir to check that we don't run
VPTBlock with optnone.
Differential Revision: https://reviews.llvm.org/D71470
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.
These improvements cover architectures implementing ARMv5TE or Thumb-2.
Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers
Reviewed By: efriedma, nickdesaulniers
Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70072
|
|
|
|
|
|
|
|
| |
Follow-up of D72172.
Reviewed By: jhenderson, rnk
Differential Revision: https://reviews.llvm.org/D72180
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.
It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.
Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.
The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.
In any case, downstream projects which don't know `Address` can pass 0 as
the argument.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D72172
|
|
|
|
|
|
|
|
|
| |
The segmented stack lowering code appears to be using ARM opcodes under
Thumb2. The MRC opcode will be the same for Thumb and ARM, but t2LDR
seems wrong. Either way, using the correct thumb vs arm opcodes is more
correct.
Differential Revision: https://reviews.llvm.org/D72074
|
|
|
|
|
|
|
|
|
| |
We were previously unconditionally using the ARM::TRAP opcode, even
under Thumb. My understanding is that these are essentially the same
thing (they both result in a trap under Thumb), but the ARM::TRAP opcode
is marked as requiring IsARM, so it is more correct to use ARM::tTRAP.
Differential Revision: https://reviews.llvm.org/D72075
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Running an end-to-end test last week I noticed that a lot of the ACLE
intrinsics that operate differently on vectors of signed and unsigned
integers were ending up generating the signed version of the
instruction unconditionally. This is because the IR intrinsics had no
way to distinguish signed from unsigned: the LLVM type system just
calls them both `v8i16` (or whatever), so you need either separate
intrinsics for signed and unsigned, or a flag parameter that tells
ISel which one to choose.
This patch fixes all the problems of that kind that I've noticed, by
adding an i32 flag parameter to many of the IR intrinsics which is set
to 1 for unsigned (matching the existing practice in cases where we
got it right), and conditioning all the isel patterns on that flag. So
the fundamental change is in `IntrinsicsARM.td`, changing the
low-level IR intrinsics API; there are knock-on changes in
`arm_mve.td` (adjusting code gen for the ACLE intrinsics to use the
modified API) and in `ARMInstrMVE.td` (adjusting isel to expect the
new unsigned flags). The rest of this patch is boringly updating tests.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72270
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Due to a copy-paste error in the isel patterns, the predicated version
of this intrinsic was expanding to the `VMAXNMT.F32` instruction
instead of `VMAXNMT.F16`. Similarly for vminnm.
Reviewers: dmgreen, miyuki, MarkMurrayARM
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72269
|
|
|
|
|
|
|
|
| |
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D72143
Patch by Kazuaki Ishizaki.
|
|
|
|
|
|
|
|
|
|
| |
I've added a few more debug messages to MVETailPredication because I wanted to
trace better which instructions are added/removed. And while I was at it, I
factored out one function which I thought was clearer, and have added some
comments to describe better the flow between MVETailPredication and
ARMLowOverheadLoops.
Differential Revision: https://reviews.llvm.org/D71549
|
|
|
|
|
|
|
|
| |
ARMELFObjectWriter::addTargetSectionFlags
This simplifies the generic interface and also makes SHF_ARM_PURECODE
more robust (fixes a TODO). Inspecting MCDataFragment contents covers
more cases than MCObjectStreamer::EmitBytes.
|
|
|
|
|
|
|
|
|
|
|
| |
This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing
the target independent code to handle more folds in more situations (for
example if the fast math flags are present, but the global
AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx
into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately
if needed.
Differential Revision: https://reviews.llvm.org/D72139
|