| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
| |
This teaches the cost model that the sext or zext of a load is going to be
free.
Differential Revision: https://reviews.llvm.org/D66006
llvm-svn: 368593
|
| |
|
|
|
|
|
|
|
|
|
| |
A VDUP will perform a vector broadcast in a single instruction. Update the cost
model for MVE accordingly.
Code originally by David Sherwood.
Differential Revision: https://reviews.llvm.org/D63448
llvm-svn: 368589
|
| |
|
|
|
|
|
|
|
| |
This puts some of the calls in ARMTargetTransformInfo.cpp behind hasNeon()
checks, now that we have MVE, and updates all the tests accordingly.
Differential Revision: https://reviews.llvm.org/D63447
llvm-svn: 368587
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to the nature of the beat system in the MVE architecture, along with tail
predication and low-overhead loops, unrolling has less benefit compared to
normal loops. You can not, for example, hide the latency of a load with other
instructions as you can for scalar code. Preventing unrolling also makes the
code easier to read and reason about.
So if a loop contains vector code, don't enable the runtime unrolling. At least
for the time being.
Differential Revision: https://reviews.llvm.org/D65803
llvm-svn: 368530
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
With enough codegen complete, we can now correctly report the number and size
of vector registers for MVE, allowing auto vectorisation. This also allows FP
auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE
FP arithmetic and parity between scalar and vector FP behaviour.
Patch by David Sherwood.
Differential Revision: https://reviews.llvm.org/D63728
llvm-svn: 368529
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.
%1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
%2:_(s32) = G_ASHR %1:_(s32), i32 24
%3:_(s32) = G_ASHR %2:_(s32), i32 1
would reasonably combine to:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 25
which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:
%2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
%3:_(s32) = G_ASHR %2:_(s32), i32 1
It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 24
%3:_(s32) = G_MUL %2:_(s32), i32 2
can become:
%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 23
All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.
To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:
%1 = G_CONSTANT 16
%2 = G_SEXT_INREG %0, %1
is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:
%2 = G_SEXT_INREG %0, 16
or:
%1 = G_CONSTANT 16
%2:_(s32) = G_SHL %0, %1
%3:_(s32) = G_ASHR %2, %1
depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.
Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61289
llvm-svn: 368487
|
| |
|
|
|
|
|
|
|
| |
I've now needed to add an extra parameter to this call twice recently. Not only
is the signature getting extremely unwieldy, but just updating all of the
callsites and implementations is a pain. Putting the parameters in a struct
sidesteps both issues.
llvm-svn: 368408
|
| |
|
|
|
|
|
|
|
|
|
| |
As loads are combined and widened, we replaced their sext users
operands whereas we should have been replacing the uses of the sext.
I've added a load of tests, with only a few of them originally
causing assertion failures, the rest improve pattern coverage.
Differential Revision: https://reviews.llvm.org/D65740
llvm-svn: 368404
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This adds pre- and post- increment and decrements for MVE loads and stores. It
uses the builtin pre and post load/store detection, unlike Neon. Loads are
selected with the code in tryT2IndexedLoad, stores are selected with tablegen
patterns. The immediates have a +/-7bit range, multiplied by the size of the
element.
Differential Revision: https://reviews.llvm.org/D63840
llvm-svn: 368305
|
| |
|
|
|
|
|
|
|
|
|
| |
This adds some missing patterns for big endian loads/stores, allowing unaligned
loads/stores to also be selected with an extra VREV, which produces better code
than aligning through a stack. Also moves VLDR_P0 to not be LE only, and
adjusts some of the tests to show all that working.
Differential Revision: https://reviews.llvm.org/D65583
llvm-svn: 368304
|
| |
|
|
| |
llvm-svn: 368264
|
| |
|
|
|
|
|
|
|
|
|
|
| |
VLDRH needs to have an alignment of at least 2, including the
widening/narrowing versions. This tightens up the ISel patterns for it and
alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded
through the stack. It also fixed some incorrect shift amounts, which seemed to
be passing a multiple not a shift.
Differential Revision: https://reviews.llvm.org/D65580
llvm-svn: 368256
|
| |
|
|
| |
llvm-svn: 368180
|
| |
|
|
| |
llvm-svn: 368146
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we check whether LR is stored/loaded to/from inbetween the
loop decrement and loop end pseudo instructions. There's two problems
here:
- It relies on all load/store instructions being labelled as such in
tablegen.
- Actually any use of loop decrement is troublesome because the value
doesn't exist!
So we need to check for any read/write of LR that occurs between the
two instructions and revert if we find anything.
Differential Revision: https://reviews.llvm.org/D65792
llvm-svn: 368130
|
| |
|
|
| |
llvm-svn: 367974
|
| |
|
|
|
|
|
|
|
| |
isIncomingArgumentHandler()
Previous name and comment incorrectly implied it was just for formal arg handlers,
which is not true.
llvm-svn: 367945
|
| |
|
|
|
|
|
| |
Don't assume format loads for f16. Also fixes support for targets
without i16.
llvm-svn: 367879
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, jfb, jakehehrlich
Reviewed By: jfb
Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65514
llvm-svn: 367828
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an explicit construction of the ArrayRef, gcc 5 and earlier don't
seem to select the ArrayRef constructor which takes a C array when the
construction is implicit.
Original commit message:
- Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves
with a null RegScavenger. Simply not updating the register scavenger
is fine because IPRA only cares about the SavedRegs vector, the acutal
code of the function has already been generated at this point.
- Add a new hook to TargetRegisterInfo to get the set of registers which
can be clobbered inside a call, even if the compiler can see both
sides, by linker-generated code.
Differential revision: https://reviews.llvm.org/D64908
llvm-svn: 367819
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds big endian MVE patterns for bitcasts. They are defined in llvm as
being the same as a store of the existing type and the load into the new. This
means that they have to become a VREV between the two types, working in the
same way that NEON works in big-endian. This also adds some example tests for
bigendian, showing where code is and isn't different.
The main difference, especially from a testing perspective is that vectors are
passed as v2f64, and so are VREV into and out of call arguments, and the
parameters are passed in a v2f64 format. Same happens for inline assembly where
the register class is used, so it is VREV to a v16i8.
So some of this is probably not correct yet, but it is (mostly) self-consistent
and seems to be consistent with how llvm treats vectors. The rest we can
hopefully fix later. More details about big endian neon can be found in
https://llvm.org/docs/BigEndianNEON.html.
Differential Revision: https://reviews.llvm.org/D65581
llvm-svn: 367780
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix for https://bugs.llvm.org/show_bug.cgi?id=42760. A tBR_JTr
instruction is duplicated by tail duplication, which results in
the same jumptable with the same label being emitted twice.
Fix this by marking tBR_JTr as not duplicable. The corresponding
ARM/Thumb instructions are already marked as not duplicable.
Additionally also mark tTBB_JT and tTBH_JT to be consistent with
Thumb2, even though this shouldn't be strictly necessary.
Differential Revision: https://reviews.llvm.org/D65606
llvm-svn: 367753
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.
Reviewers: joerg, mgorny, efriedma, rsmith
Reviewed By: joerg
Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60942
llvm-svn: 367750
|
| |
|
|
|
|
|
|
| |
This reverts r367669 (git commit f6b00c279a5587a25876752a6ecd8da0bed959dc)
This was breaking a build bot http://lab.llvm.org:8011/builders/netbsd-amd64/builds/21233
llvm-svn: 367731
|
| |
|
|
| |
llvm-svn: 367683
|
| |
|
|
|
|
|
|
|
|
|
| |
This optimisation isn't generally profitable for ARM, because we can
save/restore many registers in the prologue and epilogue using the PUSH
and POP instructions, but mostly use individual LDR/STR instructions for
other spills.
Differential revision: https://reviews.llvm.org/D64910
llvm-svn: 367670
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Avoid a crash when IPRA calls ARMFrameLowering::determineCalleeSaves
with a null RegScavenger. Simply not updating the register scavenger
is fine because IPRA only cares about the SavedRegs vector, the acutal
code of the function has already been generated at this point.
- Add a new hook to TargetRegisterInfo to get the set of registers which
can be clobbered inside a call, even if the compiler can see both
sides, by linker-generated code.
Differential revision: https://reviews.llvm.org/D64908
llvm-svn: 367669
|
| |
|
|
|
|
|
| |
Remove forward declaration, fold a couple of typedefs and change one
to be more useful.
llvm-svn: 367665
|
| |
|
|
|
|
| |
We only care about the first element in the list.
llvm-svn: 367660
|
| |
|
|
|
|
| |
llvm::Register as started by r367614. NFC
llvm-svn: 367633
|
| |
|
|
|
|
|
|
|
|
|
| |
The VREV64 instruction is apparently unpredictable if Qd == Qm, due to the
cross-beat nature of the instruction. This adds an earlyclobber to Qd, which
seems to be the same way we deal with this on other instructions like the
write-back on loads and stores.
Differential Revision: https://reviews.llvm.org/D65502
llvm-svn: 367544
|
| |
|
|
|
|
|
| |
Add a couple of getters for Reduction and do some renaming of
variables around CreateSMLAD for clarity.
llvm-svn: 367522
|
| |
|
|
|
|
|
|
|
| |
This is extremely specific, but saves three instructions when it's
legal. I don't think the code can be usefully generalized.
Differential Revision: https://reviews.llvm.org/D65351
llvm-svn: 367492
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Thumb1 has very limited immediate modes, so turning an "and" into a
shift can save multiple instructions.
It's possible to simplify the generated code for test2 and test3 in
cmp-and-fold.ll a little more, but I'll implement that as a followup.
Differential Revision: https://reviews.llvm.org/D65175
llvm-svn: 367491
|
| |
|
|
|
|
|
| |
Preserve the nullptr default for KnownCallees that appears in
the base class.
llvm-svn: 367477
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This will make it possible to improve IPRA by taking into account
register usage in indirect calls.
NFC yet; this is just laying the groundwork to start building
up patches to take advantage of the information for improved register
allocation.
Reviewers: aditya_nandakumar, volkan, qcolombet, arsenm, rovka, aemerson, paquette
Subscribers: sdardis, wdng, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65488
llvm-svn: 367476
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
According to the Armv8.1-M manual CSEL, CSINC, CSINV and CSNEG are
"constrained unpredictable" when SP is used as the source register Rn.
The assembler should diagnose this case.
Reviewers: momchil.velikov, dmgreen, ostannard, simon_tatham, t.p.northover
Reviewed By: ostannard
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65505
llvm-svn: 367433
|
| |
|
|
| |
llvm-svn: 367408
|
| |
|
|
| |
llvm-svn: 367405
|
| |
|
|
|
|
|
|
| |
Use a switch instead of many isa<> while checking for supported
values. Also be explicit about which cast instructions are supported;
This allows the removal of SIToFP from GenerateSignBits.
llvm-svn: 367402
|
| |
|
|
|
|
|
|
| |
Run across a whole function, visiting each basic block one at a time.
Differential Revision: https://reviews.llvm.org/D65324
llvm-svn: 367389
|
| |
|
|
|
|
|
|
|
| |
The code is now in a good enough state to pass the bunch of tests that
I have run (after fixing the bugs), so let's enable it by default.
Differential Revision: https://reviews.llvm.org/D65277
llvm-svn: 367297
|
| |
|
|
|
|
|
|
|
| |
Revert the hardware loop upon finding a LoopEnd that doesn't target
the loop header, instead of asserting a failure.
Differential Revision: https://reviews.llvm.org/D65268
llvm-svn: 367296
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- Remove some unused typedefs.
- Rename BinOpChain struct to MulCandidate.
- Remove the size method of MulCandidate.
- Store only the first input of the ValueList provided to
MulCandidate, as it's the only value we care about. This means we
don't have to perform any ugly (and unnecessary) iterations of the
list later on.
llvm-svn: 367208
|
| |
|
|
|
|
|
| |
We explicitly search for a parallel mac and we only care about its
inputs, checking for symmetry doesn't add anything here.
llvm-svn: 367205
|
| |
|
|
|
|
|
|
| |
We no longer have to check what loads are used, all this
is performed at the start of the transform, so it's not
doing anything now.
llvm-svn: 367204
|
| |
|
|
|
|
|
|
|
|
| |
This adds the patterns required to transform xor P0, -1 to a VPNOT. The
instruction operands have to change a little for this, adding an in and an out
VCCR reg and using a custom DecodeMVEVPNOT for the decode.
Differential Revision: https://reviews.llvm.org/D65133
llvm-svn: 367192
|
| |
|
|
|
|
|
|
|
|
| |
These are some better patterns for converting between predicates and floating
points. Much like the extends, we select "1"/"-1" or "0" depending on the
predicate value. Or we perform a compare against 0 to convert to a predicate.
Differential Revision: https://reviews.llvm.org/D65103
llvm-svn: 367191
|
| |
|
|
|
|
|
| |
Combine OpChain and BinOpChain structs as OpChain is a base class to
BinOpChain that is never used.
llvm-svn: 367114
|
| |
|
|
|
|
| |
Remove unused logic.
llvm-svn: 367099
|