| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
| |
We only try to promote types with are smaller than 16-bits, but we
also need to check that the type is not less than 8-bits.
Differential Revision: https://reviews.llvm.org/D50769
llvm-svn: 339770
|
| |
|
|
|
|
|
|
|
|
|
| |
Treat signed icmps as 'sinks', allowing them to be in the use-def
tree, enabling more promotions to be performed. As a sink, any
promoted incoming values need to be truncated before being used by
the signed icmp.
Differential Revision: https://reviews.llvm.org/D50067
llvm-svn: 339755
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add pointers to the list of allowed types, but don't try to promote
them. Also fixed a bug with the promotion of undef values, so a new
value is now created instead of mutating in place. We also now only
promote if there's an instruction in the use-def chains other than
the icmp, sinks and sources.
Differential Revision: https://reviews.llvm.org/D50054
llvm-svn: 339754
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`MachineMemOperand` pointers attached to `MachineSDNodes` and instead
have the `SelectionDAG` fully manage the memory for this array.
Prior to this change, the memory management was deeply confusing here --
The way the MI was built relied on the `SelectionDAG` allocating memory
for these arrays of pointers using the `MachineFunction`'s allocator so
that the raw pointer to the array could be blindly copied into an
eventual `MachineInstr`. This creates a hard coupling between how
`MachineInstr`s allocate their array of `MachineMemOperand` pointers and
how the `MachineSDNode` does.
This change is motivated in large part by a change I am making to how
`MachineFunction` allocates these pointers, but it seems like a layering
improvement as well.
This would run the risk of increasing allocations overall, but I've
implemented an optimization that should avoid that by storing a single
`MachineMemOperand` pointer directly instead of allocating anything.
This is expected to be a net win because the vast majority of uses of
these only need a single pointer.
As a side-effect, this makes the API for updating a `MachineSDNode` and
a `MachineInstr` reasonably different which seems nice to avoid
unexpected coupling of these two layers. We can map between them, but we
shouldn't be *surprised* at where that occurs. =]
Differential Revision: https://reviews.llvm.org/D50680
llvm-svn: 339740
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Intentionally excluding nodes from the DAGCombine worklist is likely to
lead to weird optimizations and infinite loops, so it's generally a bad
idea.
To avoid the infinite loops, fix DAGCombine to use the
isDesirableToCommuteWithShift target hook before performing the
transforms in question, and implement the target hook in the ARM backend
disable the transforms in question.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38530 . (I don't have a
reduced testcase for that bug. But we should have sufficient test
coverage for PerformSHLSimplify given that we're not playing weird
tricks with the worklist. I can try to bugpoint it if necessary,
though.)
Differential Revision: https://reviews.llvm.org/D50667
llvm-svn: 339734
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50511
llvm-svn: 339645
|
| |
|
|
| |
llvm-svn: 339546
|
| |
|
|
| |
llvm-svn: 339479
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.
Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.
The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)
According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.
Differential Revision: https://reviews.llvm.org/D50030
llvm-svn: 339472
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enabling ARMCodeGenPrepare by default caused a whole load of
failures. This is due to zexts and truncs not being handled properly.
ZExts are messy so it's just easier to disable for now and truncs
are allowed only as 'sinks'. I still need to figure out why allowing
them as 'sources' causes so many failures. The other main changes are
that we are explicit in the types that we converting to, it's now
always 'TypeSize'. Type support is also now performed while checking
for valid opcodes as it unnecessarily complicated having the checks
are different stages.
I've moved the tests around too, so we have the zext and truncs in
their own file as well as the overflowing opcode tests.
Differential Revision: https://reviews.llvm.org/D50518
llvm-svn: 339432
|
| |
|
|
|
|
|
|
|
| |
Enable `FeatureZCZeroing`, `FeatureHasSlowFPVMLx`, `FeatureExpandMLx`,
`FeatureProfUnpredicate`, `FeatureSlowVDUP32`, `FeatureSlowVGETLNi32`,
`FeatureSplatVFPToNeon`, `FeatureHasRetAddrStack`, `FeatureSlowFPBrcc` for
all Exynos processors.
llvm-svn: 339356
|
| |
|
|
|
|
|
| |
Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a
processor check. Otherwise, NFC.
llvm-svn: 339354
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50454
llvm-svn: 339340
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally, if any registers are spilled, we prefer to spill lr on Thumb1
so we can fold the "bx lr" into the "pop". However, if there are tail
calls involved, restoring lr is expensive, so skip the optimization in
that case.
The spill of r7 in the new test also isn't necessary, but that's
mostly orthogonal to this patch. (It's the same code in
ARMFrameLowering, but it's not related to tail calls.)
Differential Revision: https://reviews.llvm.org/D49459
llvm-svn: 339283
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50427
llvm-svn: 339241
|
| |
|
|
|
|
|
|
| |
This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants.
Differential Revision: https://reviews.llvm.org/D50329
llvm-svn: 339238
|
| |
|
|
|
|
|
|
| |
This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants.
Differential Revision: https://reviews.llvm.org/D50326
llvm-svn: 339232
|
| |
|
|
|
|
|
|
| |
This adds codegen support for the different vcvt_f16 variants.
Differential Revision: https://reviews.llvm.org/D50393
llvm-svn: 339227
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50238
llvm-svn: 339221
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D50236
llvm-svn: 339148
|
| |
|
|
|
|
|
|
|
| |
ld64 supplies its own Thumb bit for Thumb functions, and intentionally zeroes
out that part of any addend in an object file. But it only does that for
symbols marked N_EXT -- i.e. external symbols. So LLVM should avoid setting
that extra bit in other cases.
llvm-svn: 339007
|
| |
|
|
|
|
|
|
| |
This is addressing PR38404.
Differential Revision: https://reviews.llvm.org/D50186
llvm-svn: 338835
|
| |
|
|
|
|
| |
This is addressing PR38404.
llvm-svn: 338830
|
| |
|
|
|
|
|
|
|
|
| |
Value
This is logical continuation of https://reviews.llvm.org/D46018 (r332449)
Differential Revision: https://reviews.llvm.org/D49660
llvm-svn: 338685
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Disable ARMCodeGenPrepare by default again. It is causing verifier
failues in V8 that look like:
Duplicate integer as switch case
switch i32 %trunc, label %if.end13 [
i32 0, label %cleanup36
i32 0, label %if.then8
], !dbg !4981
i32 0
fatal error: error in backend: Broken function found, compilation aborted!
I will continue reducing the test case and send it along.
llvm-svn: 338452
|
| |
|
|
|
|
|
|
| |
This matches GAS, that allows unsuffixed .inst for thumb.
Differential Revision: https://reviews.llvm.org/D49937
llvm-svn: 338357
|
| |
|
|
|
|
|
|
|
|
| |
Contrary to ELF, we don't add any markers that distinguish data generated
with .short/.long from normal instructions, so the .inst directive only
adds compatibility with assembly that uses it.
Differential Revision: https://reviews.llvm.org/D49936
llvm-svn: 338356
|
| |
|
|
|
|
|
| |
Re-enabling ARMCodeGenPrepare by default after failing to reproduce
the bootstrap issues that I was concerned it was causing.
llvm-svn: 338354
|
| |
|
|
|
|
| |
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}
llvm-svn: 338293
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix read of uninitialized RC variable in ARM's PrintAsmOperand when
hasRegClassConstraint returns false. This was causing
inline-asm-operand-implicit-cast test to fail in r338206.
Reviewers: t.p.northover, weimingz, javed.absar, chill
Reviewed By: chill
Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits
Differential Revision: https://reviews.llvm.org/D49984
llvm-svn: 338268
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling
homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up
fully on stack, the function tries to pack all resulting items of the
aggregate as tightly as possible according to AAPCS.
Once the first item was laid out, the alignment used for consecutive
items was the size of one item. This logic went wrong for 128-bit
vectors because their alignment is normally only 64 bits, and so could
result in inserting unexpected padding between the first and second
element.
The patch fixes the problem by updating the alignment with the item size
only if this results in reducing it.
Differential Revision: https://reviews.llvm.org/D49720
llvm-svn: 338233
|
| |
|
|
|
|
|
|
| |
This seems like a pretty glaring omission, and AMDGPU
wants to treat kernels differently from other calling
conventions.
llvm-svn: 338194
|
| |
|
|
|
|
|
|
|
|
| |
This feature enables the fusion of such operations on Cortex A57 and Cortex
A72, as recommended in their Software Optimisation Guides, sections 4.14 and
4.11, respectively.
Differential revision: https://reviews.llvm.org/D49563
llvm-svn: 338147
|
| |
|
|
| |
llvm-svn: 337952
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.
With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:
fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4
Differential Revision: https://reviews.llvm.org/D49644
llvm-svn: 337950
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Saves materializing the immediate for the "ands".
Corresponding patterns exist for lsrs+lsls, but that seems less common
in practice.
Now implemented as a DAGCombine.
Differential Revision: https://reviews.llvm.org/D49585
llvm-svn: 337945
|
| |
|
|
|
|
|
|
| |
ARM Stage 2 builders have been suspiciously broken since the pass was
committed. Disabling to hopefully fix the bots and give me time to
debug.
llvm-svn: 337821
|
| |
|
|
| |
llvm-svn: 337714
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
OpChain has subclasses, so add a virtual destructor.
This fixes an issue when deleting subclasses of OpChain (see MatchSMLAD() specifically) in r337701.
Reviewers: javed.absar
Subscribers: llvm-commits, SjoerdMeijer, samparker
Differential Revision: https://reviews.llvm.org/D49681
llvm-svn: 337713
|
| |
|
|
|
|
| |
Fix double-free.
llvm-svn: 337711
|
| |
|
|
|
|
|
| |
Attempt to fix the leak introduced in r337687 and make sanitizer
buildbots green again.
llvm-svn: 337709
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparing to allow ARMParallelDSP pass to parallelise more than
smlads, I've restructed some elements:
- The ParallelMAC struct has been renamed to BinOpChain.
- The BinOpChain struct holds two value lists: LHS and RHS, as well
as inheriting from the OpChain base class.
- The OpChain struct holds all the values of the represented chain
and has had the memory locations functionality inserted into it.
- ParallelMACList becomes OpChainList and it now holds pointers
instead of objects.
Differential Revision: https://reviews.llvm.org/D49020
llvm-svn: 337701
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Arm specific codegen prepare is implemented to perform type promotion
on icmp operands, which can enable the removal of uxtb and uxth
(unsigned extend) instructions. This is possible because performing
type promotion before ISel alleviates this duty from the DAG builder
which has to perform legalisation, but has a limited view on data
ranges.
The pass visits any instruction operand of an icmp and creates a
worklist to traverse the use-def tree to determine whether the values
can simply be promoted. Our concern is values in the registers
overflowing the narrow (i8, i16) data range, so instructions marked
with nuw can be promoted easily. For add and sub instructions, we are
able to use the parallel dsp instructions to operate on scalar data
types and avoid overflowing bits. Underflowing adds and subs are also
permitted when the result is only used by an unsigned icmp.
Differential Revision: https://reviews.llvm.org/D48832
llvm-svn: 337687
|
| |
|
|
|
|
|
|
|
| |
Enable the optimization of operations on DPR and SPR via a feature instead
of checking the target.
Differential revision: https://reviews.llvm.org/D49463
llvm-svn: 337575
|
| |
|
|
|
|
|
|
|
| |
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.
Recommitted without breaking ELF this time.
llvm-svn: 337450
|
| |
|
|
|
|
| |
This reverts commit r337385 until it can be targeted at MachO only.
llvm-svn: 337424
|
| |
|
|
|
|
|
| |
Since the triple's default is hard float, the libcalls will already use VFP
registers.
llvm-svn: 337386
|
| |
|
|
|
|
|
| |
We were emitting incorrect calls to libm functions that LLVM had decided it
knew about because the default is soft-float.
llvm-svn: 337385
|
| |
|
|
|
|
|
|
| |
ARMSubtarget had a copy/pasted block to determine whether the target was
hard-float, but it just delegated to triple features anyway so it's better at
the TargetMachine level.
llvm-svn: 337384
|
| |
|
|
|
|
|
|
|
|
| |
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.
Differential revision: https://reviews.llvm.org/D49125
llvm-svn: 336795
|