| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit starts with a "git mv ARM64 AArch64" and continues out
from there, renaming the C++ classes, intrinsics, and other
target-local objects for consistency.
"ARM64" test directories are also moved, and tests that began their
life in ARM64 use an arm64 triple, those from AArch64 use an aarch64
triple. Both should be equivalent though.
This finishes the AArch64 merge, and everyone should feel free to
continue committing as normal now.
llvm-svn: 209577
|
|
|
|
|
|
| |
with swapped input vectors.
llvm-svn: 209495
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Povray and dealII currently assert with "Overran sorted position" in
AssignTopologicalOrder. The problem is that performPostLD1Combine can
introduce cycles.
Consider:
(insert_vector_elt (INSERT_SUBREG undef,
(load (add %vreg0, Constant<8>), undef), <= A
TargetConstant<2>),
(load %vreg0, undef), <= B
Constant<1>)
This is turned into a LD1LANEpost node. However the address in A is not a
valid user of the post-incremented address of B in LD1LANEpost.
llvm-svn: 209242
|
|
|
|
|
|
|
|
|
|
| |
bswap not.
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.
llvm-svn: 209123
|
|
|
|
|
|
|
|
|
|
| |
This is mostly a mechanical change changing all the call sites to the newer
chained-function construction pattern. This removes the horrible 15-parameter
constructor for the CallLoweringInfo in favour of setting properties of the call
via chained functions. No functional change beyond the removal of the old
constructors are intended.
llvm-svn: 209082
|
|
|
|
|
|
|
|
|
| |
This is a preliminary step to help ease the construction of CallLoweringInfo.
Changing the construction to a chained function pattern requires that the
parameter be nullable. However, rather than copying the vector, save a pointer
rather than the reference to permit a late binding of the arguments.
llvm-svn: 209080
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r208934.
The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.
The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.
llvm-svn: 208978
|
|
|
|
| |
llvm-svn: 208955
|
|
|
|
|
|
|
|
|
|
|
| |
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.
For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.
llvm-svn: 208934
|
|
|
|
|
|
| |
argument stack from callee.
llvm-svn: 208837
|
|
|
|
|
|
| |
inappropriate since it lost its Mask parameter in r154011.
llvm-svn: 208811
|
|
|
|
|
|
|
|
|
|
|
| |
Normally, patterns like (add x, (setcc cc ...)) will be folded into
(csel x, x+1, not cc). However, if there is a ZEXT after SETCC, they
won't be folded. This patch recognizes the ZEXT and allows the
generation of CSINC.
This patch fixes bug 19680.
llvm-svn: 208660
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We must validate the value type in TLI::getRegisterByName, because if we
don't and the wrong type was used with the IR intrinsic, then we'll assert
(because we won't be able to find a valid register class with which to
construct the requested copy operation). For PPC64, additionally, the type
information is necessary to decide between the 64-bit register and the 32-bit
subregister.
No functionality change.
llvm-svn: 208508
|
|
|
|
|
|
|
|
|
|
| |
We were swapping the true & false results while testing for FMAX/FMIN,
but not putting them back to the original state if the later checks
failed.
Should fix PR19700.
llvm-svn: 208469
|
|
|
|
|
|
| |
ARM64 backend.
llvm-svn: 208284
|
|
|
|
|
|
|
|
|
| |
When performing a scalar comparison that feeds into a vector select,
it's actually better to do the comparison on the vector side: the
scalar route would be "CMP -> CSEL -> DUP", the vector is "CM -> DUP"
since the vector comparisons are all mask based.
llvm-svn: 208210
|
|
|
|
| |
llvm-svn: 208199
|
|
|
|
|
|
|
|
|
|
| |
This completes the port of r204814 (cpirker "AArch64_BE function argument
passing for ARM ABI") from AArch64 to ARM64, and fixes a bunch of issues
found during later development along the way. The biggest of these was
that the alignment fixup logic wasn't replicated into all the places it
should have been.
llvm-svn: 208192
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements the infrastructure to use named register constructs in
programs that need access to specific registers (bare metal, kernels, etc).
So far, only the stack pointer is supported as a technology preview, but as it
is, the intrinsic can already support all non-allocatable registers from any
architecture.
llvm-svn: 208104
|
|
|
|
|
|
| |
This is the modification in llvm part.
llvm-svn: 208074
|
|
|
|
|
|
|
|
| |
While post-indexed LD1/ST1 instructions do exist for vector loads,
this patch makes use of the more flexible addressing-modes in LDR/STR
instructions.
llvm-svn: 207838
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For pattern like ((x >> C1) & Mask) << C2, DAG combiner may convert it
into (x >> (C1-C2)) & (Mask << C2), which makes pattern matching of ubfx
more difficult.
For example:
Given
%shr = lshr i64 %x, 4
%and = and i64 %shr, 15
%arrayidx = getelementptr inbounds [8 x [64 x i64]]* @arr, i64 0, %i64 2, i64 %and
%0 = load i64* %arrayidx
With current shift folding, it takes 3 instrs to compute base address:
lsr x8, x0, #1
and x8, x8, #0x78
add x8, x9, x8
If using ubfx, it only needs 2 instrs:
ubfx x8, x0, #4, #4
add x8, x9, x8, lsl #3
This fixes bug 19589
llvm-svn: 207702
|
|
|
|
|
|
|
|
|
|
|
|
| |
AArch64 does not have a CPSR register in the same way that AArch32 does. Most
of its compiler-relevant roles have been taken over by the more specific NZCV
register (representing just the flags set by normal instructions).
Its system control functions still remain, but are now under the
pseudo-register referred to as "PSTATE". They're accessed via various MRS & MSR
instructions described in the reference manual.
llvm-svn: 207645
|
|
|
|
|
|
|
|
|
| |
On instructions using the NZCV register, a couple of conditions have dual
representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and
unsigned-lower/carry-clear). The first of these is more descriptive in most
circumstances, so we should print it.
llvm-svn: 207644
|
|
|
|
|
|
|
| |
v2f32 and v4f32 were missed out of these conditions, so this is also
a bugfix.
llvm-svn: 207628
|
|
|
|
|
|
| |
introduced most of these recently.
llvm-svn: 207616
|
|
|
|
|
|
| |
is introduced by r207485.
llvm-svn: 207500
|
|
|
|
|
|
| |
E.g. Mask like <-1, -1, 1, ...> will generate incorrect EXT index.
llvm-svn: 207485
|
|
|
|
| |
llvm-svn: 207374
|
|
|
|
| |
llvm-svn: 207327
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.
I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.
llvm-svn: 207315
|
|
|
|
| |
llvm-svn: 207313
|
|
|
|
|
|
|
|
|
|
| |
no-fp test.
This patch is a supplement of implementing predicate of FP, enabling aarch64 backend
no-fp tests on arm64 target for verification. During this, one bug is exposed and
fixed by this patch.
llvm-svn: 207215
|
|
|
|
| |
llvm-svn: 207197
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur. It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.
Reviewers: nicholas
Differential Revision: http://llvm-reviews.chandlerc.com/D3240
llvm-svn: 207143
|
|
|
|
|
|
|
|
| |
AArch64 has feature predicates for NEON, FP and CRYPTO instructions.
This allows the compiler to generate code without using FP, NEON
or CRYPTO instructions.
llvm-svn: 206949
|
|
|
|
| |
llvm-svn: 206888
|
|
|
|
|
|
|
| |
definition below all of the header #include lines, lib/Target/...
edition.
llvm-svn: 206842
|
|
|
|
|
|
| |
instruction
llvm-svn: 206774
|
|
|
|
| |
llvm-svn: 206749
|
|
|
|
|
|
|
|
| |
Original commit message:
Implement builtins for safe division: safe.sdiv.iN, safe.udiv.iN,
safe.srem.iN, safe.urem.iN (iN = i8, i61, i32, or i64).
llvm-svn: 206735
|
|
|
|
|
|
| |
safe.urem.iN (iN = i8, i16, i32, or i64).
llvm-svn: 206732
|
|
|
|
| |
llvm-svn: 206591
|
|
|
|
|
|
|
| |
We couldn't cope if the first mask element was UNDEF before, which
isn't ideal.
llvm-svn: 206588
|
|
|
|
|
|
|
|
|
|
| |
Code mostly copied from AArch64, just tidied up a trifle and plumbed
into the ARM64 way of doing things.
This also enables the AArch64 tests which inspired the previous
untested commits.
llvm-svn: 206574
|
|
|
|
|
|
|
| |
Tests will be coming very shortly when all the optimisations needed to
support AArch64's neon-copy.ll file are committed.
llvm-svn: 206572
|
|
|
|
|
|
|
|
|
|
|
|
| |
ARM64 was scalarizing some vector comparisons which don't quite map to
AArch64's compare and mask instructions. AArch64's approach of sacrificing a
little efficiency to emulate them with the limited set available was better, so
I ported it across.
More "inspired by" than copy/paste since the backend's internal expectations
were a bit different, but the tests were invaluable.
llvm-svn: 206570
|
|
|
|
|
|
|
|
|
|
|
| |
I enhanced it a little in the process. The decision shouldn't really be beased
on whether a BUILD_VECTOR is a splat: any set of constants will do the job
provided they're related in the correct way.
Also, the BUILD_VECTOR could be any operand of the incoming AND nodes, so it's
best to check for all 4 possibilities rather than assuming it'll be the RHS.
llvm-svn: 206569
|
|
|
|
|
|
|
|
| |
It's not actually used to handle C or C++ ABI rules on ARM64, but could well be
emitted by other language front-ends, so it's as well to have a sensible
implementation.
llvm-svn: 206568
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves the performance of vector creation in caseiswhere where
several of the lanes in the vector are a constant floating point value. It
also includes new patterns to fold together some of the instructions when the
value is 0.0f. Test cases included.
rdar://16349427
llvm-svn: 206496
|