| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 258810
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
EltsFromConsecutiveLoads
This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load).
It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads.
After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion.
Differential Revision: http://reviews.llvm.org/D16217
llvm-svn: 258798
|
| |
|
|
|
|
| |
Their opcodes are used as part of the VEX prefix in 64-bit mode. Clearly the disassembler implicitly decoded them as AVX instructions in 64-bit mode, but I think the AsmParser would have encoded them.
llvm-svn: 258793
|
| |
|
|
| |
llvm-svn: 258790
|
| |
|
|
|
|
|
|
|
|
| |
Make comments and indentation more consistent.
Rearrange a few things to be in a more consistent order,
such as organizing subtarget features from those describing
an actual device property, and those used as options.
llvm-svn: 258789
|
| |
|
|
|
|
|
|
| |
Old intrinsics were forcing these, but they have now all
been removed. This fixes large i8 vector operations generally
being broken.
llvm-svn: 258788
|
| |
|
|
|
|
|
|
|
|
|
| |
I did my best to try to update all the uses in tests that
just happened to use the old ones to the newer intrinsics.
I'm not sure I got all of the immediate operand conversions
correct, since the value seems to have been ignored by the
old pattern but I don't think it really matters.
llvm-svn: 258787
|
| |
|
|
|
|
|
| |
More cleanup to try to get all intrinsics using the correct
amdgcn prefix that are as close to the instruction as possible.
llvm-svn: 258786
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some of the special intrinsics now that now correspond to a instruction
also have special setting of some registers, e.g. llvm.SI.sendmsg sets
m0 as well as use s_sendmsg. Using these explicit register intrinsics
may be a better option.
Reading the exec mask and others may be useful for debugging. For this
I'm not sure this is entirely correct because we would want this to
be convergent, although it's possible this is already treated
sufficently conservatively.
llvm-svn: 258785
|
| |
|
|
|
|
| |
Also move into backend intrinsics to discourage use of the old name.
llvm-svn: 258783
|
| |
|
|
|
|
|
|
| |
These calls return their first argument, but because LLVM uses an intrinsic
with a void return type, they can't use the returned attribute. Generalize
the store results pass to optimize these calls too.
llvm-svn: 258781
|
| |
|
|
| |
llvm-svn: 258780
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16534
llvm-svn: 258779
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Step one towards using a simple binary search to lookup intrinsic IDs
instead of our crazy table generated switch+memcmp+startswith code that
makes Function.cpp take about a minute to compile. See PR24785 and
PR11951 for why we should do this.
The X86 backend contains tables that need to be sorted on intrinsic ID,
so reorder those.
llvm-svn: 258757
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's a special case in EmitLoweredSelect() that produces an improved
lowering for cmov(cmov) patterns. However this special lowering is
currently broken if the inner cmov has multiple users so this patch
stops using it in this case.
If you wonder why this wasn't fixed by continuing to use the special
lowering and inserting a 2nd PHI for the inner cmov: I believe this
would incur additional copies/register pressure so the special lowering
does not improve upon the normal one anymore in this case.
This fixes http://llvm.org/PR26256 (= rdar://24329747)
llvm-svn: 258729
|
| |
|
|
|
|
|
|
| |
Its main use is to allow memory folding of the 1st operand
Differential Revision: http://reviews.llvm.org/D16521
llvm-svn: 258726
|
| |
|
|
|
|
|
| |
Instructions can be DCE'd after the RegStackify pass. If the instruction which
would be the pop for what would be a push is removed, don't use a push.
llvm-svn: 258694
|
| |
|
|
| |
llvm-svn: 258692
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16520
llvm-svn: 258688
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16519
llvm-svn: 258686
|
| |
|
|
|
|
|
|
| |
This patch was originally committed as r257885, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258683
|
| |
|
|
|
|
|
|
| |
This patch was originally committed as r257884, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258682
|
| |
|
|
|
|
|
|
| |
This patch was originally committed as r257883, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258681
|
| |
|
|
|
|
|
|
|
|
|
| |
52bit integer
VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators
Differential Revision: http://reviews.llvm.org/D16407
llvm-svn: 258680
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was originally committed as r255762, but reverted as it broke windows
bots. Re-commitiing the exact same patch, as the underlying cause was fixed by
r258677.
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.
These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.
Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.
New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.
Differential Revision: http://reviews.llvm.org/D15038
llvm-svn: 258678
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=22796.
The previous implementation of ClassInfo::operator< allowed cycles of classes
such that x < y < z < x, meaning that a list of them cannot be correctly
sorted, and the sort order could differ with different standard libraries.
The original implementation sorted classes by ValueName if they were otherwise
equal. This isn't strictly necessary, but some backends seem to accidentally
rely on it. If I reverse this comparison I get 8 test failures spread across
the AArch64, Mips and X86 backends, so I have left it in until those backends
can be fixed.
There was one case in the X86 backend where the observable behaviour of the
assembler is changed by this patch. This was because some of the memory asm
operands were not marked as children of X86MemAsmOperand.
Differential Revision: http://reviews.llvm.org/D16141
llvm-svn: 258677
|
| |
|
|
| |
llvm-svn: 258676
|
| |
|
|
|
|
|
|
| |
Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q).
Differential Revision: http://reviews.llvm.org/D16528
llvm-svn: 258675
|
| |
|
|
|
|
|
|
|
|
|
| |
X86AsmParser.cpp is missing full comparison predicate names for CMPPD and CMPPS Instructions.
X86AsmParser.cpp defines only the short names of the Comparison predicate that you can find in the following pdf:
https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf
Page 5-61 table 5-3
Differential Revision: http://reviews.llvm.org/D16518
llvm-svn: 258671
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Changes in X86.td:
I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X ..
I added Skylake client processor and defined it's features
FeatureADX was missing on KNL
Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others
Differential Revision: http://reviews.llvm.org/D16357
llvm-svn: 258659
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D16137
llvm-svn: 258657
|
| |
|
|
|
|
| |
Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types.
llvm-svn: 258645
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously, we would just output "foo = bar" in the assembly, and then
ptxas would choke. Now we die before emitting any invalid code.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, jhen, tra
Differential Revision: http://reviews.llvm.org/D16490
llvm-svn: 258638
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Before:
.func (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv(
)
{
After:
.func (.param .b32 func_retval0) _ZL21__nvvm_reflect_anchorv()
{
Reviewers: bkramer
Subscribers: llvm-commits, tra, jhen, echristo, jholewinski
Differential Revision: http://reviews.llvm.org/D16512
llvm-svn: 258637
|
| |
|
|
| |
llvm-svn: 258626
|
| |
|
|
| |
llvm-svn: 258624
|
| |
|
|
|
|
| |
If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies.
llvm-svn: 258622
|
| |
|
|
|
|
|
|
|
| |
Seems like some compilers still give unused variable warnings for
bool var = ...;
(void)var;
so I have to inline the variable.
llvm-svn: 258619
|
| |
|
|
| |
llvm-svn: 258618
|
| |
|
|
| |
llvm-svn: 258615
|
| |
|
|
|
|
| |
Replace tests with lrp with basic IR expansion
llvm-svn: 258612
|
| |
|
|
| |
llvm-svn: 258608
|
| |
|
|
|
|
| |
This has side effects.
llvm-svn: 258607
|
| |
|
|
|
|
|
| |
This is a leftover from AMDIL that doesn't do anything
and doesn't belong here.
llvm-svn: 258606
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Some of the conditions necessary to produce ccmp sequences were only
checked in recursive calls to emitConjunctionDisjunctionTree() after
some of the earlier expressions were already built. Move all checks over
to isConjunctionDisjunctionTree() so they are all checked before we
start emitting instructions.
Also rename some variable to better reflect their usage.
llvm-svn: 258605
|
| |
|
|
|
|
|
|
|
| |
isConjunctionDisjunctionTree()
This function will exhibit exponential runtime (2**n) so we should
rather use a lower limit.
llvm-svn: 258604
|
| |
|
|
| |
llvm-svn: 258603
|
| |
|
|
|
|
|
|
|
| |
Previously it failed to add NumArgRegs to the offset and so clobbered an
already-used register. Now just start the numbering after the arg regs
and don't duplicate the add. Test coverage for this coming shortly with
the implementation of byval.
llvm-svn: 258597
|
| |
|
|
| |
llvm-svn: 258567
|
| |
|
|
| |
llvm-svn: 258558
|