| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds some basic operations for fp16
vectors, such as bitcast from fp16 to i16,
required to perform extract_subvector (also added
here) and extract_element.
Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard
Reviewed By: ostannard
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60618
llvm-svn: 359433
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The Procedure Call Standard for the Arm Architecture
states that float16x4_t and float16x8_t behave just
as uint16x4_t and uint16x8_t for argument passing.
This patch adds the fp16 vectors to the
ARMCallingConv.td file.
Reviewers: miyuki, ostannard
Reviewed By: ostannard
Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60720
llvm-svn: 359431
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(v)cvt(u)qq2ps. Add 'x' and'y' suffix aliases to masked version of the same in att syntax.
The 128/256 bit version of these instructions require an 'x' or 'y' suffix to
disambiguate the memory form in att syntax.
We were allowing the same suffix in intel syntax, but it appears gas does not
do that.
gas does allow the 'x' and 'y' suffix on register and broadcast forms even
though its not needed. We were allowing it on unmasked register form, but not on
masked versions or on masked or unmasked broadcast form.
While there fix some test coverage holes so they can be extended with the 'x'
and 'y' suffix tests.
llvm-svn: 359418
|
| |
|
|
|
|
| |
out-of-range extraction indices.
llvm-svn: 359406
|
| |
|
|
|
|
| |
Some of the combines might be further improved if we lower more shuffles with X86ISD::VPERMV3 directly, instead of waiting to combine the results.
llvm-svn: 359400
|
| |
|
|
|
|
|
|
|
|
| |
reduction (PR38840)
An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0.
Differential Revision: https://reviews.llvm.org/D61230
llvm-svn: 359396
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MOVD/MOVQ and COPY_TO_REGCLASS instead
Summary:
The register form of these instructions are CodeGenOnly instructions that cover
GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for
the opposite bitcast. Due to the patterns using bitcasts these instructions get
marked as "bitcast" machine instructions as well. The peephole pass is able to
look through these as well as other copies to try to avoid register bank copies.
Because FR32/FR64/VR128 are all coalescable to each other we can end up in a
situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to
GR32->GR64 which the copyPhysReg code can't handle.
To prevent this, this patch removes one set of the 'bitcast' instructions. So
now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that
converts from GR32/GR64->VR128 has no special significance to the peephole pass
and won't be looked through.
I guess the other option would be to add support to copyPhysReg to just promote
the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined
anyway. But removing the CodeGenOnly instruction in favor of one that won't be
optimized seemed safer.
I deleted the peephole test because it couldn't be made to work with the bitcast
instructions removed.
The load version of the instructions were unnecessary as the pattern that selects
them contains a bitcasted load which should never happen.
Fixes PR41619.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61223
llvm-svn: 359392
|
| |
|
|
|
|
|
|
| |
Minor generalization of the existing <32 x i1> pre-AVX2 split code.
........
Causing irregular buildbot failures.
llvm-svn: 359391
|
| |
|
|
|
|
| |
Minor generalization of the existing <32 x i1> pre-AVX2 split code.
llvm-svn: 359389
|
| |
|
|
|
|
| |
As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern.
llvm-svn: 359386
|
| |
|
|
|
|
|
|
| |
Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector.
We can extend this in the future to handle more opcodes and non-zero selections.
llvm-svn: 359378
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store.
Reviewers: spatel, RKSimon, reames, jfb, efriedma
Reviewed By: RKSimon
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60546
llvm-svn: 359368
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.
We discovered some internal test failures, so reverting for now.
Differential Revision: https://reviews.llvm.org/D61213
llvm-svn: 359363
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61208
llvm-svn: 359358
|
| |
|
|
|
|
|
|
|
|
|
|
| |
getConstantVRegValWithLookThrough does the same thing as the
getConstantValueForReg function, and has more visibility across GISel. Plus, it
supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code
reuse and more functionality for free by using it.
Add some test cases to select-extract-vector-elt.mir to show that we can now
look through those instructions.
llvm-svn: 359351
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.
Refactors a few subclasses to support the target independent %a, %c, and
%n.
The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.
It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.
Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449
Reviewers: echristo, void
Reviewed By: void
Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60887
llvm-svn: 359337
|
| |
|
|
|
|
| |
one use
llvm-svn: 359332
|
| |
|
|
|
|
|
|
|
| |
There are instructions for these, so mark them as legal. Select the correct
instruction in AArch64InstructionSelector.cpp.
Update select-bswap.mir and arm64-rev.ll to reflect the changes.
llvm-svn: 359331
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61202
llvm-svn: 359328
|
| |
|
|
|
|
| |
getAddressOperands. NFCI
llvm-svn: 359318
|
| |
|
|
|
|
| |
Probably doesn't really matter, but was inconsistent with the rest of the code.
llvm-svn: 359317
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61156
llvm-svn: 359316
|
| |
|
|
|
|
|
|
|
|
|
| |
The PPC vector cost model values for insert/extract element reflect older
processors that lacked vector insert/extract and move-to/move-from VSR
instructions. Update getVectorInstrCost to give appropriate values for when
the newer instructions are present.
Differential Revision: https://reviews.llvm.org/D60160
llvm-svn: 359313
|
| |
|
|
| |
llvm-svn: 359299
|
| |
|
|
|
|
|
|
|
|
| |
code from LowerVectorAllZeroTest
Create a matchBitOpReduction helper that checks for the pattern with any opcode.
First step towards reusing this code to recognize other scalar reduction patterns.
llvm-svn: 359296
|
| |
|
|
|
|
|
|
|
|
| |
targets (PR40758)
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).
Differential Revision: https://reviews.llvm.org/D61068
llvm-svn: 359293
|
| |
|
|
|
|
|
|
| |
A small step towards combining shuffles across vector sizes - this recognizes when a shuffle's operands are all extracted from the same larger source and tries to combine to an unary shuffle of that source instead. Fixes one of the test cases from PR34380.
Differential Revision: https://reviews.llvm.org/D60512
llvm-svn: 359292
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code was using the alignment of a pointer to the value, not the
alignment of the constant itself.
Maybe we got away with it so far because the pointer alignment is
fairly high, but we did end up under-aligning <16 x i8> vectors,
which was caught in the Chromium build after lld stopped over-aligning
the .rodata.cst16 section in r356428. (See crbug.com/953815)
Differential revision: https://reviews.llvm.org/D61124
llvm-svn: 359287
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.
Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)
Differential Revision: https://reviews.llvm.org/D60279
llvm-svn: 359248
|
| |
|
|
|
|
|
|
|
|
|
|
| |
All of the new instructions are still handled mostly by tablegen. I've slightly
refactored the code to drive intrinsic/instruction generation from a master
list of supported variants, so all irregularities have to be implemented in one place only.
The test generation script wmma.py has been refactored in a similar way.
Differential Revision: https://reviews.llvm.org/D60015
llvm-svn: 359247
|
| |
|
|
|
|
|
|
|
|
|
| |
PTX 6.3 requires using ".aligned" in the MMA instruction names.
In order to generate correct name, now we pass current
PTX version to each instruction as an extra constant operand
and InstPrinter adjusts its output accordingly.
Differential Revision: https://reviews.llvm.org/D59393
llvm-svn: 359246
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generalized constructions of 'fragments' of MMA operations to provide
common primitives for construction of the ops. This will make it easier
to add new variants of the instructions that operate on integer types.
Use nested foreach loops which makes it possible to better control
naming of the intrinsics.
This patch does not affect LLVM's output, so there are no test changes.
Differential Revision: https://reviews.llvm.org/D59389
llvm-svn: 359245
|
| |
|
|
|
|
|
| |
Revert DecoderNamespace in one place for now. It will need more
changes to properly work.
llvm-svn: 359239
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I added a diagnostic along the lines of `-Wpessimizing-move` to detect `return x = y` suppressing copy elision, but I don't know if the diagnostic is really worth it. Anyway, here are the places where my diagnostic reported that copy elision would have been possible if not for the assignment.
P1155R1 in the post-San-Diego WG21 (C++ committee) mailing discusses whether WG21 should fix this pitfall by just changing the core language to permit copy elision in cases like these.
(Kona update: The bulk of P1155 is proceeding to CWG review, but specifically *not* the parts that explored the notion of permitting copy-elision in these specific cases.)
Reviewed By: dblaikie
Author: Arthur O'Dwyer
Differential Revision: https://reviews.llvm.org/D54885
llvm-svn: 359236
|
| |
|
|
|
|
|
|
| |
This case was missing before, so we couldn't legalize it.
Add it to AArch64LegalizerInfo.cpp and update select-extract-vector-elt.mir.
llvm-svn: 359231
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61099
llvm-svn: 359225
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61094
llvm-svn: 359224
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a legalization rule for G_ZEXT, G_ANYEXT, and G_SEXT which allows
extends whenever the types will fit in registers (or the source is an s1).
Update tests. Add GISel checks throughout all of arm64-vabs.ll,
where we now select a good portion of the code. Add GISel checks to
arm64-subvector-extend.ll, which has a good number of vector extends in it.
Differential Revision: https://reviews.llvm.org/D60889
llvm-svn: 359222
|
| |
|
|
|
|
|
|
|
| |
Add legalizer support for G_FNEARBYINT. It's the same as G_FCEIL etc.
Since the importer allows us to automatically select this after legalization,
also add tests for selection etc. Also update arm64-vfloatintrinsics.ll.
llvm-svn: 359204
|
| |
|
|
|
|
|
|
|
|
| |
integers
Truncate the movmsk scalar integer result to the equivalent scalar integer width as before but then bitcast to the requested type.
We still have the issue identified in PR41594 but D61114 should handle this.
llvm-svn: 359176
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
On Mips32r2 bitcast can be expanded to two sw instructions and an ldc1
when using bitcast i64 to double or an sdc1 and two lw instructions when
using bitcast double to i64. By introducing custom lowering that uses
mtc1/mthc1 we can avoid excessive instructions.
Patch by Mirko Brkusanin.
Differential Revision: https://reviews.llvm.org/D61069
llvm-svn: 359171
|
| |
|
|
|
|
|
|
| |
The IndexReg will always be non-null at this point. Earlier in the function, if
IndexReg was null we set it to CurDAG->getRegister(0, VT) which made it
non-null.
llvm-svn: 359170
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Test commit.
Reviewers: msearles, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, arsenm, jvesely, nhaehnle, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61093
llvm-svn: 359154
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fixes for buildbot error (undefined assembler label).
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.
Reviewers: rnk
Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D61083
llvm-svn: 359149
|
| |
|
|
|
|
|
|
|
|
| |
Using initial-exec TLS variables is a reasonable performance
optimisation for system libraries. Use the correct PIC mechanism to get
hold of the GOT to avoid text relocations.
Differential Revision: https://reviews.llvm.org/D61026
llvm-svn: 359146
|
| |
|
|
| |
llvm-svn: 359144
|
| |
|
|
| |
llvm-svn: 359143
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61080
llvm-svn: 359139
|
| |
|
|
|
|
|
|
|
|
| |
that should use movzx.
This can save a 32-bit immediate move.
We would shrink the load and fold it if it was non-volatile, but that's trickier to check for.
llvm-svn: 359129
|
| |
|
|
|
|
|
|
|
|
|
|
| |
matching
ReplaceAllUsesWith doesn't remove the node that was replaced. So its left around in the graph messing up use counts on other nodes.
One thing to note, is that this isn't valid if the node being deleted is the root node of an LEA match that gets rejected. In that case the node needs to stay alive because the isel table walking code would still have a reference to it that its going to try to match next. I don't think that's the case here though because the nodes being deleted here should be "and", "srl", and "zero_extend" none of which can be the root node of an LEA match.
Differential Revision: https://reviews.llvm.org/D61048
llvm-svn: 359121
|