| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
| |
This is needed to make the wasm waterfall green again
after we land the update to WASI:
https://github.com/WebAssembly/waterfall/pull/492
Differential Revision: https://reviews.llvm.org/D61351
llvm-svn: 359634
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61330
llvm-svn: 359621
|
| |
|
|
|
|
|
|
| |
This adds any extend support - folding to zero_extend_vector_inreg (PMOVZX) for legality
Minor improvement for PR39709
llvm-svn: 359608
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add support for f16 libcalls in WebAssembly. This entails adding signatures
for the remaining F16 libcalls, and renaming gnu_f2h_ieee/gnu_h2f_ieee to
truncsfhf2/extendhfsf2 for consistency between f32 and f64/f128 (compiler-rt
already supports this).
Differential Revision: https://reviews.llvm.org/D61287
Reviewer: dschuff
llvm-svn: 359600
|
| |
|
|
|
|
|
|
| |
It's been like this since it was added in a refactor of this code.
Fixes PR41659
llvm-svn: 359597
|
| |
|
|
|
|
|
|
|
|
|
|
| |
remove dead nodes from the graph
The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue.
This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it.
Differential Revision: https://reviews.llvm.org/D61164
llvm-svn: 359582
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from other LEA optimizations.
This removes some of the class variables. Merge basic block processing into
runOnMachineFunction to keep the flags local.
Pass MachineBasicBlock around instead of an iterator. We can get the iterator in
the few places that need it. Allows a range-based outer for loop.
Separate the Atom optimization from the rest of the optimizations. This allows
fixupIncDec to create INC/DEC and still allow Atom to turn it back into LEA
when profitable by its heuristics.
I'd like to improve fixupIncDec to turn LEAs into ADD any time the base or index
register is equal to the destination register. This is profitable regardless of
the various slow flags. But again we would want Atom to be able to undo that.
Differential Revision: https://reviews.llvm.org/D60993
llvm-svn: 359581
|
| |
|
|
|
|
|
|
|
| |
This implements TargetTransformInfo method getMemcpyCost, which estimates the
number of instructions to which a memcpy instruction expands to.
Differential Revision: https://reviews.llvm.org/D59787
llvm-svn: 359547
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SCALAR_TO_VECTOR(Elt) for all SSE flavors
Current LLVM uses pxor+pinsrb on SSE4+ for INSERT_VECTOR_ELT(ZeroVec, 0, Elt) insead of much simpler movd.
INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is idiomatic construct which is used e.g. for _mm_cvtsi32_si128(Elt) and for lowest element initialization in _mm_set_epi32.
So such inefficient lowering leads to significant performance digradations in ceratin cases switching from SSSE3 to SSE4.
https://bugs.llvm.org/show_bug.cgi?id=41512
Here INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is simply converted to SCALAR_TO_VECTOR(Elt) when applicable since latter is closer match to desired behavior and always efficiently lowered to movd and alike.
Committed on behalf of @Serge_Preis (Serge Preis)
Differential Revision: https://reviews.llvm.org/D60852
llvm-svn: 359545
|
| |
|
|
|
|
|
| |
The legalizer was already widening the shift amount. Add tests for that
behaviour, and also support widening the shifted value.
llvm-svn: 359542
|
| |
|
|
|
|
|
| |
Handlers.clear() in AsmPrinter::doFinalization() will destroy these handlers.
A unique_ptr makes the ownership clearer.
llvm-svn: 359541
|
| |
|
|
|
|
|
|
|
| |
Bail out on function arguments/returns with types aggregating an
unsupported type. This fixes cases where we would happily and
incorrectly lower functions taking e.g. [1 x i64] parameters, when we
don't even support plain i64 yet.
llvm-svn: 359540
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.
This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.
Differential Revision: https://reviews.llvm.org/D59785
llvm-svn: 359537
|
| |
|
|
|
|
| |
This is a follow-up to https://reviews.llvm.org/D59521.
llvm-svn: 359509
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The WebAssembly backend needs to know the signatures of all runtime
libcall functions. This adds the signature for __stack_chk_fail which was
previously missing.
Also, make the error message for a missing libcall include the name of
the function.
Differential Revision: https://reviews.llvm.org/D59521
Reviewed By: sbc100
llvm-svn: 359505
|
| |
|
|
|
|
|
|
|
|
| |
Change the PPCISelLowering.cpp function that decides to avoid update form in
favor of partial vector loads to know about newer load types and to not be
confused by the chain operand.
Differential Revision: https://reviews.llvm.org/D60102
llvm-svn: 359504
|
| |
|
|
|
|
|
|
|
|
| |
This was falling back and gives us a reason to create a selectIntrinsic function
which we would need eventually anyway. Update arm64-crypto.ll to show that we
correctly select it.
Also factor out the code for finding an intrinsic ID.
llvm-svn: 359501
|
| |
|
|
|
|
|
|
|
|
|
| |
This is necessary since SVN r330706, as tail merging can include
CFI instructions since then.
This fixes PR40322 and PR40012.
Differential Revision: https://reviews.llvm.org/D61252
llvm-svn: 359496
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add target shuffle decoding to isHorizontalBinOp as well as ISD::VECTOR_SHUFFLE support.
This does mean we can go through bitcasts so we need to bitcast the extracted args to ensure they are the correct type
Fixes PR39936 and should help with PR39920/PR39921
Differential Revision: https://reviews.llvm.org/D61245
llvm-svn: 359491
|
| |
|
|
|
|
|
|
|
|
| |
Use size_t assignment to prevent a bad explicit type conversion warning.
Given the typical size of shuffle masks this was never going to happen, but this at least stops the warning.
Reported in https://www.viva64.com/en/b/0629/
llvm-svn: 359479
|
| |
|
|
|
|
| |
Reported in https://www.viva64.com/en/b/0629/
llvm-svn: 359473
|
| |
|
|
|
|
| |
Reported in https://www.viva64.com/en/b/0629/
llvm-svn: 359472
|
| |
|
|
|
|
| |
Reported in https://www.viva64.com/en/b/0629/
llvm-svn: 359469
|
| |
|
|
|
|
| |
Reported in https://www.viva64.com/en/b/0629/
llvm-svn: 359467
|
| |
|
|
|
|
|
|
|
| |
This patch adds aliases for element sizes .B/.H/.S to the
AND/ORR/EOR/BIC bitwise logical instructions. The assembler now accepts
these instructions with all element sizes up to 64-bit (.D). The
preferred disassembly is .D.
llvm-svn: 359457
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds some basic operations for fp16
vectors, such as bitcast from fp16 to i16,
required to perform extract_subvector (also added
here) and extract_element.
Reviewers: SjoerdMeijer, DavidSpickett, t.p.northover, ostannard
Reviewed By: ostannard
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60618
llvm-svn: 359433
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The Procedure Call Standard for the Arm Architecture
states that float16x4_t and float16x8_t behave just
as uint16x4_t and uint16x8_t for argument passing.
This patch adds the fp16 vectors to the
ARMCallingConv.td file.
Reviewers: miyuki, ostannard
Reviewed By: ostannard
Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60720
llvm-svn: 359431
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(v)cvt(u)qq2ps. Add 'x' and'y' suffix aliases to masked version of the same in att syntax.
The 128/256 bit version of these instructions require an 'x' or 'y' suffix to
disambiguate the memory form in att syntax.
We were allowing the same suffix in intel syntax, but it appears gas does not
do that.
gas does allow the 'x' and 'y' suffix on register and broadcast forms even
though its not needed. We were allowing it on unmasked register form, but not on
masked versions or on masked or unmasked broadcast form.
While there fix some test coverage holes so they can be extended with the 'x'
and 'y' suffix tests.
llvm-svn: 359418
|
| |
|
|
|
|
| |
out-of-range extraction indices.
llvm-svn: 359406
|
| |
|
|
|
|
| |
Some of the combines might be further improved if we lower more shuffles with X86ISD::VPERMV3 directly, instead of waiting to combine the results.
llvm-svn: 359400
|
| |
|
|
|
|
|
|
|
|
| |
reduction (PR38840)
An xor reduction of a bool vector can be optimized to a parity check of the MOVMSK/BITCAST'd integer - if the population count is odd return 1, else return 0.
Differential Revision: https://reviews.llvm.org/D61230
llvm-svn: 359396
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MOVD/MOVQ and COPY_TO_REGCLASS instead
Summary:
The register form of these instructions are CodeGenOnly instructions that cover
GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for
the opposite bitcast. Due to the patterns using bitcasts these instructions get
marked as "bitcast" machine instructions as well. The peephole pass is able to
look through these as well as other copies to try to avoid register bank copies.
Because FR32/FR64/VR128 are all coalescable to each other we can end up in a
situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to
GR32->GR64 which the copyPhysReg code can't handle.
To prevent this, this patch removes one set of the 'bitcast' instructions. So
now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that
converts from GR32/GR64->VR128 has no special significance to the peephole pass
and won't be looked through.
I guess the other option would be to add support to copyPhysReg to just promote
the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined
anyway. But removing the CodeGenOnly instruction in favor of one that won't be
optimized seemed safer.
I deleted the peephole test because it couldn't be made to work with the bitcast
instructions removed.
The load version of the instructions were unnecessary as the pattern that selects
them contains a bitcasted load which should never happen.
Fixes PR41619.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61223
llvm-svn: 359392
|
| |
|
|
|
|
|
|
| |
Minor generalization of the existing <32 x i1> pre-AVX2 split code.
........
Causing irregular buildbot failures.
llvm-svn: 359391
|
| |
|
|
|
|
| |
Minor generalization of the existing <32 x i1> pre-AVX2 split code.
llvm-svn: 359389
|
| |
|
|
|
|
| |
As predicate masks are legal on AVX512 targets, we avoid MOVMSK in these cases, but we can just bitcast the bool vector to the integer equivalent directly - avoiding expansion of the reduction to a shuffle pattern.
llvm-svn: 359386
|
| |
|
|
|
|
|
|
| |
Fixes PR40332 in the limited case where we're selecting between a target shuffle and a zero vector.
We can extend this in the future to handle more opcodes and non-zero selections.
llvm-svn: 359378
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store.
Reviewers: spatel, RKSimon, reames, jfb, efriedma
Reviewed By: RKSimon
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60546
llvm-svn: 359368
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.
We discovered some internal test failures, so reverting for now.
Differential Revision: https://reviews.llvm.org/D61213
llvm-svn: 359363
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61208
llvm-svn: 359358
|
| |
|
|
|
|
|
|
|
|
|
|
| |
getConstantVRegValWithLookThrough does the same thing as the
getConstantValueForReg function, and has more visibility across GISel. Plus, it
supports looking through G_TRUNC, G_SEXT, and G_ZEXT. So, we get better code
reuse and more functionality for free by using it.
Add some test cases to select-extract-vector-elt.mir to show that we can now
look through those instructions.
llvm-svn: 359351
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when
printing the address of a MachineOperand::MO_GlobalAddress. Move that
handling into a new overriden method in each base class. A virtual
method was added to the base class for handling the generic case.
Refactors a few subclasses to support the target independent %a, %c, and
%n.
The patch also contains small cleanups for AVRAsmPrinter and
SystemZAsmPrinter.
It seems that NVPTXTargetLowering is possibly missing some logic to
transform GlobalAddressSDNodes for
TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended
inline assembly asm constraints.
Fixes:
- https://bugs.llvm.org/show_bug.cgi?id=41402
- https://github.com/ClangBuiltLinux/linux/issues/449
Reviewers: echristo, void
Reviewed By: void
Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60887
llvm-svn: 359337
|
| |
|
|
|
|
| |
one use
llvm-svn: 359332
|
| |
|
|
|
|
|
|
|
| |
There are instructions for these, so mark them as legal. Select the correct
instruction in AArch64InstructionSelector.cpp.
Update select-bswap.mir and arm64-rev.ll to reflect the changes.
llvm-svn: 359331
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61202
llvm-svn: 359328
|
| |
|
|
|
|
| |
getAddressOperands. NFCI
llvm-svn: 359318
|
| |
|
|
|
|
| |
Probably doesn't really matter, but was inconsistent with the rest of the code.
llvm-svn: 359317
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61156
llvm-svn: 359316
|
| |
|
|
|
|
|
|
|
|
|
| |
The PPC vector cost model values for insert/extract element reflect older
processors that lacked vector insert/extract and move-to/move-from VSR
instructions. Update getVectorInstrCost to give appropriate values for when
the newer instructions are present.
Differential Revision: https://reviews.llvm.org/D60160
llvm-svn: 359313
|
| |
|
|
| |
llvm-svn: 359299
|
| |
|
|
|
|
|
|
|
|
| |
code from LowerVectorAllZeroTest
Create a matchBitOpReduction helper that checks for the pattern with any opcode.
First step towards reusing this code to recognize other scalar reduction patterns.
llvm-svn: 359296
|