| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
TargetLoweringBase::isBinOp checks isCommutativeBinOp as a fallback, so don't duplicate.
llvm-svn: 364072
|
|
|
|
|
|
| |
The sat add/sub tests still have unnecessary extract_subvector((vandnps ymm, ymm), 0) uses that should be split to (vandnps (extract_subvector(ymm, 0), extract_subvector(ymm, 0)), but its getting better.
llvm-svn: 364038
|
|
|
|
| |
llvm-svn: 364030
|
|
|
|
| |
llvm-svn: 364026
|
|
|
|
|
|
| |
Use the isConstOrConstSplat helper instead of inspecting the build vector manually.
llvm-svn: 364024
|
|
|
|
| |
llvm-svn: 364022
|
|
|
|
| |
llvm-svn: 364006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
BLSI sets the C flag is the input is not zero. So if its followed
by a TEST of the input where only the Z flag is consumed, we can
replace it with the opposite check of the C flag.
We should be able to do the same for BLSMSK and BLSR, but the
naive test case for those is being optimized to a subo by
CodeGenPrepare.
Reviewers: spatel, RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63589
llvm-svn: 363957
|
|
|
|
| |
llvm-svn: 363922
|
|
|
|
|
|
|
|
| |
The caller of this is looking for comparisons of the input
to these instructions with 0. But the memory instructions
input is an addess not a value input in a register.
llvm-svn: 363907
|
|
|
|
|
|
| |
seemed obviously missing.
llvm-svn: 363906
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is an exception to the rule that we should prefer xmm ops to ymm ops.
As shown in PR42305:
https://bugs.llvm.org/show_bug.cgi?id=42305
...the store folding opportunity with vextractf128 may result in better
perf by reducing the instruction count.
Differential Revision: https://reviews.llvm.org/D63517
llvm-svn: 363853
|
|
|
|
|
|
| |
So I don't forget - there's a load of yak shaving to do first.
llvm-svn: 363847
|
|
|
|
|
|
| |
We already do this for ZERO_EXTEND/ZERO_EXTEND_VECTOR_INREG - this just extends the pattern matcher to recognize cases where we don't need the zeros in the extension.
llvm-svn: 363841
|
|
|
|
|
|
|
|
| |
bool flag. NFCI.
Prep work to support ANY_EXTEND/ANY_EXTEND_VECTOR_INREG without needing another flag.
llvm-svn: 363818
|
|
|
|
|
|
| |
Given a *_EXTEND or *_EXTEND_VECTOR_INREG opcode, convert it to *_EXTEND_VECTOR_INREG.
llvm-svn: 363812
|
|
|
|
|
|
| |
extract_subvector(*_EXTEND_VECTOR_INREG) handling. NFCI.
llvm-svn: 363808
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
llvm.x86.sse.stmxcsr only writes to memory.
llvm.x86.sse.ldmxcsr only reads from memory, and might generate an FPE.
Reviewers: craig.topper, RKSimon
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62896
llvm-svn: 363773
|
|
|
|
|
|
|
|
|
|
|
| |
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
llvm-svn: 363757
|
|
|
|
|
|
|
|
| |
FP_ROUND defaults to Legal for all MVT types and nothing changes
the v4f32 entry way from this default. If we needed this line
we'd also need one for v8f32 with AVX512 which we don't have.
llvm-svn: 363719
|
|
|
|
|
|
| |
Part of fixing the X86 regression noted in D63281 - I've split this into X86 and generic parts - the generic commit will be coming shortly and will fix the vector-reduce-mul-widen.ll regression introduced here.
llvm-svn: 363693
|
|
|
|
|
|
|
|
|
|
| |
First step toward addressing the vector-reduce-mul-widen.ll regression in D63281 - we should replace ANY_EXTEND/ANY_EXTEND_VECTOR_INREG in X86ISelDAGToDAG to avoid having to add duplicate patterns when treating any extensions as legal.
In future patches this will also allow us to keep any extension nodes around a lot longer in the DAG, which should mean that we can keep better track of undef elements that otherwise become zeros that we think we have to keep......
Differential Revision: https://reviews.llvm.org/D63326
llvm-svn: 363655
|
|
|
|
|
|
|
|
| |
function. NFCI
Preliminary step for D59909
llvm-svn: 363645
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions.
The isel patterns for these use a bitcast and load/store, but
DAG combine should have canonicalized those away.
For the purposes of the memory folding table these opcodes can be
replaced by the MOVSSrm_alt/MOVSDrm_alt and MOVSSmr/MOVSDmr opcodes.
llvm-svn: 363644
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rename the old versions that use FR32/FR64 to MOVSSrm_alt/MOVSDrm_alt.
Use the new versions in patterns that previously used a COPY_TO_REGCLASS
to VR128. These patterns expect the upper bits to be zero. The
current set up appears to work, but I'm not sure we should be
enforcing upper bits being zero through a COPY_TO_REGCLASS.
I wanted to flip the arrangement and use a COPY_TO_REGCLASS to
FR32/FR64 for the patterns that need an f32/f64 result, but that
complicated fastisel and globalisel.
I've been doing some experiments with reducing some isel patterns
and ended up in a situation where I had a
(SUBREG_TO_REG (COPY_TO_RECLASS (VMOVSSrm), VR128)) and our
post-isel peephole was unable to avoid using an instruction for
the SUBREG_TO_REG due to the COPY_TO_REGCLASS. Having a VR128
instruction removes the COPY_TO_REGCLASS that was breaking this.
llvm-svn: 363643
|
|
|
|
|
|
|
|
| |
VMOVSSZmrk/VMOVSDZmrk.
Removes COPY_TO_REGCLASS from some patterns.
llvm-svn: 363630
|
|
|
|
|
|
|
|
| |
types are allowed here. NFC
Make it clear that only integer type with i32 or smaller elements shoudl get to this part of the code.
llvm-svn: 363629
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
register form requires 64-bit mode, but the memory form does not.
We don't know if its safe to unfold if we're in 32-bit mode.
This is simlar to what was done to some load opcodes in r363523.
I think its pretty unlikely we will try to unfold these anyway so
I don't think this is testable.
llvm-svn: 363595
|
|
|
|
|
|
| |
If a XMM non-temporal store has less than natural alignment, scalarize the vector - with SSE4A we can stay on the vector and use MOVNTSD(f64), else we must move to GPRs and use MOVNTI(i32/i64).
llvm-svn: 363592
|
|
|
|
|
|
| |
If a YMM/ZMM non-temporal store has less than natural alignment, split the vector - either they will be satisfactorily aligned or will continue to be split until they are XMMs - at which point the legalizer will scalarize it.
llvm-svn: 363582
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.
This adds two new functions:
bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;
to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.
This fixes https://llvm.org/PR40759
Differential Revision: https://reviews.llvm.org/D61764
llvm-svn: 363581
|
|
|
|
|
|
|
|
| |
This is currently only used for ymm->xmm splitting but we shouldn't hardcode the offsets/alignment.
This is necessary for an upcoming patch to split under-aligned non-temporal vector loads.
llvm-svn: 363570
|
|
|
|
|
|
|
|
|
|
| |
For loads, pre-SSE41 we can't perform NT loads at all, and after that we can only perform vector aligned loads, so if the alignment is less than for a xmm we'll just end up using the regular unaligned vector loads anyway.
First step towards fixing PR42026 - the next step for stores will be to use SSE4A movntsd where possible and to avoid the stack spill on SSE2 targets.
Differential Revision: https://reviews.llvm.org/D63246
llvm-svn: 363564
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from uses the REX prefix, but the memory form does not.
It would not be safe to unfold the memory form the register form
without checking that we are compiling for 64-bit mode.
This probaby isn't a real functional issue since we are unlikely
to unfold any of these instructions since they don't have any
tied registers, aren't commutable, and don't have any inputs
other than the address.
llvm-svn: 363523
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is similar logic/motivation to the select splitting in D62969.
In D63233, the pattern changes so that we no longer have an extract_subvector of vselect,
but the operands of the select are still being concatenated.
The closest case is represented in either the first or last test diffs here - we have an
extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions.
I think that's the right trade-off for most AVX1 targets.
In the example based on PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx).
Differential Revision: https://reviews.llvm.org/D63364
llvm-svn: 363508
|
|
|
|
|
|
|
|
| |
sources
Insert the shorter vector source into an undef vector of the longer vector source's type.
llvm-svn: 363507
|
|
|
|
|
|
| |
rootsize. NFCI.
llvm-svn: 363501
|
|
|
|
|
|
|
|
|
|
| |
shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles
Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well.
Fixes PR34380
llvm-svn: 363500
|
|
|
|
|
|
| |
This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0)
llvm-svn: 363499
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
of ISelLowering to mirror non-strict nodes on x86.
I recently discovered a bug on the x86 platform: The fp80 type was not handled well by x86 for constrained floating point nodes, as their regular counterparts are replaced by extending loads and truncating stores during the preprocess phase. Normally, platforms don't have this issue, as they don't typically attempt to perform such legalizations during instruction selection preprocessing. Before this change, strict_fp nodes survived until they were mutated to normal nodes, which happened shortly after preprocessing on other platforms. This modification lowers these nodes at the same phase while properly utilizing the chain.5
Submitted by: Drew Wock <drew.wock@sas.com>
Reviewed by: Craig Topper, Kevin P. Neal
Approved by: Craig Topper
Differential Revision: https://reviews.llvm.org/D63271
llvm-svn: 363417
|
|
|
|
|
|
|
| |
to segment registers closer to the segment register check for when
we add further optimizations.
llvm-svn: 363355
|
|
|
|
|
|
|
|
|
|
| |
ATTR_VEXL/ATTR_EVEXL bits. NFCI
Merging the two bits shrinks the context table from 16384 bytes to 8192 bytes.
Remove the ATTRIBUTE_BITS macro and just create an enum directly. Then fix the ATTR_max define to be 8192 to reflect the table size so we stop hardcoding it separately.
llvm-svn: 363330
|
|
|
|
|
|
|
|
|
|
| |
Previously it copied over MachineMemOperands verbatim which caused MOV32rm to have store flags set, and MOV32mr to have load flags set. This fixes some assertions being thrown with EXPENSIVE_CHECKS on.
Committed on behalf of @luke (Luke Lau)
Differential Revision: https://reviews.llvm.org/D62726
llvm-svn: 363268
|
|
|
|
|
|
|
|
| |
helper wrapper. NFCI.
Pre-commit for D62726 on behalf of @luke (Luke Lau)
llvm-svn: 363257
|
|
|
|
|
|
| |
Based on fuzz test from @craig.topper
llvm-svn: 363251
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Remove redundant initializations from pass constructors that were
already being initialized by LLVMInitializeX86Target().
- Add initialization function for the FPS pass.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63218
llvm-svn: 363221
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR42123)
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
|
|
|
|
|
|
|
|
|
|
| |
vpermilps(concat(x,y),c)
Handles PSHUFD/PSHUFLW/PSHUFHW (AVX2) + VPERMILPS (AVX1).
An extra AVX1 PSHUFD->VPERMILPS combine will be added in a future commit.
llvm-svn: 363178
|
|
|
|
|
|
|
|
|
| |
The non-masked versions are already in there. I'm having some
trouble coming up with a way to test this right now. Most load
folding should happen during isel so I'm not sure how to get
peephole pass to do it.
llvm-svn: 363125
|
|
|
|
|
|
| |
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.
llvm-svn: 363048
|