| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Keep the uint64_t type from getConstantOperandVal to stop truncation/extension overflow warnings in MSVC in subvector index math.
llvm-svn: 365328
|
|
|
|
|
|
|
| |
This can help with code size on SSE targets where SHUFPD requires
a 0x66 prefix and SHUFPS doesn't.
llvm-svn: 365293
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
targets.
This can help avoid a copy or enable load folding.
On SSE4.1 targets we can commute it to blendi instead.
I had to make shufpd with a 0x02 immediate commutable as well
since we expect commuting to be reversible.
llvm-svn: 365292
|
|
|
|
|
|
|
|
| |
to turn UNPCKLPDrr->MOVHPDrm when load is under aligned.
If the load is aligned we can turn UNPCKLPDrr into UNPCKLPDrm.
llvm-svn: 365287
|
|
|
|
|
|
| |
patterns.
llvm-svn: 365275
|
|
|
|
|
|
|
|
| |
NFCI.
Fixes cppcheck warning.
llvm-svn: 365271
|
|
|
|
| |
llvm-svn: 365270
|
|
|
|
|
|
|
|
| |
These patterns are the same as the MOVLPDmr and MOVHPDmr patterns,
but with a bitcast at the end. We can just select the PD instruction
and let execution domain fixing switch to PS.
llvm-svn: 365267
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
UNPCKL+load.
These narrow the load so we can only do it if the load isn't
volatile.
There also tests in vector-shuffle-128-v4.ll that this should
support, but we don't seem to fold bitcast+load on pre-sse4.2
targets due to the slow unaligned mem 16 flag.
llvm-svn: 365266
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Size either needs to be 0 meaning we aren't folding
a stack reload. Or the stack slot needs to be at least
16 bytes. I've also added a paranoia check ensure the
RCSize is at leat 16 bytes as well. This avoids any
FR32/FR64 surprises, but I think we already filtered
those earlier.
All of our test case have Size as either 0 or 16 and
RCSize == 16. So the Size <= 16 check worked for those
cases.
llvm-svn: 365234
|
|
|
|
|
|
|
|
|
|
|
| |
non-volatile before folding.
These patterns use 128-bit loads, but the instructions only load
64-bits. We shouldn't narrow the load if its volatile.
Fixes another variant of PR42079
llvm-svn: 365225
|
|
|
|
|
|
|
|
|
| |
This was identical to a pattern for MOVPQI2QImr with a bitcast
as an input. But we should be able to turn MOVPQI2QImr into
MOVLPSmr in the execution domain fixup pass so we shouldn't
need this.
llvm-svn: 365224
|
|
|
|
|
|
|
|
| |
Revision r365061 changed a skip of debug instructions for a skip
of meta instructions. This is not safe, as IMPLICIT_DEF is classed
as a meta instruction.
llvm-svn: 365202
|
|
|
|
|
|
| |
should not have been added.
llvm-svn: 365199
|
|
|
|
|
|
|
|
| |
Revision r365061 changed a skip of debug instructions for a skip
of meta instructions. This is not safe, as IMPLICIT_DEF is classed
as a meta instruction.
llvm-svn: 365198
|
|
|
|
|
|
| |
Fixes OSS-Fuzz #15662
llvm-svn: 365180
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
forms under optsize when the immediate has additional users.
Summary:
We attempt to prevent folding immediates with multiple users under optsize. But we only do this from store nodes and X86ISD::ADD/SUB/XOR/OR/AND patterns. We don't do it for ISD::ADD/SUB/XOR/OR/AND even though we count them as users when deciding whether to fold into other nodes. This leads to situations where we block folding to a compare for example, but still fold into an AND or OR as seen in PR27202.
Unfortunately touching the isel patterns in tablegen for the ISD::ADD/SUB/XOR/OR/AND opcodes will cause the patterns to be unusable for fast isel. And we don't have a way to make a fast isel only pattern.
To workaround this, this patch adds custom isel in front of the isel table that will select the non-immediate forms if the immediate has additional users. This may create some issues for ANDN and NOT matching. And there's room for improvement with unsigned 32 immediates on 64-bit AND.
This patch needs more thorough test cases, but I wanted to get feedback on the direction. Please send me any other test cases you've seen in the wild.
I think we probably have the same issue with the immediate matching when we fold RMW from X86ISD::ADD/SUB/XOR/OR/AND. And our TEST immedaite shrinking logic. Our cost modeling for immediates that can fit in a sign extended 8-bit immediate on a 16/32/64 bit operation is completely wrong.
I also wonder if we should update the ConstantHoisting cost model and block folding for "opaque" constants. But of course constants can still be created by DAG combine and lowering optimizations.
Fixes PR27202
Reviewers: spatel, RKSimon, andreadb
Reviewed By: RKSimon
Subscribers: jsji, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59909
llvm-svn: 365163
|
|
|
|
|
|
|
|
| |
vpermilps(concat_vectors(x,y),c)
Bitcast v4i32 to v8f32 and back again - it might be worth adding isel patterns for X86PShufd v8i32 on AVX1 targets like we did for X86Blendi to avoid the bitcasts?
llvm-svn: 365125
|
|
|
|
|
|
|
|
| |
INSERT_VECTOR_ELT in a couple places.
Most places already did this.
llvm-svn: 365109
|
|
|
|
|
|
|
|
|
|
|
| |
This patch generalizes the fix in D61680 to ignore all meta instructions,
not just debug info.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D62605
llvm-svn: 365061
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We previously marked all the tests with branch funnels as
`-verify-machineinstrs=0`.
This is an attempt to fix it.
1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList =
(outs)`
2) After that we hit an assert: ``` Assertion failed: (Op.getValueType()
!= MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue
operands should occur at end of operand list!"), function AddOperand,
file
/Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp,
line 461. ```
The chain operand was added at the beginning of the operand list. Move
that to the end.
3) After that we hit another verifier issue in the pseudo expansion
where the registers used in the cmps and jmps are not added to the
livein lists. Add the `EFLAGS` to all the new MBBs that we create.
PR39436
Differential Review: https://reviews.llvm.org/D54155
llvm-svn: 365058
|
|
|
|
| |
llvm-svn: 365057
|
|
|
|
|
|
| |
If we have more then 2 shuffle ops to combine, try to use combineX86ShuffleChainWithExtract to see if some are from the same super vector.
llvm-svn: 365050
|
|
|
|
|
|
|
|
|
|
| |
iff the number of elements doesn't change.
This gets around an issue with combineX86ShuffleChain not being able to hint which domain is preferred for shuffles that can be done with either.
Fixes regression introduced in rL365041
llvm-svn: 365044
|
|
|
|
|
|
|
|
|
|
| |
extract_subvectors to the combine depth
This better accounts for the cost/benefit of removing extract_subvectors from the shuffle and will be more useful in future patches.
The vpermq predicate regression will be fixed shortly.
llvm-svn: 365041
|
|
|
|
|
|
| |
Fixes MSVC analyzer extension->double warning.
llvm-svn: 365027
|
|
|
|
|
|
| |
Assert that the insertion index is in range and use uint64_t for the index to fix MSVC/cppcheck truncation warning.
llvm-svn: 365025
|
|
|
|
|
|
| |
Assert that the shift amount is in range and create vXi8 shift masks in a way that doesn't cause MSVC/cppcheck shift result is truncated then extended warnings.
llvm-svn: 365024
|
|
|
|
|
|
| |
Both MSVC and cppcheck don't like the fact that the variables are initialized via references.
llvm-svn: 365018
|
|
|
|
|
|
| |
This avoids the use of getZExtValue and uses the modulo shift amount which is whats expected for funnel shifts anyhow.
llvm-svn: 365016
|
|
|
|
|
|
|
|
|
|
| |
appropriate extload if the load isn't volatile.
Remove the corresponding isel patterns that did the same thing without checking for volatile.
This fixes another variation of PR42079
llvm-svn: 364977
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR42486)
Don't use APInt::getZExtValue() if you can avoid it - eventually someone will call it with i128 or something that doesn't fit into 64-bits.
In this case it was completely superfluous as we'd moved the rest of the code to always use APInt.
Fixes the <1 x i128> addition bug in PR42486
llvm-svn: 364953
|
|
|
|
|
|
|
|
|
|
| |
instead of COPY_TO_REGCLASS + (V)MOVSSrm_alt.
Similar for (V)MOVSD. Ultimately, I'd like to see about folding
scalar_to_vector+load to vzload. Which would select as (V)MOVSSrm
so this is closer to that.
llvm-svn: 364948
|
|
|
|
|
|
|
|
| |
lambda. NFCI.
Pull out CombineShuffleWithExtract lambda to new combineX86ShuffleChainWithExtract wrapper and refactored it to handle more than 2 shuffle inputs - this will allow combineX86ShufflesRecursively to call this in a future patch.
llvm-svn: 364924
|
|
|
|
|
|
| |
We were relying on combineX86ShufflesRecursively to handle this - this patch gets it done earlier which should make it easier for other code to use resolveTargetShuffleInputsAndMask.
llvm-svn: 364906
|
|
|
|
|
|
| |
X86ISD::CVTTP2SI/CVTTP2UI and to reduce the number of isel patterns.
llvm-svn: 364887
|
|
|
|
|
|
|
|
|
|
|
|
| |
only 32-bits are loaded.
v2i64 vzload defines a 64-bit memory access. It doesn't look like
we have any coverage for this either way.
Also remove some vzload usages where the instruction loads only
16-bits.
llvm-svn: 364851
|
|
|
|
|
|
|
| |
These patterns all matched a v2i64 vzload which only loads 64-bits
to instructions that load a full 128-bits.
llvm-svn: 364847
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These instructions only read 64-bits of memory so we shouldn't
allow a full vector width load to be pattern matched in case it
is marked volatile.
Instead allow vzload or scalar_to_vector+load.
Also add a DAG combine to turn full vector loads into vzload when
used by one of these instructions if the load isn't volatile.
This fixes another case for PR42079
llvm-svn: 364838
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function findPotentialBlockers may consider debug info instructions as
potential blockers and may stop searching for a store-load pair prematurely.
This patch corrects this and tests the cases where the store is separated
from the load by more than InspectionLimit debug instructions.
Patch by Chris Dawson.
Differential Revision: https://reviews.llvm.org/D62408
llvm-svn: 364829
|
|
|
|
|
|
|
|
| |
We can already widenSubVector to a specific type (of the same scalar type) - this variant just specifies the target vector size.
This will be useful when CombineShuffleWithExtract relaxes the need to have the same scalar type for all shuffle operand subvector sources.
llvm-svn: 364803
|
|
|
|
|
|
| |
CombineShuffleWithExtract no longer requires that both shuffle ops are extract_subvectors, from the same type or from the same size.
llvm-svn: 364745
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had a bunch of vector size legality checks for the source type
based on feature flags, but we didn't check the destination type at
all beyond ensuring that it was a "simple" type. But this allowed
the destination to be i128 which isn't legal.
This commit changes the code to use TLI's isTypeLegal logic in
place of the all the subtarget checks. Then additionally checks
that the source and dest are vectors.
Fixes 42452
llvm-svn: 364729
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CVTSI2FP/CVTUI2FP node with a vzload.
But only when the load isn't volatile.
This improves load folding during isel where we only have vzload
and scalar_to_vector+load patterns. We can't have full vector load
isel patterns for the same volatile load issue.
Also add some missing masked cvtsi2fp/cvtui2fp with vzload patterns.
llvm-svn: 364728
|
|
|
|
|
|
|
|
|
| |
We already had patterns that used scalar_to_vector+load. But we can
also have a vzload.
Found while investigating combining scalar_to_vector+load to vzload.
llvm-svn: 364726
|
|
|
|
| |
llvm-svn: 364720
|
|
|
|
|
|
|
|
|
|
|
|
| |
selecting a maskmov+vblend during isel.
AVX masked loads only support 0 as the value for masked off elements.
So we need an extra blend to support other values. Previously we
expanded the masked load to two instructions with isel patterns.
With this patch we now insert the vselect during lowering and it
will be separately selected as a blend.
llvm-svn: 364718
|
|
|
|
|
|
|
| |
The cmov node used to sometimes return a glue result (and that's what
'flag' meant in this context), but that was removed with D38664.
llvm-svn: 364687
|
|
|
|
| |
llvm-svn: 364667
|
|
|
|
|
|
|
|
| |
We were requiring that both shuffle operands were EXTRACT_SUBVECTORs, but we can relax this to only require one of them to be.
Also, we shouldn't bother attempting this if both operands are from the lowest subvector (or not EXTRACT_SUBVECTOR at all).
llvm-svn: 364644
|