| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
| |
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.
Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.
llvm-svn: 359891
|
| |
|
|
| |
llvm-svn: 359889
|
| |
|
|
|
|
| |
Prefer ((X & Y) ? A : B) to (X & Y ? A : B)
llvm-svn: 359884
|
| |
|
|
|
|
|
|
|
| |
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.
llvm-svn: 359883
|
| |
|
|
|
|
| |
Leftover from before we had the extract128BitVector helpers.
llvm-svn: 359871
|
| |
|
|
| |
llvm-svn: 359869
|
| |
|
|
|
|
|
|
|
| |
r358358.
This covers the patterns we use for widening 128/256 comparisons to 512-bit when
AVX512VL isn't supported.
llvm-svn: 359863
|
| |
|
|
|
|
|
|
|
|
| |
Looks like just a minor oversight in the parsing code.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41504.
Differential Revision: https://reviews.llvm.org/D60840
llvm-svn: 359855
|
| |
|
|
|
|
| |
As far as I know, we never emit LEA16r
llvm-svn: 359840
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getRegForInlineAsmConstraint when the VT is a scalar type
The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes.
Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying.
This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg.
Fixes PR41678
Differential Revision: https://reviews.llvm.org/D61453
llvm-svn: 359837
|
| |
|
|
|
|
| |
Fix the forwarding of multiplication results for Exynos M4.
llvm-svn: 359834
|
| |
|
|
|
|
|
|
| |
This if used to be an assert that got refactored into an if, but left the string literal behind.
Fixes PR41718
llvm-svn: 359833
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.
There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.
Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.
Differential Revision: https://reviews.llvm.org/D61331
llvm-svn: 359791
|
| |
|
|
|
|
|
|
|
|
|
| |
Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway.
Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result:
https://godbolt.org/z/0R-U-K
Differential Revision: https://reviews.llvm.org/D61426
llvm-svn: 359786
|
| |
|
|
|
|
| |
Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921
llvm-svn: 359782
|
| |
|
|
|
|
| |
Get rid of local variable used only in assertion.
llvm-svn: 359772
|
| |
|
|
|
|
|
|
|
| |
Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in
the exact same way as 32 bits. This overwrites the higher bits, but that
should be ok since all legal users of types smaller than 32 bits ignore
those bits anyway.
llvm-svn: 359768
|
| |
|
|
|
|
| |
Make it legal to extend from e.g. s1 to s8 or s16.
llvm-svn: 359766
|
| |
|
|
|
|
|
|
|
|
|
|
| |
combineBVOfConsecutiveLoads
Summary:
Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61076
llvm-svn: 359764
|
| |
|
|
|
|
|
|
| |
Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32.
Differential Revision:
llvm-svn: 359757
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61413
llvm-svn: 359756
|
| |
|
|
|
|
|
|
| |
Constant bus limit has increased to 2 with GFX10.
Differential Revision: https://reviews.llvm.org/D61404
llvm-svn: 359754
|
| |
|
|
|
|
|
|
|
|
| |
The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template.
Patch by Pengfei Wang
Differential Revision: https://reviews.llvm.org/D61295
llvm-svn: 359753
|
| |
|
|
|
|
|
|
|
|
| |
This adds support for using fmov rather than a standard mov to materialize
G_FCONSTANT when it's safe to do so.
Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the
selection is correct.
llvm-svn: 359734
|
| |
|
|
|
|
|
|
| |
We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector).
Differential Revision: https://reviews.llvm.org/D61263
llvm-svn: 359707
|
| |
|
|
|
|
|
|
| |
Reassign registers to reduce register bank conflicts.
Differential Revision: https://reviews.llvm.org/D61344
llvm-svn: 359704
|
| |
|
|
|
|
|
|
| |
Convert NSA into non-NSA images.
Differential Revision: https://reviews.llvm.org/D61341
llvm-svn: 359700
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61339
llvm-svn: 359698
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61332
llvm-svn: 359696
|
| |
|
|
| |
llvm-svn: 359694
|
| |
|
|
|
|
| |
Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode
llvm-svn: 359686
|
| |
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359680
|
| |
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode
llvm-svn: 359678
|
| |
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359677
|
| |
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359673
|
| |
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359670
|
| |
|
|
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode
This lets us hit more cases than combineExtractSubvector and allows us reuse more code.
llvm-svn: 359669
|
| |
|
|
|
|
|
|
| |
code. NFCI.
Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes.
llvm-svn: 359667
|
| |
|
|
|
|
|
|
| |
This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK.
Differential Revision: https://reviews.llvm.org/D61189
llvm-svn: 359666
|
| |
|
|
|
|
|
|
| |
put it in the basic block instruction loop. NFC
Now need to check it 3 different times. Just do it once at the top of the loop.
llvm-svn: 359658
|
| |
|
|
|
|
|
|
| |
element"
This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313.
llvm-svn: 359648
|
| |
|
|
|
|
|
|
|
|
| |
This is needed to make the wasm waterfall green again
after we land the update to WASI:
https://github.com/WebAssembly/waterfall/pull/492
Differential Revision: https://reviews.llvm.org/D61351
llvm-svn: 359634
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61330
llvm-svn: 359621
|
| |
|
|
|
|
|
|
| |
This adds any extend support - folding to zero_extend_vector_inreg (PMOVZX) for legality
Minor improvement for PR39709
llvm-svn: 359608
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add support for f16 libcalls in WebAssembly. This entails adding signatures
for the remaining F16 libcalls, and renaming gnu_f2h_ieee/gnu_h2f_ieee to
truncsfhf2/extendhfsf2 for consistency between f32 and f64/f128 (compiler-rt
already supports this).
Differential Revision: https://reviews.llvm.org/D61287
Reviewer: dschuff
llvm-svn: 359600
|
| |
|
|
|
|
|
|
| |
It's been like this since it was added in a refactor of this code.
Fixes PR41659
llvm-svn: 359597
|
| |
|
|
|
|
|
|
|
|
|
|
| |
remove dead nodes from the graph
The reordering can leave at least a dead TokenFactor in the graph. This cause the linearize scheduler to fail with something like the assert seen in PR22614. This is only one of many ways we can break the linearize scheduler today so I can't say for sure that any of the other failures in that bug were caused by this issue.
This takes the heavy hammer approach of just running RemoveDeadNodes unconditionally at the end of the PreprocessISelDAG. If this turns out to be a compile time hit, we can try to refine it.
Differential Revision: https://reviews.llvm.org/D61164
llvm-svn: 359582
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from other LEA optimizations.
This removes some of the class variables. Merge basic block processing into
runOnMachineFunction to keep the flags local.
Pass MachineBasicBlock around instead of an iterator. We can get the iterator in
the few places that need it. Allows a range-based outer for loop.
Separate the Atom optimization from the rest of the optimizations. This allows
fixupIncDec to create INC/DEC and still allow Atom to turn it back into LEA
when profitable by its heuristics.
I'd like to improve fixupIncDec to turn LEAs into ADD any time the base or index
register is equal to the destination register. This is profitable regardless of
the various slow flags. But again we would want Atom to be able to undo that.
Differential Revision: https://reviews.llvm.org/D60993
llvm-svn: 359581
|
| |
|
|
|
|
|
|
|
| |
This implements TargetTransformInfo method getMemcpyCost, which estimates the
number of instructions to which a memcpy instruction expands to.
Differential Revision: https://reviews.llvm.org/D59787
llvm-svn: 359547
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SCALAR_TO_VECTOR(Elt) for all SSE flavors
Current LLVM uses pxor+pinsrb on SSE4+ for INSERT_VECTOR_ELT(ZeroVec, 0, Elt) insead of much simpler movd.
INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is idiomatic construct which is used e.g. for _mm_cvtsi32_si128(Elt) and for lowest element initialization in _mm_set_epi32.
So such inefficient lowering leads to significant performance digradations in ceratin cases switching from SSSE3 to SSE4.
https://bugs.llvm.org/show_bug.cgi?id=41512
Here INSERT_VECTOR_ELT(ZeroVec, 0, Elt) is simply converted to SCALAR_TO_VECTOR(Elt) when applicable since latter is closer match to desired behavior and always efficiently lowered to movd and alike.
Committed on behalf of @Serge_Preis (Serge Preis)
Differential Revision: https://reviews.llvm.org/D60852
llvm-svn: 359545
|