| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61529
llvm-svn: 359935
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Refer the ABI doc at: https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values
Related clang patch: D60349
Reviewers: rnk, efriedma, TomTan, ssijaric
Reviewed By: rnk, efriedma
Subscribers: mstorsjo, javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60348
llvm-svn: 359934
|
| |
|
|
|
|
|
| |
For immediates that can't be evaluated in assembler-mapped instructions, we
should return 'invalid operand' instead of assert.
llvm-svn: 359905
|
| |
|
|
|
|
|
|
|
|
|
|
| |
vfpclassps/pd and on memory forms in intel syntax
The x/y/z suffix is needed to disambiguate the memory form in at&t syntax since no xmm/ymm/zmm register is mentioned.
But we should also allow it for the register and broadcast forms where its not needed for consistency. This matches gas.
The printing code will still only use the suffix for the memory form where it is needed.
llvm-svn: 359903
|
| |
|
|
|
|
| |
Merge the if() tests for the various HADD/SUB + Subtarget tests
llvm-svn: 359901
|
| |
|
|
|
|
|
|
|
|
| |
The VOP3 form should always be the preferred selection form to be
shrunk later.
The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.
llvm-svn: 359899
|
| |
|
|
|
|
| |
Avoids test regression in a future patch
llvm-svn: 359898
|
| |
|
|
| |
llvm-svn: 359895
|
| |
|
|
|
|
|
| |
There were 2 patterns for sub, one selecting to sub and one to
subrev. Only one of these will succeed, so remove the reversed one.
llvm-svn: 359894
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.
Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.
llvm-svn: 359891
|
| |
|
|
| |
llvm-svn: 359889
|
| |
|
|
|
|
| |
Prefer ((X & Y) ? A : B) to (X & Y ? A : B)
llvm-svn: 359884
|
| |
|
|
|
|
|
|
|
| |
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.
llvm-svn: 359883
|
| |
|
|
|
|
| |
Leftover from before we had the extract128BitVector helpers.
llvm-svn: 359871
|
| |
|
|
| |
llvm-svn: 359869
|
| |
|
|
|
|
|
|
|
| |
r358358.
This covers the patterns we use for widening 128/256 comparisons to 512-bit when
AVX512VL isn't supported.
llvm-svn: 359863
|
| |
|
|
|
|
|
|
|
|
| |
Looks like just a minor oversight in the parsing code.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41504.
Differential Revision: https://reviews.llvm.org/D60840
llvm-svn: 359855
|
| |
|
|
|
|
| |
As far as I know, we never emit LEA16r
llvm-svn: 359840
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getRegForInlineAsmConstraint when the VT is a scalar type
The default impementation in the base class for TargetLowering::getRegForInlineAsmConstraint doesn't work for mask registers when the VT is a scalar type integer types since the only legal mask types are vXi1. So we end up just getting whatever the first register class that contains the register. Currently this appears to be VK1, but its really dependent on the order tablegen outputs the register classes.
Some code in the caller ends up looking up the type for this register class and find v1i1 then generates a copyfromreg from the physical k-register with the v1i1 type. Then it generates an any_extend from v1i1 to the scalar VT which isn't legal. This bad any_extend sticks around until isel where it selects a MOVZX32rr8 with a v1i1 input or maybe a i8 input. Not sure but eventually we pick up a copy from VK1 to GR8 in MachineIR which isn't supported. This leads to a failure in physical register copying.
This patch uses the scalar type to find a VK class of the right size. In the attached test case this will be VK16. This causes a bitcast from vk16 to i16 to be generated instead of an any_extend. This will be properly iseled to a VK16 to GR32 copy and a GR32->GR16 extract_subreg.
Fixes PR41678
Differential Revision: https://reviews.llvm.org/D61453
llvm-svn: 359837
|
| |
|
|
|
|
| |
Fix the forwarding of multiplication results for Exynos M4.
llvm-svn: 359834
|
| |
|
|
|
|
|
|
| |
This if used to be an assert that got refactored into an if, but left the string literal behind.
Fixes PR41718
llvm-svn: 359833
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.
There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.
Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.
Differential Revision: https://reviews.llvm.org/D61331
llvm-svn: 359791
|
| |
|
|
|
|
|
|
|
|
|
| |
Limiting scalar hadd/hsub generation to the lowest xmm looks to be unnecessary - we will be extracting one upper xmm whatever, and we can remove a shuffle by using the hop which is inline with what shouldUseHorizontalOp expects to happen anyway.
Testing on btver2 (the main target for fast-hops) shows this is beneficial even for float ops where we have a 'shuffle' to extract the float result:
https://godbolt.org/z/0R-U-K
Differential Revision: https://reviews.llvm.org/D61426
llvm-svn: 359786
|
| |
|
|
|
|
| |
Matches what we do for lowerAddSubToHorizontalOp and will make it easier to peek through subvectors to help fix PR39921
llvm-svn: 359782
|
| |
|
|
|
|
| |
Get rid of local variable used only in assertion.
llvm-svn: 359772
|
| |
|
|
|
|
|
|
|
| |
Select G_SEXT and G_ZEXT with destination types smaller than 32 bits in
the exact same way as 32 bits. This overwrites the higher bits, but that
should be ok since all legal users of types smaller than 32 bits ignore
those bits anyway.
llvm-svn: 359768
|
| |
|
|
|
|
| |
Make it legal to extend from e.g. s1 to s8 or s16.
llvm-svn: 359766
|
| |
|
|
|
|
|
|
|
|
|
|
| |
combineBVOfConsecutiveLoads
Summary:
Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61076
llvm-svn: 359764
|
| |
|
|
|
|
|
|
| |
Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32.
Differential Revision:
llvm-svn: 359757
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61413
llvm-svn: 359756
|
| |
|
|
|
|
|
|
| |
Constant bus limit has increased to 2 with GFX10.
Differential Revision: https://reviews.llvm.org/D61404
llvm-svn: 359754
|
| |
|
|
|
|
|
|
|
|
| |
The broadcasting variant for instruction vfpclassp[d,s] shouldn't use suffix q/l. So remove them from the template.
Patch by Pengfei Wang
Differential Revision: https://reviews.llvm.org/D61295
llvm-svn: 359753
|
| |
|
|
|
|
|
|
|
|
| |
This adds support for using fmov rather than a standard mov to materialize
G_FCONSTANT when it's safe to do so.
Update arm64-fast-isel-materialize.ll and select-constant.mir to show that the
selection is correct.
llvm-svn: 359734
|
| |
|
|
|
|
|
|
| |
We already perform horizontal add/sub if we extract from elements 0 and 1, this patch extends it to non-0/1 element extraction indices (as long as they are from the lowest 128-bit vector).
Differential Revision: https://reviews.llvm.org/D61263
llvm-svn: 359707
|
| |
|
|
|
|
|
|
| |
Reassign registers to reduce register bank conflicts.
Differential Revision: https://reviews.llvm.org/D61344
llvm-svn: 359704
|
| |
|
|
|
|
|
|
| |
Convert NSA into non-NSA images.
Differential Revision: https://reviews.llvm.org/D61341
llvm-svn: 359700
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61339
llvm-svn: 359698
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D61332
llvm-svn: 359696
|
| |
|
|
| |
llvm-svn: 359694
|
| |
|
|
|
|
| |
Add to SimplifyDemandedVectorEltsForTargetNode and SimplifyDemandedBitsForTargetNode
llvm-svn: 359686
|
| |
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359680
|
| |
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode
llvm-svn: 359678
|
| |
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359677
|
| |
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359673
|
| |
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode vector splitting
llvm-svn: 359670
|
| |
|
|
|
|
|
|
| |
SimplifyDemandedVectorEltsForTargetNode
This lets us hit more cases than combineExtractSubvector and allows us reuse more code.
llvm-svn: 359669
|
| |
|
|
|
|
|
|
| |
code. NFCI.
Pull out the HADD/HSUB code to halve vector widths if the upper half isn't used - prep work to adding support for other opcodes.
llvm-svn: 359667
|
| |
|
|
|
|
|
|
| |
This is an alternative to D59669 which more aggressively extracts i1 elements from vXi1 bool vectors using a MOVMSK.
Differential Revision: https://reviews.llvm.org/D61189
llvm-svn: 359666
|
| |
|
|
|
|
|
|
| |
put it in the basic block instruction loop. NFC
Now need to check it 3 different times. Just do it once at the top of the loop.
llvm-svn: 359658
|
| |
|
|
|
|
|
|
| |
element"
This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313.
llvm-svn: 359648
|