| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element.
llvm-svn: 370404
|
|
|
|
|
|
|
|
|
|
|
|
| |
ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.
Matches what we already do for ZERO_EXTEND.
Reapplies rL363850 but now with legality checks added at rL364290
llvm-svn: 364303
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts the following patches.
"[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support"
We can end up with an any_extend_vector_inreg with a 256 bit result type
and a 128 bit result type. This is allowed by the ISD opcode, but the
generic operation legalizer is only able to expand cases where the
total vector width is the same.
The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg.
The SimplifyDemandedBits changes are allowing those nodes to become
aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom
handling and never lets them get to the generic legalizer. We need to do the same
for aext_vec_inreg.
llvm-svn: 364264
|
|
|
|
|
|
|
|
|
|
| |
ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.
Matches what we already do for ZERO_EXTEND.
llvm-svn: 363850
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZERO_EXTEND_VECTOR_INREG (PR40685)
Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure.
This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain.
........
Causes PR41249
llvm-svn: 357057
|
|
|
|
|
|
|
|
| |
Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure.
This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain.
llvm-svn: 356864
|
|
|
|
|
|
|
|
|
|
| |
Because we don't currently simplify icmp with undef in DAG, bugpoint loves to introduce them during reduction.
This is a small step towards re-adding non-undef values into some of the simpler tests so that they should still test correctly and emit similar/same codegen.
Prep work for PR40800 ([SelectionDAG] Add UNDEF handling to SelectionDAG::FoldSetCC).
llvm-svn: 356076
|
|
|
|
|
|
|
|
|
|
| |
Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes.
This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398.
Differential Revision: https://reviews.llvm.org/D53643
llvm-svn: 345182
|
|
|
|
|
|
| |
We can't add the MULDQ node back to the worklist after the demanded bits change has been committed in case the node has been removed entirely. This will have to wait until we have SimplifyDemandedBitsForTargetNode.
llvm-svn: 345070
|
|
|
|
|
|
|
|
| |
succeeds on either operand
Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself
llvm-svn: 343922
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
simplification
This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector.
There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify.
As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.).
Fixes PR39178
Differential Revision: https://reviews.llvm.org/D52935
llvm-svn: 343913
|
|
|
|
|
|
|
|
| |
lowering.
Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget
llvm-svn: 329510
|
|
|
|
|
|
|
|
| |
My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues.
So just do it in LowerMUL where we can catch more cases.
llvm-svn: 321496
|
|
|
|
|
|
| |
v8i32 is legal von AVX1, but it doesn't have pmuludq for it.
llvm-svn: 321490
|
|
|
|
|
|
|
|
|
|
| |
upper bits are all sign bits or zeros.
Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ.
This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop.
llvm-svn: 321437
|
|
|
|
|
|
|
|
| |
demonstrate where we fail to use pmuldq/pmuludq and use to pmullq instead.
It's nice that pmullq exists, but it has higher latency and probably lower throughput than pmuldq/pmuludq. We should prefer those if we can.
llvm-svn: 321436
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.
The MIR printer prints the IR name of a MBB only for block definitions.
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix
Differential Revision: https://reviews.llvm.org/D40422
llvm-svn: 319665
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D35839
llvm-svn: 309298
|
|
Found several missed optimizations while investigating replacing _mm_mul_epi32/_mm_mul_epu32 with generic implementations
llvm-svn: 306302
|