summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/combine-pmuldq.ll
Commit message (Collapse)AuthorAgeFilesLines
* [X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159)Simon Pilgrim2019-08-291-0/+115
| | | | | | ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element. llvm-svn: 370404
* [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-251-10/+7
| | | | | | | | | | | | ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. Reapplies rL363850 but now with legality checks added at rL364290 llvm-svn: 364303
* Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..."Craig Topper2019-06-251-7/+10
| | | | | | | | | | | | | | | | | | | | This reverts the following patches. "[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG" "[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support" We can end up with an any_extend_vector_inreg with a 256 bit result type and a 128 bit result type. This is allowed by the ISD opcode, but the generic operation legalizer is only able to expand cases where the total vector width is the same. The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg. The SimplifyDemandedBits changes are allowing those nodes to become aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom handling and never lets them get to the generic legalizer. We need to do the same for aext_vec_inreg. llvm-svn: 364264
* [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ↵Simon Pilgrim2019-06-191-10/+7
| | | | | | | | | | ANY_EXTEND_VECTOR_INREG Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required. Matches what we already do for ZERO_EXTEND. llvm-svn: 363850
* Revert rL356864 : [X86][SSE41] Start shuffle combining from ↵Simon Pilgrim2019-03-271-8/+8
| | | | | | | | | | | | ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. ........ Causes PR41249 llvm-svn: 357057
* [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685)Simon Pilgrim2019-03-241-8/+8
| | | | | | | | Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. llvm-svn: 356864
* [X86] Avoid icmp undef in reduced testsSimon Pilgrim2019-03-131-4/+4
| | | | | | | | | | Because we don't currently simplify icmp with undef in DAG, bugpoint loves to introduce them during reduction. This is a small step towards re-adding non-undef values into some of the simpler tests so that they should still test correctly and emit similar/same codegen. Prep work for PR40800 ([SelectionDAG] Add UNDEF handling to SelectionDAG::FoldSetCC). llvm-svn: 356076
* [X86][SSE] Add SimplifyDemandedBitsForTargetNode PMULDQ/PMULUDQ handlingSimon Pilgrim2018-10-241-26/+4
| | | | | | | | | | Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes. This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398. Differential Revision: https://reviews.llvm.org/D53643 llvm-svn: 345182
* [X86][SSE] Revert rL343922 combinePMULDQ AddToWorklist (PR39398)Simon Pilgrim2018-10-231-4/+71
| | | | | | We can't add the MULDQ node back to the worklist after the demanded bits change has been committed in case the node has been removed entirely. This will have to wait until we have SimplifyDemandedBitsForTargetNode. llvm-svn: 345070
* [X86] combinePMULDQ - add op back to worklist if SimplifyDemandedBits ↵Simon Pilgrim2018-10-061-26/+4
| | | | | | | | succeeds on either operand Prevents missing other simplifications that may occur deep in the operand chain where CommitTargetLoweringOpt won't add the PMULDQ back to the worklist itself llvm-svn: 343922
* [SelectionDAG] Add SimplifyDemandedBits to SimplifyDemandedVectorElts ↵Simon Pilgrim2018-10-061-17/+1
| | | | | | | | | | | | | | | | simplification This patch enables SimplifyDemandedBits to call SimplifyDemandedVectorElts in cases where the demanded bits mask covers entire elements of a bitcasted source vector. There are a couple of cases here where simplification at a deeper level (such as through bitcasts) prevents further simplification - CommitTargetLoweringOpt only adds immediate uses/users back to the worklist when we might want to combine the original caller again to see what else it can simplify. As well as that I had to disable handling of bool vector until SimplifyDemandedVectorElts better supports some of their opcodes (SETCC, shifts etc.). Fixes PR39178 Differential Revision: https://reviews.llvm.org/D52935 llvm-svn: 343913
* [X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of ↵Craig Topper2018-04-071-20/+2
| | | | | | | | lowering. Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget llvm-svn: 329510
* [X86] Reimplement r321437 using custom lowering instead of as a DAG combine.Craig Topper2017-12-271-2/+2
| | | | | | | | My original implementation ran as a DAG combine post type legalization, but it turns out we don't run that DAG combine step if type legalization didn't change anything. Attempts to make the combine run before type legalization as well hit other issues. So just do it in LowerMUL where we can catch more cases. llvm-svn: 321496
* [X86] Fix vmul combine for AVX1 targets.Benjamin Kramer2017-12-271-0/+44
| | | | | | v8i32 is legal von AVX1, but it doesn't have pmuludq for it. llvm-svn: 321490
* [X86] Add a DAG combines to turn vXi64 muls into VPMULDQ/VPMULUDQ if the ↵Craig Topper2017-12-251-52/+16
| | | | | | | | | | upper bits are all sign bits or zeros. Normally we catch this during lowering, but vXi64 mul is considered legal when we have AVX512DQ. This DAG combine allows us to avoid PMULLQ with AVX512DQ if we can prove its unnecessary. PMULLQ is 3 uops that take 4 cycles each. While pmuldq/pmuludq is only one 4 cycle uop. llvm-svn: 321437
* [X86] Add avx512vl and avx512dq command lines to combine-pmuldq.ll to ↵Craig Topper2017-12-251-31/+101
| | | | | | | | demonstrate where we fail to use pmuldq/pmuludq and use to pmullq instead. It's nice that pmullq exists, but it has higher latency and probably lower throughput than pmuldq/pmuludq. We should prefer those if we can. llvm-svn: 321436
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-041-8/+8
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
* [X86] SET0 to use XMM registers where possible PR26018 PR32862Dinar Temirbulatov2017-07-271-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D35839 llvm-svn: 309298
* [X86][SSE] Add combine tests for PMULDQ/PMULUDQSimon Pilgrim2017-06-261-0/+110
Found several missed optimizations while investigating replacing _mm_mul_epi32/_mm_mul_epu32 with generic implementations llvm-svn: 306302
OpenPOWER on IntegriCloud