summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86/combine-srl.ll
Commit message (Collapse)AuthorAgeFilesLines
* [x86] split more v8f32/v8i32 shuffles in loweringSanjay Patel2019-02-181-5/+5
| | | | | | | | | | | | | | Similar to D57867 - this is a small patch with lots of test diffs. With half-vector-width narrowing potential, using an extract + 128-bit vshufps is a win because it replaces a 256-bit shuffle with a 128-bit shufle. This seems like it should be a win even for targets with 'fast-variable-shuffle', but we are intentionally deferring that to an independent change to make sure that is true. Differential Revision: https://reviews.llvm.org/D58181 llvm-svn: 354279
* [SelectionDAG] Optional handling of UNDEF elements in matchUnaryPredicateSimon Pilgrim2018-12-191-8/+3
| | | | | | | | | | | | Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs. This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument. I've updated SelectionDAG::simplifyShift to demonstrate its use. Differential Revision: https://reviews.llvm.org/D55819 llvm-svn: 349616
* [X86][SSE] Add shift combine 'out of range' tests with UNDEFsSimon Pilgrim2018-12-181-0/+13
| | | | | | Shows failure to simplify out of range shift amounts to UNDEF if any element is UNDEF. llvm-svn: 349483
* [X86][SSE] Add computeKnownBits/ComputeNumSignBits support for PACKSS/PACKUS ↵Simon Pilgrim2018-11-201-19/+1
| | | | | | | | instructions. Pull out getPackDemandedElts demanded elts remapping helper from computeKnownBitsForTargetNode and use in computeKnownBits/ComputeNumSignBits. llvm-svn: 347303
* [X86][SSE] Remove unnecessary bit-and in pshufb vector ctlz (PR39703)Simon Pilgrim2018-11-191-34/+32
| | | | | | | | SSE PSHUFB vector ctlz lowering works at the i4 nibble level. As detailed in PR39703, we were masking the lower nibble off but we only actually use it in the case where the upper nibble is known to be zero, making it safe to remove the mask and save an instruction. Differential Revision: https://reviews.llvm.org/D54707 llvm-svn: 347242
* [X86] Add vector shift by immediate to SimplifyDemandedBitsForTargetNode.Craig Topper2018-11-041-11/+1
| | | | | | | | | | | | | | Summary: This also enables some constant folding from KnownBits propagation. This helps on some cases vXi64 case in 32-bit mode where constant vectors appear as vXi32 and a bitcast. This can prevent getNode from constant folding sra/shl/srl. Reviewers: RKSimon, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54069 llvm-svn: 346102
* [X86] Move promotion of vector and/or/xor from legalization to DAG combineCraig Topper2018-10-151-35/+30
| | | | | | | | | | | | | | | | | | | Summary: I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that. This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better. In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D53107 llvm-svn: 344487
* [X86][SSE] Reduce instruction/register usages for v4i32 vector shifts (PR37441)Simon Pilgrim2018-05-161-15/+13
| | | | | | | | | | As suggested by Fabian on PR37441, use PSHUFLW to extend shift amount types for use with PSRAD/PSRLD to reduce register pressure. Some of this ideally would be done by combineTargetShuffle but its tricky to do as most of the shuffles are sharing inputs. Differential Revision: https://reviews.llvm.org/D46959 llvm-svn: 332524
* [x86] remove duplicate undef tests; NFCSanjay Patel2018-02-091-18/+0
| | | | | | | These are incomplete and were made redundant with the consolidation in: https://reviews.llvm.org/rL324678 llvm-svn: 324754
* [X86] Add common CHECK prefix to shift combine testsSimon Pilgrim2018-02-081-40/+19
| | | | llvm-svn: 324638
* [X86] Add shift undef, %X testsSimon Pilgrim2018-02-081-0/+27
| | | | llvm-svn: 324637
* X86 Tests: Update more isel tests with FastVariableShuffle featureZvi Rackover2018-01-091-17/+36
| | | | | | | | | | | | | | | | | Summary: Added the FastVariableShuffle feature to cases that resembled processors for which this fearure is on. For AVX2 there are processors with and w/o this fearue enable. For AVX512 only KNL does enable this feature so cases which only have +avx512f were left without the FastVariableShuffle enabled. Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41851 llvm-svn: 322090
* [CodeGen] Unify MBB reference format in both MIR and debug outputFrancis Visoiu Mistrih2017-12-041-40/+40
| | | | | | | | | | | | | | | | As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g' * find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g' * find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
* [SelectionDAG] Teach computeKnownBits some improvements to ISD::SRL with a ↵Craig Topper2017-12-041-6/+6
| | | | | | | | | non-splat constant shift amount. If we have a non-splat constant shift amount, the minimum shift amount can be used to infer the number of zero upper bits of the result. There's probably a lot more that we can do here, but this fixes a case where I wanted to infer the sign bit as zero when all the shift amounts are non-zero. llvm-svn: 319639
* [X86][SSE] Add PACKUS support to combineVectorTruncationSimon Pilgrim2017-11-031-1/+1
| | | | | | | | Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317315
* [X86][SSE] Add PACKUS support to LowerTruncateSimon Pilgrim2017-11-011-3/+2
| | | | | | | | Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317128
* [DAGCombiner] Match ISD::SRL non-uniform constant vectors patterns using ↵Simon Pilgrim2017-07-201-42/+10
| | | | | | | | predicates. Use predicate matchers introduced in D35492 to match more ISD::SRL constant folds llvm-svn: 308602
* [X86][AVX] Regenerate combine tests with constant broadcast commentsSimon Pilgrim2017-07-161-4/+4
| | | | llvm-svn: 308128
* [DAGCombiner] Add vector support to fold (shl/srl 0, x) -> 0Simon Pilgrim2017-05-101-20/+2
| | | | llvm-svn: 302641
* [DAGCombiner] Add vector support for (srl (trunc (srl x, c1)), c2) combine.Simon Pilgrim2017-04-251-5/+4
| | | | llvm-svn: 301305
* [x86] use a single shufps when it can save instructionsSanjay Patel2016-12-151-30/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a tiny patch with a big pile of test changes. This partially fixes PR27885: https://llvm.org/bugs/show_bug.cgi?id=27885 My motivating case looks like this: - vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2] - vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3] - vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7] + vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2] And this happens several times in the diffs. For chips with domain-crossing penalties, the instruction count and size reduction should usually overcome any potential domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so using shufps is a pure win. So the test case diffs all appear to be improvements except one test in vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate zero elements and one test in combine-sra.ll where multiple uses prevent the expected shuffle combining. Differential Revision: https://reviews.llvm.org/D27692 llvm-svn: 289837
* [DAG] enhance computeKnownBits to handle SRL/SRA with vector splat constantSanjay Patel2016-10-231-16/+4
| | | | llvm-svn: 284953
* [DAGCombiner] Add general constant vector support to (srl (shl x, c), c) -> ↵Simon Pilgrim2016-10-201-14/+2
| | | | | | | | (and x, cst2) We already supported scalar constant / splatted constant vector - now accepts any (non opaque) constant scalar / vector llvm-svn: 284717
* [DAGCombine] Generalize distributeTruncateThroughAnd to work with any ↵Simon Pilgrim2016-10-191-3/+2
| | | | | | non-opaque constant or constant vector llvm-svn: 284574
* [X86][SSE] Added vector lshr/shl combine testsSimon Pilgrim2016-10-181-0/+546
This doesn't cover all combines in DAGCombiner::visitSRL/visitSHL yet, but identifies several cases where we fail to combine vectors (or non-splatted) vectors llvm-svn: 284518
OpenPOWER on IntegriCloud