summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] Fix infinite loop due to bitcast <-> phi transformsNikita Popov2020-01-141-3/+8
| | | | | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=44245. The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering. This patch fixes the issue by explicitly performing the load combine as part of the bitcast of phi transform. Other attempts to force the load to be combined first were ultimately too unreliable. Differential Revision: https://reviews.llvm.org/D71164
* [InstCombine] Fix user iterator invalidation in bitcast of phi transformNikita Popov2020-01-141-1/+4
| | | | | | | | | This fixes the issue encountered in D71164. Instead of using a range-based for, manually iterate over the users and advance the iterator beforehand, so we do not skip any users due to iterator invalidation. Differential Revision: https://reviews.llvm.org/D72657
* Revert "[InstCombine] fold zext of masked bit set/clear"Kadir Cetinkaya2020-01-081-17/+3
| | | | | | | | | | This reverts commit a041c4ec6f7aa659b235cb67e9231a05e0a33b7d. This looks like a non-trivial change and there has been no code reviews (at least there were no phabricator revisions attached to the commit description). It is also causing a regression in one of our downstream integration tests, we haven't been able to come up with a minimal reproducer yet.
* [InstCombine] fold zext of masked bit set/clearSanjay Patel2019-12-311-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does not solve PR17101, but it is one of the underlying diffs noted here: https://bugs.llvm.org/show_bug.cgi?id=17101#c8 We could ease the one-use checks for the 'clear' (no 'not' op) half of the transform, but I do not know if that asymmetry would make things better or worse. Proofs: https://rise4fun.com/Alive/uVB Name: masked bit set %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp ne i32 %and, 0 %r = zext i1 %cmp to i32 => %s = lshr i32 %x, %y %r = and i32 %s, 1 Name: masked bit clear %sh1 = shl i32 1, %y %and = and i32 %sh1, %x %cmp = icmp eq i32 %and, 0 %r = zext i1 %cmp to i32 => %xn = xor i32 %x, -1 %s = lshr i32 %xn, %y %r = and i32 %s, 1
* Revert "[InstCombine] Fix infinite loop due to bitcast <-> phi transforms"Nikita Popov2019-12-311-6/+0
| | | | | | This reverts commit 27a0795943fee0f30b995fe5165428afc2dfd402. Seems to break test-suite.
* [InstCombine] Fix infinite loop due to bitcast <-> phi transformsNikita Popov2019-12-311-0/+6
| | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=44245. The optimizeBitCastFromPhi() and FoldPHIArgOpIntoPHI() end up fighting against each other, because optimizeBitCastFromPhi() assumes that bitcasts of loads will get folded. This doesn't happen here, because a dangling phi node prevents the one-use fold in https://github.com/llvm/llvm-project/blob/master/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp#L620-L628 from triggering. This patch fixes the issue by adding manually removing the old phis. Differential Revision: https://reviews.llvm.org/D71164
* [InstCombine] Don't rewrite phi-of-bitcast when the phi has other usersConnor Abbott2019-12-311-14/+43
| | | | | | | | | Judging by the existing comments, this was the intention, but the transform never actually checked if the existing phi's would be removed. See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where this causes much worse code generation on AMDGPU. Differential Revision: https://reviews.llvm.org/D71209
* [InstCombine] Insert instructions before adding them to worklistJakub Kuderski2019-12-181-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get `<badref>`s printed when logging added instructions. It also adds a check in `Worklist::Add` that ensures that all added instructions have parents. Simple test case that illustrates the difference when run with `--debug-only=instcombine`: ``` define i32 @test35(i32 %a, i32 %b) { %1 = or i32 %a, 1135 %2 = or i32 %1, %b ret i32 %2 } ``` Before this patch: ``` INSTCOMBINE ITERATION #1 on test35 IC: ADDING: 3 instrs to worklist IC: Visiting: %1 = or i32 %a, 1135 IC: Visiting: %2 = or i32 %1, %b IC: ADD: %2 = or i32 %a, %b IC: Old = %3 = or i32 %1, %b New = <badref> = or i32 %2, 1135 IC: ADD: <badref> = or i32 %2, 1135 ... ``` With this patch: ``` INSTCOMBINE ITERATION #1 on test35 IC: ADDING: 3 instrs to worklist IC: Visiting: %1 = or i32 %a, 1135 IC: Visiting: %2 = or i32 %1, %b IC: ADD: %2 = or i32 %a, %b IC: Old = %3 = or i32 %1, %b New = <badref> = or i32 %2, 1135 IC: ADD: %3 = or i32 %2, 1135 ... ``` Reviewers: fhahn, davide, spatel, foad, grosser, nikic Reviewed By: nikic Subscribers: nikic, lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71093
* Reland [DataLayout] Fix occurrences that size and range of pointers are ↵Nicola Zaghen2019-12-131-1/+1
| | | | | | | | | | | | | | assumed to be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. This fixes the buildbot failures. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!
* Temporarily Revert "[DataLayout] Fix occurrences that size and range of ↵Nicola Zaghen2019-12-121-1/+1
| | | | | | | | | pointers are assumed to be the same." This reverts commit 5f6208778ff92567c57d7c1e2e740c284d7e69a5. This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.
* [DataLayout] Fix occurrences that size and range of pointers are assumed to ↵Nicola Zaghen2019-12-121-1/+1
| | | | | | | | | | | | be the same. GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered. Differential Revision: https://reviews.llvm.org/D68328 Patch by Joseph Faulls!
* Revert "[InstCombine] reduce code duplication; NFC"Sanjay Patel2019-12-061-40/+38
| | | | | This reverts commit db5739658467e20a52f20e769d3580412e13ff87. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.
* Revert "[InstCombine] improve readability; NFC"Sanjay Patel2019-12-061-6/+10
| | | | | This reverts commit 7250ef3613cc6b81145b9543bafb86d7f9466cde. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.
* Revert "[InstCombine] reduce indentation; NFC"Sanjay Patel2019-12-061-25/+28
| | | | | This reverts commit 8bf8ef7116bd0daec570b35480ca969b74e66c6e. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.
* [InstCombine] reduce indentation; NFCSanjay Patel2019-12-061-28/+25
|
* [InstCombine] improve readability; NFCSanjay Patel2019-12-061-10/+6
| | | | CreateIntCast returns the input if its type matches, so need to duplicate that check.
* [InstCombine] reduce code duplication; NFCSanjay Patel2019-12-061-38/+40
|
* [InstCombine] improve readability; NFCSanjay Patel2019-12-061-33/+33
|
* [InstCombine] narrow select with FP castsSanjay Patel2019-12-051-0/+18
| | | | Select doesn't change values, so truncate of extended operand cancels out.
* [InstCombine] add FMF guard to builder in fptrunc transform; NFCSanjay Patel2019-12-051-0/+5
| | | | | | | | This makes no difference currently because we don't apply FMF to FP casts, but that may change. This could also be a place to add a fold for select with fptrunc, so it will make that patch easier/smaller.
* [InstCombine] Revert aafde063aaf09285c701c80cd4b543c2beb523e8 and ↵Craig Topper2019-12-031-3/+2
| | | | | | | | | | | | | | | | 6749dc3446671df05235d0a218c426a314ac33cd related to bitcast handling of x86_mmx This reverts these two commits [InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) into a single bitcast from x86_mmx to i64/double. [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into insertelement/extractelement We're seeing at least one internal test failure related to a bitcast that was previously before an inline assembly block containing emms being placed after it. This leads to the mmx state ending up not empty after the emms. IR has no way to make any specific guarantees about this. Reverting these patches to get back to previous behavior which at least worked for this test.
* [InstCombine] Fix big-endian miscompile of (bitcast (zext/trunc (bitcast)))Bjorn Pettersson2019-12-021-22/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: optimizeVectorResize is rewriting patterns like: %1 = bitcast vector %src to integer %2 = trunc/zext %1 %dst = bitcast %2 to vector Since bitcasting between integer an vector types gives different integer values depending on endianness, we need to take endianness into account. As it happens the old implementation only produced the correct result for little endian targets. Fixes: https://bugs.llvm.org/show_bug.cgi?id=44178 Reviewers: spatel, lattner, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, hiraditya, uabelho, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70844
* [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into ↵Craig Topper2019-11-071-2/+3
| | | | | | | | | | insertelement/extractelement x86_mmx is conceptually a vector already. Don't introduce an extra conversion between it and scalar i64. I'm using VectorType::isValidElementType which checks for floating point, integer, and pointers to hopefully make this more readable than just blacklisting x86_mmx. Differential Revision: https://reviews.llvm.org/D69964
* [InstCombine] don't assume 'inbounds' for bitcast deref or null pointer in ↵Sanjay Patel2019-10-131-2/+10
| | | | | | | | | | | | | | non-default address space Follow-up to D68244 to account for a corner case discussed in: https://bugs.llvm.org/show_bug.cgi?id=43501 Add one more restriction: if the pointer is deref-or-null and in a non-default (non-zero) address space, we can't assume inbounds. Differential Revision: https://reviews.llvm.org/D68706 llvm-svn: 374728
* [InstCombine] don't assume 'inbounds' for bitcast pointer to GEP transform ↵Sanjay Patel2019-10-061-2/+9
| | | | | | | | | | | | (PR43501) https://bugs.llvm.org/show_bug.cgi?id=43501 We can't declare a GEP 'inbounds' in general. But we may salvage that information if we have known dereferenceable bytes on the source pointer. Differential Revision: https://reviews.llvm.org/D68244 llvm-svn: 373847
* [Alignment][NFC] Remove AllocaInst::setAlignment(unsigned)Guillaume Chatelet2019-09-301-1/+1
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jholewinski, arsenm, jvesely, nhaehnle, eraman, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D68141 llvm-svn: 373207
* [InstCombine] rename variable for readability; NFCSanjay Patel2019-09-111-21/+21
| | | | | | | There's more that can be done here, but "OpI" doesn't convey that we casted to BinaryOperator. llvm-svn: 371682
* [InstCombine] recognize bswap disguised as shufflevectorSanjay Patel2019-09-021-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N*8} --> bswap (bitcast X to i{N*8}) In PR43146: https://bugs.llvm.org/show_bug.cgi?id=43146 ...we have a more complicated case where SLP is making a mess of bswap. This patch won't do anything for that currently, but we need to improve bswap recognition in instcombine, SLP, and/or a standalone pass to avoid that problem. This is limited using the data-layout so we don't try to do this transform with actual vector types. The backend does not appear to have folds to convert in either direction, so we don't want to mess up something that is actually better lowered as a shuffle. On x86, we're trading something like this: vmovd %edi, %xmm0 vpshufb LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u] vmovd %xmm0, %eax For: movl %edi, %eax bswapl %eax Differential Revision: https://reviews.llvm.org/D66965 llvm-svn: 370659
* [InstCombine] reduce duplicated code; NFCSanjay Patel2019-08-291-10/+13
| | | | llvm-svn: 370399
* Revert "[InstCombine] try to narrow a truncated load"Vlad Tsyrklevich2019-07-251-39/+0
| | | | | | | | | This reverts commit bc4a63fd3c29c1a8ce22891bf34ee4dccfef578c, this is a speculative revert to fix a number of sanitizer bots (like sanitizer-x86_64-linux-bootstrap-ubsan) that have started to see stage2 compiler crashes, presumably due to a miscompile. llvm-svn: 367029
* [InstCombine] try to narrow a truncated loadSanjay Patel2019-07-251-0/+39
| | | | | | | | | | | | | | | | | | | | | | trunc (load X) --> load (bitcast X to narrow type) We have this transform in DAGCombiner::ReduceLoadWidth(), but the truncated load pattern can interfere with other instcombine transforms, so I'd like to allow the fold sooner. Example: https://bugs.llvm.org/show_bug.cgi?id=16739 ...in that report, we have bitcasts bracketing these ops, so those could get eliminated too. We've generally ruled out widening of loads early in IR ( LoadCombine - http://lists.llvm.org/pipermail/llvm-dev/2016-September/105291.html ), but that reasoning may not apply to narrowing if we can preserve information such as the dereferenceable range. Differential Revision: https://reviews.llvm.org/D64432 llvm-svn: 367011
* [InstCombine] Update fptrunc (fneg x)) -> (fneg (fptrunc x) for unary FNegCameron McInally2019-06-111-4/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D62629 llvm-svn: 363080
* [InstCombine] simplify code for bitcast of insertelement; NFCSanjay Patel2019-06-051-5/+4
| | | | llvm-svn: 362655
* [InstCombine] When turning sext into zext due to known bits, return the new ↵Craig Topper2019-05-081-4/+2
| | | | | | | | | | | | ZExt instead of calling replaceinstuseswith The worklist loop that we're returning back to should be able to do the repacement itself. This is how we normally do replacements. My main motivation was that I observed that we weren't preserving the name of the result when we do this transform. The replacement code in the worklist loop will call takeName as part of the replacement. Differential Revision: https://reviews.llvm.org/D61695 llvm-svn: 360284
* Extra processing for BitCast + PHI in InstCombineGabor Buella2019-02-091-11/+34
| | | | | | | | | | | | | | | | | | | | | | | For some specific cases with bitcast A->B->A with intervening PHI nodes InstCombiner::optimizeBitCastFromPhi transformation creates extra PHI nodes, which are actually a copy of already created PHI or in another words, they are redundant. These extra PHI nodes could lead to extra move instructions generated after DeSSA transformation. This happens when several conditions are met - SROA kicks in and creates new alloca; - there is a simple assignment L = R, which falls under 'canonicalize loads' done by combineLoadToOperationType (this transformation is by default). Exactly this transformation is the reason of bitcasts generated; - the alloca is then used in A->B->A + PHI chain; - there is a loop unrolling. As a result optimizeBitCastFromPhi creates as many of PHI nodes for each new SROA alloca as loop unrolling factor is. These new extra PHI nodes are redundant actually except of one and should not be created. Moreover the idea of optimizeBitCastFromPhi is to get rid of the cast (when possible) but that doesn't happen in these conditions. The proposed fix is to do the cast replacement for the whole calculated/accumulated PHI closure not for one cast only, which is an argument to the optimizeBitCastFromPhi. These will help to accomplish several things: 1) avoid extra PHI nodes generated as all casts which may trigger optimizeBitCastFromPhi transformation will be replaced, 3) bitcasts will be replaced, and 3) create more opportunities to remove dead code, which appears after the replacement. A new test case shows that it's possible to get rid of all bitcasts completely and get quite good code reduction. Author: Igor Tsimbalist <igor.v.tsimbalist@intel.com> Reviewed By: Carrot Differential Revision: https://reviews.llvm.org/D57053 llvm-svn: 353595
* [opaque pointer types] Pass value type to GetElementPtr creation.James Y Knight2019-02-011-1/+2
| | | | | | | | | This cleans up all GetElementPtr creation in LLVM to explicitly pass a value type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57173 llvm-svn: 352913
* [opaque pointer types] Pass function types to CallInst creation.James Y Knight2019-02-011-2/+2
| | | | | | | | | This cleans up all CallInst creation in LLVM to explicitly pass a function type rather than deriving it from the pointer's element-type. Differential Revision: https://reviews.llvm.org/D57170 llvm-svn: 352909
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [InstCombine] reduce raw IR narrowing rotate patterns to funnel shiftSanjay Patel2019-01-041-16/+8
| | | | | | | | | Similar to rL350199 - there are no known analysis/codegen holes for funnel shift intrinsics now, so we can canonicalize the 6+ regular instructions to funnel shift to improve vectorization, inlining, unrolling, etc. llvm-svn: 350419
* [InstCombine] don't widen an arbitrary sequence of vector ops (PR40032)Sanjay Patel2018-12-171-11/+4
| | | | | | | | | | | | | | | | | | | | | The problem is shown specifically for a case with vector multiply here: https://bugs.llvm.org/show_bug.cgi?id=40032 ...and this might mask the original backend bug for ARM shown in: https://bugs.llvm.org/show_bug.cgi?id=39967 As the test diffs here show, we were (and probably still aren't) doing these kinds of transforms in a principled way. We are producing more or equal wide instructions than we started with in some cases, so we still need to restrict/correct other transforms from overstepping. If there are perf regressions from this change, we can either carve out exceptions to the general IR rules, or improve the backend to do these transforms when we know the transform is profitable. That's probably similar to a change like D55448. Differential Revision: https://reviews.llvm.org/D55744 llvm-svn: 349389
* [InstCombine] fix rotate narrowing bug for non-pow-2 typesSanjay Patel2018-11-151-2/+7
| | | | llvm-svn: 346968
* [InstCombine] narrow width of rotate patterns, part 3Sanjay Patel2018-11-121-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a longer variant for the pattern handled in rL346713 This one includes zexts. Eventually, we should canonicalize all rotate patterns to the funnel shift intrinsics, but we need a bit more infrastructure to make sure the vectorizers handle those intrinsics as well as the shift+logic ops. https://rise4fun.com/Alive/FMn Name: narrow rotateright %neg = sub i8 0, %shamt %rshamt = and i8 %shamt, 7 %rshamtconv = zext i8 %rshamt to i32 %lshamt = and i8 %neg, 7 %lshamtconv = zext i8 %lshamt to i32 %conv = zext i8 %x to i32 %shr = lshr i32 %conv, %rshamtconv %shl = shl i32 %conv, %lshamtconv %or = or i32 %shl, %shr %r = trunc i32 %or to i8 => %maskedShAmt2 = and i8 %shamt, 7 %negShAmt2 = sub i8 0, %shamt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shl2 = lshr i8 %x, %maskedShAmt2 %shr2 = shl i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2 llvm-svn: 346716
* [InstCombine] narrow width of rotate patterns, part 2 (PR39624)Sanjay Patel2018-11-121-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sub-pattern for the shift amount in a rotate can take on several different forms, and there's apparently no way to canonicalize those without seeing the entire rotate sequence. This is the form noted in: https://bugs.llvm.org/show_bug.cgi?id=39624 https://rise4fun.com/Alive/qnT %zx = zext i8 %x to i32 %maskedShAmt = and i32 %shAmt, 7 %shl = shl i32 %zx, %maskedShAmt %negShAmt = sub i32 0, %shAmt %maskedNegShAmt = and i32 %negShAmt, 7 %shr = lshr i32 %zx, %maskedNegShAmt %rot = or i32 %shl, %shr %r = trunc i32 %rot to i8 => %truncShAmt = trunc i32 %shAmt to i8 %maskedShAmt2 = and i8 %truncShAmt, 7 %shl2 = shl i8 %x, %maskedShAmt2 %negShAmt2 = sub i8 0, %truncShAmt %maskedNegShAmt2 = and i8 %negShAmt2, 7 %shr2 = lshr i8 %x, %maskedNegShAmt2 %r = or i8 %shl2, %shr2 llvm-svn: 346713
* [InstCombine] refactor code for matching shift amount of a rotate; NFCSanjay Patel2018-11-121-12/+17
| | | | | | | | As shown in existing test cases and with: https://bugs.llvm.org/show_bug.cgi?id=39624 ...we're missing at least 2 more patterns for rotate narrowing. llvm-svn: 346711
* [FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changesCameron McInally2018-10-251-2/+3
| | | | | | | | | Replacing BinaryOperator::isFNeg(...) to avoid regressions when we separate FNeg from the FSub IR instruction. Differential Revision: https://reviews.llvm.org/D53650 llvm-svn: 345295
* [InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd trySanjay Patel2018-10-101-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-trying r344082 because it unintentionally included extra diffs. Original commit message: icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344181
* revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalizationSanjay Patel2018-10-101-27/+4
| | | | | | This commit accidentally included the diffs from D53057. llvm-svn: 344178
* [InstCombine] reverse 'trunc X to <N x i1>' canonicalizationSanjay Patel2018-10-091-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344082
* Fix InstCombine address space assertEwan Crawford2018-07-311-0/+6
| | | | | | | | | | | | | | | | | | | | | Workaround bug where the InstCombine pass was asserting on the IR added in lit test, where we have a bitcast instruction after a GEP from an addrspace cast. The second bitcast in the test was getting combined into `bitcast <16 x i32>* %0 to <16 x i32> addrspace(3)*`, which looks like it should be an addrspace cast instruction instead. Otherwise if control flow is allowed to continue as it is now we create a GEP instruction `<badref> = getelementptr inbounds <16 x i32>, <16 x i32>* %0, i32 0`. However because the type of this instruction doesn't match the address space we hit an assert when replacing the bitcast with that GEP. ``` void llvm::Value::doRAUW(llvm::Value*, bool): Assertion `New->getType() == getType() && "replaceAllUses of value with new value of different type!"' failed. ``` Differential Revision: https://reviews.llvm.org/D50058 llvm-svn: 338395
* [InstCombine] Preserve debug value when simplifying cast-of-selectVedant Kumar2018-07-171-1/+3
| | | | | | | | | | | | | | | | | | | | InstCombine has a cast transform that matches a cast-of-select: Orig = cast (Src = select Cond TV FV) And tries to replace it with a select which has the cast folded in: NewSel = select Cond (cast TV) (cast FV) The combiner does RAUW(Orig, NewSel), so any debug values for Orig would survive the transform. But debug values for Src would be lost. This patch teaches InstCombine to replace all debug uses of Src with NewSel (taking care of doing any necessary DIExpression rewriting). Differential Revision: https://reviews.llvm.org/D49270 llvm-svn: 337310
OpenPOWER on IntegriCloud