summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms/InstCombine
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM,MVE] Add an InstCombine rule permitting VPNOT.Simon Tatham2019-12-021-0/+94
| | | | | | | | | | | | | | | | | | | | | | | | Summary: If a user writing C code using the ACLE MVE intrinsics generates a predicate and then complements it, then the resulting IR will use the `pred_v2i` IR intrinsic to turn some `<n x i1>` vector into a 16-bit integer; complement that integer; and convert back. This will generate machine code that moves the predicate out of the `P0` register, complements it in an integer GPR, and moves it back in again. This InstCombine rule replaces `i2v(~v2i(x))` with a direct complement of the original predicate vector, which we can already instruction- select as the VPNOT instruction which complements P0 in place. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70484
* [InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() ↵Roman Lebedev2019-12-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR44100) rL341831 moved one-use check higher up, restricting a few folds that produced a single instruction from two instructions to the case where the inner instruction would go away. Original commit message: > InstCombine: move hasOneUse check to the top of foldICmpAddConstant > > There were two combines not covered by the check before now, > neither of which actually differed from normal in the benefit analysis. > > The most recent seems to be because it was just added at the top of the > function (naturally). The older is from way back in 2008 (r46687) > when we just didn't put those checks in so routinely, and has been > diligently maintained since. From the commit message alone, there doesn't seem to be a deeper motivation, deeper problem that was trying to solve, other than 'fixing the wrong one-use check'. As i have briefly discusses in IRC with Tim, the original motivation can no longer be recovered, too much time has passed. However i believe that the original fold was doing the right thing, we should be performing such a transformation even if the inner `add` will not go away - that will still unchain the comparison from `add`, it will no longer need to wait for `add` to compute. Doing so doesn't seem to break any particular idioms, as least as far as i can see. References https://bugs.llvm.org/show_bug.cgi?id=44100
* [InstCombine] fold copysign with constant sign argument to (fneg+)fabsSanjay Patel2019-12-021-8/+10
| | | | | | | | | | | | | | | If the sign of the sign argument is known (this could be extended to use ValueTracking), then we can use fneg+fabs to clear/set the sign bit of the magnitude argument. http://llvm.org/docs/LangRef.html#llvm-copysign-intrinsic This transform is already done in DAGCombiner, but we can do it sooner in IR as suggested in PR44153: https://bugs.llvm.org/show_bug.cgi?id=44153 We have effectively no analysis for copysign in IR, so we are taking the unusual step of increasing the number of IR instructions for the negative constant case. Differential Revision: https://reviews.llvm.org/D70792
* [InstCombine] Fix big-endian miscompile of (bitcast (zext/trunc (bitcast)))Bjorn Pettersson2019-12-021-10/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: optimizeVectorResize is rewriting patterns like: %1 = bitcast vector %src to integer %2 = trunc/zext %1 %dst = bitcast %2 to vector Since bitcasting between integer an vector types gives different integer values depending on endianness, we need to take endianness into account. As it happens the old implementation only produced the correct result for little endian targets. Fixes: https://bugs.llvm.org/show_bug.cgi?id=44178 Reviewers: spatel, lattner, lebedev.ri Reviewed By: spatel, lebedev.ri Subscribers: lebedev.ri, hiraditya, uabelho, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70844
* [X86][InstCombine] Move non-X86 specific instcombine test from ↵Craig Topper2019-12-011-0/+16
| | | | test/CodeGen/X86/ to test/Transforms/InstCombine/
* [X86][InstCombine] Move instcombine test from test/CodeGen/X86 to ↵Craig Topper2019-12-011-0/+20
| | | | test/Transforms/InstCombine/ and replace grep with FileCheck
* [InstCombine] Expand usub_sat patterns to handle constantsDavid Green2019-11-302-23/+16
| | | | | | | | The constants come through as add %x, -C, not a sub as would be expected. They need some extra matchers to canonicalise them towards usub_sat. Differential Revision: https://reviews.llvm.org/D69514
* [InstCombine] Adjust usub_sat fold one use checksDavid Green2019-11-301-10/+9
| | | | | | This adjusts the one use checks in the the usub_sat fold code to not increase instruction count, but otherwise do the fold. Reviewed as a part of D69514.
* [InstCombine] More usub_sat tests. NFC.David Green2019-11-302-1/+300
|
* [InstCombine] Run the cast.ll test a twice, now also testing little endian. NFCBjorn Pettersson2019-11-291-492/+501
| | | | | | | | Some tests in test/Transforms/InstCombine/cast.ll depend on endianness. Added a second run line to run the tests with both big and little endian. In the past we only compiled for big endian, and then it was hard to see if any big endian bugfixes would impact the little endian result etc.
* [InstCombine] add tests for copysign; NFCSanjay Patel2019-11-271-0/+41
|
* [ConstFolding] move tests for copysign; NFCSanjay Patel2019-11-261-49/+0
| | | | InstCombine doesn't have any transforms for copysign currently.
* [InstCombine] Optimize some memccpy calls to memcpy/nullDávid Bolvanský2019-11-261-15/+150
| | | | | | | | | | | | | | | | | Summary: return memccpy(d, "helloworld", 'r', 20) => return memcpy(d, "helloworld", 8 /* pos of 'r' in string */), d + 8 Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68089
* [InstCombine] remove identity shuffle simplification for mask with undefsSanjay Patel2019-11-2410-28/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And simultaneously enhance SimplifyDemandedVectorElts() to rcognize that pattern. That preserves some of the old optimizations in IR. Given a shuffle that includes undef elements in an otherwise identity mask like: define <4 x float> @shuffle(<4 x float> %arg) { %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3> ret <4 x float> %shuf } We were simplifying that to the input operand. But as discussed in PR43958: https://bugs.llvm.org/show_bug.cgi?id=43958 ...that means that per-vector-element poison that would be stopped by the shuffle can now leak to the result. Also note that we still have (and there are tests for) the same transform with no undef elements in the mask (a fully-defined identity mask). I don't think there's any controversy about that case - it's a valid transform under any interpretation of shufflevector/undef/poison. Looking at a few of the diffs into codegen, I don't see any difference in final asm. So depending on your perspective, that's good (no real loss of optimization power) or bad (poison exists in the DAG, so we only partially fixed the bug). Differential Revision: https://reviews.llvm.org/D70246
* [InstCombine] Fix call guard difference with dbgDavide Italiano2019-11-221-0/+1
| | | | | | Patch by Chris Ye! Differential Revision: https://reviews.llvm.org/D68004
* Precommit tests for forthcoming widenable.condition transformsPhilip Reames2019-11-201-0/+156
|
* [ARM,MVE] Add InstCombine rules for pred_i2v / pred_v2i.Simon Tatham2019-11-181-0/+236
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you're writing C code using the ACLE MVE intrinsics that passes the result of a vcmp as input to a predicated intrinsic, e.g. mve_pred16_t pred = vcmpeqq(v1, v2); v_out = vaddq_m(v_inactive, v3, v4, pred); then clang's codegen for the compare intrinsic will create calls to `@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an `mve_pred16_t` integer representation, and then the next intrinsic will call `@llvm.arm.mve.pred.i2v` to convert it straight back again. This will be visible in the generated code as a `vmrs`/`vmsr` pair that move the predicate value pointlessly out of `p0` and back into it again. To prevent that, I've added InstCombine rules to remove round trips of the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine about the known and demanded bits of those intrinsics. As a result, you now get just the generated code you wanted: vpt.u16 eq, q1, q2 vaddt.u16 q0, q3, q4 Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70313
* [InstCombine] prevent crashing/assert on shift constant expression (PR44028)Sanjay Patel2019-11-171-0/+17
| | | | | The binary operator cast implies an instruction, but the matcher for shift does not: https://bugs.llvm.org/show_bug.cgi?id=44028
* [ConstantFold] Handle identity folds at top of ConstantFoldBinaryInstFlorian Hahn2019-11-171-14/+10
| | | | | | | | | | | | | | | | | | | | | | Currently we miss folds with undef and identity values for binary ops that do not fold to undef in general. We can generalize the identity simplifications and do them before checking for undef in particular. Alive checks: * OR - https://rise4fun.com/Alive/8OsK * AND - https://rise4fun.com/Alive/e3tE This will also allow us to remove some now redundant cases throughout the function, but I would like to do this as follow-up. That should make tracking down potential issues easier. Reviewers: spatel, RKSimon, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D70169
* [InstCombine] Canonicalize ssub.with.overflow with clamp to ssub.satDavid Green2019-11-171-54/+9
| | | | | | Working on top of D69252, this adds canonicalisation patterns for ssub.with.overflow to ssub.sats. Differential Revision: https://reviews.llvm.org/D69753
* [InstCombine] Canonicalize sadd.with.overflow with clamp to sadd.satDavid Green2019-11-171-54/+9
| | | | | | | | | | | | | This adds to D69245, adding extra signed patterns for folding from a sadd_with_overflow to a sadd_sat. These are more complex than the unsigned patterns, as the overflow can occur in either direction. For the add case, the positive overflow can only occur if both of the values are positive (same for both the values being negative). So there is an extra select on whether to use the positive or negative overflow limit. Differential Revision: https://reviews.llvm.org/D69252
* [InstCombine] Add extra tests for overflow_to_sat.ll. NFCDavid Green2019-11-171-0/+76
|
* [InstCombine] regenerate test CHECKs; NFCSanjay Patel2019-11-149-501/+502
| | | | | | | | | There's a discussion about changing a shufflevector transform in: https://bugs.llvm.org/show_bug.cgi?id=43958 It would protect against our current undef/poison behavior, and these are all tests that could be affected.
* Revert "[InstCombine] Fold PHIs with equal incoming pointers"Daniil Suchkov2019-11-141-400/+136
| | | | | This reverts commit a2f6ae9abffcba260c22bb235879f0576bf3b783. It is reverted due to clang-cmake-armv7-selfhost buildbot failure.
* [InstCombine] Fold PHIs with equal incoming pointersDaniil Suchkov2019-11-141-136/+400
| | | | | | | | | | | | | | | | | | | | | | | | This is a resubmission of bbb29738b58aaf6f6518269abdcf8f64131665a9 that was reverted due to clang tests failures. It includes the fix and additional IR tests for the missed case. Summary: In case when all incoming values of a PHI are equal pointers, this transformation inserts a definition of such a pointer right after definition of the base pointer and replaces with this value both PHI and all it's incoming pointers. Primary goal of this transformation is canonicalization of this pattern in order to enable optimizations that can't handle PHIs. Non-inbounds pointers aren't currently supported. Reviewers: spatel, RKSimon, lebedev.ri, apilipenko Reviewed By: apilipenko Tags: #llvm Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D68128
* Check result of emitStrLen before passing it to CreateGEPDimitry Andric2019-11-141-0/+15
| | | | | | | | | | | | | | | | | | Summary: This fixes PR43081, where the transformation of `strchr(p, 0) -> p + strlen(p)` can cause a segfault, if `-fno-builtin-strlen` is used. In that case, `emitStrLen` returns nullptr, which CreateGEP is not designed to handle. Also add the minimized code from the PR as a test case. Reviewers: xbolva00, spatel, jdoerfert, efriedma Reviewed By: efriedma Subscribers: lebedev.ri, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70143
* Add -disable-builtin option to optDimitry Andric2019-11-131-0/+21
| | | | | | | | | | | | | | | | | | Summary: The option allows to disable specific target library builtin functions, instead of -disable-simplify-libcalls, which disables all of them. This is a prerequisite for D70143, which fixes PR43081. Reviewers: xbolva00, spatel, jdoerfert, efriedma Reviewed By: efriedma Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70193
* [InstCombine] propagate fast-math-flags (FMF) to select when inverting ↵Sanjay Patel2019-11-134-72/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fcmp+select As noted by the FIXME comment, this is not correct based on our current FMF semantics. We should be propagating FMF from the final value in a sequence (in this case the 'select'). So the behavior even without this patch is wrong, but we did not allow FMF on 'select' until recently. But if we do the correct thing right now in this patch, we'll inevitably introduce regressions because we have not wired up FMF propagation for 'phi' and 'select' in other passes (like SimplifyCFG) or other places in InstCombine. I'm not seeing a better incremental way to make progress. That said, the potential extra damage over the existing wrong behavior from this patch is very limited. AFAIK, the only way to have different FMF on IR in the same function is if we have LTO inlined IR from 2 modules that were compiled using different fast-math settings. As seen in the tests, we may actually see some improvements with this patch because adding the FMF to the 'select' allows matching to min/max intrinsics that were previously missed (in the common case, the 'fcmp' and 'select' should have identical FMF to begin with). Next steps in the transition: Make similar changes in instcombine as needed. Enable phi-to-select FMF propagation in SimplifyCFG. Remove dependencies on fcmp with FMF. Deprecate FMF on fcmp. Differential Revision: https://reviews.llvm.org/D69720
* [InstCombine] Avoid moving ops that do restrict undef across shuffles.Florian Hahn2019-11-131-13/+32
| | | | | | | | | | | | | | | | | | | I think we have to be a bit more careful when it comes to moving ops across shuffles, if the op does restrict undef. For example, without this patch, we would move 'and %v, <0, 0, -1, -1>' over a 'shufflevector %a, undef, <undef, undef, 1, 2>'. As a result, the first 2 lanes of the result are undef after the combine, but they really should be 0, unless I am missing something. For ops that do fold to undef on undef operands, the current behavior should be fine. I've add conservative check OpDoesRestrictUndef, maybe there's a better existing utility? Reviewers: spatel, RKSimon, lebedev.ri Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D70093
* [InstCombine] Precommit shuffle tests for D70093.Florian Hahn2019-11-131-0/+186
|
* Revert 57dd4b0 "[ValueTracking] Allow context-sensitive nullness check for ↵Hans Wennborg2019-11-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | non-pointers" This caused miscompiles of Chromium (https://crbug.com/1023818). The reduced repro is small enough to fit here: $ cat /tmp/a.c unsigned char f(unsigned char *p) { unsigned char result = 0; for (int shift = 0; shift < 1; ++shift) result |= p[0] << (shift * 8); return result; } $ bin/clang -O2 -S -o - /tmp/a.c | grep -A4 f: f: # @f .cfi_startproc # %bb.0: # %entry xorl %eax, %eax retq That's nicely optimized, but I don't think it's the right result :-) > Same as D60846 but with a fix for the problem encountered there which > was a missing context adjustment in the handling of PHI nodes. > > The test that caused D60846 to be reverted was added in e15ab8f277c7. > > Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam > > Subscribers: hiraditya, bollu, llvm-commits > > Tags: #llvm > > Differential Revision: https://reviews.llvm.org/D69571 This reverts commit 57dd4b03e4806bbb4760ab6150940150d884df20.
* Temporarily revert "[InstCombine] Fold PHIs with equal incoming pointers"Daniil Suchkov2019-11-131-295/+136
| | | | | | Revert due to sanitizer-windows buildbot failure. This reverts commit bbb29738b58aaf6f6518269abdcf8f64131665a9.
* [InstCombine] Fold PHIs with equal incoming pointersDaniil Suchkov2019-11-131-136/+295
| | | | | | | | | | | | | | | | | | | In case when all incoming values of a PHI are equal pointers, this transformation inserts a definition of such a pointer right after definition of the base pointer and replaces with this value both PHI and all it's incoming pointers. Primary goal of this transformation is canonicalization of this pattern in order to enable optimizations that can't handle PHIs. Non-inbounds pointers aren't currently supported. Reviewers: spatel, RKSimon, lebedev.ri, apilipenko Reviewed By: apilipenko Tags: #llvm Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D68128
* [InstCombine] Skip scalable vectors in combineLoadToOperationTypeDiana Picus2019-11-121-0/+48
| | | | | | | | | | | | | | | | Don't try to canonicalize loads to scalable vector types to loads of integers. This removes one assertion when trying to use a TypeSize as a parameter to DataLayout::isLegalInteger. It does not handle the second part of the function (which looks at bitcasts). This patch also contains a NFC fix for Load Analysis, where a variable initialization that would cause the same assertion is moved closer to its use. This allows us to run the new test for InstCombine without having to teach LocationSize to play nicely with scalable vectors. Differential Revision: https://reviews.llvm.org/D70075
* [NFC][InstCombine] Add tests that show a number of canonicalization ↵Daniil Suchkov2019-11-121-0/+616
| | | | | | | | | | | | | | opportunities Reviewers: spatel, RKSimon, lebedev.ri, apilipenko Reviewed-By: apilipenko Tags: #llvm Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D68263
* Add InstCombine/InstructionSimplify support for Freeze Instructionaqjune2019-11-121-0/+20
| | | | | | | | | | | | | | | Summary: - Add llvm::SimplifyFreezeInst - Add InstCombiner::visitFreeze - Add llvm tests Reviewers: majnemer, sanjoy, reames, lebedev.ri, spatel Reviewed By: reames, lebedev.ri Subscribers: reames, lebedev.ri, filcab, regehr, trentxintong, llvm-commits Differential Revision: https://reviews.llvm.org/D29013
* Revert "[InstCombine] avoid crash from deleting an instruction that still ↵Sanjay Patel2019-11-111-35/+0
| | | | | | | | has uses (PR43723) (3rd try)" This reverts commit 3db8a3ef86e7b3331ab466a78c10a62be9e69829. This caused a different memory-sanitizer failure than earlier attempts, but it's still not right.
* [InstCombine] avoid crash from deleting an instruction that still has uses ↵Sanjay Patel2019-11-111-0/+35
| | | | | | | | | | | | | | | | | | | | | (PR43723) (3rd try) Re-try because earlier attempts were reverted due to use-after-free. Hopefully, diagnosed correctly this time - we replace/remove the invariant.start first rather than the invariant.end to avoid angering worklist-based iteration. We gather a set of white-listed instructions in isAllocSiteRemovable() and then replace/erase them. But we don't know in general if the instructions in the set have uses amongst themselves, so order of deletion makes a difference. There's already a special-case for the llvm.objectsize intrinsic, so add another for llvm.invariant.start. Should fix: https://bugs.llvm.org/show_bug.cgi?id=43723 Differential Revision: https://reviews.llvm.org/D69977
* [InstCombine] Simplify binary op when only one operand is a selectJay Foad2019-11-112-9/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: SimplifySelectsFeedingBinaryOp simplified binary ops when both operands were selects with the same condition. This patch extends it to handle these cases where only one operand is a select: X op (C ? P : Q) -> C ? (X op P) : (X op Q) // if X op P and X op Q both simplify (C ? P : Q) op Y -> C ? (P op Y) : (Q op Y) // if P op Y and Q op Y both simplify For example: X *fast (C ? 1.0 : 0.0) -> C ? X : 0.0 Reviewers: mcberg2017, majnemer, craig.topper, qcolombet, mcrosier Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64713
* [InstCombine] Turn (extractelement <1 x i64/double> (bitcast (x86_mmx))) ↵Craig Topper2019-11-101-2/+1
| | | | | | | | | | | | | into a single bitcast from x86_mmx to i64/double. The _m64 type is represented in IR as <1 x i64>. The x86-64 ABI on Linux passes <1 x i64> as a double. MMX intrinsics use x86_mmx type in IR.These things result in a lot of bitcasts in mmx code. There's another instcombine that tries to turn bitcast <1 x i64> to double into extractelement and a bitcast. The combine here tries to reverse this extractelement conversion if we see an mmx type.
* [InstCombine] Add a test case for suboptimal handling of (double (bitcast ↵Craig Topper2019-11-101-1/+14
| | | | | | | | (<1 x i64> (bitcast (x86_mmx))))) The outer bitcast gets turned into an extractelement and another bitcast rather than combining away to a single bitcast from mmx to double.
* Revert "[InstCombine] avoid crash from deleting an instruction that still ↵Sanjay Patel2019-11-101-39/+0
| | | | | | | has uses (PR43723) (2nd try)" This reverts commit 56b2aee1875a1ee47ddf859a6584f8728787fb7b. Still causes a use-after-free on sanitizer bots.
* [InstCombine] avoid crash from deleting an instruction that still has uses ↵Sanjay Patel2019-11-101-0/+39
| | | | | | | | | | | | | | | | | | | (PR43723) (2nd try) Re-try rGef02831f0a4e (reverted due to use-after-free), but bail out completely if we encounter an unexpected llvm.invariant.start. We gather a set of white-listed instructions in isAllocSiteRemovable() and then replace/erase them. But we don't know in general if the instructions in the set have uses amongst themselves, so order of deletion makes a difference. There's already a special-case for the llvm.objectsize intrinsic, so add another for llvm.invariant.end. Should fix: https://bugs.llvm.org/show_bug.cgi?id=43723 Differential Revision: https://reviews.llvm.org/D69977
* Revert "[InstCombine] avoid crash from deleting an instruction that still ↵Sanjay Patel2019-11-101-35/+0
| | | | | | | has uses (PR43723)" This reverts commit ef02831f0a4e3b3ccaa45a5589e4cabecbf527ab. Sanitizer bots fail with this change.
* [InstCombine] avoid crash from deleting an instruction that still has uses ↵Sanjay Patel2019-11-101-0/+35
| | | | | | | | | | | | | | | | (PR43723) We gather a set of white-listed instructions in isAllocSiteRemovable() and then replace/erase them. But we don't know in general if the instructions in the set have uses amongst themselves, so order of deletion makes a difference. There's already a special-case for the llvm.objectsize intrinsic, so add another for llvm.invariant.end. Should fix: https://bugs.llvm.org/show_bug.cgi?id=43723 Differential Revision: https://reviews.llvm.org/D69977
* [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into ↵Craig Topper2019-11-071-4/+2
| | | | | | | | | | insertelement/extractelement x86_mmx is conceptually a vector already. Don't introduce an extra conversion between it and scalar i64. I'm using VectorType::isValidElementType which checks for floating point, integer, and pointers to hopefully make this more readable than just blacklisting x86_mmx. Differential Revision: https://reviews.llvm.org/D69964
* [InstCombine] auto-generate complete checks; NFCSanjay Patel2019-11-071-10/+9
|
* [InstCombine] Add test cases to show bad canonicalization of bitcasts ↵Craig Topper2019-11-071-0/+19
| | | | | | | | | | | | | between x86_mmx and <1 x i64>. As the test cases show, we end up with an insert/extract and a bitcast to/from i64. x86_mmx is for some purposes conceptually a vector. We shouldn't be adding scalar conversions around it. Since _m64 is defined as <1 x i64> and intrinsics use x86_mmx as their input/output these extra scalar operations prevent the X86 backend from generating good code especially on 32-bit targets where i64 gets split.
* Wrong debug info generated at -O2 (-O0 is correct)Vedant Kumar2019-11-074-5/+62
| | | | | | | | | | | | | | | | Instcombiner pass was erasing trivially dead instruction without updating dependent llvm.dbg.value. which was not showing programmer current state of variables while debugging. As a part of this fix I did following, Iterate throught all the users (llvm.dbg) of a instruction which is trivially dead and set each if them undef, Before deleting the instruction. Now user will see optimized out, when try to print those variables. This fixes https://bugs.llvm.org/show_bug.cgi?id=43893 This is my first fix to llvm. Patch by kamlesh kumar! Differential Revision: https://reviews.llvm.org/D69809
* [InstCombine] canonicalize shift+logic+shift to reduce dependency chainSanjay Patel2019-11-072-31/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shift (logic (shift X, C0), Y), C1 --> logic (shift X, C0+C1), (shift Y, C1) This is an IR translation of an existing SDAG transform added here: rL370617 So we again have 9 possible patterns with a commuted IR variant of each pattern: https://rise4fun.com/Alive/VlI https://rise4fun.com/Alive/n1m https://rise4fun.com/Alive/1Vn Part of the motivation is to allow easier recognition and subsequent canonicalization of bswap patterns as discussed in PR43146: https://bugs.llvm.org/show_bug.cgi?id=43146 We had to delay this transform because it used to allow the SLP vectorizer to create awful reductions out of simple load-combines. That problem was fixed with: rL375025 (we'll bring back load combining in IR someday...) The backend is also better equipped to deal with these patterns now using hooks like TLI.getShiftAmountThreshold(). The only remaining potential controversy is that the -reassociate pass tends to reverse this kind of pattern (to help GVN?). But since -reassociate doesn't do anything with these specific patterns, there is no conflict currently. Finally, there's a new pass proposal at D67383 for general tree-height-reduction reassociation, and it could use a cost model to decide how to optimally rearrange these kinds of ops for a target. That patch appears to be stalled. Differential Revision: https://reviews.llvm.org/D69842
OpenPOWER on IntegriCloud