summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Add cvt.pkrtz intrinsicMatt Arsenault2017-02-221-0/+25
| | | | | | Convert llvm.SI.packf16 test uses llvm-svn: 295797
* [InstCombine] canonicalize non-obivous forms of integer min/maxSanjay Patel2017-02-211-17/+24
| | | | | | | | | | | | | | | | This is part of trying to clean up our handling of min/max patterns in IR. By converting these to canonical form, we're more likely to recognize them because there are various places in InstCombine that don't use matchSelectPattern or m_SMax and friends. The backend fixups referenced in the now deleted TODO comment were added with: https://reviews.llvm.org/rL291392 https://reviews.llvm.org/rL289738 If there's any codegen fallout from this change, we should be able to address it in DAGCombiner or target-specific lowering. llvm-svn: 295758
* [InstCombine] Do not exercise nested max/min pattern on absAnna Thomas2017-02-211-1/+3
| | | | | | | | | | | | | | | | | | | Summary: This is a fix for assertion failure in `getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern. We should not be invoking the simplification rule for ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations. Added a test case which would cause an assertion failure without the patch. Reviewers: sanjoy, majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30051 llvm-svn: 295719
* [InstCombine] add nsw/nuw X, signbit --> or X, signbitSanjay Patel2017-02-181-2/+9
| | | | | | | | | Changing to 'or' (rather than 'xor' when no wrapping flags are set) allows icmp simplifies to happen as expected. Differential Revision: https://reviews.llvm.org/D29729 llvm-svn: 295574
* InstCombine: fix extraction when performing vector/array punningEugene Leviant2017-02-171-1/+1
| | | | | | Differential revision: https://reviews.llvm.org/D29491 llvm-svn: 295429
* InstCombine: Canonicalize fast fmuladd to fmul + faddMatt Arsenault2017-02-161-1/+14
| | | | llvm-svn: 295353
* [AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus ↵Craig Topper2017-02-162-4/+9
| | | | | | intrinsics like it does 128/256-bit. llvm-svn: 295294
* [InstCombine] improve formatting; NFCSanjay Patel2017-02-151-6/+3
| | | | llvm-svn: 295237
* [InstCombine] fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2)Sanjay Patel2017-02-121-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | I found one special case of this transform for 'slt 0', so I removed that and added the general transform. Alive code to check correctness: Name: slt_no_overflow Pre: WillNotOverflowSignedSub(C1, C2) %a = add nsw i8 %x, C2 %b = icmp slt %a, C1 => %b = icmp slt %x, C1 - C2 Name: sgt_no_overflow Pre: WillNotOverflowSignedSub(C1, C2) %a = add nsw i8 %x, C2 %b = icmp sgt %a, C1 => %b = icmp sgt %x, C1 - C2 http://rise4fun.com/Alive/MH Differential Revision: https://reviews.llvm.org/D29774 llvm-svn: 294898
* [InstCombine] Move class into anonymous namespace. NFC.Benjamin Kramer2017-02-101-0/+2
| | | | | | | | This is necessary to avoid warnings from GCC. InstCombineLoadStoreAlloca.cpp:238:7: error: 'PointerReplacer' declared with greater visibility than the type of its field 'PointerReplacer::IC' llvm-svn: 294794
* [InstCombine] Silence unused variable warning in Release builds.Benjamin Kramer2017-02-101-0/+2
| | | | llvm-svn: 294788
* Fix invalid addrspacecast due to combining alloca with global varYaxun Liu2017-02-102-7/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | For function-scope variables with large initialisation list, FE usually generates a global variable to hold the initializer, then generates memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst identifies such allocas which are accessed only by reading and replaces them with the global variable. This is done by casting the global variable to the type of the alloca and replacing all references. However, when the global variable is in a different address space which is disjoint with addr space 0 (e.g. for IR generated from OpenCL, global variable cannot be in private addr space i.e. addr space 0), casting the global variable to addr space 0 results in invalid IR for certain targets (e.g. amdgpu). To fix this issue, when the global variable is not in addr space 0, instead of casting it to addr space 0, this patch chases down the uses of alloca until reaching the load instructions, then replaces load from alloca with load from the global variable. If during the chasing bitcast and GEP are encountered, new bitcast and GEP based on the global variable are generated and used in the load instructions. Differential Revision: https://reviews.llvm.org/D27283 llvm-svn: 294786
* [InstCombine] allow (X * C2) << C1 --> X * (C2 << C1) for vectorsSanjay Patel2017-02-091-13/+12
| | | | | | | | | | This fold already existed for vectors but only when 'C1' was a splat constant (but 'C2' could be any constant). There were no tests for any vector constants, so I'm adding a test that shows non-splat constants for both operands. llvm-svn: 294650
* [InstCombine] use m_APInt to allow demanded bits analysis on splat constantsSanjay Patel2017-02-091-10/+13
| | | | llvm-svn: 294628
* [InstCombine] add local name for repeated calls; NFC Sanjay Patel2017-02-081-6/+4
| | | | llvm-svn: 294470
* [InstComobineCalls] Fix buildbot failures after r294453.Igor Laevsky2017-02-081-1/+1
| | | | | | | | Some targets don't support uint64_t options. Change type to unsigned. Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294461
* [InstCombineCalls] Unfold element atomic memcpy instructionIgor Laevsky2017-02-082-0/+83
| | | | | | Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294453
* [InstCombineCalls] Remove zero length atomic memcpy intrinsicsIgor Laevsky2017-02-081-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294452
* Fix the -Werror build for some sign-comparisonsDavid Blaikie2017-02-071-1/+1
| | | | llvm-svn: 294331
* [InstCombine] Make max size array combine a tunable.Davide Italiano2017-02-074-3/+13
| | | | | | | Requested by Sanjoy/Hal a while ago, and forgotten by me (r283612). llvm-svn: 294323
* Merge DebugLoc on combined stores; in this case, when combining storesPaul Robinson2017-02-061-1/+4
| | | | | | | | from the end of two blocks, merge instead of arbitrarily picking one. Differential Revision: http://reviews.llvm.org/D29504 llvm-svn: 294251
* [InstCombine] simplify dyn_cast + isa; NFCISanjay Patel2017-02-061-6/+4
| | | | llvm-svn: 294198
* [InstCombine] treat i1 as a special type in shouldChangeType()Sanjay Patel2017-02-031-4/+8
| | | | | | | | | | | | | | | | | | | | This patch is based on the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2017-January/109631.html Folding to i1 should always be desirable because that's better for value tracking and we have special folds for i1 types. I checked for other users of shouldChangeType() where this might have an effect, but we already handle the i1 case differently than other types in all of those cases. Side note: the default datalayout includes i1, so it seems we only find this gap in shouldChangeType + phi folding for the case when there is (1) an explicit datalayout without i1, (2) casting to i1 from a legal type, and (3) a phi with exactly 2 incoming casted operands (as Björn mentioned). Differential Revision: https://reviews.llvm.org/D29336 llvm-svn: 294066
* [InstCombine] fix operand-complexity-based canonicalization (PR28296)Sanjay Patel2017-02-031-7/+15
| | | | | | | | | | | | | | | | | | | The code comments didn't match the code logic, and we didn't actually distinguish the fake unary (not/neg/fneg) operators from arguments. Adding another level to the weighting scheme provides more structure and can help simplify the pattern matching in InstCombine and other places. I fixed regressions that would have shown up from this change in: rL290067 rL290127 But that doesn't mean there are no pattern-matching logic holes left; some combines may just be missing regression tests. Should fix: https://llvm.org/bugs/show_bug.cgi?id=28296 Differential Revision: https://reviews.llvm.org/D27933 llvm-svn: 294049
* [InstCombine] move folds for shift-shift pairs; NFCISanjay Patel2017-02-011-48/+34
| | | | | | | | | | | Although this is 'no-functional-change-intended', I'm adding tests for shl-shl and lshr-lshr pairs because there is no existing test coverage for those folds. It seems like we should be able to remove some code from foldShiftedShift() at this point because we're handling those patterns on the general path. llvm-svn: 293814
* [InstCombine] Allow InstCombine to merge adjacent guardsSanjoy Das2017-02-011-6/+14
| | | | | | | | | | | | | | | | | | | | Summary: If there are two adjacent guards with different conditions, we can remove one of them and include its condition into the condition of another one. This patch allows InstCombine to merge them by the following pattern: guard(a); guard(b) -> guard(a & b). Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29378 llvm-svn: 293778
* [Instcombine] Combine consecutive identical fencesDavide Italiano2017-01-312-0/+10
| | | | | | Differential Revision: https://reviews.llvm.org/D29314 llvm-svn: 293661
* Don't combine stores to a swifterror pointer operand to a different typeArnold Schwaighofer2017-01-311-1/+2
| | | | llvm-svn: 293658
* fix formatting; NFCSanjay Patel2017-01-315-17/+17
| | | | llvm-svn: 293652
* [InstCombine] Make sure that LHS and RHS have the same type inSilviu Baranga2017-01-311-0/+4
| | | | | | | | | | | | | | | | | transformToIndexedCompare If they don't have the same type, the size of the constant index would need to be adjusted (and this wouldn't be always possible). Alternatively we could try the analysis with the initial RHS value, which would guarantee that the two sides have the same type. However it is unlikely that in practice this would pass our transformation requirements. Fixes PR31808 (https://llvm.org/bugs/show_bug.cgi?id=31808). llvm-svn: 293629
* [InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2) for vectors ↵Sanjay Patel2017-01-301-54/+19
| | | | | | with splat constants llvm-svn: 293570
* [InstCombine] enable more lshr(shl X, C1), C2 folds for vectors with splat ↵Sanjay Patel2017-01-301-23/+17
| | | | | | constants llvm-svn: 293562
* [InstCombine] enable (X >>?exact C1) << C2 --> X >>?exact (C1-C2) for ↵Sanjay Patel2017-01-301-24/+22
| | | | | | vectors with splat constants llvm-svn: 293524
* [InstCombine] use auto with obvious type; NFCSanjay Patel2017-01-301-3/+3
| | | | llvm-svn: 293508
* [InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1-C2) for vectors ↵Sanjay Patel2017-01-301-20/+16
| | | | | | with splat constants llvm-svn: 293507
* [InstCombine] fixed to propagate 'exact' on lshrSanjay Patel2017-01-301-1/+1
| | | | | | | | | | | | | | | | | The original shift is bigger, so this may qualify as 'obvious', but here's an attempt at an Alive-based proof: Name: exact Pre: (C1 u< C2) %a = shl i8 %x, C1 %b = lshr exact i8 %a, C2 => %c = lshr exact i8 %x, C2 - C1 %b = and i8 %c, ((1 << width(C1)) - 1) u>> C2 Optimization is correct! llvm-svn: 293498
* [InstCombine] enable lshr(shl X, C1), C2 folds for vectors with splat constantsSanjay Patel2017-01-301-25/+25
| | | | llvm-svn: 293489
* [InstCombine] enable (X >>?,exact C1) << C2 --> X << (C2 - C1) for vectors ↵Sanjay Patel2017-01-291-17/+17
| | | | | | with splats llvm-svn: 293435
* [InstCombine] move icmp transforms that might be recognized as min/max and ↵Sanjay Patel2017-01-271-10/+21
| | | | | | | | | | | | | | | | | | | | | | inf-loop (PR31751) This is a minimal patch to avoid the infinite loop in: https://llvm.org/bugs/show_bug.cgi?id=31751 But the general problem is bigger: we're not canonicalizing all of the min/max forms reported by value tracking's matchSelectPattern(), and we don't define min/max consistently. Some code uses matchSelectPattern(), other code uses matchers like m_Umax, and others have their own inline definitions which may be subtly different from any of the above. The reason that the test cases in this patch need a cast op to trigger is because we don't (yet) canonicalize all min/max forms based on matchSelectPattern() in canonicalizeMinMaxWithConstant(), but we do make min/max+cast transforms based on matchSelectPattern() in visitSelectInst(). The location of the icmp transforms that trigger the inf-loop seems arbitrary at best, so I'm moving those behind the min/max fence in visitICmpInst() as the quick fix. llvm-svn: 293345
* [NVPTX] [InstCombine] Add llvm_unreachable to appease MSVC.Justin Lebar2017-01-271-0/+1
| | | | llvm-svn: 293253
* [NVPTX] Fix use-after-stack-free bug in InstCombineCalls.Justin Lebar2017-01-271-1/+1
| | | | | | Introduced in r293244. llvm-svn: 293251
* [NVPTX] Upgrade NVVM intrinsics in InstCombineCalls.Justin Lebar2017-01-271-0/+250
| | | | | | | | | | | | | | | | | | | | | | | | Summary: There are many NVVM intrinsics that we can't entirely get rid of, but that nonetheless often correspond to target-generic LLVM intrinsics. For example, if flush denormals to zero (ftz) is enabled, we can convert @llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32. On the other hand, if ftz is disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a non-ftz PTX instruction. In this case, we can, however, simplify the non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32. These transformations are particularly useful because they let us constant fold instructions that appear in libdevice, the bitcode library that ships with CUDA and essentially functions as its libm. Reviewers: tra Subscribers: hfinkel, majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D28794 llvm-svn: 293244
* Revert a couple of InstCombine/Guard checkinsSanjoy Das2017-01-261-29/+0
| | | | | | | | | | | | | | | | | | | | | | | | | This change reverts: r293061: "[InstCombine] Canonicalize guards for NOT OR condition" r293058: "[InstCombine] Canonicalize guards for AND condition" They miscompile cases like: ``` declare void @llvm.experimental.guard(i1, ...) define void @test_guard_not_or(i1 %A, i1 %B) { %C = or i1 %A, %B %D = xor i1 %C, true call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ] ret void } ``` because they do transfer the `i32 20, i32 30` parameters to newly created guard instructions. llvm-svn: 293227
* [InstCombine] fold (X >>u C) << C --> X & (-1 << C)Sanjay Patel2017-01-261-18/+17
| | | | | | | | | | | | | | | We already have this fold when the lshr has one use, but it doesn't need that restriction. We may be able to remove some code from foldShiftedShift(). Also, move the similar: (X << C) >>u C --> X & (-1 >>u C) ...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst(). That whole function seems questionable since it is called by commonShiftTransforms(), but there's really not much in common if we're checking the shift opcodes for every fold. llvm-svn: 293215
* [InstCombine] use m_APInt to allow (X << C) >>u C --> X & (-1 >>u C) with ↵Sanjay Patel2017-01-261-16/+24
| | | | | | splat vectors llvm-svn: 293208
* [X86] Add demanded elts support for the inputs to pclmul intrinsicCraig Topper2017-01-261-0/+38
| | | | | | | | This intrinsic uses bit 0 and bit 4 of an immediate argument to determine which bits of its inputs to read. This patch uses this information to simplify the demanded elements of the input vectors. Differential Revision: https://reviews.llvm.org/D28979 llvm-svn: 293151
* [InstCombine] Canonicalize guards for NOT OR conditionArtur Pilipenko2017-01-251-0/+12
| | | | | | | | | | | | This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D29075 Patch by Maxim Kazantsev. llvm-svn: 293061
* [InstCombine][SSE] Add support for PACKSS/PACKUS constant foldingSimon Pilgrim2017-01-251-0/+94
| | | | | | Differential Revision: https://reviews.llvm.org/D28949 llvm-svn: 293060
* [InstCombine] Canonicalize guards for AND conditionArtur Pilipenko2017-01-251-0/+17
| | | | | | | | | | | | This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D29074 Patch by Maxim Kazantsev. llvm-svn: 293058
* [InstCombine] Allow InstrCombine to remove one of adjacent guards if they ↵Artur Pilipenko2017-01-251-0/+10
| | | | | | | | | | | | | | are equivalent This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: majnemer, apilipenko Differential Revision: https://reviews.llvm.org/D29071 Patch by Maxim Kazantsev. llvm-svn: 293056
OpenPOWER on IntegriCloud