summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine
Commit message (Collapse)AuthorAgeFilesLines
* [InstSimplify] analyze (optionally casted) icmps to eliminate obviously ↵Sanjay Patel2016-06-201-10/+0
| | | | | | | | | | | | | | | false logic (PR27869) By moving this transform to InstSimplify from InstCombine, we sidestep the problem/question raised by PR27869: https://llvm.org/bugs/show_bug.cgi?id=27869 ...where InstCombine turns an icmp+zext into a shift causing us to miss the fold. Credit to David Majnemer for a draft patch of the changes to InstructionSimplify.cpp. Differential Revision: http://reviews.llvm.org/D21512 llvm-svn: 273200
* InstCombine: Don't strip convergent from intrinsic callsitesMatt Arsenault2016-06-201-1/+2
| | | | | | | Specific instances of intrinsic calls may want to be convergent, such as certain register reads but the intrinsic declaration is not. llvm-svn: 273188
* Revert "Revert "Revert "InstCombine: Reduce trunc (shl x, K) width."""Matt Arsenault2016-06-171-22/+5
| | | | | | | This seems to be causing an infinite loop / crash in instcombine on some bots. llvm-svn: 273069
* Revert "Revert "InstCombine: Reduce trunc (shl x, K) width.""Matt Arsenault2016-06-171-5/+22
| | | | | | | Reapply r272987. Condition should be in terms of the destination type, and the flags should not be copied. llvm-svn: 273045
* [InstCombine] allow more than one use for vector bitcast folding with selectsSanjay Patel2016-06-171-13/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The motivating example for this transform is similar to D20774 where bitcasts interfere with a single cmp/select sequence, but in this case we have 2 uses of each bitcast to produce min and max ops: define void @minmax_bc_store(<4 x float> %a, <4 x float> %b, <4 x float>* %ptr1, <4 x float>* %ptr2) { %cmp = fcmp olt <4 x float> %a, %b %bc1 = bitcast <4 x float> %a to <4 x i32> %bc2 = bitcast <4 x float> %b to <4 x i32> %sel1 = select <4 x i1> %cmp, <4 x i32> %bc1, <4 x i32> %bc2 %sel2 = select <4 x i1> %cmp, <4 x i32> %bc2, <4 x i32> %bc1 %bc3 = bitcast <4 x float>* %ptr1 to <4 x i32>* store <4 x i32> %sel1, <4 x i32>* %bc3 %bc4 = bitcast <4 x float>* %ptr2 to <4 x i32>* store <4 x i32> %sel2, <4 x i32>* %bc4 ret void } With this patch, we move the selects up to use the input args which allows getting rid of all of the bitcasts: define void @minmax_bc_store(<4 x float> %a, <4 x float> %b, <4 x float>* %ptr1, <4 x float>* %ptr2) { %cmp = fcmp olt <4 x float> %a, %b %sel1.v = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b %sel2.v = select <4 x i1> %cmp, <4 x float> %b, <4 x float> %a store <4 x float> %sel1.v, <4 x float>* %ptr1, align 16 store <4 x float> %sel2.v, <4 x float>* %ptr2, align 16 ret void } The asm for x86 SSE then improves from: movaps %xmm0, %xmm2 cmpltps %xmm1, %xmm2 movaps %xmm2, %xmm3 andnps %xmm1, %xmm3 movaps %xmm2, %xmm4 andnps %xmm0, %xmm4 andps %xmm2, %xmm0 orps %xmm3, %xmm0 andps %xmm1, %xmm2 orps %xmm4, %xmm2 movaps %xmm0, (%rdi) movaps %xmm2, (%rsi) To: movaps %xmm0, %xmm2 minps %xmm1, %xmm2 maxps %xmm0, %xmm1 movaps %xmm2, (%rdi) movaps %xmm1, (%rsi) The TODO comments show that we're limiting this transform only to vectors and only to bitcasts because we need to improve other transforms or risk creating worse codegen. Differential Revision: http://reviews.llvm.org/D21190 llvm-svn: 273011
* Revert "InstCombine: Reduce trunc (shl x, K) width."Matt Arsenault2016-06-171-24/+5
| | | | | | | | This reverts commit r272987. This might be causing crashes on some bots. llvm-svn: 272990
* InstCombine: Reduce trunc (shl x, K) width.Matt Arsenault2016-06-171-5/+24
| | | | llvm-svn: 272987
* [InstCombine] Don't widen metadata on store-to-load forwardingEli Friedman2016-06-161-2/+4
| | | | | | | | | | | | | The original check for load CSE or store-to-load forwarding is wrong when the forwarded stored value happened to be a load. Ref https://github.com/JuliaLang/julia/issues/16894 Differential Revision: http://reviews.llvm.org/D21271 Patch by Yichao Yu! llvm-svn: 272868
* [IR] Require ArrayRef of 'uint32_t' instead of 'int' for the mask argument ↵Craig Topper2016-06-121-3/+3
| | | | | | for one of the signatures of CreateShuffleVector. This better emphasises that you can't use it for the -1 as undef behavior. llvm-svn: 272491
* [InstCombine] move fold of select of add/sub to helper function; NFCISanjay Patel2016-06-081-61/+75
| | | | llvm-svn: 272199
* [InstCombine] fix outdated comment, simplify logic; NFCISanjay Patel2016-06-081-16/+13
| | | | llvm-svn: 272196
* [InstCombine] reduce indent; NFCSanjay Patel2016-06-081-63/+64
| | | | llvm-svn: 272193
* [InstCombine] use copyIRFlags() ; NFCISanjay Patel2016-06-081-12/+2
| | | | llvm-svn: 272191
* Apply most suggestions of clang-tidy's performance-unnecessary-value-paramBenjamin Kramer2016-06-082-4/+7
| | | | | | | Avoids unnecessary copies. All changes audited & pass tests with asan. No functional change intended. llvm-svn: 272190
* Avoid copies of std::strings and APInt/APFloats where we only read from itBenjamin Kramer2016-06-083-9/+9
| | | | | | | | As suggested by clang-tidy's performance-unnecessary-copy-initialization. This can easily hit lifetime issues, so I audited every change and ran the tests under asan, which came back clean. llvm-svn: 272126
* [InstCombine][AVX2] Add support for simplifying AVX2 per-element shifts to ↵Simon Pilgrim2016-06-071-0/+125
| | | | | | | | | | | | | | | | | | native shifts Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit). If the shift amount is constant we can sometimes convert these instructions to native shifts: 1 - if all shift amounts are in range then the conversion is trivial. 2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion. 3 - logical shifts just return zero if all elements have out of range shift amounts. In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts. Differential Revision: http://reviews.llvm.org/D19675 llvm-svn: 271996
* [InstCombine][SSE] Add MOVMSK constant folding (PR27982)Simon Pilgrim2016-06-071-0/+51
| | | | | | | | | | This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions. The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs. Differential Revision: http://reviews.llvm.org/D20998 llvm-svn: 271990
* [InstCombine] scalarizePHI should not assume the code it sees has been CSE'dMichael Kuperstein2016-06-061-12/+26
| | | | | | | | | | | | | | scalarizePHI only looked for phis that have exactly two uses - the "latch" use, and an extract. Unfortunately, we can not assume all equivalent extracts are CSE'd, since InstCombine itself may create an extract which is a duplicate of an existing one. This extends it to handle several distinct extracts from the same index. This should fix at least some of the performance regressions from PR27988. Differential Revision: http://reviews.llvm.org/D20983 llvm-svn: 271961
* [InstCombine] limit icmp transform to ConstantInt (PR28011)Sanjay Patel2016-06-061-3/+5
| | | | | | | | | | | | | | | In r271810 ( http://reviews.llvm.org/rL271810 ), I loosened the check above this to work for any Constant rather than ConstantInt. AFAICT, that part makes sense if we can determine that the shrunken/extended constant remained equal. But it doesn't make sense for this later transform where we assume that the constant DID change. This could assert for a ConstantExpr: https://llvm.org/bugs/show_bug.cgi?id=28011 And it could be wrong for a vector as shown in the added regression test. llvm-svn: 271908
* Add safety check to InstCombiner::commonIRemTransformsSanjoy Das2016-06-051-2/+11
| | | | | | | | | | | | | | | | Since FoldOpIntoPhi speculates the binary operation to potentially each of the predecessors of the PHI node (pulling it out of arbitrary control dependence in the process), we can FoldOpIntoPhi only if we know the operation doesn't have UB. This also brings up an interesting profitability question -- the way it is written today, commonIRemTransforms will hoist out work from dynamically dead code into code that will execute at runtime. Perhaps that isn't the best canonicalization? Fixes PR27968. llvm-svn: 271857
* [InstCombine] allow vector icmp bool transformsSanjay Patel2016-06-051-1/+1
| | | | llvm-svn: 271843
* fix documentation comments and other clean-ups; NFCSanjay Patel2016-06-051-74/+67
| | | | llvm-svn: 271839
* [InstCombine] less 'CI' confusion; NFCSanjay Patel2016-06-051-26/+26
| | | | | | | | | Change the name of the ICmpInst to 'ICmp' and the Constant (was a ConstantInt) to 'C', so that it's hopefully clearer that 'CI' refers to CastInst in this context. While we're scrubbing, fix the documentation comment and use 'auto' with 'dyn_cast'. llvm-svn: 271817
* [InstCombine] allow vector constants for cast+icmp foldSanjay Patel2016-06-041-1/+1
| | | | | | | This is step 1 of unknown towards fixing PR28001: https://llvm.org/bugs/show_bug.cgi?id=28001 llvm-svn: 271810
* clean-up; NFCSanjay Patel2016-06-041-4/+3
| | | | llvm-svn: 271807
* fix formatting, punctuation; NFCSanjay Patel2016-06-041-5/+3
| | | | llvm-svn: 271804
* [InstCombine][MMX] Extend SimplifyDemandedUseBits MOVMSK support to MMXSimon Pilgrim2016-06-041-3/+9
| | | | | | | | Add the MMX implementation to the SimplifyDemandedUseBits SSE/AVX MOVMSK support added in D19614 Requires a minor tweak as llvm.x86.mmx.pmovmskb takes a x86_mmx argument - so we have to be explicit about the implied v8i8 vector type. llvm-svn: 271789
* [InstCombine] look through bitcasts to find selectsSanjay Patel2016-06-031-18/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was concern that creating bitcasts for the simpler potential select pattern: define <2 x i64> @vecBitcastOp1(<4 x i1> %cmp, <2 x i64> %a) { %a2 = add <2 x i64> %a, %a %sext = sext <4 x i1> %cmp to <4 x i32> %bc = bitcast <4 x i32> %sext to <2 x i64> %and = and <2 x i64> %a2, %bc ret <2 x i64> %and } might lead to worse code for some targets, so this patch is matching the larger patterns seen in the test cases. The motivating example for this patch is this IR produced via SSE intrinsics in C: define <2 x i64> @gibson(<2 x i64> %a, <2 x i64> %b) { %t0 = bitcast <2 x i64> %a to <4 x i32> %t1 = bitcast <2 x i64> %b to <4 x i32> %cmp = icmp sgt <4 x i32> %t0, %t1 %sext = sext <4 x i1> %cmp to <4 x i32> %t2 = bitcast <4 x i32> %sext to <2 x i64> %and = and <2 x i64> %t2, %a %neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1> %neg2 = bitcast <4 x i32> %neg to <2 x i64> %and2 = and <2 x i64> %neg2, %b %or = or <2 x i64> %and, %and2 ret <2 x i64> %or } For an AVX target, this is currently: vpcmpgtd %xmm1, %xmm0, %xmm2 vpand %xmm0, %xmm2, %xmm0 vpandn %xmm1, %xmm2, %xmm1 vpor %xmm1, %xmm0, %xmm0 retq With this patch, it becomes: vpmaxsd %xmm1, %xmm0, %xmm0 Differential Revision: http://reviews.llvm.org/D20774 llvm-svn: 271676
* transform obscured FP sign bit ops into a fabs/fneg using TLI hookSanjay Patel2016-06-021-18/+0
| | | | | | | | | | | | | | | | | | | This is effectively a revert of: http://reviews.llvm.org/rL249702 - [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886) and: http://reviews.llvm.org/rL249701 - [ValueTracking] teach computeKnownBits that a fabs() clears sign bits and a reimplementation as a DAG combine for targets that have IEEE754-compliant fabs/fneg instructions. This is intended to resolve the objections raised on the dev list: http://lists.llvm.org/pipermail/llvm-dev/2016-April/098154.html and: https://llvm.org/bugs/show_bug.cgi?id=24886#c4 In the interest of patch minimalism, I've only partly enabled AArch64. PowerPC, MIPS, x86 and others can enable later. Differential Revision: http://reviews.llvm.org/D19391 llvm-svn: 271573
* [InstCombine] remove guard for generating a vector selectSanjay Patel2016-06-021-15/+11
| | | | | | | | | | | | | | | | | This is effectively NFC because we already do this transform after r175380: http://reviews.llvm.org/rL175380 and also via foldBoolSextMaskToSelect(). This change should just make it a bit more efficient to match the pattern. The original guard was added in r95058: http://reviews.llvm.org/rL95058 A sampling of codegen for current in-tree targets shows no problems. This makes sense given that we're already producing the vector selects via the other transforms. llvm-svn: 271554
* X86: permit using SjLj EH on x86 targets as an optionSaleem Abdulrasool2016-05-311-0/+2
| | | | | | | | | | | This adds support to the backed to actually support SjLj EH as an exception model. This is *NOT* the default model, and requires explicitly opting into it from the frontend. GCC supports this model and for MinGW can still be enabled via the `--using-sjlj-exceptions` options. Addresses PR27749! llvm-svn: 271244
* [X86] Remove SSE/AVX unaligned store intrinsics as clang no longer uses ↵Craig Topper2016-05-301-26/+0
| | | | | | them. Auto upgrade to native unaligned store instructions. llvm-svn: 271236
* [X86][SSE] (Reapplied) Replace (V)PMOVSX and (V)PMOVZX integer extension ↵Simon Pilgrim2016-05-281-44/+0
| | | | | | | | | | | | intrinsics with generic IR (llvm) This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused. Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed. Differential Revision: http://reviews.llvm.org/D20686 llvm-svn: 271131
* [InstCombine] move and/sext fold to helper function; NFCISanjay Patel2016-05-271-27/+28
| | | | | | We need to enhance the pattern matching on these to look through bitcasts. llvm-svn: 271051
* Revert: r270973 - [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer ↵Simon Pilgrim2016-05-271-0/+44
| | | | | | extension intrinsics with generic IR (llvm) llvm-svn: 270976
* [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with ↵Simon Pilgrim2016-05-271-44/+0
| | | | | | | | | | | | generic IR (llvm) This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused. A companion patch (D20684) removes/auto-upgrade the clang intrinsics. Differential Revision: http://reviews.llvm.org/D20686 llvm-svn: 270973
* [InstCombine] Catch more bswap cases missed due to zext and truncs.Chad Rosier2016-05-261-14/+28
| | | | | | | Fixes PR27824. Differential Revision: http://reviews.llvm.org/D20591. llvm-svn: 270853
* [X86] Add the AVX storeu intrinsics to InstCombine and LoopStrengthReduce in ↵Craig Topper2016-05-261-0/+13
| | | | | | | | the same places that the SSE/SSE2 storeu intrinsics appear. I don't really know how to test this. Just seemed like we should be consistent. llvm-svn: 270819
* Clarify that we match BSwap in InstCombine and BitReverse in CGP. NFC.Chad Rosier2016-05-252-6/+6
| | | | | | | | Also, rename recognizeBitReverseOrBSwapIdiom to recognizeBSwapOrBitReverseIdiom, so the ordering of the MatchBSwaps and MatchBitReversals arguments are consistent with the function name. llvm-svn: 270715
* [InstCombine] Fix assertion when bitcast is converted to gepGerolf Hoflehner2016-05-231-0/+7
| | | | | | | | | | When an aggregate contains an opaque type its size cannot be determined. This triggers an "Invalid GetElementPtrInst indices for type" assert in function checkGEPType. The fix suppresses the conversion in this case. http://reviews.llvm.org/D20319 llvm-svn: 270479
* reduce indent; NFCSanjay Patel2016-05-221-19/+19
| | | | llvm-svn: 270372
* [InstCombine] Avoid combining the bitcast of a var that is used as both ↵Guozhi Wei2016-05-191-0/+7
| | | | | | | | | | address and result of load instructions This patch fixes https://llvm.org/bugs/show_bug.cgi?id=27703. If there is a sequence of one or more load instructions, each loaded value is used as address of later load instruction, bitcast is necessary to change the value type, don't optimize it. llvm-svn: 270135
* [InstCombine] add another test for wrong icmp constant (PR27792)Sanjay Patel2016-05-171-1/+1
| | | | | | It doesn't matter if the comparison is unsigned; the inc/dec is always signed. llvm-svn: 269831
* [InstCombine] fix constant to be signed for signed comparisonsSanjay Patel2016-05-171-1/+1
| | | | | | | This bug was introduced in r269728 and is the likely cause of many stage 2 ubsan bot failures. I'll add a test in a follow-up commit assuming this fixes things properly. llvm-svn: 269797
* [InstCombine] Don't crash when trying to take an element of a ConstantExpr.Benjamin Kramer2016-05-171-0/+3
| | | | | | Fixes PR27786. llvm-svn: 269757
* try to avoid unused variable warning in release build; NFCISanjay Patel2016-05-171-1/+2
| | | | llvm-svn: 269729
* [InstCombine] check vector elements before trying to transform LE/GE vector ↵Sanjay Patel2016-05-171-78/+42
| | | | | | | | | | | | | | | | | icmp (PR27756) Fix a bug introduced with rL269426 : [InstCombine] canonicalize* LE/GE vector integer comparisons to LT/GT (PR26701, PR26819) We were assuming that a ConstantDataVector / ConstantVector / ConstantAggregateZero operand of an ICMP was composed of ConstantInt elements, but it might have ConstantExpr or UndefValue elements. Handle those appropriately. Also, refactor this function to join the scalar and vector paths and eliminate the switches. Differential Revision: http://reviews.llvm.org/D20289 llvm-svn: 269728
* use 'match' for less indenting; NFCISanjay Patel2016-05-131-21/+20
| | | | llvm-svn: 269494
* Rename getLargestLegalIntTypeSize to getLargestLegalIntTypeSizeInBits(). NFC.Jun Bum Lim2016-05-131-1/+1
| | | | | | | | | | | | Summary: Rename DataLayout::getLargestLegalIntTypeSize to DataLayout::getLargestLegalIntTypeSizeInBits() to prevent similar mistakes fixed in r269433. Reviewers: joker.eph, mcrosier Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D20248 llvm-svn: 269456
* [InstCombine] handle zero constant vectors for LE/GE comparisons tooSanjay Patel2016-05-131-2/+3
| | | | | | | | | | | | Enhancement to: http://reviews.llvm.org/rL269426 With discussion in: http://reviews.llvm.org/D17859 This should complete the fixes for: PR26701, PR26819: https://llvm.org/bugs/show_bug.cgi?id=26701 https://llvm.org/bugs/show_bug.cgi?id=26819 llvm-svn: 269439
OpenPOWER on IntegriCloud