summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/InstCombine
Commit message (Collapse)AuthorAgeFilesLines
* [InstCombine] Call getCmpPredicateForMinMax only with a valid SPFSanjoy Das2015-12-051-1/+5
| | | | | | | | | | | | | | | | Summary: There are `SelectPatternFlavor`s that don't represent min or max idioms, and we should not be passing those to `getCmpPredicateForMinMax`. Fixes PR25745. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15249 llvm-svn: 254869
* Move EH-specific helper functions to a more appropriate placeDavid Majnemer2015-12-021-1/+1
| | | | | | No functionality change is intended. llvm-svn: 254562
* Do (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1 rather than (A == C1 || A ↵David Majnemer2015-12-021-4/+4
| | | | | | | | | | == C2) -> (A | (C1 ^ C2)) == C2 when C1 ^ C2 is a power of 2. Differential Revision: http://reviews.llvm.org/D14223 Patch by Amaury SECHET! llvm-svn: 254518
* [AttributeSet] Overload AttributeSet::addAttribute to reduce compileAkira Hatanaka2015-12-021-7/+14
| | | | | | | | | | | | | | | | | | | | time. The new overloaded function is used when an attribute is added to a large number of slots of an AttributeSet (for example, to function parameters). This is much faster than calling AttributeSet::addAttribute once per slot, because AttributeSet::getImpl (which calls FoldingSet::FIndNodeOrInsertPos) is called only once per function instead of once per slot. With this commit, clang compiles a file which used to take over 22 minutes in just 13 seconds. rdar://problem/23581000 Differential Revision: http://reviews.llvm.org/D15085 llvm-svn: 254491
* fix typos in comments; NFCSanjay Patel2015-11-291-6/+8
| | | | llvm-svn: 254266
* [OperandBundles] Extract duplicated code into a helper function, NFCSanjoy Das2015-11-251-5/+1
| | | | llvm-svn: 254047
* [InstCombine] Don't drop operand bundlesSanjoy Das2015-11-251-3/+10
| | | | | | | | | | Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D14857 llvm-svn: 254046
* [InstCombine] fix propagation of fast-math-flagsSanjay Patel2015-11-241-10/+5
| | | | | | | Noticed while working on D4583: http://reviews.llvm.org/D4583 llvm-svn: 253997
* use ternary ops; NFCSanjay Patel2015-11-211-8/+2
| | | | llvm-svn: 253787
* remove unnecessary temp variables; NFCSanjay Patel2015-11-211-4/+2
| | | | llvm-svn: 253786
* fix typo; NFCSanjay Patel2015-11-211-1/+1
| | | | llvm-svn: 253785
* Revert "Change memcpy/memset/memmove to have dest and source alignments."Pete Cooper2015-11-192-17/+14
| | | | | | | | | | This reverts commit r253511. This likely broke the bots in http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202 http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787 llvm-svn: 253543
* Change memcpy/memset/memmove to have dest and source alignments.Pete Cooper2015-11-182-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html These intrinsics currently have an explicit alignment argument which is required to be a constant integer. It represents the alignment of the source and dest, and so must be the minimum of those. This change allows source and dest to each have their own alignments by using the alignment attribute on their arguments. The alignment argument itself is removed. There are a few places in the code for which the code needs to be checked by an expert as to whether using only src/dest alignment is safe. For those places, they currently take the minimum of src/dest alignments which matches the current behaviour. For example, code which used to read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false) will now read: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false) For out of tree owners, I was able to strip alignment from calls using sed by replacing: (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\) with: $1i1 false) and similarly for memmove and memcpy. I then added back in alignment to test cases which needed it. A similar commit will be made to clang which actually has many differences in alignment as now IRBuilder can generate different source/dest alignments on calls. In IRBuilder itself, a new argument was added. Instead of calling: CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false) you now call CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false) There is a temporary class (IntegerAlignment) which takes the source alignment and rejects implicit conversion from bool. This is to prevent isVolatile here from passing its default parameter to the source alignment. Note, changes in future can now be made to codegen. I didn't change anything here, but this change should enable better memcpy code sequences. Reviewed by Hal Finkel. llvm-svn: 253511
* [InstCombine] refactor optimizeIntToFloatBitCast() ; NFCISanjay Patel2015-11-181-38/+29
| | | | | | | | | | | | | | | The logic for handling the pattern without a shift is identical to the logic for handling the pattern with a shift if you set the shift amount to zero for the former. This should make it easier to see that we probably don't even need optimizeIntToFloatBitCast(). If we call something like foldVecTruncToExtElt() from visitTrunc(), we'll solve PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 llvm-svn: 253403
* [EH] Keep filter clauses for types that have been caught.Andrew Kaylor2015-11-171-4/+18
| | | | | | | | The instruction combiner previously removed types from filter clauses in Landing Pad instructions if the type had previously been seen in a catch clause. This is incorrect and prevents unexpected exception handlers from rethrowing the caught type. Differential Revision: http://reviews.llvm.org/D14669 llvm-svn: 253370
* fix typos; NFCSanjay Patel2015-11-171-2/+2
| | | | llvm-svn: 253359
* use local variables; NFCISanjay Patel2015-11-171-7/+7
| | | | llvm-svn: 253356
* function names start with a lower case letter; NFCSanjay Patel2015-11-171-20/+20
| | | | llvm-svn: 253348
* use range-based for loop; NFCISanjay Patel2015-11-161-2/+2
| | | | llvm-svn: 253256
* Fixed GEP visitor in the InstCombine pass.Elena Demikhovsky2015-11-151-5/+10
| | | | | | | | | | | | | The current implementation of GEP visitor in InstCombine fails with assertion on Vector GEP with mix of scalar and vector types, like this: getelementptr double, double* %a, <8 x i32> %i (It fails to create a "sext" from <8 x i32> to <8 x i64>) I fixed it and added some tests. Differential Revision: http://reviews.llvm.org/D14485 llvm-svn: 253162
* [InstCombine] Add trivial folding (bitreverse (bitreverse x)) -> xJames Molloy2015-11-121-0/+10
| | | | | | There are plenty more instcombines we could probably do with bitreverse, but this seems like a very obvious and trivial starting point and was brought up by Hal in his review. llvm-svn: 252879
* [InstCombine] Teach FoldPHIArgZextsIntoPHI about EHPadsDavid Majnemer2015-11-071-0/+6
| | | | | | | | FoldPHIArgZextsIntoPHI cannot insert an instruction after the PHI if there is an EHPad in the BB. Doing so would result in an instruction inserted after a terminator. llvm-svn: 252377
* [InstCombine] Don't insert an instruction after a terminatorDavid Majnemer2015-11-061-0/+6
| | | | | | | | We tried to insert a cast of a phi in a block whose terminator is an EHPad. This is invalid. Do not attempt the transform in these circumstances. llvm-svn: 252370
* [InstCombine] Don't RAUW tokens with undefDavid Majnemer2015-11-061-2/+3
| | | | | | Let SimplifyCFG remove unreachable BBs which define token instructions. llvm-svn: 252343
* Fix some Clang-tidy modernize warnings, other minor fixes.Eugene Zelenko2015-11-041-14/+12
| | | | | | | | Fixed warnings are: modernize-use-override, modernize-use-nullptr and modernize-redundant-void-arg. Differential revision: http://reviews.llvm.org/D14312 llvm-svn: 252087
* InstCombine: fix sinking of convergent callsFiona Glaser2015-11-031-0/+6
| | | | llvm-svn: 251991
* don't repeat function names in comments; NFCSanjay Patel2015-11-021-2/+2
| | | | llvm-svn: 251846
* Preserve load alignment and dereferenceable metadata during some transformationsArtur Pilipenko2015-11-022-6/+20
| | | | | | | | Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D13953 llvm-svn: 251809
* [InstCombine] Teach instcombine not to create extra PHI nodes when folding GEPsSilviu Baranga2015-10-261-1/+6
| | | | | | | | | | | | | | | | | | Summary: InstCombine tries to transform GEP(PHI(GEP1, GEP2, ..)) into GEP(GEP(PHI(...)) when possible. However, this may leave the old PHI node around. Even if we do end up folding the GEPs, having an extra PHI node might not be beneficial. This change makes the transformation more conservative. We now only do this if the PHI has only one use, and can therefore be removed after the transformation. Reviewers: jmolloy, majnemer Subscribers: mcrosier, mssimpso, llvm-commits Differential Revision: http://reviews.llvm.org/D13887 llvm-svn: 251281
* Handle non-constant shifts in computeKnownBits, and use computeKnownBits for ↵Hal Finkel2015-10-231-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | constant folding in InstCombine/Simplify First, the motivation: LLVM currently does not realize that: ((2072 >> (L == 0)) >> 7) & 1 == 0 where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the lowest-order bit is always zero. There are obviously several ways to go about fixing this, but the generic solution pursued in this patch is to teach computeKnownBits something about shifts by a non-constant amount. Previously, we would give up completely on these. Instead, in cases where we know something about the low-order bits of the shift-amount operand, we can combine (and together) the associated restrictions for all shift amounts consistent with that knowledge. As a further generalization, I refactored all of the logic for all three kinds of shifts to have this capability. This works well in the above case, for example, because the dynamic shift amount can only be 0 or 1, and thus we can say a lot about the known bits of the result. This brings us to the second part of this change: Even when we know all of the bits of a value via computeKnownBits, nothing used to constant-fold the result. This introduces the necessary code into InstCombine and InstSimplify. I've added it into both because: 1. InstCombine won't automatically pick up the associated logic in InstSimplify (InstCombine uses InstSimplify, but not via the API that passes in the original instruction). 2. Putting the logic in InstCombine allows the resulting simplifications to become part of the iterative worklist 3. Putting the logic in InstSimplify allows the resulting simplifications to be used by everywhere else that calls SimplifyInstruction (inlining, unrolling, and many others). And this requires a small change to our definition of an ephemeral value so that we don't break the rest case from r246696 (where the icmp feeding the @llvm.assume, is also feeding a br). Under the old definition, the icmp would not be considered ephemeral (because it is used by the br), but this causes the assume to remove itself (in addition to simplifying the branch structure), and it seems more-useful to prevent that from happening. llvm-svn: 251146
* Use ArrayRef instead of pointer and size. NFCCraig Topper2015-10-221-2/+1
| | | | llvm-svn: 251029
* [InstCombine] Optimize icmp of inc/dec at RHSMichael Liao2015-10-191-0/+20
| | | | | | | | | | | | | | | | Allow LLVM to optimize the sequence like the following: %inc = add nsw i32 %i, 1 %cmp = icmp slt %n, %inc into: %cmp = icmp sle i32 %n, %i The case is not handled previously due to the complexity of compuation of %n. Hence, LLVM cannot swap operands of icmp accordingly. llvm-svn: 250746
* [InstCombine] SSE4A constant folding and conversion to shuffles.Simon Pilgrim2015-10-171-53/+270
| | | | | | | | | | | | | This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics: 1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant 2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful) 3 - Constant folding support 4 - Add zeroinitializer handling Differential Revision: http://reviews.llvm.org/D13348 llvm-svn: 250609
* InstCombine: Remove ilist iterator implicit conversions, NFCDuncan P. N. Exon Smith2015-10-138-37/+38
| | | | | | | Stop relying on implicit conversions of ilist iterators in LLVMInstCombine. No functionality change intended. llvm-svn: 250183
* [InstCombine][SSE4A] Remove broken INSERTQI range combining optimizationSimon Pilgrim2015-10-131-45/+4
| | | | | | | | As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index. The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization. llvm-svn: 250160
* [InstCombine][X86][XOP] Combine XOP integer vector comparisons to native IRSimon Pilgrim2015-10-111-0/+53
| | | | | | We now have lowering support for XOP PCOM/PCOMU instructions. llvm-svn: 249977
* [InstCombine] transform masking off of an FP sign bit into a fabs() ↵Sanjay Patel2015-10-081-4/+19
| | | | | | | | | | | | | | | | | | intrinsic call (PR24886) This is a partial fix for PR24886: https://llvm.org/bugs/show_bug.cgi?id=24886 Without this IR transform, the backend (x86 at least) was producing inefficient code. This patch is making 2 assumptions: 1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic. 2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef. Differential Revision: http://reviews.llvm.org/D13076 llvm-svn: 249702
* InstCombine: Fold comparisons between unguessable allocas and other pointersHans Wennborg2015-10-072-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will allow us to optimize code such as: int f(int *p) { int x; return p == &x; } as well as: int *allocate(void); int f() { int x; int *p = allocate(); return p == &x; } The folding can only be done under certain circumstances. Even though p and &x cannot alias, the comparison must still return true if the pointer representations are equal. If a user successfully generates a p that's a correct guess for &x, comparison should return true even though p is an invalid pointer. This patch argues that if the address of the alloca isn't observable outside the function, the function can act as-if the address is impossible to guess from the outside. The tricky part is keeping the act consistent: if we fold p == &x to false in one place, we must make sure to fold any other comparisons based on those pointers similarly. To ensure that, we only fold when &x is involved exactly once in comparison instructions. Differential Revision: http://reviews.llvm.org/D13358 llvm-svn: 249490
* Fix Clang-tidy modernize-use-nullptr warnings in source directories and ↵Hans Wennborg2015-10-061-13/+1
| | | | | | | | | | generated files; other minor cleanups. Patch by Eugene Zelenko! Differential Revision: http://reviews.llvm.org/D13321 llvm-svn: 249482
* [WinEH] Recognize CoreCLR personality functionJoseph Tremoulet2015-10-061-0/+1
| | | | | | | | | | | | | | | Summary: - Add CoreCLR to if/else ladders and switches as appropriate. - Rename isMSVCEHPersonality to isFuncletEHPersonality to better reflect what it captures. Reviewers: majnemer, andrew.w.kaylor, rnk Subscribers: pgavlin, AndyAyers, llvm-commits Differential Revision: http://reviews.llvm.org/D13449 llvm-svn: 249455
* [InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector ↵Andrea Di Biagio2015-10-061-1/+7
| | | | | | | | | | | | | | | | | | | | | | select masks with ConstantExpr elements (PR24922) If the mask of a select instruction is a ConstantVector, method SimplifyDemandedVectorElts iterates over the mask elements to identify which values are selected from the select inputs. Before this patch, method SimplifyDemandedVectorElts always used method Constant::isNullValue() to check if a value in the mask was zero. Unfortunately that method always returns false when called on a ConstantExpr. This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we avoid calling isNullValue() on it. Fixes PR24922. Differential Revision: http://reviews.llvm.org/D13219 llvm-svn: 249390
* inariant.group handling in GVNPiotr Padlewski2015-10-021-7/+4
| | | | | | | | | | | | The most important part required to make clang devirtualization works ( ͡°͜ʖ ͡°). The code is able to find non local dependencies, but unfortunatelly because the caller can only handle local dependencies, I had to add some restrictions to look for dependencies only in the same BB. http://reviews.llvm.org/D12992 llvm-svn: 249196
* [InstCombine] Remove trivially empty lifetime start/end ranges.Arnaud A. de Grandmaison2015-10-011-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Some passes may open up opportunities for optimizations, leaving empty lifetime start/end ranges. For example, with the following code: void foo(char *, char *); void bar(int Size, bool flag) { for (int i = 0; i < Size; ++i) { char text[1]; char buff[1]; if (flag) foo(text, buff); // BBFoo } } the loop unswitch pass will create 2 versions of the loop, one with flag==true, and the other one with flag==false, but always leaving the BBFoo basic block, with lifetime ranges covering the scope of the for loop. Simplify CFG will then remove BBFoo in the case where flag==false, but will leave the lifetime markers. This patch teaches InstCombine to remove trivially empty lifetime marker ranges, that is ranges ending right after they were started (ignoring debug info or other lifetime markers in the range). This fixes PR24598: excessive compile time after r234581. Reviewers: reames, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13305 llvm-svn: 249018
* [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin ↵Andrea Di Biagio2015-09-301-0/+41
| | | | | | | | | | | | | | | | shuffles if the shuffle mask is constant. This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a builtin shuffle if the mask is constant. Converting byte shuffle intrinsic calls to builtin shuffles can help finding more opportunities for combining shuffles later on in selection dag. We may end up with byte shuffles with constant masks as the result of inlining. Differential Revision: http://reviews.llvm.org/D13252 llvm-svn: 248913
* [InstCombine] Improve Vector Demanded Bits Through BitcastsSimon Pilgrim2015-09-291-35/+34
| | | | | | | | | | | | Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements. This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector). This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts. Differential Revision: http://reviews.llvm.org/D12935 llvm-svn: 248784
* [InstCombine] fold zexts and constants into a phi (PR24766)Sanjay Patel2015-09-272-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is one step towards solving PR24766: https://llvm.org/bugs/show_bug.cgi?id=24766 We were not producing the same IR for these two C functions because the store to the temp bool causes extra zexts: #include <stdbool.h> bool switchy(char x1, char x2, char condition) { bool conditionMet = false; switch (condition) { case 0: conditionMet = (x1 == x2); break; case 1: conditionMet = (x1 <= x2); break; } return conditionMet; } bool switchy2(char x1, char x2, char condition) { switch (condition) { case 0: return (x1 == x2); case 1: return (x1 <= x2); } return false; } As noted in the code comments, this test case manages to avoid the more general existing phi optimizations where there are only 2 phi inputs or where there are no constant phi args mixed in with the casts ops. It seems like a corner case, but if we don't catch it, then I don't think we can get SimplifyCFG to further optimize towards the canonical form for this function shown in the bug report. Differential Revision: http://reviews.llvm.org/D12866 llvm-svn: 248689
* [InstCombine] match De Morgan's Law hidden by zext ops (PR22723)Sanjay Patel2015-09-251-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a fix for PR22723: https://llvm.org/bugs/show_bug.cgi?id=22723 My first attempt at this was to change what I thought was the root problem: xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32 ...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop! My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would mean potentially returning a different size for the match than what was input. I think this would require all users of m_Not to check the size of the returned match, so I abandoned that idea. I settled on just fixing the exact case presented in the PR. This patch does allow the 2 functions in PR22723 to compile identically (x86): bool test(bool x, bool y) { return !x | !y; } bool test(bool x, bool y) { return !x || !y; } ... andb %sil, %dil xorb $1, %dil movb %dil, %al retq Differential Revision: http://reviews.llvm.org/D12705 llvm-svn: 248634
* [InstCombine] Recognize another bswap idiom.Charlie Turner2015-09-241-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | Summary: The byte-swap recognizer can now notice that this ``` uint32_t bswap(uint32_t x) { x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16; x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8; return x; } ``` is a bswap. Fixes PR23863. Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin Subscribers: majnemer, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D12637 llvm-svn: 248482
* [InstCombine] Preserve metadata when merging loads that are phiAkira Hatanaka2015-09-231-6/+19
| | | | | | | | | | | | | | | | | | | | arguments. Make sure InstCombiner::FoldPHIArgLoadIntoPHI doesn't drop the following metadata: MD_tbaa MD_alias_scope MD_noalias MD_invariant_load MD_nonnull MD_range rdar://problem/17617709 Differential Revision: http://reviews.llvm.org/D12710 llvm-svn: 248419
* [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IRSimon Pilgrim2015-09-231-6/+0
| | | | | | | | | | This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead. LLVM counterpart to D12835 Differential Revision: http://reviews.llvm.org/D13002 llvm-svn: 248368
OpenPOWER on IntegriCloud