summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* PGOMemOPSizeOpt - silence static analyzer dyn_cast<MemIntrinsic> null ↵Simon Pilgrim2019-09-261-2/+2
| | | | | | | | dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<MemIntrinsic> directly and if not assert will fire for us. llvm-svn: 372959
* [InstCombine] foldUnsignedUnderflowCheck(): one last pattern with 'sub' ↵Roman Lebedev2019-09-251-0/+10
| | | | | | | | (PR43251) https://rise4fun.com/Alive/0j9 llvm-svn: 372930
* [LICM] Don't verify domtree/loopinfo unless EXPENSIVE_CHECKS is enabled.Eli Friedman2019-09-251-1/+1
| | | | | | | | | For large functions, verifying the whole function after each loop takes non-linear time. Differential Revision: https://reviews.llvm.org/D67571 llvm-svn: 372924
* [InstCombine] Fold (A - B) u>=/u< A --> B u>/u<= A iff B != 0Roman Lebedev2019-09-253-20/+14
| | | | | | | | | | | | | https://rise4fun.com/Alive/KtL This also shows that the fold added in D67412 / r372257 was too specific, and the new fold allows those test cases to be handled more generically, therefore i delete now-dead code. This is yet again motivated by D67122 "[UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour" llvm-svn: 372912
* [InstCombine] Limit FMul constant folding for fma simplifications.Florian Hahn2019-09-251-3/+15
| | | | | | | | | | | | | | | | | As @reames pointed out post-commit, rL371518 adds additional rounding in some cases, when doing constant folding of the multiplication. This breaks a guarantee llvm.fma makes and must be avoided. This patch reapplies rL371518, but splits off the simplifications not requiring rounding from SimplifFMulInst as SimplifyFMAFMul. Reviewers: spatel, lebedev.ri, reames, scanon Reviewed By: reames Differential Revision: https://reviews.llvm.org/D67434 llvm-svn: 372899
* [PatternMatch] Make m_Br more flexible, add matchers for BB values.Florian Hahn2019-09-252-10/+7
| | | | | | | | | | | | | | | | | | | | | Currently m_Br only takes references to BasicBlock*, which limits its flexibility. For example, you have to declare a variable, even if you ignore the result or you have to have additional checks to make sure the matched BB matches an expected one. This patch adds m_BasicBlock and m_SpecificBB matchers, which can be used like the existing matchers for constants or values. I also had a look at the existing uses and updated a few. IMO it makes the code a bit more explicit. Reviewers: spatel, craig.topper, RKSimon, majnemer, lebedev.ri Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D68013 llvm-svn: 372885
* [GCRelocate] Add a peephole to canonicalize base pointer relocationPhilip Reames2019-09-241-1/+12
| | | | | | If we generate the gc.relocate, and then later prove two arguments to the statepoint are equivalent, we should canonicalize the gc.relocate to the form we would have produced if this had been known before rewriting. llvm-svn: 372771
* [InstCombine] (a+b) < a && (a+b) != 0 -> (0-b) < a iff a/b != 0 (PR43259)Roman Lebedev2019-09-241-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. For ``` #include <cassert> char* test(char& base, signed long offset) { __builtin_assume(offset < 0); return &base + offset; } ``` We produce https://godbolt.org/z/r40U47 and again those two icmp's can be merged: ``` Name: 0 Pre: C != 0 %adjusted = add i8 %base, C %not_null = icmp ne i8 %adjusted, 0 %no_underflow = icmp ult i8 %adjusted, %base %r = and i1 %not_null, %no_underflow => %neg_offset = sub i8 0, C %r = icmp ugt i8 %base, %neg_offset ``` https://rise4fun.com/Alive/ALap https://rise4fun.com/Alive/slnN There are 3 other variants of this pattern, i believe they all will go into InstSimplify. https://bugs.llvm.org/show_bug.cgi?id=43259 Reviewers: spatel, xbolva00, nikic Reviewed By: spatel Subscribers: efriedma, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67849 llvm-svn: 372768
* [InstCombine] (a+b) <= a && (a+b) != 0 -> (0-b) < a (PR43259)Roman Lebedev2019-09-241-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. This pattern isn't exactly what we get there (strict vs. non-strict predicate), but this pattern does not require known-bits analysis, so it is best to handle it first. ``` Name: 0 %adjusted = add i8 %base, %offset %not_null = icmp ne i8 %adjusted, 0 %no_underflow = icmp ule i8 %adjusted, %base %r = and i1 %not_null, %no_underflow => %neg_offset = sub i8 0, %offset %r = icmp ugt i8 %base, %neg_offset ``` https://rise4fun.com/Alive/knp There are 3 other variants of this pattern, they all will go into InstSimplify: https://rise4fun.com/Alive/bIDZ https://bugs.llvm.org/show_bug.cgi?id=43259 Reviewers: spatel, xbolva00, nikic Reviewed By: spatel Subscribers: hiraditya, majnemer, vsk, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67846 llvm-svn: 372767
* LoopVectorize - silence static analyzer dyn_cast<CmpInst> null dereference ↵Simon Pilgrim2019-09-241-1/+1
| | | | | | | | warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<CmpInst> directly and if not assert will fire for us. llvm-svn: 372732
* [SimplifyCFG] FoldTwoEntryPHINode - silence static analyzer null dereference ↵Simon Pilgrim2019-09-241-0/+1
| | | | | | | | warning. NFCI. Assert that we've found the DomBlock. llvm-svn: 372728
* SimplifyCFG - silence static analyzer dyn_cast<LandingPadInst> null ↵Simon Pilgrim2019-09-241-1/+1
| | | | | | | | dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<LandingPadInst> directly and if not assert will fire for us. llvm-svn: 372727
* SimplifyCFG - silence static analyzer dyn_cast<Instruction> null dereference ↵Simon Pilgrim2019-09-241-2/+1
| | | | | | | | warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<Instruction> directly and if not assert will fire for us. llvm-svn: 372726
* [Debuginfo] dbg.value points to undef value after Induction Variable ↵Alexey Lapshin2019-09-241-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Simplification. Induction Variable Simplification pass does not update dbg.value intrinsic. Before: %add = add nuw nsw i32 %ArgIndex.06, 1 call void @llvm.dbg.value(metadata i32 %add, metadata !17, metadata !DIExpression()) After: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 call void @llvm.dbg.value(metadata i64 undef, metadata !17, metadata !DIExpression()) There should be: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 call void @llvm.dbg.value(metadata i64 %indvars.iv.next, metadata !17, metadata !DIExpression()) Differential Revision: https://reviews.llvm.org/D67770 llvm-svn: 372703
* [LV] Forced vectorization with runtime checks and OptForSizeSjoerd Meijer2019-09-241-2/+13
| | | | | | | | | | | | | | | When vectorisation is forced with a pragma, we optimise for min size, and we need to emit runtime memory checks, then allow this code growth and don't run in an assert like we currently do. This is the result of D65197 and D66803, and was a use-case not really considered before. If this now happens, we emit an optimisation remark warning about the code-size expansion, which can be avoided by not forcing vectorisation or possibly source-code modifications. Differential Revision: https://reviews.llvm.org/D67764 llvm-svn: 372694
* [InstCombine] Fold a shifty implementation of clamp-to-allones.Huihui Zhang2019-09-241-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fold or(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) into X s> Y ? -1 : X https://rise4fun.com/Alive/d8Ab clamp255 is a common operator in image processing, can be implemented in a shifty way "(255 - X) >> 31 | X & 255". Fold shift into select enables more optimization, e.g., vmin generation for ARM target. Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon Reviewed By: lebedev.ri Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67800 llvm-svn: 372678
* [InstCombine] Fold a shifty implementation of clamp-to-zero.Huihui Zhang2019-09-241-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fold and(ashr(subNSW(Y, X), ScalarSizeInBits(Y)-1), X) into X s> Y ? X : 0 https://rise4fun.com/Alive/lFH Fold shift into select enables more optimization, e.g., vmax generation for ARM target. Reviewers: lebedev.ri, efriedma, spatel, kparzysz, bcahoon Reviewed By: lebedev.ri Subscribers: xbolva00, andreadb, craig.topper, RKSimon, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67799 llvm-svn: 372676
* HotColdSplitting: invalidate the AssumptionCache on splitSaleem Abdulrasool2019-09-231-0/+5
| | | | | | | | | | | When a cold path is outlined, the value tracking in the assumption cache may be invalidated due to the code motion. We would previously trip an assertion in subsequent passes (but required the passes to happen in a single run as the assumption cache is shared across the passes). Invalidating the cache ensures that we get the correct information when needed with the legacy pass manager as well. llvm-svn: 372667
* [SampleFDO] Treat names in profile as not cold only when profile symbol listWei Mi2019-09-231-20/+25
| | | | | | | | | | | | | | is available In rL372232, we treated names showing up in profile as not cold when profile-sample-accurate is enabled. This caused 70k size regression in Chrome/Android. The patch put a guard and only enable the change when profile symbol list is available, i.e., keep the old behavior when profile symbol list is not available. Differential Revision: https://reviews.llvm.org/D67931 llvm-svn: 372665
* [InstCombine] foldOrOfICmps(): Acquire SimplifyQuery with set CxtIRoman Lebedev2019-09-231-2/+4
| | | | | | Extracted from https://reviews.llvm.org/D67849#inline-610377 llvm-svn: 372654
* [InstCombine] foldAndOfICmps(): Acquire SimplifyQuery with set CxtIRoman Lebedev2019-09-231-2/+4
| | | | | | Extracted from https://reviews.llvm.org/D67849#inline-610377 llvm-svn: 372653
* [InstCombine] Annotate strndup calls with dereferenceable_or_nullDavid Bolvansky2019-09-231-9/+18
| | | | | | "Implementations are free to malloc() a buffer containing either (size + 1) bytes or (strnlen(s, size) + 1) bytes. Applications should not assume that strndup() will allocate (size + 1) bytes when strlen(s) is smaller than size." llvm-svn: 372647
* [IR] Add getExtendedType() to IntegerType and Type (dispatching to ↵Roman Lebedev2019-09-231-10/+2
| | | | | | IntegerType or VectorType) llvm-svn: 372638
* [InstCombine] dropRedundantMaskingOfLeftShiftInput(): improve commentRoman Lebedev2019-09-231-4/+4
| | | | llvm-svn: 372637
* [SLC] Convert some strndup calls to strdup callsDavid Bolvansky2019-09-233-3/+24
| | | | | | | | | | | | | | | | | | | | | Summary: Motivation: - If we can fold it to strdup, we should (strndup does more things than strdup). - Annotation mechanism. (Works for strdup well). strdup and strndup are part of C 20 (currently posix fns), so we should optimize them. Reviewers: efriedma, jdoerfert Reviewed By: jdoerfert Subscribers: lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67679 llvm-svn: 372636
* [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. c/d/e with mask ↵Roman Lebedev2019-09-231-3/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR42563) Summary: If we have a pattern `(x & (-1 >> maskNbits)) << shiftNbits`, we already know (have a fold) that will drop the `& (-1 >> maskNbits)` mask iff `(shiftNbits-maskNbits) s>= 0` (i.e. `shiftNbits u>= maskNbits`). So even if `(shiftNbits-maskNbits) s< 0`, we can still fold, we will just need to apply a **constant** mask afterwards: ``` Name: c, normal+mask %t0 = lshr i32 -1, C1 %t1 = and i32 %t0, %x %r = shl i32 %t1, C2 => %n0 = shl i32 %x, C2 %n1 = i32 ((-(C2-C1))+32) %n2 = zext i32 %n1 to i64 %n3 = lshr i64 -1, %n2 %n4 = trunc i64 %n3 to i32 %r = and i32 %n0, %n4 ``` https://rise4fun.com/Alive/gslRa Naturally, old `%masked` will have to be one-use. This is not valid for pattern f - where "masking" is done via `ashr`. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67725 llvm-svn: 372630
* [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask ↵Roman Lebedev2019-09-231-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (PR42563) Summary: And this is **finally** the interesting part of that fold! If we have a pattern `(x & (~(-1 << maskNbits))) << shiftNbits`, we already know (have a fold) that will drop the `& (~(-1 << maskNbits))` mask iff `(maskNbits+shiftNbits) u>= bitwidth(x)`. But that is actually ignorant, there's more general fold here: In this pattern, `(maskNbits+shiftNbits)` actually correlates with the number of low bits that will remain in the final value. So even if `(maskNbits+shiftNbits) u< bitwidth(x)`, we can still fold, we will just need to apply a **constant** mask afterwards: ``` Name: a, normal+mask %onebit = shl i32 -1, C1 %mask = xor i32 %onebit, -1 %masked = and i32 %mask, %x %r = shl i32 %masked, C2 => %n0 = shl i32 %x, C2 %n1 = add i32 C1, C2 %n2 = zext i32 %n1 to i64 %n3 = shl i64 -1, %n2 %n4 = xor i64 %n3, -1 %n5 = trunc i64 %n4 to i32 %r = and i32 %n0, %n5 ``` https://rise4fun.com/Alive/F5R Naturally, old `%masked` will have to be one-use. Similar fold exists for patterns c,d,e, will post patch later. https://bugs.llvm.org/show_bug.cgi?id=42563 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67677 llvm-svn: 372629
* [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && ↵Alexey Bataev2019-09-231-66/+75
| | | | | | | | | | | | | | | | | | | | | "SCEVAddRecExpr operand is not loop-invariant!") Summary: Initially SLP vectorizer replaced all going-to-be-vectorized instructions with Undef values. It may break ScalarEvaluation and may cause a crash. Reworked SLP vectorizer so that it does not replace vectorized instructions by UndefValue anymore. Instead vectorized instructions are marked for deletion inside if BoUpSLP class and deleted upon class destruction. Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29641 llvm-svn: 372626
* [InstCombine] foldUnsignedUnderflowCheck(): s/Subtracted/ZeroCmpOp/Roman Lebedev2019-09-231-7/+7
| | | | llvm-svn: 372625
* [FunctionAttrs] Enable nonnull arg propagationDavid Bolvansky2019-09-231-4/+1
| | | | | | Enable flag introduced in rL294998. Security concerns are no longer valid, since function signatures for mentioned libc functions has no nonnull attribute (Clang does not generate them? I see no nonnull attr in LLVM IR for these functions) and since rL372091 we carefully annotate the callsites where we know that size is static, non zero. So let's enable this flag again.. llvm-svn: 372573
* [LSR] Silence static analyzer null dereference warnings with assertions. NFCI.Simon Pilgrim2019-09-221-0/+2
| | | | | | Add assertions to make it clear that GenerateIVChain / NarrowSearchSpaceByPickingWinnerRegs should succeed in finding non-null values llvm-svn: 372518
* ConstantHoisting - Silence static analyzer dyn_cast<PointerType> null ↵Simon Pilgrim2019-09-221-1/+1
| | | | | | dereference warning. NFCI. llvm-svn: 372517
* [InstCombine] allow icmp+binop folds before min/max bailout (PR43310)Sanjay Patel2019-09-222-3/+4
| | | | | | | | | This has the potential to uncover missed analysis/folds as shown in the min/max code comment/test, but fewer restrictions on icmp folds should be better in general to solve cases like: https://bugs.llvm.org/show_bug.cgi?id=43310 llvm-svn: 372510
* [VPlan] Silence static analyzer dyn_cast null dereference warning. NFCI.Simon Pilgrim2019-09-221-1/+1
| | | | llvm-svn: 372502
* SROA: Check Total Bits of vector typeSuyog Sarda2019-09-211-0/+8
| | | | | | | | | | While Promoting alloca instruction of Vector Type, Check total size in bits of its slices too. If they don't match, don't promote the alloca instruction. Bug : https://bugs.llvm.org/show_bug.cgi?id=42585 llvm-svn: 372480
* Test mail. NFC.Suyog Sarda2019-09-211-1/+1
| | | | | | Testing commit acces. NFC. llvm-svn: 372479
* [Attributor] Implement "norecurse" function attribute deductionHideto Ueno2019-09-211-6/+35
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch introduces `norecurse` function attribute deduction. `norecurse` will be deduced if the following conditions hold: * The size of SCC in which the function belongs equals to 1. * The function doesn't have self-recursion. * We have `norecurse` for all call site. To avoid a large change, SCC is calculated using scc_iterator in InfoCache initialization for now. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67751 llvm-svn: 372475
* [AddressSanitizer] Don't dereference dyn_cast<ConstantInt> results. NFCI.Simon Pilgrim2019-09-201-2/+2
| | | | | | The static analyzer is warning about potential null dereference, but we can use cast<ConstantInt> directly and if not assert will fire for us. llvm-svn: 372429
* [ObjC][ARC] Skip debug instructions when computing the insert point ofAkira Hatanaka2019-09-191-0/+4
| | | | | | | | | | | objc_release calls This fixes a bug where the presence of debug instructions would cause ARC optimizer to change the order of retain and release calls. rdar://problem/55319419 llvm-svn: 372352
* Don't use invalidated iterators in FlattenCFGPassJakub Kuderski2019-09-191-7/+17
| | | | | | | | | | | | | | | | | | | | Summary: FlattenCFG may erase unnecessary blocks, which also invalidates iterators to those erased blocks. Before this patch, `iterativelyFlattenCFG` could try to increment a BB iterator after that BB has been removed and crash. This patch makes FlattenCFGPass use `WeakVH` to skip over erased blocks. Reviewers: dblaikie, tstellar, davide, sanjoy, asbirlea, grosser Reviewed By: asbirlea Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67672 llvm-svn: 372347
* [InstCombine] Simplify @llvm.usub.with.overflow+non-zero check (PR43251)Roman Lebedev2019-09-191-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is again motivated by D67122 sanitizer check enhancement. That patch seemingly worsens `-fsanitize=pointer-overflow` overhead from 25% to 50%, which strongly implies missing folds. In this particular case, given ``` char* test(char& base, unsigned long offset) { return &base - offset; } ``` it will end up producing something like https://godbolt.org/z/luGEju which after optimizations reduces down to roughly ``` declare void @use64(i64) define i1 @test(i8* dereferenceable(1) %base, i64 %offset) { %base_int = ptrtoint i8* %base to i64 %adjusted = sub i64 %base_int, %offset call void @use64(i64 %adjusted) %not_null = icmp ne i64 %adjusted, 0 %no_underflow = icmp ule i64 %adjusted, %base_int %no_underflow_and_not_null = and i1 %not_null, %no_underflow ret i1 %no_underflow_and_not_null } ``` Without D67122 there was no `%not_null`, and in this particular case we can "get rid of it", by merging two checks: Here we are checking: `Base u>= Offset && (Base u- Offset) != 0`, but that is simply `Base u> Offset` Alive proofs: https://rise4fun.com/Alive/QOs The `@llvm.usub.with.overflow` pattern itself is not handled here because this is the main pattern, that we currently consider canonical. https://bugs.llvm.org/show_bug.cgi?id=43251 Reviewers: spatel, nikic, xbolva00, majnemer Reviewed By: xbolva00, majnemer Subscribers: vsk, majnemer, xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67356 llvm-svn: 372341
* [Float2Int] avoid crashing on unreachable code (PR38502)Sanjay Patel2019-09-191-18/+29
| | | | | | | | | | | | | In the example from: https://bugs.llvm.org/show_bug.cgi?id=38502 ...we hit infinite looping/crashing because we have non-standard IR - an instruction operand is used before defined. This and other unusual constructs are allowed in unreachable blocks, so avoid the problem by using DominatorTree to step around landmines. Differential Revision: https://reviews.llvm.org/D67766 llvm-svn: 372339
* [Unroll] Add an option to control complete unrollingSerguei Katkov2019-09-192-9/+19
| | | | | | | | | | | | Add an ability to specify the max full unroll count for LoopUnrollPass pass in pass options. Reviewers: fhahn, fedor.sergeev Reviewed By: fedor.sergeev Subscribers: hiraditya, zzheng, dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D67701 llvm-svn: 372305
* [SimplifyCFG] mergeConditionalStoreToAddress(): try to pacify MSANRoman Lebedev2019-09-181-1/+1
| | | | | | | | MSAN bot complains that there is use-of-uninitialized-value of this FreeStores later in IsWorthwhile(). Perhaps FreeStores needs to be stored in a vector? llvm-svn: 372262
* [InstCombine] foldUnsignedUnderflowCheck(): handle last few cases (PR43251)Roman Lebedev2019-09-181-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I don't have a direct motivational case for this, but it would be good to have this for completeness/symmetry. This pattern is basically the motivational pattern from https://bugs.llvm.org/show_bug.cgi?id=43251 but with different predicate that requires that the offset is non-zero. The completeness bit comes from the fact that a similar pattern (offset != zero) will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259, so it'd seem to be good to not overlook very similar patterns.. Proofs: https://rise4fun.com/Alive/21b Also, there is something odd with `isKnownNonZero()`, if the non-zero knowledge was specified as an assumption, it didn't pick it up (PR43267) With this, i see no other missing folds for https://bugs.llvm.org/show_bug.cgi?id=43251 Reviewers: spatel, nikic, xbolva00 Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67412 llvm-svn: 372257
* [SimplifyCFG] mergeConditionalStoreToAddress(): consider cost, not ↵Roman Lebedev2019-09-181-42/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | instruction count Summary: As it can be see in the changed test, while `div` is really costly, we were speculating it. This does not seem correct. Also, the old code would run for every single insturuction in BB, instead of eagerly bailing out as soon as there are too many instructions. This function still has a problem that `PHINodeFoldingThreshold` is per-basic-block, while it should be for all the basic blocks. Reviewers: efriedma, craig.topper, dmgreen, jmolloy Reviewed By: jmolloy Subscribers: xbolva00, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67315 llvm-svn: 372255
* [InstCombine] dropRedundantMaskingOfLeftShiftInput(): some cleanup before ↵Roman Lebedev2019-09-181-5/+8
| | | | | | upcoming patch llvm-svn: 372245
* [SampleFDO] Minimize performance impact when profile-sample-accurateWei Mi2019-09-181-20/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | is enabled. We can save memory and reduce binary size significantly by enabling ProfileSampleAccurate. However when ProfileSampleAccurate is true, function without sample will be regarded as cold and this could potentially cause performance regression. To minimize the potential negative performance impact, we want to be a little conservative here saying if a function shows up in the profile, no matter as outline instance, inline instance or call targets, treat the function as not being cold. This will handle the cases such as most callsites of a function are inlined in sampled binary (thus outline copy don't get any sample) but not inlined in current build (because of source code drift, imprecise debug information, or the callsites are all cold individually but not cold accumulatively...), so that the outline function showing up as cold in sampled binary will actually not be cold after current build. After the change, such function will be treated as not cold even profile-sample-accurate is enabled. At the same time we lower the hot criteria of callsiteIsHot check when profile-sample-accurate is enabled. callsiteIsHot is used to determined whether a callsite is hot and qualified for early inlining. When profile-sample-accurate is enabled, functions without profile will be regarded as cold and much less inlining will happen in CGSCC inlining pass, so we can worry less about size increase and be aggressive to allow more early inlining to happen for warm callsites and it is helpful for performance overall. Differential Revision: https://reviews.llvm.org/D67561 llvm-svn: 372232
* [SimplifyLibCalls] fix crash with empty function name (PR43347)Sanjay Patel2019-09-181-15/+12
| | | | | | | | ...and improve some variable names while here. https://bugs.llvm.org/show_bug.cgi?id=43347 llvm-svn: 372227
* [PGO] Change hardcoded thresholds for cold/inlinehint to use summaryTeresa Johnson2019-09-171-18/+21
| | | | | | | | | | | | | | | | | | | | | | | Summary: The PGO counter reading will add cold and inlinehint (hot) attributes to functions that are very cold or hot. This was using hardcoded thresholds, instead of the profile summary cutoffs which are used in other hot/cold detection and are more dynamic and adaptable. Switch to using the summary-based cold/hot detection. The hardcoded limits were causing some code that had a medium level of hotness (per the summary) to be incorrectly marked with a cold attribute, blocking inlining. Reviewers: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67673 llvm-svn: 372189
OpenPOWER on IntegriCloud