summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* Recommit "r306541 - Add zero-length check to memcpy/memset load store loop ↵Teresa Johnson2017-07-011-5/+14
| | | | | | | | | | expansion"" With fix for use-after-free errors. We can't add the new branch and remove the old one until we are done with the Builder constructed for the block. llvm-svn: 306937
* Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by ↵Teresa Johnson2017-07-011-1/+1
| | | | | | | | | default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306936
* re-commit r306336: Enable vectorizer-maximize-bandwidth by default.Teresa Johnson2017-07-011-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306935
* revert r306336 for breaking ppc test.Teresa Johnson2017-07-011-1/+1
| | | | llvm-svn: 306934
* Enable vectorizer-maximize-bandwidth by default.Teresa Johnson2017-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306933
* [SLPVectorizer] Add isOdd() helper function, NFCI.Dinar Temirbulatov2017-06-301-2/+7
| | | | llvm-svn: 306887
* [InstCombine] Replace an unnecessary use of a matcher with just an isa and a ↵Craig Topper2017-06-301-3/+2
| | | | | | | | cast. NFC We aren't looking through any levels of IR here so I don't think we need the power of a matcher or the temporary variable it requires. llvm-svn: 306885
* [LV] Sink casts to unravel first order recurrenceAyal Zaks2017-06-302-4/+33
| | | | | | | | | | | Check if a single cast is preventing handling a first-order-recurrence Phi, because the scheduling constraints it imposes on the first-order-recurrence shuffle are infeasible; but they can be made feasible by moving the cast downwards. Record such casts and move them when vectorizing the loop. Differential Revision: https://reviews.llvm.org/D33058 llvm-svn: 306884
* [SimplifyCFG] Update the name of switch generated lookup table.Sumanth Gundapaneni2017-06-301-4/+6
| | | | | | | | | | This patch appends the name of the function to the switch generated lookup table. This will ease the visual debugging in identifying the function the table is generated from. Differential Revision: https://reviews.llvm.org/D34817 llvm-svn: 306867
* [InstCombine] Add m_BitReverse pattern match helper. NFCI.Simon Pilgrim2017-06-301-1/+1
| | | | llvm-svn: 306860
* [RuntimeUnrolling] Add logic for loops with multiple exit blocksAnna Thomas2017-06-301-23/+101
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Runtime unrolling is done for loops with a single exit block and a single exiting block (and this exiting block should be the latch block). This patch adds logic to support unrolling in the presence of multiple exit blocks (which also means multiple exiting blocks). Currently this is under an off-by-default option and is supported when epilog code is generated. Support in presence of prolog code will be in a future patch (we just need to add more tests, and update comments). This patch is essentially an implementation patch. I have not added any heuristic (in terms of branches added or code size) to decide when this should be enabled. Reviewers: mkuper, sanjoy, reames, evstupac Reviewed by: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33001 llvm-svn: 306846
* Revert of r306525: "Canonicalize clamp of float types to minmax"Nikolai Bozhenov2017-06-301-10/+3
| | | | llvm-svn: 306815
* [LV] Optimize for size when vectorizing loops with tiny trip countAyal Zaks2017-06-301-29/+30
| | | | | | | | | | | | | | | | | It may be detrimental to vectorize loops with very small trip count, as various costs of the vectorized loop body as well as enclosing overheads including runtime tests and scalar iterations may outweigh the gains of vectorizing. The current cost model measures the cost of the vectorized loop body only, expecting it will amortize other costs, and loops with known or expected very small trip counts are not vectorized at all. This patch allows loops with very small trip counts to be vectorized, but under OptForSize constraints, which ensure the cost of the loop body is dominant, having no runtime guards nor scalar iterations. Patch inspired by D32451. Differential Revision: https://reviews.llvm.org/D34373 llvm-svn: 306803
* [InstCombine] In foldXorToXor, move the commutable matcher from the LHS ↵Craig Topper2017-06-301-8/+8
| | | | | | | | | | match to the RHS match. No meaningful change intended. There are two conditions ORed here with similar checks and each contain two matches that must be true for the if to succeed. With the commutable match on the first half of the OR then both ifs basically have the same first part and only the second part distinguishs. With this change we move the commutable match to second half and make the first half unique. This caused some tests to change because we now produce a commuted result, but this shouldn't matter in practice. llvm-svn: 306800
* Remove the BBVectorize pass.Chandler Carruth2017-06-304-3332/+7
| | | | | | | | | | | | | It served us well, helped kick-start much of the vectorization efforts in LLVM, etc. Its time has come and past. Back in 2014: http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html Time to actually let go and move forward. =] I've updated the release notes both about the removal and the deprecation of the corresponding C API. llvm-svn: 306797
* Revert "r306541 - Add zero-length check to memcpy/memset load store loop ↵Daniel Jasper2017-06-301-12/+5
| | | | | | | | | expansion" Segfaults in non-optimized builds. I'll get a stack trace and a reproducer to Teresa. llvm-svn: 306793
* Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by ↵Daniel Jasper2017-06-301-1/+1
| | | | | | | | | default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306792
* [SCEV] Use depth limit instead of local cache for SExt and ZExtMax Kazantsev2017-06-301-5/+5
| | | | | | | | | | | | | | | | In rL300494 there was an attempt to deal with excessive compile time on invocations of getSign/ZeroExtExpr using local caching. This approach only helps if we request the same SCEV multiple times throughout recursion. But in the bug PR33431 we see a case where we request different values all the time, so caching does not help and the size of the cache grows enormously. In this patch we remove the local cache for this methods and add the recursion depth limit instead, as we do for arithmetics. This gives us a guarantee that the invocation sequence is limited and reasonably short. Differential Revision: https://reviews.llvm.org/D34273 llvm-svn: 306785
* Reduce indenting and clean up comparisons around sign bit.Eric Christopher2017-06-301-6/+7
| | | | llvm-svn: 306781
* Reduce the complexity of the signbit/branch test functions.Eric Christopher2017-06-301-3/+3
| | | | llvm-svn: 306779
* Hook the sample PGO machinery in the new PMDehao Chen2017-06-291-1/+2
| | | | | | | | | | | | | | Summary: This patch hooks up SampleProfileLoaderPass with the new PM. Reviewers: chandlerc, davidxl, davide, tejohnson Reviewed By: chandlerc, tejohnson Subscribers: tejohnson, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D34720 llvm-svn: 306763
* [SLPVectorizer] Moving Entry->NeedToGather check out of inner loop, Dinar Temirbulatov2017-06-291-4/+4
| | | | | | since it is invariant there. NFCI. llvm-svn: 306749
* Remove `inline` keyword from inline `classof` methodsSam Clegg2017-06-291-22/+22
| | | | | | | | | | | | | | | | | | | | | | The style guide states that the explicit `inline` should not be used with inline methods. classof is very common inline method with a fair amount on inconsistency: $ git grep classof ./include | grep inline | wc -l 230 $ git grep classof ./include | grep -v inline | wc -l 257 I chose to target this method rather the larger change since this method is easily cargo-culted (I did it at least once). I considered doing the larger change and removing all occurrences but that would be a much larger change. Differential Revision: https://reviews.llvm.org/D33906 llvm-svn: 306731
* Remove useless header. NFCXin Tong2017-06-291-1/+0
| | | | llvm-svn: 306712
* [ConstantHoisting] Avoid hoisting constants in GEPs that index into a struct ↵Leo Li2017-06-291-35/+60
| | | | | | | | | | | | | | | | | | | | | type. Summary: Indices for GEPs that index into a struct type should always be constants. This added more checks in `collectConstantCandidates:` which make sure constants for GEP pointer type are not hoisted. This fixed Bug https://bugs.llvm.org/show_bug.cgi?id=33538 Reviewers: ributzka, rnk Reviewed By: ributzka Subscribers: efriedma, llvm-commits, srhines, javed.absar, pirama Differential Revision: https://reviews.llvm.org/D34576 llvm-svn: 306704
* PredicateInfo: Use OrderedInstructions instead of our homemadeDaniel Berlin2017-06-291-51/+27
| | | | | | version. llvm-svn: 306703
* NewGVN: Remove useless test in addPhiOfOps.Daniel Berlin2017-06-291-2/+1
| | | | llvm-svn: 306702
* Remove unneeded else from OrderedInstructions::dominates.Daniel Berlin2017-06-291-2/+1
| | | | llvm-svn: 306701
* [SLPVectorizer] Introducing getTreeEntry() helper function [NFC]Dinar Temirbulatov2017-06-291-34/+33
| | | | | | Differential Revision: https://reviews.llvm.org/D34756 llvm-svn: 306655
* [InstCombine] In visitXor, use m_Not on the instruction itself instead of ↵Craig Topper2017-06-291-3/+2
| | | | | | looking for all ones in Op1. This is consistent with 3 other not checks before this one. NFCI llvm-svn: 306617
* [InstCombine] Retain TBAA when narrowing memory accessesKeno Fischer2017-06-282-3/+29
| | | | | | | | | | | | Summary: As discussed on the mailing list it is legal to propagate TBAA to loads/stores from/to smaller regions of a larger load tagged with TBAA. Do so for (load->extractvalue)=>(gep->load) and similar foldings. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D31954 llvm-svn: 306615
* [LV] Fix PR33613 - retain order of insertelement per partAyal Zaks2017-06-281-6/+7
| | | | | | | | | | | | r306381 caused PR33613, by reversing the order in which insertelements were generated per unroll part. This patch fixes PR33613 by retraining this order, placing each set of insertelements per part immediately after the last scalar being packed for this part. Includes a test case derived from PR33613. Reference: https://bugs.llvm.org/show_bug.cgi?id=33613 Differential Revision: https://reviews.llvm.org/D34760 llvm-svn: 306575
* [LoopUnroll] Fix bug in computeUnrollCount causing it to not honor MaxCountGeoff Berry2017-06-281-0/+2
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: mcrosier, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D34532 llvm-svn: 306564
* [InstCombine] use local variable to reduce code; NFCISanjay Patel2017-06-281-18/+14
| | | | llvm-svn: 306560
* [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI.Geoff Berry2017-06-281-14/+14
| | | | | | | | | | Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554
* Add zero-length check to memcpy/memset load store loop expansionTeresa Johnson2017-06-281-5/+12
| | | | | | | | | | | | | | | | Summary: I was testing using this expansion logic in other cases besides NVPTX, and found some runtime failures due to the lack of a check for a zero length memcpy/memset before the loop. There is already such a check in the memmove expansion code though. Reviewers: hfinkel Subscribers: jholewinski, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D34707 llvm-svn: 306541
* [InstCombine] Canonicalize clamp of float types to minmax in fast mode.Nikolai Bozhenov2017-06-281-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit allows matchSelectPattern to recognize clamp of float arguments in the presence of FMF the same way as already done for integers. This case is a little different though. With integers, given the min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX "automatically". That is not the case for float, because for them only full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care about NaNs. On the other hand, some backends (e.g. X86) have only FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM nodes are illegal thus selection is not happening. So I decided to do such kind of transformation in IR (InstCombiner) instead of complicating the logic in the backend. Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper Reviewed By: efriedma Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D33186 llvm-svn: 306525
* [IRCE][NFC] Better get SCEV for 1 in calculateSubRangesMax Kazantsev2017-06-281-3/+3
| | | | | | | | | A slightly more efficient way to get constant, we avoid resolving in getSCEV and excessive invocations, and we don't create a ConstantInt if 'true' branch is taken. Differential Revision: https://reviews.llvm.org/D34672 llvm-svn: 306503
* Inlining: Don't re-map simplified cloned instructions.Kyle Butt2017-06-281-4/+5
| | | | | | | | | | | | | When simplifying an instruction that has been re-mapped, it should never simplify to an instruction in the original function. In the edge case where we are inlining a function into itself, the existing code led to incorrect behavior. Replace the incorrect code with an assert verifying that we never expect simplification to produce an instruction in the old function, unless the functions are the same. Differential Revision: https://reviews.llvm.org/D33850 llvm-svn: 306495
* Bitcode: Write the irsymtab to disk.Peter Collingbourne2017-06-271-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D33973 llvm-svn: 306487
* [EarlyCSE][MemorySSA] Enable MemorySSA in function-simplification pass of ↵Geoff Berry2017-06-271-2/+2
| | | | | | EarlyCSE. llvm-svn: 306477
* re-commit r306336: Enable vectorizer-maximize-bandwidth by default.Dehao Chen2017-06-271-1/+1
| | | | | | Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306473
* [InstCombine] Propagate nsw flag when turning mul by pow2 into shift when ↵Craig Topper2017-06-271-2/+2
| | | | | | | | | | | | the constant is a vector splat or the scalar bit width is larger than 64-bits The check to see if we can propagate the nsw flag used m_ConstantInt(uint64_t*&) which doesn't work with splat vectors and has a restriction that the bitwidth of the ConstantInt must be 64-bits are less. This patch changes it to use m_APInt to remove both these issues Differential Revision: https://reviews.llvm.org/D34699 llvm-svn: 306457
* [CodeExtractor] Prevent extraction of block involving blockaddressSerge Guelton2017-06-271-0/+27
| | | | | | | | | BlockAddress are only valid within their function context, which does not interact well with CodeExtractor. Detect this case and prevent it. Differential Revision: https://reviews.llvm.org/D33839 llvm-svn: 306448
* [SROA] Fix APInt size when alloca address space is not 0Yaxun Liu2017-06-271-2/+3
| | | | | | | | SROA assumes alloca address space is 0, which causes assertion. This patch fixes that. Differential Revision: https://reviews.llvm.org/D34104 llvm-svn: 306440
* [InstCombine] canonicalize icmp predicate feeding selectSanjay Patel2017-06-271-0/+17
| | | | | | | | | | | | | | | | | | | | | This canonicalization was suggested in D33172 as a way to make InstCombine behavior more uniform. We have this transform for icmp+br, so unless there's some reason that icmp+select should be treated differently, we should do the same thing here. The benefit comes from increasing the chances of creating identical instructions. This is shown in the tests in logical-select.ll (PR32791). InstCombine doesn't fold those directly, but EarlyCSE can simplify the identical cmps, and then InstCombine can fold the selects together. The possible regression for the tests in select.ll raises questions about poison/undef: http://lists.llvm.org/pipermail/llvm-dev/2017-May/113261.html ...but that transform is just as likely to be triggered by this canonicalization as it is to be missed, so we're just pointing out a commutation deficiency in the pattern matching: https://reviews.llvm.org/rL228409 Differential Revision: https://reviews.llvm.org/D34242 llvm-svn: 306435
* Enable ICP for AutoFDO.Dehao Chen2017-06-271-2/+3
| | | | | | | | | | | | | | Summary: AutoFDO should have ICP enabled. Reviewers: davidxl Reviewed By: davidxl Subscribers: sanjoy, mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D34662 llvm-svn: 306429
* [LoopUnrollRuntime] Use SCEV exit count for calculating trip count. NFCIAnna Thomas2017-06-271-1/+5
| | | | | | | Instead of getBackEdgeTakenCount, use getExitCount on the latch exiting block (which is proven to be the only exiting block in the loop to be unrolled). llvm-svn: 306410
* Recommitting 306331.Ayal Zaks2017-06-271-287/+300
| | | | | | | Undoing revert 306338 after fixed bug: add metadata to the load instead of the reverse shuffle added to it, retaining the original ValueMap implementation. llvm-svn: 306381
* [SROA] Fix PR32902 by more carefully propagating !nonnull metadata.Chandler Carruth2017-06-271-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is based heavily on the work done ni D34285. I mostly wanted to do test cleanup for the author to save them some time, but I had a really hard time understanding why it was so hard to write better test cases for these issues. The problem is that because SROA does a second rewrite of the loads and because we *don't* propagate !nonnull for non-pointer loads, we first introduced invalid !nonnull metadata and then stripped it back off just in time to avoid most ways of this PR manifesting. Moving to the more careful utility only fixes this by changing the predicate to look at the new load's type rather than the target type. However, that *does* fix the bug, and the utility is much nicer including adding range metadata to model the nonnull property after a conversion to an integer. However, we have bigger problems because we don't actually propagate *range* metadata, and the utility to do this extracted from instcombine isn't really in good shape to do this currently. It *only* handles the case of copying range metadata from an integer load to a pointer load. It doesn't even handle the trivial cases of propagating from one integer load to another when they are the same width! This utility will need to be beefed up prior to using in this location to get the metadata to fully survive. And even then, we need to go and teach things to turn the range metadata into an assume the way we do with nonnull so that when we *promote* an integer we don't lose the information. All of this will require a new test case that looks kind-of like `preserve-nonnull.ll` does here but focuses on range metadata. It will also likely require more testing because it needs to correctly handle changes to the integer width, especially as SROA actively tries to change the integer width! Last but not least, I'm a little worried about hooking the range metadata up here because the instcombine logic for converting from a range metadata *to* a nonnull metadata node seems broken in the face of non-zero address spaces where null is not mapped to the integer `0`. So that probably needs to get fixed with test cases both in SROA and in instcombine to cover it. But this *does* extract the core PR fix from D34285 of preventing the !nonnull metadata from being propagated in a broken state just long enough to feed into promotion and crash value tracking. On D34285 there is some discussion of zero-extend handling because it isn't necessary. First, the new load size covers all of the non-undef (ie, possibly initialized) bits. This may even extend past the original alloca if loading those bits could produce valid data. The only way its valid for us to zero-extend an integer load in SROA is if the original code had a zero extend or those bits were undef. And we get to assume things like undef *never* satifies nonnull, so non undef bits can participate here. No need to special case the zero-extend handling, it just falls out correctly. The original credit goes to Ariel Ben-Yehuda! I'm mostly landing this to save a few rounds of trivial edits fixing style issues and test case formulation. Differental Revision: D34285 llvm-svn: 306379
OpenPOWER on IntegriCloud