summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Revert "[LV] Extend trunc optimization to all IVs with constant integer steps"Matthew Simpson2017-02-131-20/+12
| | | | | | | | This reverts commit r294967. This patch caused execution time slowdowns in a few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot. I'm reverting to investigate. llvm-svn: 294973
* [LV] Extend trunc optimization to all IVs with constant integer stepsMatthew Simpson2017-02-131-12/+20
| | | | | | | | | | | | | | This patch extends the optimization of truncations whose operand is an induction variable with a constant integer step. Previously we were only applying this optimization to the primary induction variable. However, the cost model assumes the optimization is applied to the truncation of all integer induction variables (even regardless of step type). The transformation is now applied to the other induction variables, and I've updated the cost model to ensure it is better in sync with the transformation we actually perform. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 294967
* [SLP] Fix for PR31690: Allow using of extra values in horizontalAlexey Bataev2017-02-131-18/+130
| | | | | | | | | | | | | | | | | | | | | | | reductions. Currently, LLVM supports vectorization of horizontal reduction instructions with initial value set to 0. Patch supports vectorization of reduction with non-zero initial values. Also, it supports a vectorization of instructions with some extra arguments, like: ``` float f(float x[], int a, int b) { float p = a % b; p += x[0] + 3; for (int i = 1; i < 32; i++) p += x[i]; return p; } ``` Patch allows vectorization of this kind of horizontal reductions. Differential Revision: https://reviews.llvm.org/D29727 llvm-svn: 294934
* NewGVN: Reverse order of congruence class elimination to maximize trivial ↵Daniel Berlin2017-02-121-2/+2
| | | | | | deadness llvm-svn: 294926
* NewGVN: Use shouldSwapOperands in one more placeDaniel Berlin2017-02-121-1/+1
| | | | llvm-svn: 294925
* Revert accidental commit titled "testing"Daniel Berlin2017-02-121-1/+1
| | | | | | This reverts commit r294919 llvm-svn: 294923
* NewGVN: Apply the fast math flags fix in r267113 to NewGVN as well.Daniel Berlin2017-02-121-23/+26
| | | | llvm-svn: 294922
* PredicateInfo: Handle critical edgesDaniel Berlin2017-02-121-63/+107
| | | | | | | | | | | | | | | | | Summary: This adds support for placing predicateinfo such that it affects critical edges. This fixes the issues mentioned by Nuno on the mailing list. Depends on D29519 Reviewers: davide, nlopes Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29606 llvm-svn: 294921
* NewGVN: Fix missed call that should be to shouldSwapOperandsDaniel Berlin2017-02-121-1/+0
| | | | llvm-svn: 294920
* testingDaniel Berlin2017-02-121-1/+2
| | | | llvm-svn: 294919
* [InstCombine] fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2)Sanjay Patel2017-02-121-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | I found one special case of this transform for 'slt 0', so I removed that and added the general transform. Alive code to check correctness: Name: slt_no_overflow Pre: WillNotOverflowSignedSub(C1, C2) %a = add nsw i8 %x, C2 %b = icmp slt %a, C1 => %b = icmp slt %x, C1 - C2 Name: sgt_no_overflow Pre: WillNotOverflowSignedSub(C1, C2) %a = add nsw i8 %x, C2 %b = icmp sgt %a, C1 => %b = icmp sgt %x, C1 - C2 http://rise4fun.com/Alive/MH Differential Revision: https://reviews.llvm.org/D29774 llvm-svn: 294898
* NewGVN: Reverse sense of this test to make it clearerDaniel Berlin2017-02-111-5/+7
| | | | llvm-svn: 294851
* NewGVN: Add missing initialization of NumFuncArgs lost due to bad merge.Daniel Berlin2017-02-111-0/+1
| | | | llvm-svn: 294850
* NewGVN: Rank and order commutative operands consistently.Daniel Berlin2017-02-111-2/+40
| | | | llvm-svn: 294849
* NewGVN: Clean up how we handle the INITIAL class so that everything inDaniel Berlin2017-02-111-16/+38
| | | | | | | | | | | | | | | | | | it is dead or unreachable, as it should be. This also makes the leader of INITIAL undef, enabling us to handle irreducibility properly. Summary: This lets us verify, more than we do now, that we didn't screw up value numbering. Reviewers: davide Subscribers: Prazek, llvm-commits Differential Revision: https://reviews.llvm.org/D29842 llvm-svn: 294844
* Move symbols from the global namespace into (anonymous) namespaces. NFC.Benjamin Kramer2017-02-111-9/+2
| | | | llvm-svn: 294837
* Fix PR23384 (under "-lsr-insns-cost" option)Evgeny Stupachenko2017-02-111-4/+57
| | | | | | | | | | | | | Summary: The patch adds instructions number generated by a solution to LSR cost under "-lsr-insns-cost" option. Reviewers: qcolombet, hfinkel Differential Revision: http://reviews.llvm.org/D28307 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 294821
* [LSR] Recommit: Allow formula containing Reg for SCEVAddRecExpr related with ↵Wei Mi2017-02-111-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | outerloop. The recommit includes some changes of testcases. No functional change to the patch. In RateRegister of existing LSR, if a formula contains a Reg which is a SCEVAddRecExpr, and this SCEVAddRecExpr's loop is an outerloop, the formula will be marked as Loser and dropped. Suppose we have an IR that %for.body is outerloop and %for.body2 is innerloop. LSR only handle inner loop now so only %for.body2 will be handled. Using the logic above, formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) will be dropped no matter what because reg({1,+, %size}<%for.body>) is a SCEVAddRecExpr type reg related with outerloop. Only formula like reg(%array) + 1*reg({{1,+, %size}<%for.body>,+,1}<nuw><nsw><%for.body2>) will be kept because the SCEVAddRecExpr related with outerloop is folded into the initial value of the SCEVAddRecExpr related with current loop. But in some cases, we do need to share the basic induction variable reg{0 ,+, 1}<%for.body2> among LSR Uses to reduce the final total number of induction variables used by LSR, so we don't want to drop the formula like reg(%array) + reg({1,+, %size}<%for.body>) + 1*reg({0,+,1}<%for.body2>) unconditionally. From the existing comment, it tries to avoid considering multiple level loops at the same time. However, existing LSR only handles innermost loop, so for any SCEVAddRecExpr with a loop other than current loop, it is an invariant and will be simple to handle, and the formula doesn't have to be dropped. Differential Revision: https://reviews.llvm.org/D26429 llvm-svn: 294814
* [PM] Fix a bug in how I ported LoopDeletion to the new PM.Chandler Carruth2017-02-111-21/+14
| | | | | | | | | | | | | | | This was marking the loop for deletion after the loop was deleted. This almost works, except that when we do any kind of debug logging it starts reading the name of the loop from deleted memory or otherwise blowing up. This can fail in a bunch of ways. I recently added a test that *always* does this, and it started failing on the sanitizer bots. The fix is to mark the loop as deleted in the loop PM infrastructure before we remove the loop. We can do this by passing the updater into the routine. That also lets us simplify a bunch of other interface components here for a net win. llvm-svn: 294810
* [InstCombine] Move class into anonymous namespace. NFC.Benjamin Kramer2017-02-101-0/+2
| | | | | | | | This is necessary to avoid warnings from GCC. InstCombineLoadStoreAlloca.cpp:238:7: error: 'PointerReplacer' declared with greater visibility than the type of its field 'PointerReplacer::IC' llvm-svn: 294794
* [InstCombine] Silence unused variable warning in Release builds.Benjamin Kramer2017-02-101-0/+2
| | | | llvm-svn: 294788
* Fix invalid addrspacecast due to combining alloca with global varYaxun Liu2017-02-102-7/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | For function-scope variables with large initialisation list, FE usually generates a global variable to hold the initializer, then generates memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst identifies such allocas which are accessed only by reading and replaces them with the global variable. This is done by casting the global variable to the type of the alloca and replacing all references. However, when the global variable is in a different address space which is disjoint with addr space 0 (e.g. for IR generated from OpenCL, global variable cannot be in private addr space i.e. addr space 0), casting the global variable to addr space 0 results in invalid IR for certain targets (e.g. amdgpu). To fix this issue, when the global variable is not in addr space 0, instead of casting it to addr space 0, this patch chases down the uses of alloca until reaching the load instructions, then replaces load from alloca with load from the global variable. If during the chasing bitcast and GEP are encountered, new bitcast and GEP based on the global variable are generated and used in the load instructions. Differential Revision: https://reviews.llvm.org/D27283 llvm-svn: 294786
* Encode duplication factor from loop vectorization and loop unrolling to ↵Dehao Chen2017-02-103-10/+23
| | | | | | | | | | | | | | | | | | | | | discriminator. Summary: This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations. The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default. Reviewers: probinson, aprantl, davidxl, hfinkel, echristo Reviewed By: hfinkel Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D26420 llvm-svn: 294782
* [LV] Remove type restriction for vector phi creationMatthew Simpson2017-02-101-2/+1
| | | | | | | | | We previously only created a vector phi node for an induction variable if its type matched the type of the canonical induction variable. Differential Revision: https://reviews.llvm.org/D29776 llvm-svn: 294755
* [LoopUnswitch] Remove BFI usage (dead code)Philip Reames2017-02-101-43/+0
| | | | | | | | Chandler mentioned at the last social that the need for BFI in the new pass manager was causing a slight hiccup for this pass. Given this code has been checked in, but off for over a year, it makes sense to just remove it for now. Note that there's nothing wrong with the general idea - it's actually a quite good one - and once we have the infrastructure in place to implement this without the full recompuation on every loop, we absolutely should. llvm-svn: 294715
* [PM] Port ArgumentPromotion to the new pass manager.Chandler Carruth2017-02-091-29/+86
| | | | | | | | | | | | | | | Now that the call graph supports efficient replacement of a function and spurious reference edges, we can port ArgumentPromotion to the new pass manager very easily. The old PM-specific bits are sunk into callbacks that the new PM simply doesn't use. Unlike the old PM, the new PM simply does argument promotion and afterward does the update to LCG reflecting the promoted function. Differential Revision: https://reviews.llvm.org/D29580 llvm-svn: 294667
* WholeProgramDevirt: Check that VCP candidate functions are defined before ↵Peter Collingbourne2017-02-091-5/+5
| | | | | | | | evaluating them. This was crashing before. llvm-svn: 294666
* [PM/LCG] Teach the LazyCallGraph how to replace a function withoutChandler Carruth2017-02-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | disturbing the graph or having to update edges. This is motivated by porting argument promotion to the new pass manager. Because of how LLVM IR Function objects work, in order to change their signature a new object needs to be created. This is efficient and straight forward in the IR but previously was very hard to implement in LCG. We could easily replace the function a node in the graph represents. The challenging part is how to handle updating the edges in the graph. LCG previously used an edge to a raw function to represent a node that had not yet been scanned for calls and references. This was the core of its laziness. However, that model causes this kind of update to be very hard: 1) The keys to lookup an edge need to be `Function*`s that would all need to be updated when we update the node. 2) There will be some unknown number of edges that haven't transitioned from `Function*` edges to `Node*` edges. All of this complexity isn't necessary. Instead, we can always build a node around any function, always pointing edges at it and always using it as the key to lookup an edge. To maintain the laziness, we need to sink the *edges* of a node into a secondary object and explicitly model transitioning a node from empty to populated by scanning the function. This design seems much cleaner in a number of ways, but importantly there is now exactly *one* place where the `Function*` has to be updated! Some other cleanups that fall out of this include having something to model the *entry* edges more accurately. Rather than hand rolling parts of the node in the graph itself, we have an explicit `EdgeSequence` object that gives us exactly the functionality needed. We also have a consistent place to define the edge iterators and can use them for both the entry edges and the internal edges of the graph. The API used to model the separation between a node and its edges is intentionally very thin as most clients are expected to deal with nodes that have populated edges. We model this exactly as an optional does with an additional method to populate the edges when that is a reasonable thing for a client to do. This is based on API design suggestions from Richard Smith and David Blaikie, credit goes to them for helping pick how to model this without it being either too explicit or too implicit. The patch is somewhat noisy due to shifting around iterator types and new syntax for walking the edges of a node, but most of the functionality change is in the `Edge`, `EdgeSequence`, and `Node` types. Differential Revision: https://reviews.llvm.org/D29577 llvm-svn: 294653
* [InstCombine] allow (X * C2) << C1 --> X * (C2 << C1) for vectorsSanjay Patel2017-02-091-13/+12
| | | | | | | | | | This fold already existed for vectors but only when 'C1' was a splat constant (but 'C2' could be any constant). There were no tests for any vector constants, so I'm adding a test that shows non-splat constants for both operands. llvm-svn: 294650
* De-duplicate some code for creating an AARGetter suitable for the legacy PM.Peter Collingbourne2017-02-093-35/+4
| | | | | | | | I'm about to use this in a couple more places. Differential Revision: https://reviews.llvm.org/D29793 llvm-svn: 294648
* [LoadCombine] Fix combining of loads which span an aliasing store.Michael J. Spencer2017-02-091-1/+5
| | | | | | | | Fixes PR31517 Differential Revision: https://reviews.llvm.org/D28922 llvm-svn: 294632
* Rename LowerTypeTestsSummaryAction to PassSummaryAction. NFCI.Peter Collingbourne2017-02-092-22/+19
| | | | | | | | | I intend to use the same type with the same semantics in the WholeProgramDevirt pass. Differential Revision: https://reviews.llvm.org/D29746 llvm-svn: 294629
* [InstCombine] use m_APInt to allow demanded bits analysis on splat constantsSanjay Patel2017-02-091-10/+13
| | | | llvm-svn: 294628
* [JumpThreading] Thread through guardsSanjoy Das2017-02-092-15/+189
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch allows JumpThreading also thread through guards. Virtually, guard(cond) is equivalent to the following construction: if (cond) { do something } else {deoptimize} Yet it is not explicitly converted into IFs before lowering. This patch enables early threading through guards in simple cases. Currently it covers the following situation: if (cond1) { // code A } else { // code B } // code C guard(cond2) // code D If there is implication cond1 => cond2 or !cond1 => cond2, we can transform this construction into the following: if (cond1) { // code A // code C } else { // code B // code C guard(cond2) } // code D Thus, removing the guard from one of execution branches. Patch by Max Kazantsev! Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29620 llvm-svn: 294617
* [sancov] using comdat only when it is enabledMike Aizatsky2017-02-081-3/+7
| | | | | | Differential Revision: https://reviews.llvm.org/D29733 llvm-svn: 294529
* [sancov] specifying comdat for sancov constructorsMike Aizatsky2017-02-081-1/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D29662 llvm-svn: 294517
* ThinLTOBitcodeWriter: Strip debug info from merged module.Peter Collingbourne2017-02-081-0/+2
| | | | | | | | | | This module will contain nothing but vtable definitions and (soon) available_externally function definitions, so there is no point in keeping debug info in the module. Differential Revision: https://reviews.llvm.org/D28913 llvm-svn: 294511
* [Loop Vectorizer] Cost-based decision for vectorization form of memory ↵Elena Demikhovsky2017-02-081-269/+485
| | | | | | | | | | | | | | | | | | instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 llvm-svn: 294503
* NVPTX: Extract mem intrinsic expansions into utilitiesMatt Arsenault2017-02-082-0/+232
| | | | llvm-svn: 294490
* [Reassociate] Remove an unused argument. NFC.Chad Rosier2017-02-081-5/+4
| | | | llvm-svn: 294489
* [InstCombine] add local name for repeated calls; NFC Sanjay Patel2017-02-081-6/+4
| | | | llvm-svn: 294470
* [InstComobineCalls] Fix buildbot failures after r294453.Igor Laevsky2017-02-081-1/+1
| | | | | | | | Some targets don't support uint64_t options. Change type to unsigned. Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294461
* [InstCombineCalls] Unfold element atomic memcpy instructionIgor Laevsky2017-02-082-0/+83
| | | | | | Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294453
* [InstCombineCalls] Remove zero length atomic memcpy intrinsicsIgor Laevsky2017-02-081-0/+6
| | | | | | Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294452
* LSR: Check atomic instruction pointer operandsMatt Arsenault2017-02-081-1/+11
| | | | llvm-svn: 294410
* Revert "CVP: Make CVP iterate in an order that maximizes reuse of LVI cache"Daniel Berlin2017-02-081-5/+6
| | | | | | This reverts commit r294398, it seems to be failing on the bots. llvm-svn: 294399
* CVP: Make CVP iterate in an order that maximizes reuse of LVI cacheDaniel Berlin2017-02-081-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: After the DFS order change for LVI, i have a few testcases that now take forever. The TL;DR - This is mainly due to the overdefined cache, but that requires predicateinfo to fix[1] In order to maximize reuse of the LVI cache for now, change the order we iterate in. This reduces my testcase from 5 minutes to 4 seconds. I have verified cases like gmic do not get slower. I am playing with whether the order should be postorder or idf. [1] In practice, overdefined anywhere should be overdefined everywhere, so this cache should be global. That also fixes this bug. The problem, however, is that LVI relies on this cache being filled in per-block because it wants different values in different blocks due to precisely the naming issue that predicateinfo fixes. With predicateinfo, making the cache global works fine on individual passes, and also resolves this issue. Reviewers: davide, sanjoy, chandlerc Subscribers: llvm-commits, djasper Differential Revision: https://reviews.llvm.org/D29679 llvm-svn: 294398
* [IRCE] Add a missing invariant checkSanjoy Das2017-02-071-5/+39
| | | | | | | | | | | | | | | | | | | | | | | Currently IRCE relies on the loops it transforms to be (semantically) of the form: for (i = START; i < END; i++) ... or for (i = START; i > END; i--) ... However, we were not verifying the presence of the START < END entry check (i.e. check before the first iteration). We were only verifying that the backedge was guarded by (i + 1) < END. Usually this would work "fine" since (especially in Java) most loops do actually have the START < END check, but of course that is not guaranteed. llvm-svn: 294375
* PredicateInfo: Some compilers are unhappy with naming Use *'s Use. Change ↵Daniel Berlin2017-02-071-13/+13
| | | | | | the name. llvm-svn: 294364
* Add PredicateInfo utility and printing passDaniel Berlin2017-02-073-0/+642
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds a utility to build extended SSA (see "ABCD: eliminating array bounds checks on demand"), and an intrinsic to support it. This is then used to get functionality equivalent to propagateEquality in GVN, in NewGVN (without having to replace instructions as we go). It would work similarly in SCCP or other passes. This has been talked about a few times, so i built a real implementation and tried to productionize it. Copies are inserted for operands used in assumes and conditional branches that are based on comparisons (see below for more) Every use affected by the predicate is renamed to the appropriate intrinsic result. E.g. %cmp = icmp eq i32 %x, 50 br i1 %cmp, label %true, label %false true: ret i32 %x false: ret i32 1 will become %cmp = icmp eq i32, %x, 50 br i1 %cmp, label %true, label %false true: ; Has predicate info ; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 %x, 50 } %x.0 = call @llvm.ssa_copy.i32(i32 %x) ret i32 %x.0 false: ret i23 1 (you can use -print-predicateinfo to get an annotated-with-predicateinfo dump) This enables us to easily determine what operations are affected by a given predicate, and how operations affected by a chain of predicates. Reviewers: davide, sanjoy Subscribers: mgorny, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D29519 Update for review comments Fix a bug Nuno noticed where we are giving information about and/or on edges where the info is not useful and easy to use wrong Update for review comments llvm-svn: 294351
OpenPOWER on IntegriCloud