summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* NewGVN: Rename InitialClass to TOP, which is what most people would expect ↵Daniel Berlin2017-03-101-25/+25
| | | | | | it to be called llvm-svn: 297494
* [SLP] Revert everything that has to do with memory access sorting.Michael Kuperstein2017-03-101-122/+57
| | | | | | | | | | | | This reverts r293386, r294027, r294029 and r296411. Turns out the SLP tree isn't actually a "tree" and we don't handle accessing the same packet of loads in several different orders well, causing miscompiles. Revert until we can fix this properly. llvm-svn: 297493
* WholeProgramDevirt: Fixed compilation error under MSVS2015.George Rimar2017-03-101-9/+18
| | | | | | | | | | | | | | | | | | | | | | It was introduced in: r296945 WholeProgramDevirt: Implement exporting for single-impl devirtualization. --------------------- r296939 WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load devirtualizations to the list of llvm.type.test users. --------------------- Microsoft Visual Studio Community 2015 Version 14.0.23107.0 D14REL Does not compile that code without additional brackets, showing multiple error like below: WholeProgramDevirt.cpp(1216): error C2958: the left bracket '[' found at 'c:\access_softek\llvm\lib\transforms\ipo\wholeprogramdevirt.cpp(1216)' was not matched correctly WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ']' before '}' WholeProgramDevirt.cpp(1216): error C2143: syntax error: missing ';' before '}' WholeProgramDevirt.cpp(1216): error C2059: syntax error: ']' llvm-svn: 297451
* AMDGPU: Fix insertion point when reducing load intrinsicsMatt Arsenault2017-03-101-0/+3
| | | | | | | The insertion point may be later than the next instruction, so it is necessary to set it when replacing the call. llvm-svn: 297439
* Move memory coercion functions from GVN.cpp to VNCoercion.cpp so they can be ↵Daniel Berlin2017-03-103-447/+460
| | | | | | | | | | | | | | | | | | | shared between GVN and NewGVN. Summary: These are the functions used to determine when values of loads can be extracted from stores, etc, and to perform the necessary insertions to do this. There are no changes to the functions themselves except reformatting, and one case where memdep was informed of a removed load (which was pushed into the caller). Reviewers: davide Subscribers: mgorny, llvm-commits, Prazek Differential Revision: https://reviews.llvm.org/D30478 llvm-svn: 297438
* NewGVN: Rewrite DCE during elimination so we do it as well as old GVN did.Daniel Berlin2017-03-102-53/+109
| | | | llvm-svn: 297428
* NewGVN: Rename a few things for clarityDaniel Berlin2017-03-101-40/+45
| | | | llvm-svn: 297427
* Add support for DenseMap/DenseSet count and find using const pointersDaniel Berlin2017-03-101-3/+3
| | | | | | | | | | | | | | Summary: Similar to SmallPtrSet, this makes find and count work with both const referneces and const pointers. Reviewers: dblaikie Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D30713 llvm-svn: 297424
* AMDGPU: Support for SimplifyDemandedVectorElts for load intrinsicsMatt Arsenault2017-03-091-0/+41
| | | | llvm-svn: 297408
* Minor format change. nfc.Rong Xu2017-03-091-5/+5
| | | | llvm-svn: 297400
* [PM/Inliner] Make the new PM's inliner process call edges across anChandler Carruth2017-03-091-29/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | entire SCC before iterating on newly-introduced call edges resulting from any inlined function bodies. This more closely matches the behavior of the old PM's inliner. While it wasn't really clear to me initially, this behavior is actually essential to the inliner behaving reasonably in its current design. Because the inliner is fundamentally a bottom-up inliner and all of its cost modeling is designed around that it often runs into trouble within an SCC where we don't have any meaningful bottom-up ordering to use. In addition to potentially cyclic, infinite inlining that we block with the inline history mechanism, it can also take seemingly simple call graph patterns within an SCC and turn them into *insanely* large functions by accidentally working top-down across the SCC without any of the threshold limitations that traditional top-down inliners use. Consider this diabolical monster.cpp file that Richard Smith came up with to help demonstrate this issue: ``` template <int N> extern const char *str; void g(const char *); template <bool K, int N> void f(bool *B, bool *E) { if (K) g(str<N>); if (B == E) return; if (*B) f<true, N + 1>(B + 1, E); else f<false, N + 1>(B + 1, E); } template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); } template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); } extern bool *arr, *end; void test() { f<false, 0>(arr, end); } ``` When compiled with '-DMAX=N' for various values of N, this will create an SCC with a reasonably large number of functions. Previously, the inliner would try to exhaust the inlining candidates in a single function before moving on. This, unfortunately, turns it into a top-down inliner within the SCC. Because our thresholds were never built for that, we will incrementally decide that it is always worth inlining and proceed to flatten the entire SCC into that one function. What's worse, we'll then proceed to the next function, and do the exact same thing except we'll skip the first function, and so on. And at each step, we'll also make some of the constant factors larger, which is awesome. The fix in this patch is the obvious one which makes the new PM's inliner use the same technique used by the old PM: consider all the call edges across the entire SCC before beginning to process call edges introduced by inlining. The result of this is essentially to distribute the inlining across the SCC so that every function incrementally grows toward the inline thresholds rather than allowing the inliner to grow one of the functions vastly beyond the threshold. The code for this is a bit awkward, but it works out OK. We could consider in the future doing something more powerful here such as prioritized order (via lowest cost and/or profile info) and/or a code-growth budget per SCC. However, both of those would require really substantial work both to design the system in a way that wouldn't break really useful abstraction decomposition properties of the current inliner and to be tuned across a reasonably diverse set of code and workloads. It also seems really risky in many ways. I have only found a single real-world file that triggers the bad behavior here and it is generated code that has a pretty pathological pattern. I'm not worried about the inliner not doing an *awesome* job here as long as it does *ok*. On the other hand, the cases that will be tricky to get right in a prioritized scheme with a budget will be more common and idiomatic for at least some frontends (C++ and Rust at least). So while these approaches are still really interesting, I'm not in a huge rush to go after them. Staying even closer to the existing PM's behavior, especially when this easy to do, seems like the right short to medium term approach. I don't really have a test case that makes sense yet... I'll try to find a variant of the IR produced by the monster template metaprogram that is both small enough to be sane and large enough to clearly show when we get this wrong in the future. But I'm not confident this exists. And the behavior change here *should* be unobservable without snooping on debug logging. So there isn't really much to test. The test case updates come from two incidental changes: 1) We now visit functions in an SCC in the opposite order. I don't think there really is a "right" order here, so I just update the test cases. 2) We no longer compute some analyses when an SCC has no call instructions that we consider for inlining. llvm-svn: 297374
* [SLP] Mark values in Dot that need to be extractedAdam Nemet2017-03-091-3/+9
| | | | llvm-svn: 297361
* WholeProgramDevirt: Implement importing for uniform ret val opt.Peter Collingbourne2017-03-091-0/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D29854 llvm-svn: 297350
* WholeProgramDevirt: Implement importing for single-impl devirtualization.Peter Collingbourne2017-03-091-11/+47
| | | | | | Differential Revision: https://reviews.llvm.org/D29844 llvm-svn: 297333
* Perform symbol binding for .symver versioned symbolsTeresa Johnson2017-03-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In a .symver assembler directive like: .symver name, name2@@nodename "name2@@nodename" should get the same symbol binding as "name". While the ELF object writer is updating the symbol binding for .symver aliases before emitting the object file, not doing so when the module inline assembly is handled by the RecordStreamer is causing the wrong behavior in *LTO mode. E.g. when "name" is global, "name2@@nodename" must also be marked as global. Otherwise, the symbol is skipped when iterating over the LTO InputFile symbols (InputFile::Symbol::shouldSkip). So, for example, when performing any *LTO via the gold-plugin, the versioned symbol definition is not recorded by the plugin and passed back to the linker. If the object was in an archive, and there were no other symbols needed from that object, the object would not be included in the final link and references to the versioned symbol are undefined. The llvm-lto2 tests added will give an error about an unused symbol resolution without the fix. Reviewers: rafael, pcc Reviewed By: pcc Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D30485 llvm-svn: 297332
* Don't merge global constants with non-dbg metadata.Evgeniy Stepanov2017-03-091-0/+26
| | | | | | | | | | | !type metadata can not be dropped. An alternative to this is adding !type metadata from the replaced globals to the replacement, but that may weaken type tests and make them slower at the same time. The merged global gets !dbg metadata from replaced globals, and can end up with multiple debug locations. llvm-svn: 297327
* [MemCpyOpt] clang-format + trim the legacy pass. NFC.George Burgess IV2017-03-081-39/+23
| | | | | | | None of the declarations below `// Helper functions` seem to have definitions anymore. llvm-svn: 297309
* [SLP] Visualize SLP trees with -view-slp-treeAdam Nemet2017-03-081-62/+167
| | | | | | | | | | | | | | | | | Analyzing larger trees is extremely difficult with the current debug output so this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data structure. We can now display the SLP trees with Graphviz as in https://reviews.llvm.org/F3132765. I decorated the graph where a value needs to be gathered for one reason or another. These are the red nodes. There are other improvement I am planning to make as I work through my case here. For example, I would also like to mark nodes that need to be extracted. Differential Revision: https://reviews.llvm.org/D30731 llvm-svn: 297303
* [LV] Select legal insert point when fixing first-order recurrencesMatthew Simpson2017-03-081-7/+9
| | | | | | | | | | Because IRBuilder performs constant-folding, it's not guaranteed that an instruction in the original loop map to an instruction in the vector loop. It could map to a constant vector instead. The handling of first-order recurrences was incorrectly making this assumption when setting the IRBuilder's insert point. llvm-svn: 297302
* [JumpThread] Use AA in SimplifyPartiallyRedundantLoad()Jun Bum Lim2017-03-081-11/+20
| | | | | | | | | | | | | | Summary: Use AA when scanning to find an available load value. Reviewers: rengolin, mcrosier, hfinkel, trentxintong, dberlin Reviewed By: rengolin, dberlin Subscribers: aemerson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D30352 llvm-svn: 297284
* [InstCombine] avoid crashing on shuffle shrinkage when input type is not ↵Sanjay Patel2017-03-081-1/+2
| | | | | | same as result type llvm-svn: 297280
* [LoopRotate] Propagate dbg.value intrinsicsSam Parker2017-03-081-3/+45
| | | | | | | | | | | | Recommitting patch which was previously reverted in r297159. These changes should address the casting issues. The original patch enables dbg.value intrinsics to be attached to newly inserted PHI nodes. Differential Review: https://reviews.llvm.org/D30701 llvm-svn: 297269
* [SCCP] Merge markOverdefined and markAnythingOverdefined.Davide Italiano2017-03-081-23/+17
| | | | | | There's no need to have two separate APIs. llvm-svn: 297253
* [InstCombine] shrink truncated insertelement into undef vectorSanjay Patel2017-03-071-0/+38
| | | | | | | | | | | | This is the 2nd part of solving: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html D30123 moves the trunc ahead of the shuffle, and this moves the trunc ahead of the insertelement. We're limiting this transform to undef rather than any constant to avoid backend problems. Differential Revision: https://reviews.llvm.org/D30137 llvm-svn: 297242
* Fix one-after-the-end type metadata handling in globalsplit.Evgeniy Stepanov2017-03-071-1/+10
| | | | | | | | | | Itanium ABI may have an address point one byte after the end of a vtable. When such vtable global is split, the !type metadata needs to follow the right vtable. Differential Revision: https://reviews.llvm.org/D30716 llvm-svn: 297236
* [InstCombine] shrink truncated splat shuffle (2nd try)Sanjay Patel2017-03-071-0/+20
| | | | | | | | | | | | | | | | This was committed at r297155 and reverted at r297166 because of an over-reaching clang test. That should be fixed with r297189. This is one part of solving a recent bug report: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html This keeps with our general approach: changing arbitrary shuffles is off-limts, but changing splat is ok. The transform is very similar to the existing shrinkBitwiseLogic() canonicalization. Differential Revision: https://reviews.llvm.org/D30123 llvm-svn: 297232
* [coroutines] Add handling for unwind coro.endsGor Nishanov2017-03-071-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The purpose of coro.end intrinsic is to allow frontends to mark the cleanup and other code that is only relevant during the initial invocation of the coroutine and should not be present in resume and destroy parts. In landing pads coro.end is replaced with an appropriate instruction to unwind to caller. The handling of coro.end differs depending on whether the target is using landingpad or WinEH exception model. For landingpad based exception model, it is expected that frontend uses the `coro.end`_ intrinsic as follows: ``` ehcleanup: %InResumePart = call i1 @llvm.coro.end(i8* null, i1 true) br i1 %InResumePart, label %eh.resume, label %cleanup.cont cleanup.cont: ; rest of the cleanup eh.resume: %exn = load i8*, i8** %exn.slot, align 8 %sel = load i32, i32* %ehselector.slot, align 4 %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn, 0 %lpad.val29 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1 resume { i8*, i32 } %lpad.val29 ``` The `CoroSpit` pass replaces `coro.end` with ``True`` in the resume functions, thus leading to immediate unwind to the caller, whereas in start function it is replaced with ``False``, thus allowing to proceed to the rest of the cleanup code that is only needed during initial invocation of the coroutine. For Windows Exception handling model, a frontend should attach a funclet bundle referring to an enclosing cleanuppad as follows: ``` ehcleanup: %tok = cleanuppad within none [] %unused = call i1 @llvm.coro.end(i8* null, i1 true) [ "funclet"(token %tok) ] cleanupret from %tok unwind label %RestOfTheCleanup ``` The `CoroSplit` pass, if the funclet bundle is present, will insert ``cleanupret from %tok unwind to caller`` before the `coro.end`_ intrinsic and will remove the rest of the block. Reviewers: majnemer Reviewed By: majnemer Subscribers: llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D25543 llvm-svn: 297223
* [JumpThread] Simplify CmpInst-as-Condition branch-folding a bit.Xin Tong2017-03-071-4/+11
| | | | | | | | | | | | | | Summary: Simplify CmpInst-as-Condition branch-folding a bit. Reviewers: sanjoy, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30429 llvm-svn: 297186
* [LV] Consider users that are memory accesses in uniforms expansion stepMatthew Simpson2017-03-071-1/+3
| | | | | | | | | | | When expanding the set of uniform instructions beyond the seed instructions (e.g., consecutive pointers), we mark a new instruction uniform if all its loop-varying users are uniform. We should also allow users that are consecutive or interleaved memory accesses. This fixes cases where we have an instruction that is used as the pointer operand of a consecutive access but also used by a non-memory instruction that later becomes uniform as part of the expansion. llvm-svn: 297179
* revert r297155 because there's a clang test that depends on InstCombine:Sanjay Patel2017-03-071-20/+0
| | | | | | tools/clang/test/CodeGen/zvector.c llvm-svn: 297166
* Revert "Strip debug info when inlining into a nodebug function."Adrian Prantl2017-03-071-30/+12
| | | | | | | | | | This reverts commit r296488. As noted by David Blaikie on llvm-commits, I overlooked the case of a debug function being inlined into a nodebug function being inlined into a debug function. llvm-svn: 297163
* Revert r297132, it caused PR32171Nico Weber2017-03-071-45/+3
| | | | llvm-svn: 297159
* [InstCombine] shrink truncated splat shuffleSanjay Patel2017-03-071-0/+20
| | | | | | | | | | | | | This is one part of solving a recent bug report: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html This keeps with our general approach: changing arbitrary shuffles is off-limts, but changing splat is ok. The transform is very similar to the existing shrinkBitwiseLogic() canonicalization. Differential Revision: https://reviews.llvm.org/D30123 llvm-svn: 297155
* [LoopRotate] Update dbg.value intrinsicsSam Parker2017-03-071-3/+45
| | | | | | | | Propagate debug info through the newly inserted PHI nodes. Differential Revision: https://reviews.llvm.org/D30190 llvm-svn: 297132
* [LoopUnrolling] Fix loop size check for peelingSanjoy Das2017-03-071-1/+3
| | | | | | | | | | | | | | | | | | Summary: We should check if loop size allows us to peel at least one iteration before we do so. Patch by Max Kazantsev! Reviewers: sanjoy, mkuper, efriedma Reviewed By: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30632 llvm-svn: 297122
* [SLP] Revert r296863 due to miscompiles.Michael Kuperstein2017-03-061-79/+72
| | | | | | Details and reproducer are on the email thread for r296863. llvm-svn: 297103
* [InstCombine] use dyn_cast instead of isa+cast; NFCISanjay Patel2017-03-061-15/+12
| | | | llvm-svn: 297092
* Disable gvn-hoist (PR32153)Hans Wennborg2017-03-061-2/+2
| | | | llvm-svn: 297075
* NewGVN: Remove DebugUnknownExprs, just mark the instructions as unusedDaniel Berlin2017-03-061-7/+3
| | | | llvm-svn: 297047
* NewGVN: Only call isInstructionTrivially dead once per instruction.Daniel Berlin2017-03-061-9/+10
| | | | llvm-svn: 297046
* Remove the sample pgo annotation heuristic that uses call count to annotate ↵Dehao Chen2017-03-061-5/+3
| | | | | | | | | | | | | | | | basic block count. Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks. Reviewers: davidxl, aprantl Reviewed By: aprantl Subscribers: llvm-commits, aprantl Differential Revision: https://reviews.llvm.org/D30658 llvm-svn: 297038
* [BasicBlockUtils] Check for nullptr before updating LoopInfo.Michael Kruse2017-03-061-3/+4
| | | | | | | | | | | | | | | | LoopInfo::getLoopFor returns nullptr if a BB is not in a loop and only then can the loop be updated to contain the newly created BBs. Add the missing nullptr check to SplitBlockAndInsertIfThen. Within LLVM, the only user of this function that also passes a LoopInfo to be updated is InnerLoopVectorizer::predicateInstructions(). As the method's name implies, the BB operataten on will always be within a loop, but out-of-tree users may also use it differently (here: Polly). All other uses of LoopInfo::getLoopFor in the file properly check its return value for nullptr. llvm-svn: 297016
* [SimplifyCFG] Use APInt::operator| instead of APInt::Or. NFCCraig Topper2017-03-051-1/+1
| | | | | | I'm looking to improve operator| to support rvalue references and may remove APInt::Or. llvm-svn: 296982
* Set option enabling LSR alternative way to resolve complex solution to false.Evgeny Stupachenko2017-03-041-1/+1
| | | | | | | Differential Revision: http://reviews.llvm.org/D29862 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 296959
* Fix build.Peter Collingbourne2017-03-041-1/+1
| | | | llvm-svn: 296949
* WholeProgramDevirt: Implement exporting for uniform ret val opt.Peter Collingbourne2017-03-041-6/+19
| | | | | | Differential Revision: https://reviews.llvm.org/D29846 llvm-svn: 296948
* WholeProgramDevirt: Implement exporting for single-impl devirtualization.Peter Collingbourne2017-03-041-6/+54
| | | | | | Differential Revision: https://reviews.llvm.org/D29811 llvm-svn: 296945
* WholeProgramDevirt: Add any unsuccessful llvm.type.checked.load ↵Peter Collingbourne2017-03-041-12/+88
| | | | | | | | | | | | | devirtualizations to the list of llvm.type.test users. Any unsuccessful llvm.type.checked.load devirtualizations will be translated into uses of llvm.type.test, so we need to add the resulting llvm.type.test intrinsics to the function summaries so that the LowerTypeTests pass will export them. Differential Revision: https://reviews.llvm.org/D29808 llvm-svn: 296939
* NewGVN: Be consistent in what order we compare operands for swapping.Daniel Berlin2017-03-041-2/+2
| | | | | | NFC. llvm-svn: 296935
* Fix a compiler warningSanjoy Das2017-03-031-1/+2
| | | | llvm-svn: 296903
OpenPOWER on IntegriCloud