summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "[NewGVN] replace emplace_back with push_back"Piotr Padlewski2016-12-281-7/+7
| | | | llvm-svn: 290692
* Speed up Function::isIntrinsic() by adding a bit to GlobalValue. NFCJustin Lebar2016-12-281-3/+6
| | | | | | | | | | | | | | | | | | Summary: Previously isIntrinsic() called getName(). This involves a hashtable lookup, so is nontrivially expensive. And isIntrinsic() is called frequently, particularly by dyn_cast<IntrinsicInstr>. This patch steals a bit of IntID and uses that to store whether or not getName() starts with "llvm." Reviewers: bogner, arsenm, joker-eph Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D22949 llvm-svn: 290691
* Add an index for Module Metadata record in the bitcodeMehdi Amini2016-12-281-5/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | This index record the position for each metadata record in the bitcode, so that the reader will be able to lazy-load on demand each individual record. We also make sure that every abbrev is emitted upfront so that the block can be skipped while reading. I don't plan to commit this before having the reader counterpart, but I figured this can be reviewed mostly independently. Recommit r290684 (was reverted in r290686 because a test was broken) after adding a threshold to avoid emitting the index when unnecessary (little amount of metadata). This optimization "hides" a limitation of the ability to backpatch in the bitstream: we can only backpatch safely when the position has been flushed. So if we emit an index for one metadata, it is possible that (part of) the offset placeholder hasn't been flushed and the backpatch will fail. Differential Revision: https://reviews.llvm.org/D28083 llvm-svn: 290690
* Revert "Add an index for Module Metadata record in the bitcode"Saleem Abdulrasool2016-12-281-80/+5
| | | | | | | This reverts commit a0ca6ae2d38339e4ede0dfa588086fc23d87e836. Revert at Mehdi's request as it is breaking bots. llvm-svn: 290686
* [NewGVN] replace emplace_back with push_backPiotr Padlewski2016-12-281-7/+7
| | | | | | | | emplace_back is not faster if it is equivalent to push_back. In this cases emplaced value had the same type that the one stored in container. It is ugly and it might be even slower (see Scott Meyers presentation about emplacement). llvm-svn: 290685
* Add an index for Module Metadata record in the bitcodeMehdi Amini2016-12-281-5/+80
| | | | | | | | | | | | | | | | | | | | | | Summary: This index record the position for each metadata record in the bitcode, so that the reader will be able to lazy-load on demand each individual record. We also make sure that every abbrev is emitted upfront so that the block can be skipped while reading. I don't plan to commit this before having the reader counterpart, but I figured this can be reviewed mostly independently. Reviewers: pcc, tejohnson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28083 llvm-svn: 290684
* [NewGVN] Simplyfy loop NFCPiotr Padlewski2016-12-281-4/+1
| | | | llvm-svn: 290683
* [ThinLTO] Honor -O{0,1,2,4} passed through the libLTO interface for ThinLTOMehdi Amini2016-12-281-6/+8
| | | | | | | This was hardcoded to be O3 till now, without any way to change it without changing the code. llvm-svn: 290682
* [NewGVN] replace typedefs with usingsPiotr Padlewski2016-12-281-2/+2
| | | | llvm-svn: 290680
* [NewGVN] NFC fixesPiotr Padlewski2016-12-281-40/+36
| | | | llvm-svn: 290679
* [WinEH] Don't assume endFunction is called while in .textReid Kleckner2016-12-282-10/+11
| | | | | | | | | | | | Jump table emission can switch to .rdata before WinException::endFunction gets called. Just remember the appropriate text section we started in and reset back to it when we end the function. We were already switching sections back from .xdata anyway. Fixes the first problem in PR31488, so that now COFF switch tables can live in .rdata if we want them to. llvm-svn: 290678
* [NewGVN] Global sweep replacing NULL with nullptr. NFCI.Davide Italiano2016-12-281-10/+10
| | | | llvm-svn: 290670
* [NewGVN] Remove redundant code. NFCI.Davide Italiano2016-12-281-2/+0
| | | | llvm-svn: 290669
* [NewGVN] equals() for loads/stores is the same. Unify.Davide Italiano2016-12-281-23/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D28116 llvm-svn: 290667
* [PM] Introduce a devirtualization iteration layer for the new PM.Chandler Carruth2016-12-281-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an orthogonal and separated layer instead of being embedded inside the pass manager. While it adds a small amount of complexity, it is fairly minimal and the composability and control seems worth the cost. The logic for this ends up being nicely isolated and targeted. It should be easy to experiment with different iteration strategies wrapped around the CGSCC bottom-up walk using this kind of facility. The mechanism used to track devirtualization is the simplest one I came up with. I think it handles most of the cases the existing iteration machinery handles, but I haven't done a *very* in depth analysis. It does however match the basic intended semantics, and we can tweak or tune its exact behavior incrementally as necessary. One thing that we may want to revisit is freshly building the value handle set on each iteration. While I don't think this will be a significant cost (it is strictly fewer value handles but more churn of value handes than the old call graph), it is conceivable that we'll want a somewhat more clever tracking mechanism. My hope is to layer that on as a follow up patch with data supporting any implementation complexity it adds. This code also provides for a basic count heuristic: if the number of indirect calls decreases and the number of direct calls increases for a given function in the SCC, we assume devirtualization is responsible. This matches the heuristics currently used in the legacy pass manager. Differential Revision: https://reviews.llvm.org/D23114 llvm-svn: 290665
* [PM] Teach the CGSCC's CG update utility to more carefully invalidateChandler Carruth2016-12-282-20/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | analyses when we're about to break apart an SCC. We can't wait until after breaking apart the SCC to invalidate things: 1) Which SCC do we then invalidate? All of them? 2) Even if we invalidate all of them, a newly created SCC may not have a proxy that will convey the invalidation to functions! Previously we only invalidated one of the SCCs and too late. This led to stale analyses remaining in the cache. And because the caching strategy actually works, they would get used and chaos would ensue. Doing invalidation early is somewhat pessimizing though if we *know* that the SCC structure won't change. So it turns out that the design to make the mutation API force the caller to know the *kind* of mutation in advance was indeed 100% correct and we didn't do enough of it. So this change also splits two cases of switching a call edge to a ref edge into two separate APIs so that callers can clearly test for this and take the easy path without invalidating when appropriate. This is particularly important in this case as we expect most inlines to be between functions in separate SCCs and so the common case is that we don't have to so aggressively invalidate analyses. The LCG API change in turn needed some basic cleanups and better testing in its unittest. No interesting functionality changed there other than more coverage of the returned sequence of SCCs. While this seems like an obvious improvement over the current state, I'd like to revisit the core concept of invalidating within the CG-update layer at all. I'm wondering if we would be better served forcing the callers to handle the invalidation beforehand in the cases that they can handle it. An interesting example is when we want to teach the inliner to *update and preserve* analyses. But we can cross that bridge when we get there. With this patch, the new pass manager an build all of the LLVM test suite at -O3 and everything passes. =D I haven't bootstrapped yet and I'm sure there are still plenty of bugs, but this gives a nice baseline so I'm going to increasingly focus on fleshing out the missing functionality, especially the bits that are just turned off right now in order to let us establish this baseline. llvm-svn: 290664
* This is a large patch for X86 AVX-512 of an optimization for reducing code ↵Gadi Haber2016-12-288-4/+1390
| | | | | | | | | | | | size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663
* [PM] Teach the inliner's call graph update to handle inserting new edgesChandler Carruth2016-12-281-7/+9
| | | | | | | | | | | | | | | | | when they are call edges at the leaf but may (transitively) be reached via ref edges. It turns out there is a simple rule: insert everything as a ref edge which is a safe conservative default. Then we let the existing update logic handle promoting some of those to call edges. Note that it would be fairly cheap to make these call edges right away if that is desirable by testing whether there is some existing call path from the source to the target. It just seemed like slightly more complexity in this code path that isn't strictly necessary. If anyone feels strongly about handling this differently I'm happy to change it. llvm-svn: 290649
* [InstCombine] Remove a piece of a comment that said that InstCombiner ↵Craig Topper2016-12-281-2/+1
| | | | | | contains pass infrastructure. That hasn't been true since r226618. NFC llvm-svn: 290648
* [LCG] Teach the ref edge removal to handle a ref edge that is trivialChandler Carruth2016-12-281-1/+7
| | | | | | | | | | | due to a call cycle. This actually crashed the ref removal before. I've added a unittest that covers this kind of interesting graph structure and mutation. llvm-svn: 290645
* [PM] Disable the loop vectorizer from the new PM's pipeline as itChandler Carruth2016-12-281-0/+4
| | | | | | | | | | | currenty relies on the old PM's dependency system forming LCSSA. The new PM will require a different design for this, and for now this is causing most of the issues I'm currently seeing in testing. I'd like to get to a testable baseline and then work on re-enabling things one at a time. llvm-svn: 290644
* [InstCombine] Canonicalize insert splat sequences into an insert + shuffleMichael Kuperstein2016-12-281-0/+57
| | | | | | | | | | | This adds a combine that canonicalizes a chain of inserts which broadcasts a value into a single insert + a splat shufflevector. This fixes PR31286. Differential Revision: https://reviews.llvm.org/D27992 llvm-svn: 290641
* [libFuzzer] add an experimental flag -experimental_len_control=1 that sets ↵Kostya Serebryany2016-12-275-2/+32
| | | | | | max_len to 1M and tries to increases the actual max sizes of mutations very gradually (second attempt) llvm-svn: 290637
* [libFuzzer] don't create large random mutations when given an empty seedKostya Serebryany2016-12-271-1/+1
| | | | llvm-svn: 290634
* [sanitizer-coverage] sort the switch casesKostya Serebryany2016-12-271-0/+5
| | | | llvm-svn: 290628
* [libFuzzer] fix UB and simplify the computation of the RNG seed ↵Kostya Serebryany2016-12-271-2/+2
| | | | | | (https://llvm.org/bugs/show_bug.cgi?id=31456) llvm-svn: 290622
* [PM] Teach MemDep to invalidate its result object when its cachedChandler Carruth2016-12-271-0/+18
| | | | | | | | analysis handles become invalid. Add a test case for its invalidation logic. llvm-svn: 290620
* ASMParser: use range-based for loops (NFC)Saleem Abdulrasool2016-12-271-8/+5
| | | | | | | Convert the verify method to use a few more range based for loops, converting to const iterators in the process. llvm-svn: 290617
* [NewGVN] Simplify a bit removing else after return. NFCI.Davide Italiano2016-12-271-3/+3
| | | | llvm-svn: 290615
* [PM] Remove a pointless optimization.Chandler Carruth2016-12-272-6/+0
| | | | | | | | There is no need to do this within an analysis. That method shouldn't even be reached if this predicate holds as the actual useful optimization is in the analysis manager itself. llvm-svn: 290614
* [MemCpyOpt] Don't sink LoadInst below possible clobber.Bryant Wong2016-12-271-5/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D26811 llvm-svn: 290611
* [ThinLTO] Fix "||" vs "|" mixup.Teresa Johnson2016-12-271-1/+1
| | | | | | | | | | | | | | The effect of the bug was that we would incorrectly create summaries for global and weak values defined in module asm (since we were essentially testing for bit 1 which is SF_Undefined, and the RecordStreamer ignores local undefined references). This would have resulted in conservatively disabling importing of anything referencing globals and weaks defined in module asm. Added these cases to the test which now fails without this bug fix. Fixes PR31459. llvm-svn: 290610
* [AArch64][AsmParser] Add support for parsing shift/extend operands with symbols.Chad Rosier2016-12-271-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D27953 llvm-svn: 290609
* [AMDGPU][llvm-mc] Predefined symbols to access register counts ↵Artem Tamazov2016-12-271-7/+56
| | | | | | | | | | | | | | | | | | | | | | | (.kernel.{v|s}gpr_count) The feature allows for conditional assembly, filling the entries of .amd_kernel_code_t etc. Symbols are defined with value 0 at the beginning of each kernel scope. After each register usage, the respective symbol is set to: value = max( value, ( register index + 1 ) ) Thus, at the end of scope the value represents a count of used registers. Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also dummy scope that lies from the beginning of source file til the first .amdgpu_hsa_kernel. Test added. Differential Revision: https://reviews.llvm.org/D27859 llvm-svn: 290608
* [MemDep] Operand visited twice bugfixPiotr Padlewski2016-12-271-0/+1
| | | | | | | | Because operand was not marked as seen it was visited twice. It doesn't change behavior of optimization, it just saves redudant visit, so no test changes. llvm-svn: 290607
* RuntimeDyldELF: refactor AArch64 relocations. NFC.Eugene Leviant2016-12-271-106/+44
| | | | llvm-svn: 290606
* [PM] Teach BasicAA how to invalidate its result object.Chandler Carruth2016-12-271-0/+18
| | | | | | | | | | | | | | | This requires custom handling because BasicAA caches handles to other analyses and so it needs to trigger indirect invalidation. This fixes one of the common crashes when using the new PM in real pipelines. I've also tweaked a regression test to check that we are at least handling the most immediate case. I'm going to work at re-structuring this test some to both scale better (rather than all being in one file) and check more invalidation paths in a follow-up commit, but I wanted to get the basic bug fix in place. llvm-svn: 290603
* Attempt to fix build bot after r290597Eugene Leviant2016-12-271-0/+3
| | | | llvm-svn: 290602
* [PM] Disable more of the loop passes -- LCSSA and LoopSimplify are alsoChandler Carruth2016-12-271-3/+9
| | | | | | | | | | | | | not really wired into the loop pass manager in a way that will let us productively use these passes yet. This lets the new PM get farther in basic testing which is useful for establishing a good baseline of "doesn't explode". There are still plenty of crashers in basic testing though, this just gets rid of some noise that is well understood and not representing a specific or narrow bug. llvm-svn: 290601
* [AMDGPU] Assembler: support SDWA and DPP for VOP2b instructionsSam Kolton2016-12-273-6/+37
| | | | | | | | | | Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28051 llvm-svn: 290599
* RuntimeDyldELF: add R_AARCH64_ADD_ABS_LO12_NC relocEugene Leviant2016-12-271-0/+9
| | | | | | Differential revision: https://reviews.llvm.org/D28115 llvm-svn: 290598
* Allow setting multiple debug typesEugene Leviant2016-12-271-2/+6
| | | | | | Differential revision: https://reviews.llvm.org/D28109 llvm-svn: 290597
* Change a std::vector to SmallVector in NewGVNDaniel Berlin2016-12-271-1/+1
| | | | llvm-svn: 290596
* [PM] Teach the AAManager and AAResults layer (the worst offender forChandler Carruth2016-12-271-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inter-analysis dependencies) to use the new invalidation infrastructure. This teaches it to invalidate itself when any of the peer function AA results that it uses become invalid. We do this by just tracking the originating IDs. I've kept it in a somewhat clunky API since some users of AAResults are outside the new PM right now. We can clean this API up if/when those users go away. Secondly, it uses the registration on the outer analysis manager proxy to trigger deferred invalidation when a module analysis result becomes invalid. I've included test cases that specifically try to trigger use-after-free in both of these cases and they would crash or hang pretty horribly for me even without ASan. Now they work nicely. The `InvalidateAnalysis` utility pass required some tweaking to be useful in this context and it still is pretty garbage. I'd like to switch it back to the previous implementation and teach the explicit invalidate method on the AnalysisManager to take care of correctly triggering indirect invalidation, but I wanted to go ahead and send this out so folks could see how all of this stuff works together in practice. And, you know, that it does actually work. =] Differential Revision: https://reviews.llvm.org/D27205 llvm-svn: 290595
* [PM] Introduce the facilities for registering cross-IR-unit dependenciesChandler Carruth2016-12-273-11/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that require deferred invalidation. This handles the other real-world invalidation scenario that we have cases of: a function analysis which caches references to a module analysis. We currently do this in the AA aggregation layer and might well do this in other places as well. Since this is relative rare, the technique is somewhat more cumbersome. Analyses need to register themselves when accessing the outer analysis manager's proxy. This proxy is already necessarily present to allow access to the outer IR unit's analyses. By registering here we can track and trigger invalidation when that outer analysis goes away. To make this work we need to enhance the PreservedAnalyses infrastructure to support a (slightly) more explicit model for "sets" of analyses, and allow abandoning a single specific analyses even when a set covering that analysis is preserved. That allows us to describe the scenario of preserving all Function analyses *except* for the one where deferred invalidation has triggered. We also need to teach the invalidator API to support direct ID calls instead of always going through a template to dispatch so that we can just record the ID mapping. I've introduced testing of all of this both for simple module<->function cases as well as for more complex cases involving a CGSCC layer. Much like the previous patch I've not tried to fully update the loop pass management layer because that layer is due to be heavily reworked to use similar techniques to the CGSCC to handle updates. As that happens, we'll have a better testing basis for adding support like this. Many thanks to both Justin and Sean for the extensive reviews on this to help bring the API design and documentation into a better state. Differential Revision: https://reviews.llvm.org/D27198 llvm-svn: 290594
* [AVX-512] Add all forms of VPALIGNR, VALIGND, and VALIGNQ to the load ↵Craig Topper2016-12-271-2/+27
| | | | | | folding tables. llvm-svn: 290591
* [PM] Add one of the features left out of the initial inliner patch:Chandler Carruth2016-12-271-7/+23
| | | | | | | | | | | | | | | | | skipping indirectly recursive inline chains. To do this, we implicitly build an inline stack for each callsite and check prior to inlining that doing so would not form a cycle. This uses the exact same technique and even shares some code with the legacy PM inliner. This solution remains deeply unsatisfying to me because it means we cannot actually iterate the inliner externally. Doing so would not be able to easily detect and avoid such cycles. Some day I would very much like to have a solution that works without this internal state to detect cycles, but this is not that day. llvm-svn: 290590
* [Analysis] Ignore `nobuiltin` on `allocsize` function calls.George Burgess IV2016-12-271-10/+15
| | | | | | | | | We currently ignore the `allocsize` attribute on functions calls with the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We shouldn't care about `nobuiltin` here: `allocsize` is explicitly added by the user, not inferred based on a function's symbol. llvm-svn: 290588
* [Analysis] Refactor as promised in r290397.George Burgess IV2016-12-271-16/+22
| | | | | | | | This also makes us no longer check for `allocsize` on intrinsic calls. This shouldn't matter, since intrinsics should provide the information we get from `allocsize` on their own. llvm-svn: 290585
* [AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them ↵Craig Topper2016-12-272-12/+26
| | | | | | to unmasked intrinsics plus a select. llvm-svn: 290583
OpenPOWER on IntegriCloud