summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directiveArtem Tamazov2016-12-291-12/+17
| | | | | | | | | | | Among other stuff, this allows to use predefined .option.machine_version_major /minor/stepping symbols in the directive. Relevant test expanded at once (also file renamed for clarity). Differential Revision: https://reviews.llvm.org/D28140 llvm-svn: 290710
* Introduce element-wise atomic memcpy intrinsicIgor Laevsky2016-12-293-0/+94
| | | | | | | | | | This change adds a new intrinsic which is intended to provide memcpy functionality with additional atomicity guarantees. Please refer to the review thread or language reference for further details. Differential Revision: https://reviews.llvm.org/D27133 llvm-svn: 290708
* [InstCombine] Use getVectorNumElements instead of explicitly casting to ↵Craig Topper2016-12-291-8/+7
| | | | | | VectorType and calling getNumElements. NFC llvm-svn: 290707
* [InstCombine] Fix typo in comment. NFCCraig Topper2016-12-291-1/+1
| | | | llvm-svn: 290706
* [InstCombine] Use a 32-bits instead of 64-bits for storing the number of ↵Craig Topper2016-12-291-2/+2
| | | | | | elements in VectorType for a ShuffleVector. While there getVectorNumElements to avoid an explicit cast. NFC llvm-svn: 290705
* [InstCombine][X86] If the lowest element of a scalar intrinsic isn't used ↵Craig Topper2016-12-291-6/+18
| | | | | | | | make sure we add it to the worklist so we can DCE it sooner. We bypassed the intrinsic and returned the passthru operand, but we should also add the intrinsic to the worklist since its now dead. This can allow DCE to find it sooner and remove it. Similar was done for InsertElement when the inserted element isn't demanded. llvm-svn: 290704
* [libFuzzer] make __sanitizer_cov_trace_switch more predictableKostya Serebryany2016-12-292-24/+19
| | | | llvm-svn: 290703
* NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.Daniel Berlin2016-12-291-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The optimal iteration order for this problem is RPO order. We want to process as many preds of a backedge as we can before we process the backedge. At the same time, as we add predicate handling, we want to be able to touch instructions that are dominated by a given block by ranges (because a change in value numbering a predicate possibly affects all users we dominate that are using that predicate). If we don't do it this way, we can't do value inference over backedges (the paper covers this in depth). The newgvn branch currently overshoots the last part, and guarantees that it will touch *at least* the right set of instructions, but it does touch more. This is because the bitvector instruction ranges are currently generated in RPO order (so we take the max and the min of the ranges of dominated blocks, which means there are some in the middle we didn't have to touch that we did). We can do better by sorting the dominator tree, and then just using dominator tree order. As a preliminary, the dominator tree has some RPO guarantees, but not enough. It guarantees that for a given node, your idom must come before you in the RPO ordering. It guarantees no relative RPO ordering for siblings. We add siblings in whatever order they appear in the module. So that is what we fix. We sort the children array of the domtree into RPO order, and then use the dominator tree for ordering, instead of RPO, since the dominator tree is now a valid RPO ordering. Note: This would help any other pass that iterates a forward problem in dominator tree order. Most of them are single pass. It will still maximize whatever result they compute. We could also build the dominator tree in this order, but our incremental updates would still put it out of sort order, and recomputing the sort order is almost as hard as general incremental updates of the domtree. Also note that the sorting does not affect any tests, etc. Nothing depends on domtree order, including the verifier, the equals functions for domtree nodes, etc. How much could this matter, you ask? Here are the current numbers. This is generated by running NewGVN over all files in LLVM. Note that once we propagate equalities, the differences go up by an order of magnitude or two (IE instead of 29, the max ends up in the thousands, since the worst case we add a factor of N, where N is the number of branch predicates). So while it doesn't look that stark for the default ordering, it gets *much much* worse. There are also programs in the wild where the difference is already pretty stark (2 iterations vs hundreds). RPO ordering: 759040 Number of iterations is 1 112908 Number of iterations is 2 Default dominator tree ordering: 755081 Number of iterations is 1 116234 Number of iterations is 2 603 Number of iterations is 3 27 Number of iterations is 4 2 Number of iterations is 5 1 Number of iterations is 7 Dominator tree sorted: 759040 Number of iterations is 1 112908 Number of iterations is 2 <yay!> Really bad ordering (sort domtree siblings in postorder. not quite the worst possible, but yeah): 754008 Number of iterations is 1 21 Number of iterations is 10 8 Number of iterations is 11 6 Number of iterations is 12 5 Number of iterations is 13 2 Number of iterations is 14 2 Number of iterations is 15 3 Number of iterations is 16 1 Number of iterations is 17 2 Number of iterations is 18 96642 Number of iterations is 2 1 Number of iterations is 20 2 Number of iterations is 21 1 Number of iterations is 22 1 Number of iterations is 29 17266 Number of iterations is 3 2598 Number of iterations is 4 798 Number of iterations is 5 273 Number of iterations is 6 186 Number of iterations is 7 80 Number of iterations is 8 42 Number of iterations is 9 Reviewers: chandlerc, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28129 llvm-svn: 290699
* Add a static_assert about the sizeof(GlobalValue)Reid Kleckner2016-12-291-0/+7
| | | | | | | | I added one for Value back in r262045, and I'm starting to think we should have these for any class with bitfields whose memory efficiency really matters. llvm-svn: 290698
* Update equalsStoreHelper for the fact that only one branch can be trueDaniel Berlin2016-12-291-4/+5
| | | | llvm-svn: 290697
* [COFF] Use 32-bit jump table entries in .rdata for Win64Reid Kleckner2016-12-293-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694
* Change Metadata Index emission in the bitcode to use 2x32 bits for the ↵Mehdi Amini2016-12-281-2/+3
| | | | | | | | | | placeholder The Bitstream reader and writer are limited to handle a "size_t" at most, which means that we can't backpatch and read back a 64bits value on 32 bits platform. llvm-svn: 290693
* Revert "[NewGVN] replace emplace_back with push_back"Piotr Padlewski2016-12-281-7/+7
| | | | llvm-svn: 290692
* Speed up Function::isIntrinsic() by adding a bit to GlobalValue. NFCJustin Lebar2016-12-281-3/+6
| | | | | | | | | | | | | | | | | | Summary: Previously isIntrinsic() called getName(). This involves a hashtable lookup, so is nontrivially expensive. And isIntrinsic() is called frequently, particularly by dyn_cast<IntrinsicInstr>. This patch steals a bit of IntID and uses that to store whether or not getName() starts with "llvm." Reviewers: bogner, arsenm, joker-eph Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D22949 llvm-svn: 290691
* Add an index for Module Metadata record in the bitcodeMehdi Amini2016-12-281-5/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | This index record the position for each metadata record in the bitcode, so that the reader will be able to lazy-load on demand each individual record. We also make sure that every abbrev is emitted upfront so that the block can be skipped while reading. I don't plan to commit this before having the reader counterpart, but I figured this can be reviewed mostly independently. Recommit r290684 (was reverted in r290686 because a test was broken) after adding a threshold to avoid emitting the index when unnecessary (little amount of metadata). This optimization "hides" a limitation of the ability to backpatch in the bitstream: we can only backpatch safely when the position has been flushed. So if we emit an index for one metadata, it is possible that (part of) the offset placeholder hasn't been flushed and the backpatch will fail. Differential Revision: https://reviews.llvm.org/D28083 llvm-svn: 290690
* Revert "Add an index for Module Metadata record in the bitcode"Saleem Abdulrasool2016-12-281-80/+5
| | | | | | | This reverts commit a0ca6ae2d38339e4ede0dfa588086fc23d87e836. Revert at Mehdi's request as it is breaking bots. llvm-svn: 290686
* [NewGVN] replace emplace_back with push_backPiotr Padlewski2016-12-281-7/+7
| | | | | | | | emplace_back is not faster if it is equivalent to push_back. In this cases emplaced value had the same type that the one stored in container. It is ugly and it might be even slower (see Scott Meyers presentation about emplacement). llvm-svn: 290685
* Add an index for Module Metadata record in the bitcodeMehdi Amini2016-12-281-5/+80
| | | | | | | | | | | | | | | | | | | | | | Summary: This index record the position for each metadata record in the bitcode, so that the reader will be able to lazy-load on demand each individual record. We also make sure that every abbrev is emitted upfront so that the block can be skipped while reading. I don't plan to commit this before having the reader counterpart, but I figured this can be reviewed mostly independently. Reviewers: pcc, tejohnson Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28083 llvm-svn: 290684
* [NewGVN] Simplyfy loop NFCPiotr Padlewski2016-12-281-4/+1
| | | | llvm-svn: 290683
* [ThinLTO] Honor -O{0,1,2,4} passed through the libLTO interface for ThinLTOMehdi Amini2016-12-281-6/+8
| | | | | | | This was hardcoded to be O3 till now, without any way to change it without changing the code. llvm-svn: 290682
* [NewGVN] replace typedefs with usingsPiotr Padlewski2016-12-281-2/+2
| | | | llvm-svn: 290680
* [NewGVN] NFC fixesPiotr Padlewski2016-12-281-40/+36
| | | | llvm-svn: 290679
* [WinEH] Don't assume endFunction is called while in .textReid Kleckner2016-12-282-10/+11
| | | | | | | | | | | | Jump table emission can switch to .rdata before WinException::endFunction gets called. Just remember the appropriate text section we started in and reset back to it when we end the function. We were already switching sections back from .xdata anyway. Fixes the first problem in PR31488, so that now COFF switch tables can live in .rdata if we want them to. llvm-svn: 290678
* [NewGVN] Global sweep replacing NULL with nullptr. NFCI.Davide Italiano2016-12-281-10/+10
| | | | llvm-svn: 290670
* [NewGVN] Remove redundant code. NFCI.Davide Italiano2016-12-281-2/+0
| | | | llvm-svn: 290669
* [NewGVN] equals() for loads/stores is the same. Unify.Davide Italiano2016-12-281-23/+13
| | | | | | Differential Revision: https://reviews.llvm.org/D28116 llvm-svn: 290667
* [PM] Introduce a devirtualization iteration layer for the new PM.Chandler Carruth2016-12-281-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an orthogonal and separated layer instead of being embedded inside the pass manager. While it adds a small amount of complexity, it is fairly minimal and the composability and control seems worth the cost. The logic for this ends up being nicely isolated and targeted. It should be easy to experiment with different iteration strategies wrapped around the CGSCC bottom-up walk using this kind of facility. The mechanism used to track devirtualization is the simplest one I came up with. I think it handles most of the cases the existing iteration machinery handles, but I haven't done a *very* in depth analysis. It does however match the basic intended semantics, and we can tweak or tune its exact behavior incrementally as necessary. One thing that we may want to revisit is freshly building the value handle set on each iteration. While I don't think this will be a significant cost (it is strictly fewer value handles but more churn of value handes than the old call graph), it is conceivable that we'll want a somewhat more clever tracking mechanism. My hope is to layer that on as a follow up patch with data supporting any implementation complexity it adds. This code also provides for a basic count heuristic: if the number of indirect calls decreases and the number of direct calls increases for a given function in the SCC, we assume devirtualization is responsible. This matches the heuristics currently used in the legacy pass manager. Differential Revision: https://reviews.llvm.org/D23114 llvm-svn: 290665
* [PM] Teach the CGSCC's CG update utility to more carefully invalidateChandler Carruth2016-12-282-20/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | analyses when we're about to break apart an SCC. We can't wait until after breaking apart the SCC to invalidate things: 1) Which SCC do we then invalidate? All of them? 2) Even if we invalidate all of them, a newly created SCC may not have a proxy that will convey the invalidation to functions! Previously we only invalidated one of the SCCs and too late. This led to stale analyses remaining in the cache. And because the caching strategy actually works, they would get used and chaos would ensue. Doing invalidation early is somewhat pessimizing though if we *know* that the SCC structure won't change. So it turns out that the design to make the mutation API force the caller to know the *kind* of mutation in advance was indeed 100% correct and we didn't do enough of it. So this change also splits two cases of switching a call edge to a ref edge into two separate APIs so that callers can clearly test for this and take the easy path without invalidating when appropriate. This is particularly important in this case as we expect most inlines to be between functions in separate SCCs and so the common case is that we don't have to so aggressively invalidate analyses. The LCG API change in turn needed some basic cleanups and better testing in its unittest. No interesting functionality changed there other than more coverage of the returned sequence of SCCs. While this seems like an obvious improvement over the current state, I'd like to revisit the core concept of invalidating within the CG-update layer at all. I'm wondering if we would be better served forcing the callers to handle the invalidation beforehand in the cases that they can handle it. An interesting example is when we want to teach the inliner to *update and preserve* analyses. But we can cross that bridge when we get there. With this patch, the new pass manager an build all of the LLVM test suite at -O3 and everything passes. =D I haven't bootstrapped yet and I'm sure there are still plenty of bugs, but this gives a nice baseline so I'm going to increasingly focus on fleshing out the missing functionality, especially the bits that are just turned off right now in order to let us establish this baseline. llvm-svn: 290664
* This is a large patch for X86 AVX-512 of an optimization for reducing code ↵Gadi Haber2016-12-288-4/+1390
| | | | | | | | | | | | size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible. There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers. The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled. Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky Differential Revision: https://reviews.llvm.org/D27901 llvm-svn: 290663
* [PM] Teach the inliner's call graph update to handle inserting new edgesChandler Carruth2016-12-281-7/+9
| | | | | | | | | | | | | | | | | when they are call edges at the leaf but may (transitively) be reached via ref edges. It turns out there is a simple rule: insert everything as a ref edge which is a safe conservative default. Then we let the existing update logic handle promoting some of those to call edges. Note that it would be fairly cheap to make these call edges right away if that is desirable by testing whether there is some existing call path from the source to the target. It just seemed like slightly more complexity in this code path that isn't strictly necessary. If anyone feels strongly about handling this differently I'm happy to change it. llvm-svn: 290649
* [InstCombine] Remove a piece of a comment that said that InstCombiner ↵Craig Topper2016-12-281-2/+1
| | | | | | contains pass infrastructure. That hasn't been true since r226618. NFC llvm-svn: 290648
* [LCG] Teach the ref edge removal to handle a ref edge that is trivialChandler Carruth2016-12-281-1/+7
| | | | | | | | | | | due to a call cycle. This actually crashed the ref removal before. I've added a unittest that covers this kind of interesting graph structure and mutation. llvm-svn: 290645
* [PM] Disable the loop vectorizer from the new PM's pipeline as itChandler Carruth2016-12-281-0/+4
| | | | | | | | | | | currenty relies on the old PM's dependency system forming LCSSA. The new PM will require a different design for this, and for now this is causing most of the issues I'm currently seeing in testing. I'd like to get to a testable baseline and then work on re-enabling things one at a time. llvm-svn: 290644
* [InstCombine] Canonicalize insert splat sequences into an insert + shuffleMichael Kuperstein2016-12-281-0/+57
| | | | | | | | | | | This adds a combine that canonicalizes a chain of inserts which broadcasts a value into a single insert + a splat shufflevector. This fixes PR31286. Differential Revision: https://reviews.llvm.org/D27992 llvm-svn: 290641
* [libFuzzer] add an experimental flag -experimental_len_control=1 that sets ↵Kostya Serebryany2016-12-275-2/+32
| | | | | | max_len to 1M and tries to increases the actual max sizes of mutations very gradually (second attempt) llvm-svn: 290637
* [libFuzzer] don't create large random mutations when given an empty seedKostya Serebryany2016-12-271-1/+1
| | | | llvm-svn: 290634
* [sanitizer-coverage] sort the switch casesKostya Serebryany2016-12-271-0/+5
| | | | llvm-svn: 290628
* [libFuzzer] fix UB and simplify the computation of the RNG seed ↵Kostya Serebryany2016-12-271-2/+2
| | | | | | (https://llvm.org/bugs/show_bug.cgi?id=31456) llvm-svn: 290622
* [PM] Teach MemDep to invalidate its result object when its cachedChandler Carruth2016-12-271-0/+18
| | | | | | | | analysis handles become invalid. Add a test case for its invalidation logic. llvm-svn: 290620
* ASMParser: use range-based for loops (NFC)Saleem Abdulrasool2016-12-271-8/+5
| | | | | | | Convert the verify method to use a few more range based for loops, converting to const iterators in the process. llvm-svn: 290617
* [NewGVN] Simplify a bit removing else after return. NFCI.Davide Italiano2016-12-271-3/+3
| | | | llvm-svn: 290615
* [PM] Remove a pointless optimization.Chandler Carruth2016-12-272-6/+0
| | | | | | | | There is no need to do this within an analysis. That method shouldn't even be reached if this predicate holds as the actual useful optimization is in the analysis manager itself. llvm-svn: 290614
* [MemCpyOpt] Don't sink LoadInst below possible clobber.Bryant Wong2016-12-271-5/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D26811 llvm-svn: 290611
* [ThinLTO] Fix "||" vs "|" mixup.Teresa Johnson2016-12-271-1/+1
| | | | | | | | | | | | | | The effect of the bug was that we would incorrectly create summaries for global and weak values defined in module asm (since we were essentially testing for bit 1 which is SF_Undefined, and the RecordStreamer ignores local undefined references). This would have resulted in conservatively disabling importing of anything referencing globals and weaks defined in module asm. Added these cases to the test which now fails without this bug fix. Fixes PR31459. llvm-svn: 290610
* [AArch64][AsmParser] Add support for parsing shift/extend operands with symbols.Chad Rosier2016-12-271-3/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D27953 llvm-svn: 290609
* [AMDGPU][llvm-mc] Predefined symbols to access register counts ↵Artem Tamazov2016-12-271-7/+56
| | | | | | | | | | | | | | | | | | | | | | | (.kernel.{v|s}gpr_count) The feature allows for conditional assembly, filling the entries of .amd_kernel_code_t etc. Symbols are defined with value 0 at the beginning of each kernel scope. After each register usage, the respective symbol is set to: value = max( value, ( register index + 1 ) ) Thus, at the end of scope the value represents a count of used registers. Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also dummy scope that lies from the beginning of source file til the first .amdgpu_hsa_kernel. Test added. Differential Revision: https://reviews.llvm.org/D27859 llvm-svn: 290608
* [MemDep] Operand visited twice bugfixPiotr Padlewski2016-12-271-0/+1
| | | | | | | | Because operand was not marked as seen it was visited twice. It doesn't change behavior of optimization, it just saves redudant visit, so no test changes. llvm-svn: 290607
* RuntimeDyldELF: refactor AArch64 relocations. NFC.Eugene Leviant2016-12-271-106/+44
| | | | llvm-svn: 290606
* [PM] Teach BasicAA how to invalidate its result object.Chandler Carruth2016-12-271-0/+18
| | | | | | | | | | | | | | | This requires custom handling because BasicAA caches handles to other analyses and so it needs to trigger indirect invalidation. This fixes one of the common crashes when using the new PM in real pipelines. I've also tweaked a regression test to check that we are at least handling the most immediate case. I'm going to work at re-structuring this test some to both scale better (rather than all being in one file) and check more invalidation paths in a follow-up commit, but I wanted to get the basic bug fix in place. llvm-svn: 290603
* Attempt to fix build bot after r290597Eugene Leviant2016-12-271-0/+3
| | | | llvm-svn: 290602
OpenPOWER on IntegriCloud