summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
* [InstSimplify] Use APInt::isMask isntead of manually implementing it. NFCCraig Topper2017-05-261-2/+2
| | | | llvm-svn: 303968
* [InstSimplify] Use m_ConstantInt matchers to short some code. NFCCraig Topper2017-05-261-7/+5
| | | | llvm-svn: 303967
* [IR] Add an iterator and range accessor for the PHI nodes of a basicChandler Carruth2017-05-264-7/+130
| | | | | | | | | | | | | | | | | block. This allows writing much more natural and readable range based for loops directly over the PHI nodes. It also takes advantage of the same tricks for terminating the sequence as the hand coded versions. I've replaced one example of this mostly to showcase the difference and I've added a unit test to make sure the facilities really work the way they're intended. I want to use this inside of SimpleLoopUnswitch but it seems generally nice. Differential Revision: https://reviews.llvm.org/D33533 llvm-svn: 303964
* Revert "LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEI"Matthias Braun2017-05-262-93/+27
| | | | | | | | | | Tentatively revert this to see if it fixes the buildbot stage2 breakages. This reverts commit r303938. This reverts commit r303954. llvm-svn: 303960
* Revert "LivePhysRegs: Skip reserved regs in computeLiveIns; NFCI"Matthias Braun2017-05-265-13/+7
| | | | | | | | | | Tentatively revert, suspecting that it caused breakage in stage2 buildbots. This reverts commit r303949. This reverts commit r303937. llvm-svn: 303955
* Test for r303938Matthias Braun2017-05-261-0/+52
| | | | llvm-svn: 303954
* [PM] Enable the new simple loop unswitch pass in the new pass managerChandler Carruth2017-05-262-4/+2
| | | | | | | | | (where it is the only realistic option). This passes the LLVM test suite for me, but I'm clearly still hammering on this. llvm-svn: 303952
* Tidy up RelocVisitor.h.Rui Ueyama2017-05-261-325/+188
| | | | | | | | | | | | | | Summary: RelocVisitor had too many, too small functions. This patch group them by architecture rather than each relocation type. Reviewers: grimar, dblaikie Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33580 llvm-svn: 303950
* LivePhysRegs: Follow-up to r303937Matthias Braun2017-05-261-1/+1
| | | | | | | We may have situations in which a superregister is reserved and not added to liveins, so we have to add the subregisters. llvm-svn: 303949
* [llvm-pdbdump] Don't crash when displaying padding.Zachary Turner2017-05-261-1/+2
| | | | | | | | | | | | | | | | | | We have a lot of complicated logic to determine where padding is in a record, and the debug info doesn't always provide enough information to figure it out with laser precision. In this case we were putting the padding in the wrong place causing an out of bounds access on a BitVector. Right now we decide that any trailing padding of a child type will be truncated during record layout, but this is only true insofar as the class still is sized properly to end on an alignment boundary, which the algorithm doesn't yet know about. For now, just don't crash, even though we display padding twice in this case. llvm-svn: 303946
* [Examples] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-05-267-44/+43
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 303944
* Return a lit.Test.Result object from TestRunner's executeShTest()Dimitry Andric2017-05-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For various clang analyzer tests, which were unsupported, I got lit exceptions, similar to the following: Exception during script execution: Traceback (most recent call last): File "utils/lit/lit/run.py", line 190, in execute_test result = test.config.test_format.execute(test, lit_config) File "tools/clang/test/Analysis/analyzer_test.py", line 11, in execute if result.code == lit.Test.FAIL: AttributeError: 'tuple' object has no attribute 'code' This is because executeShTest() in utils/lit/lit/TestRunner.py is supposed to return a lit.Test.Result object, but in case of unsupported tests, it returns a plain tuple. Fix this by returning a properly initialized lit.Test.Result object instead. Reviewers: rnk, rafael, modocache Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33579 llvm-svn: 303943
* Remove unused member.Zachary Turner2017-05-251-2/+0
| | | | llvm-svn: 303942
* [PPC] Add text for assert.Tim Shen2017-05-251-1/+1
| | | | llvm-svn: 303940
* LTO: Do summary-based prevailing symbol resolution at --lto-O0.Peter Collingbourne2017-05-252-13/+23
| | | | | | | | | Prevailing symbol resolution is necessary for correctness. Without this we can end up dropping a referenced linkonce symbol from the link. Differential Revision: https://reviews.llvm.org/D33570 llvm-svn: 303939
* LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEIMatthias Braun2017-05-251-27/+41
| | | | | | | | | | | | | - addLiveOutsNoPristines() needs to add callee saved registers that are actually saved and restored somewhere to the set (they are not pristine). - Cleanup/rewrite the code for addLiveOuts()/addLiveOutsNoPristines(). This fixes the problem from D32156. Differential Revision: https://reviews.llvm.org/D32464 llvm-svn: 303938
* LivePhysRegs: Skip reserved regs in computeLiveIns; NFCIMatthias Braun2017-05-255-6/+12
| | | | | | | We do not track liveness of reserved registers so adding them to the liveins list in computeLiveIns() was completely unnecessary. llvm-svn: 303937
* [CV Type Merging] Find nested type indices faster.Zachary Turner2017-05-258-355/+954
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merging two type streams is one of the most time consuming parts of generating a PDB, and as such it needs to be as fast as possible. The visitor abstractions used for interoperating nicely with many different types of inputs and outputs have been used widely and help greatly for testability and implementing tools, but the abstractions build up and get in the way of performance. This patch removes all of the visitation stuff from the type stream merger, essentially re-inventing the leaf / member switch and loop, but at a very low level. This allows us many other optimizations, such as not actually deserializing *any* records (even member records which don't describe their own length), as the operation of "figure out how long this record is" is somewhat faster than "figure out how long this record *and* get all its fields out". Furthermore, whereas before we had to deserialize, re-write type indices, then re-serialize, now we don't have to do any of those 3 steps. We just find out where the type indices are and pull them directly out of the byte stream and re-write them. This is worth a 50-60% performance increase. On top of all other optimizations that have been applied this week, I now get the following numbers when linking lld.exe and lld.pdb MSVC: 25.67s Before This Patch: 18.59s After This Patch: 8.92s So this is a huge performance win. Differential Revision: https://reviews.llvm.org/D33564 llvm-svn: 303935
* DebugInfo: Simplify scopes+subprogram handling since the subprogram<>cu link ↵David Blaikie2017-05-251-18/+8
| | | | | | | | | | | | inversion Previously this code was defensive to the situation in which the debug info scopes would lead to a different subprogram from the subprogram in the CU's subprogram list (this could've happened with linkonce functions, etc as per the comment being removed). Since the CU<>SP link reversal this is no longer possible. llvm-svn: 303933
* [PPC] Fix atomics lowering in DAG lowering.Tim Shen2017-05-252-1/+26
| | | | | | | | | | | I forgot to forward the chain, causing some missing instruction dependencies. The test crashes the compiler without this patch. Inspired by the test case, D33519 also tries to remove the extra sync. Differential Revision: https://reviews.llvm.org/D33573 llvm-svn: 303931
* Fix test to handle running on platforms which don't enable pubnames at allDavid Blaikie2017-05-251-6/+4
| | | | | | | Check that there are no entries in the pub sections, but that they may either be not present or present-but-empty. llvm-svn: 303927
* [InstCombine] Add an InstCombine specific wrapper around ↵Craig Topper2017-05-254-14/+14
| | | | | | | | isKnownToBeAPowerOfTwo to shorten code. NFC We have wrappers for several other ValueTracking methods that take care of passing all of the analysis and assumption cache parameters. This extends it to isKnownToBeAPowerOfTwo. llvm-svn: 303924
* [GVN] Add phi-translate support in scalarpre.Wei Mi2017-05-255-26/+271
| | | | | | | | | | | | | | | | | | | | | | | | | | | Right now scalarpre doesn't have phi-translate support, so it will miss some simple pre opportunities. Like the following testcase, current scalarpre cannot recognize the last "a * b" is fully redundent because a and b used by the last "a * b" expr are both defined by phis. long a[100], b[100], g1, g2, g3; __attribute__((pure)) long goo(); void foo(long a, long b, long c, long d) { g1 = a * b; if (__builtin_expect(g2 > 3, 0)) { a = c; b = d; g2 = a * b; } g3 = a * b; // fully redundant. } The patch adds phi-translate support in scalarpre. This is only a temporary solution before the newpre based on newgvn is available. Differential Revision: https://reviews.llvm.org/D32252 llvm-svn: 303923
* Add constrained intrinsics for some libm-equivalent operationsAndrew Kaylor2017-05-2516-72/+1134
| | | | | | Differential revision: https://reviews.llvm.org/D32319 llvm-svn: 303922
* CodeGen: Rename DEBUG_TYPE to match passnamesMatthias Braun2017-05-2574-162/+146
| | | | | | | | Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE instead of repeating the passname where possible. llvm-svn: 303921
* [lld] Fix a bug where we continually re-follow type servers.Zachary Turner2017-05-252-8/+8
| | | | | | | | | | | | | | | | | | | Originally this was intended to be set up so that when linking a PDB which refers to a type server, it would only visit the PDB once, and on subsequent visitations it would just skip it since all the records had already been added. Due to some C++ scoping issues, this was not occurring and it was revisiting the type server every time, which caused every record to end up being thrown away on all subsequent visitations. This doesn't affect the performance of linking clang-cl generated object files because we don't use type servers, but when linking object files and libraries generated with /Zi via MSVC, this means only 1 object file has to be linked instead of N object files, so the speedup is quite large. llvm-svn: 303920
* [CodeView Type Merging] Don't keep re-allocating temp serializer.Zachary Turner2017-05-255-21/+31
| | | | | | | | | | | | | | | | Previously, every time we wanted to serialize a field list record, we would create a new copy of FieldListRecordBuilder, which would in turn create a temporary instance of TypeSerializer, which itself had a std::vector<> that was about 128K in size. So this 128K allocation was happening every time. We can re-use the same instance over and over, we just have to clear its internal hash table and seen records list between each run. This saves us from the constant re-allocations. This is worth an ~18.5% speed increase (3.75s -> 3.05s) in my tests. Differential Revision: https://reviews.llvm.org/D33506 llvm-svn: 303919
* Make BinaryStreamReader::readCString a bit faster.Zachary Turner2017-05-251-13/+14
| | | | | | | | | | | | | | | | | | | | | Previously it would do a character by character search for a null terminator, to account for the fact that an arbitrary stream need not store its data contiguously so you couldn't just do a memchr. However, the stream API has a function which will return the longest contiguous chunk without doing a copy, and by using this function we can do a memchr on the individual chunks. For certain types of streams like data from object files etc, this is guaranteed to find the null terminator with only a single memchr, but even with discontiguous streams such as MappedBlockStream, it's rare that any given string will cross a block boundary, so even those will almost always be satisfied with a single memchr. This optimization is worth a 10-12% reduction in link time (4.2 seconds -> 3.75 seconds) Differential Revision: https://reviews.llvm.org/D33503 llvm-svn: 303918
* [pdb] pad source file name buffer at the end instead of the beginningBob Haarman2017-05-255-9/+53
| | | | | | | | | | | | | | | | | | | | | | | | Summary: DbiStreamBuilder calculated the offset of the source file names inside the file info substream as the size of the file info substream minus the size of the file names. Since the file info substream is padded to a multiple of 4 bytes, this caused the first file name to be aligned on a 4-byte boundary. By contrast, DbiModuleList would read the file names immediately after the file name offset table, without skipping to the next 4-byte boundary. This change makes it so that the file names are written to the location where DbiModuleList expects them, and puts any necessary padding for the file info substream after the file names instead of before it. Reviewers: amccarth, rnk, zturner Reviewed By: amccarth, zturner Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33475 llvm-svn: 303917
* Fix a bug in MappedBlockStream.Zachary Turner2017-05-253-52/+47
| | | | | | | | | | It was using the number of blocks of the entire PDB file as the number of blocks of each stream that was created. This was only an issue in the readLongestContiguousChunk function, which was never called prior. This bug surfaced when I updated an algorithm to use this function and the algorithm broke. llvm-svn: 303916
* [WebAssembly] MC: Include unnamed data when writing wasm filesSam Clegg2017-05-252-18/+69
| | | | | | | | | | | | Also, include global entries for all data symbols, not just external ones, since these are referenced by the relocation records. Add a test case that includes unnamed data. Differential Revision: https://reviews.llvm.org/D33079 llvm-svn: 303915
* [CodeView Type Merging] Avoid record deserialization when possible.Zachary Turner2017-05-2510-148/+288
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A profile shows the majority of time doing type merging is spent deserializing records from sequences of bytes into friendly C++ structures that we can easily access members of in order to find the type indices to re-write. Records are prefixed with their length, however, and most records have type indices that appear at fixed offsets in the record. For these records, we can save some cycles by just looking at the right place in the byte sequence and re-writing the value, then skipping the record in the type stream. This saves us from the costly deserialization of examining every field, including potentially null terminated strings which are the slowest, even though it was unnecessary to begin with. In addition, we apply another optimization. Previously, after deserializing a record and re-writing its type indices, we would unconditionally re-serialize it in order to compute the hash of the re-written record. This would result in an alloc and memcpy for every record. If no type indices were re-written, however, this was an unnecessary allocation. In this patch re-writing is made two phase. The first phase discovers the indices that need to be rewritten and their new values. This information is passed through to the de-duplication code, which only copies and re-writes type indices in the serialized byte sequence if at least one type index is different. Some records have type indices which only appear after variable length strings, or which have lists of type indices, or various other situations that can make it tricky to make this optimization. While I'm not giving up on optimizing these cases as well, for now we can get the easy cases out of the way and lay the groundwork for more complicated cases later. This patch yields another 50% speedup on top of the already large speedups submitted over the past 2 days. In two tests I have run, I went from 9 seconds to 3 seconds, and from 16 seconds to 8 seconds. Differential Revision: https://reviews.llvm.org/D33480 llvm-svn: 303914
* Update the documentation and CMake file for Visual Studio generators.Aaron Ballman2017-05-252-0/+11
| | | | | | By default, CMake uses a 32-bit toolchain, even when on a 64-bit platform targeting a 64-bit build. However, due to the size of the binaries involved, this can cause linker instabilities (such as the linker running out of memory). Guide people to the correct solution to get CMake to use the native toolchain. llvm-svn: 303912
* PPC: Correct Size for GETtlsADDRKyle Butt2017-05-251-1/+3
| | | | | | | | | PPC::GETtlsADDR is lowered to a branch and a nop, by the assembly printer. Its size was incorrectly marked as 4, correct it to 8. The incorrect size can cause incorrect branch relaxation in PPCBranchSelector under the right conditions. llvm-svn: 303904
* Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.Nico Weber2017-05-253-28/+1
| | | | llvm-svn: 303902
* [AArch64]: add 'a' inline asm operand modifier.Manoj Gupta2017-05-252-1/+11
| | | | | | | | | | | | | | | | Summary: This is used in the Linux kernel, and effectively just means "print an address". This brings back r193593. Reviewed by: Renato Golin Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls Subscribers: aemerson, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D33558 llvm-svn: 303901
* Fix SelectionDAGBuilder::getDbgValue to not expect DW_OP_deref on FI varsAdrian Prantl2017-05-253-78/+93
| | | | | | | | | | | | This fixes an oversight in r300522, which changed alloca dbg.values to no longer emit a DW_OP_deref. The array.ll testcase was regenerated from source. Fixes PR33166: https://bugs.llvm.org/show_bug.cgi?id=33166 llvm-svn: 303897
* Delete an obsolete paragraph in LangRef.Adrian Prantl2017-05-251-6/+0
| | | | llvm-svn: 303896
* DebugInfo: Produce debug_{gnu_}pub{names,types} entries when explicitly ↵David Blaikie2017-05-258-24/+113
| | | | | | | | | | | | | | | | | requested, even in -gmlt or when empty Turns out gold doesn't use the DW_AT_GNU_pubnames to decide whether to parse the rest of the DIEs when building gdb-index. This causes gold to trip over LLVM's output when there are DW_FORM_ref_addr present. Gold does use the presence of a debug_gnu_pub{names,types} entry for the CU to skip parsing the debug_info portion, so make sure that's included even when empty (technically, when empty there couldn't be any ref_addr anyway - it only came up when gmlt didn't produce any (even non-empty) pubnames - but given what that reveals about gold's implementation, this seems like a good thing to do for consistency). llvm-svn: 303894
* [llvm-pdbdump] [yaml2pdb] always include object file name in module infoBob Haarman2017-05-252-1/+15
| | | | | | | | | | | | | | | | | | | | | | | Summary: Previously, the yaml2pdb subcommand of llvm-pdbdump only included object file names in module info if a module info stream was present. This change makes it so that we include the object file name even if there is no module info stream for the module. As a result, running llvm-pdbdump pdb2yaml -dbi-module-info original.pdb > original.yaml && llvm-pdbdump yaml2pdb -pdb=new.pdb original.yaml && llvm-pdbdump pdb2yaml -dbi-module-info new.pdb > new.yaml now produces identical original.yaml and new.yaml files. Reviewers: amccarth, zturner Reviewed By: zturner Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D33463 llvm-svn: 303891
* NewGVN: Fix PR 33119, PR 33129, due to regressed undef handlingDaniel Berlin2017-05-252-23/+44
| | | | | | Fix PR33120 and others by eliminating self-cycles a different way. llvm-svn: 303875
* [InstCombine] Teach isAllocSiteRemovable to look through addrspacecastsArtur Pilipenko2017-05-252-3/+8
| | | | | | | | Reviewed By: reames Differential Revision: https://reviews.llvm.org/D28565 llvm-svn: 303870
* [InstCombine] make icmp-mul fold more efficientSanjay Patel2017-05-252-6/+8
| | | | | | | | | | | There's probably a lot more like this (see also comments in D33338 about responsibility), but I suspect we don't usually get a visible manifestation. Given the recent interest in improving InstCombine efficiency, another potential micro-opt that could be repeated several times in this function: morph the existing icmp pred/operands instead of creating a new instruction. llvm-svn: 303860
* [AMDGPU] add intrinsic for s_getpcTim Corringham2017-05-253-1/+28
| | | | | | | | | | | | | | Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32862 llvm-svn: 303859
* [X86] Adding vpopcntd and vpopcntq instructionsOren Ben Simhon2017-05-2517-30/+997
| | | | | | | | | AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858
* [GVNSink] Pacify MSVCJames Molloy2017-05-251-1/+1
| | | | | | Don't convert an unsigned to a pointer for a sentinel, use a size_t instead. llvm-svn: 303855
* [GVNSink] Don't define operator<< in NDEBUGJames Molloy2017-05-251-0/+2
| | | | | | | Without debug macros enabled, the raw_ostream operator<< overload is unused. llvm-svn: 303852
* [GVNSink] GVNSink passJames Molloy2017-05-2514-48/+1825
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts. This pass attempts to sink instructions into successors, reducing static instruction count and enabling if-conversion. We use a variant of global value numbering to decide what can be sunk. Consider: [ %a1 = add i32 %b, 1 ] [ %c1 = add i32 %d, 1 ] [ %a2 = xor i32 %a1, 1 ] [ %c2 = xor i32 %c1, 1 ] \ / [ %e = phi i32 %a2, %c2 ] [ add i32 %e, 4 ] GVN would number %a1 and %c1 differently because they compute different results - the VN of an instruction is a function of its opcode and the transitive closure of its operands. This is the key property for hoisting and CSE. What we want when sinking however is for a numbering that is a function of the *uses* of an instruction, which allows us to answer the question "if I replace %a1 with %c1, will it contribute in an equivalent way to all successive instructions?". The (new) PostValueTable class in GVN provides this mapping. This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG. Differential revision: https://reviews.llvm.org/D24805 llvm-svn: 303850
* [PM] Teach the PGO instrumentation pasess to run GlobalDCE beforeChandler Carruth2017-05-252-5/+19
| | | | | | | | | | | | | | | | | | | | | instrumenting code. This is important in the new pass manager. The old pass manager's inliner has a small DCE routine embedded within it. The new pass manager relies on the actual GlobalDCE pass for this. Without this patch, instrumentation profiling with the new PM results in massive code bloat in the object files because the instrumentation itself ends up preventing DCE from working to remove the code. We should probably change the instrumentation (and/or DCE) so that we can eliminate dead code even if instrumented, but we shouldn't even spend the time generating instrumentation for that code so this still seems like a good patch. Differential Revision: https://reviews.llvm.org/D33535 llvm-svn: 303845
* [PM/Unswitch] Fix a bug in the domtree update logic for the new unswitchChandler Carruth2017-05-252-14/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | pass. The original logic only considered direct successors of the hoisted domtree nodes, but that isn't really enough. If there are other basic blocks that are completely within the subtree, their successors could just as easily be impacted by the hoisting. The more I think about it, the more I think the correct update here is to hoist every block on the dominance frontier which has an idom in the chain we hoist across. However, this is subtle enough that I'd definitely appreciate some more eyes on it. Sadly, if this is the correct algorithm, it requires computing a (highly localized) dominance frontier. I've done this in the simplest (IE, least code) way I could come up with, but that may be too naive. Suggestions welcome here, dominance update algorithms are not an area I've studied much, so I don't have strong opinions. In good news, with this patch, turning on simple unswitch passes the LLVM test suite for me with asserts enabled. Differential Revision: https://reviews.llvm.org/D32740 llvm-svn: 303843
OpenPOWER on IntegriCloud