summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Re-enable "[SCEV] Do not fold dominated SCEVUnknown into AddRecExpr start"Max Kazantsev2017-05-261-2/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch rL303730 was reverted because test lsr-expand-quadratic.ll failed on many non-X86 configs with this patch. The reason of this is that the patch makes a correctless fix that changes optimizer's behavior for this test. Without the change, LSR was making an overconfident simplification basing on a wrong SCEV. Apparently it did not need the IV analysis to do this. With the change, it chose a different way to simplify (that wasn't so confident), and this way required the IV analysis. Now, following the right execution path, LSR tries to make a transformation relying on IV Users analysis. This analysis is target-dependent due to this code: // LSR is not APInt clean, do not touch integers bigger than 64-bits. // Also avoid creating IVs of non-native types. For example, we don't want a // 64-bit IV in 32-bit code just because the loop has one 64-bit cast. uint64_t Width = SE->getTypeSizeInBits(I->getType()); if (Width > 64 || !DL.isLegalInteger(Width)) return false; To make a proper transformation in this test case, the type i32 needs to be legal for the specified data layout. When the test runs on some non-X86 configuration (e.g. pure ARM 64), opt gets confused by the specified target and does not use it, rejecting the specified data layout as well. Instead, it uses some default layout that does not treat i32 as a legal type (currently the layout that is used when it is not specified does not have legal types at all). As result, the transformation we expect to happen does not happen for this test. This re-enabling patch does not have any source code changes compared to the original patch rL303730. The only difference is that the failing test is moved to X86 directory and now has requirement of running on x86 only to comply with the specified target triple and data layout. Differential Revision: https://reviews.llvm.org/D33543 llvm-svn: 303971
* LivePhysRegs: Skip reserved regs in computeLiveIns; NFCIMatthias Braun2017-05-264-6/+12
| | | | | | | | | | Re-commit r303937 + r303949 as they were not the cause for the build failures. We do not track liveness of reserved registers so adding them to the liveins list in computeLiveIns() was completely unnecessary. llvm-svn: 303970
* Revert rL303923 since it broke the sanitizer bootstrap build bot.Wei Mi2017-05-261-136/+21
| | | | llvm-svn: 303969
* [InstSimplify] Use APInt::isMask isntead of manually implementing it. NFCCraig Topper2017-05-261-2/+2
| | | | llvm-svn: 303968
* [InstSimplify] Use m_ConstantInt matchers to short some code. NFCCraig Topper2017-05-261-7/+5
| | | | llvm-svn: 303967
* [IR] Add an iterator and range accessor for the PHI nodes of a basicChandler Carruth2017-05-261-7/+9
| | | | | | | | | | | | | | | | | block. This allows writing much more natural and readable range based for loops directly over the PHI nodes. It also takes advantage of the same tricks for terminating the sequence as the hand coded versions. I've replaced one example of this mostly to showcase the difference and I've added a unit test to make sure the facilities really work the way they're intended. I want to use this inside of SimpleLoopUnswitch but it seems generally nice. Differential Revision: https://reviews.llvm.org/D33533 llvm-svn: 303964
* Revert "LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEI"Matthias Braun2017-05-261-41/+27
| | | | | | | | | | Tentatively revert this to see if it fixes the buildbot stage2 breakages. This reverts commit r303938. This reverts commit r303954. llvm-svn: 303960
* Revert "LivePhysRegs: Skip reserved regs in computeLiveIns; NFCI"Matthias Braun2017-05-264-12/+6
| | | | | | | | | | Tentatively revert, suspecting that it caused breakage in stage2 buildbots. This reverts commit r303949. This reverts commit r303937. llvm-svn: 303955
* [PM] Enable the new simple loop unswitch pass in the new pass managerChandler Carruth2017-05-261-4/+1
| | | | | | | | | (where it is the only realistic option). This passes the LLVM test suite for me, but I'm clearly still hammering on this. llvm-svn: 303952
* LivePhysRegs: Follow-up to r303937Matthias Braun2017-05-261-1/+1
| | | | | | | We may have situations in which a superregister is reserved and not added to liveins, so we have to add the subregisters. llvm-svn: 303949
* Remove unused member.Zachary Turner2017-05-251-2/+0
| | | | llvm-svn: 303942
* [PPC] Add text for assert.Tim Shen2017-05-251-1/+1
| | | | llvm-svn: 303940
* LTO: Do summary-based prevailing symbol resolution at --lto-O0.Peter Collingbourne2017-05-251-13/+12
| | | | | | | | | Prevailing symbol resolution is necessary for correctness. Without this we can end up dropping a referenced linkonce symbol from the link. Differential Revision: https://reviews.llvm.org/D33570 llvm-svn: 303939
* LivePhysRegs: Fix addLiveOutsNoPristines() for return blocks past PEIMatthias Braun2017-05-251-27/+41
| | | | | | | | | | | | | - addLiveOutsNoPristines() needs to add callee saved registers that are actually saved and restored somewhere to the set (they are not pristine). - Cleanup/rewrite the code for addLiveOuts()/addLiveOutsNoPristines(). This fixes the problem from D32156. Differential Revision: https://reviews.llvm.org/D32464 llvm-svn: 303938
* LivePhysRegs: Skip reserved regs in computeLiveIns; NFCIMatthias Braun2017-05-254-5/+11
| | | | | | | We do not track liveness of reserved registers so adding them to the liveins list in computeLiveIns() was completely unnecessary. llvm-svn: 303937
* [CV Type Merging] Find nested type indices faster.Zachary Turner2017-05-253-347/+413
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merging two type streams is one of the most time consuming parts of generating a PDB, and as such it needs to be as fast as possible. The visitor abstractions used for interoperating nicely with many different types of inputs and outputs have been used widely and help greatly for testability and implementing tools, but the abstractions build up and get in the way of performance. This patch removes all of the visitation stuff from the type stream merger, essentially re-inventing the leaf / member switch and loop, but at a very low level. This allows us many other optimizations, such as not actually deserializing *any* records (even member records which don't describe their own length), as the operation of "figure out how long this record is" is somewhat faster than "figure out how long this record *and* get all its fields out". Furthermore, whereas before we had to deserialize, re-write type indices, then re-serialize, now we don't have to do any of those 3 steps. We just find out where the type indices are and pull them directly out of the byte stream and re-write them. This is worth a 50-60% performance increase. On top of all other optimizations that have been applied this week, I now get the following numbers when linking lld.exe and lld.pdb MSVC: 25.67s Before This Patch: 18.59s After This Patch: 8.92s So this is a huge performance win. Differential Revision: https://reviews.llvm.org/D33564 llvm-svn: 303935
* DebugInfo: Simplify scopes+subprogram handling since the subprogram<>cu link ↵David Blaikie2017-05-251-18/+8
| | | | | | | | | | | | inversion Previously this code was defensive to the situation in which the debug info scopes would lead to a different subprogram from the subprogram in the CU's subprogram list (this could've happened with linkonce functions, etc as per the comment being removed). Since the CU<>SP link reversal this is no longer possible. llvm-svn: 303933
* [PPC] Fix atomics lowering in DAG lowering.Tim Shen2017-05-251-1/+3
| | | | | | | | | | | I forgot to forward the chain, causing some missing instruction dependencies. The test crashes the compiler without this patch. Inspired by the test case, D33519 also tries to remove the extra sync. Differential Revision: https://reviews.llvm.org/D33573 llvm-svn: 303931
* [InstCombine] Add an InstCombine specific wrapper around ↵Craig Topper2017-05-254-14/+14
| | | | | | | | isKnownToBeAPowerOfTwo to shorten code. NFC We have wrappers for several other ValueTracking methods that take care of passing all of the analysis and assumption cache parameters. This extends it to isKnownToBeAPowerOfTwo. llvm-svn: 303924
* [GVN] Add phi-translate support in scalarpre.Wei Mi2017-05-251-21/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | Right now scalarpre doesn't have phi-translate support, so it will miss some simple pre opportunities. Like the following testcase, current scalarpre cannot recognize the last "a * b" is fully redundent because a and b used by the last "a * b" expr are both defined by phis. long a[100], b[100], g1, g2, g3; __attribute__((pure)) long goo(); void foo(long a, long b, long c, long d) { g1 = a * b; if (__builtin_expect(g2 > 3, 0)) { a = c; b = d; g2 = a * b; } g3 = a * b; // fully redundant. } The patch adds phi-translate support in scalarpre. This is only a temporary solution before the newpre based on newgvn is available. Differential Revision: https://reviews.llvm.org/D32252 llvm-svn: 303923
* Add constrained intrinsics for some libm-equivalent operationsAndrew Kaylor2017-05-257-64/+233
| | | | | | Differential revision: https://reviews.llvm.org/D32319 llvm-svn: 303922
* CodeGen: Rename DEBUG_TYPE to match passnamesMatthias Braun2017-05-2551-128/+112
| | | | | | | | Rename the DEBUG_TYPE to match the names of corresponding passes where it makes sense. Also establish the pattern of simply referencing DEBUG_TYPE instead of repeating the passname where possible. llvm-svn: 303921
* [lld] Fix a bug where we continually re-follow type servers.Zachary Turner2017-05-251-5/+6
| | | | | | | | | | | | | | | | | | | Originally this was intended to be set up so that when linking a PDB which refers to a type server, it would only visit the PDB once, and on subsequent visitations it would just skip it since all the records had already been added. Due to some C++ scoping issues, this was not occurring and it was revisiting the type server every time, which caused every record to end up being thrown away on all subsequent visitations. This doesn't affect the performance of linking clang-cl generated object files because we don't use type servers, but when linking object files and libraries generated with /Zi via MSVC, this means only 1 object file has to be linked instead of N object files, so the speedup is quite large. llvm-svn: 303920
* [CodeView Type Merging] Don't keep re-allocating temp serializer.Zachary Turner2017-05-253-11/+21
| | | | | | | | | | | | | | | | Previously, every time we wanted to serialize a field list record, we would create a new copy of FieldListRecordBuilder, which would in turn create a temporary instance of TypeSerializer, which itself had a std::vector<> that was about 128K in size. So this 128K allocation was happening every time. We can re-use the same instance over and over, we just have to clear its internal hash table and seen records list between each run. This saves us from the constant re-allocations. This is worth an ~18.5% speed increase (3.75s -> 3.05s) in my tests. Differential Revision: https://reviews.llvm.org/D33506 llvm-svn: 303919
* Make BinaryStreamReader::readCString a bit faster.Zachary Turner2017-05-251-13/+14
| | | | | | | | | | | | | | | | | | | | | Previously it would do a character by character search for a null terminator, to account for the fact that an arbitrary stream need not store its data contiguously so you couldn't just do a memchr. However, the stream API has a function which will return the longest contiguous chunk without doing a copy, and by using this function we can do a memchr on the individual chunks. For certain types of streams like data from object files etc, this is guaranteed to find the null terminator with only a single memchr, but even with discontiguous streams such as MappedBlockStream, it's rare that any given string will cross a block boundary, so even those will almost always be satisfied with a single memchr. This optimization is worth a 10-12% reduction in link time (4.2 seconds -> 3.75 seconds) Differential Revision: https://reviews.llvm.org/D33503 llvm-svn: 303918
* [pdb] pad source file name buffer at the end instead of the beginningBob Haarman2017-05-251-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | Summary: DbiStreamBuilder calculated the offset of the source file names inside the file info substream as the size of the file info substream minus the size of the file names. Since the file info substream is padded to a multiple of 4 bytes, this caused the first file name to be aligned on a 4-byte boundary. By contrast, DbiModuleList would read the file names immediately after the file name offset table, without skipping to the next 4-byte boundary. This change makes it so that the file names are written to the location where DbiModuleList expects them, and puts any necessary padding for the file info substream after the file names instead of before it. Reviewers: amccarth, rnk, zturner Reviewed By: amccarth, zturner Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33475 llvm-svn: 303917
* Fix a bug in MappedBlockStream.Zachary Turner2017-05-251-17/+15
| | | | | | | | | | It was using the number of blocks of the entire PDB file as the number of blocks of each stream that was created. This was only an issue in the readLongestContiguousChunk function, which was never called prior. This bug surfaced when I updated an algorithm to use this function and the algorithm broke. llvm-svn: 303916
* [WebAssembly] MC: Include unnamed data when writing wasm filesSam Clegg2017-05-251-18/+16
| | | | | | | | | | | | Also, include global entries for all data symbols, not just external ones, since these are referenced by the relocation records. Add a test case that includes unnamed data. Differential Revision: https://reviews.llvm.org/D33079 llvm-svn: 303915
* [CodeView Type Merging] Avoid record deserialization when possible.Zachary Turner2017-05-254-145/+250
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A profile shows the majority of time doing type merging is spent deserializing records from sequences of bytes into friendly C++ structures that we can easily access members of in order to find the type indices to re-write. Records are prefixed with their length, however, and most records have type indices that appear at fixed offsets in the record. For these records, we can save some cycles by just looking at the right place in the byte sequence and re-writing the value, then skipping the record in the type stream. This saves us from the costly deserialization of examining every field, including potentially null terminated strings which are the slowest, even though it was unnecessary to begin with. In addition, we apply another optimization. Previously, after deserializing a record and re-writing its type indices, we would unconditionally re-serialize it in order to compute the hash of the re-written record. This would result in an alloc and memcpy for every record. If no type indices were re-written, however, this was an unnecessary allocation. In this patch re-writing is made two phase. The first phase discovers the indices that need to be rewritten and their new values. This information is passed through to the de-duplication code, which only copies and re-writes type indices in the serialized byte sequence if at least one type index is different. Some records have type indices which only appear after variable length strings, or which have lists of type indices, or various other situations that can make it tricky to make this optimization. While I'm not giving up on optimizing these cases as well, for now we can get the easy cases out of the way and lay the groundwork for more complicated cases later. This patch yields another 50% speedup on top of the already large speedups submitted over the past 2 days. In two tests I have run, I went from 9 seconds to 3 seconds, and from 16 seconds to 8 seconds. Differential Revision: https://reviews.llvm.org/D33480 llvm-svn: 303914
* PPC: Correct Size for GETtlsADDRKyle Butt2017-05-251-1/+3
| | | | | | | | | PPC::GETtlsADDR is lowered to a branch and a nop, by the assembly printer. Its size was incorrectly marked as 4, correct it to 8. The incorrect size can cause incorrect branch relaxation in PPCBranchSelector under the right conditions. llvm-svn: 303904
* Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.Nico Weber2017-05-251-3/+1
| | | | llvm-svn: 303902
* [AArch64]: add 'a' inline asm operand modifier.Manoj Gupta2017-05-251-1/+4
| | | | | | | | | | | | | | | | Summary: This is used in the Linux kernel, and effectively just means "print an address". This brings back r193593. Reviewed by: Renato Golin Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls Subscribers: aemerson, javed.absar, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D33558 llvm-svn: 303901
* Fix SelectionDAGBuilder::getDbgValue to not expect DW_OP_deref on FI varsAdrian Prantl2017-05-251-14/+5
| | | | | | | | | | | | This fixes an oversight in r300522, which changed alloca dbg.values to no longer emit a DW_OP_deref. The array.ll testcase was regenerated from source. Fixes PR33166: https://bugs.llvm.org/show_bug.cgi?id=33166 llvm-svn: 303897
* DebugInfo: Produce debug_{gnu_}pub{names,types} entries when explicitly ↵David Blaikie2017-05-254-20/+27
| | | | | | | | | | | | | | | | | requested, even in -gmlt or when empty Turns out gold doesn't use the DW_AT_GNU_pubnames to decide whether to parse the rest of the DIEs when building gdb-index. This causes gold to trip over LLVM's output when there are DW_FORM_ref_addr present. Gold does use the presence of a debug_gnu_pub{names,types} entry for the CU to skip parsing the debug_info portion, so make sure that's included even when empty (technically, when empty there couldn't be any ref_addr anyway - it only came up when gmlt didn't produce any (even non-empty) pubnames - but given what that reveals about gold's implementation, this seems like a good thing to do for consistency). llvm-svn: 303894
* NewGVN: Fix PR 33119, PR 33129, due to regressed undef handlingDaniel Berlin2017-05-251-22/+42
| | | | | | Fix PR33120 and others by eliminating self-cycles a different way. llvm-svn: 303875
* [InstCombine] Teach isAllocSiteRemovable to look through addrspacecastsArtur Pilipenko2017-05-251-1/+3
| | | | | | | | Reviewed By: reames Differential Revision: https://reviews.llvm.org/D28565 llvm-svn: 303870
* [InstCombine] make icmp-mul fold more efficientSanjay Patel2017-05-251-5/+7
| | | | | | | | | | | There's probably a lot more like this (see also comments in D33338 about responsibility), but I suspect we don't usually get a visible manifestation. Given the recent interest in improving InstCombine efficiency, another potential micro-opt that could be repeated several times in this function: morph the existing icmp pred/operands instead of creating a new instruction. llvm-svn: 303860
* [AMDGPU] add intrinsic for s_getpcTim Corringham2017-05-251-1/+3
| | | | | | | | | | | | | | Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D32862 llvm-svn: 303859
* [X86] Adding vpopcntd and vpopcntq instructionsOren Ben Simhon2017-05-258-0/+60
| | | | | | | | | AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858
* [GVNSink] Pacify MSVCJames Molloy2017-05-251-1/+1
| | | | | | Don't convert an unsigned to a pointer for a sentinel, use a size_t instead. llvm-svn: 303855
* [GVNSink] Don't define operator<< in NDEBUGJames Molloy2017-05-251-0/+2
| | | | | | | Without debug macros enabled, the raw_ostream operator<< overload is unused. llvm-svn: 303852
* [GVNSink] GVNSink passJames Molloy2017-05-256-47/+926
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts. This pass attempts to sink instructions into successors, reducing static instruction count and enabling if-conversion. We use a variant of global value numbering to decide what can be sunk. Consider: [ %a1 = add i32 %b, 1 ] [ %c1 = add i32 %d, 1 ] [ %a2 = xor i32 %a1, 1 ] [ %c2 = xor i32 %c1, 1 ] \ / [ %e = phi i32 %a2, %c2 ] [ add i32 %e, 4 ] GVN would number %a1 and %c1 differently because they compute different results - the VN of an instruction is a function of its opcode and the transitive closure of its operands. This is the key property for hoisting and CSE. What we want when sinking however is for a numbering that is a function of the *uses* of an instruction, which allows us to answer the question "if I replace %a1 with %c1, will it contribute in an equivalent way to all successive instructions?". The (new) PostValueTable class in GVN provides this mapping. This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG. Differential revision: https://reviews.llvm.org/D24805 llvm-svn: 303850
* [PM] Teach the PGO instrumentation pasess to run GlobalDCE beforeChandler Carruth2017-05-251-0/+5
| | | | | | | | | | | | | | | | | | | | | instrumenting code. This is important in the new pass manager. The old pass manager's inliner has a small DCE routine embedded within it. The new pass manager relies on the actual GlobalDCE pass for this. Without this patch, instrumentation profiling with the new PM results in massive code bloat in the object files because the instrumentation itself ends up preventing DCE from working to remove the code. We should probably change the instrumentation (and/or DCE) so that we can eliminate dead code even if instrumented, but we shouldn't even spend the time generating instrumentation for that code so this still seems like a good patch. Differential Revision: https://reviews.llvm.org/D33535 llvm-svn: 303845
* [PM/Unswitch] Fix a bug in the domtree update logic for the new unswitchChandler Carruth2017-05-251-14/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | pass. The original logic only considered direct successors of the hoisted domtree nodes, but that isn't really enough. If there are other basic blocks that are completely within the subtree, their successors could just as easily be impacted by the hoisting. The more I think about it, the more I think the correct update here is to hoist every block on the dominance frontier which has an idom in the chain we hoist across. However, this is subtle enough that I'd definitely appreciate some more eyes on it. Sadly, if this is the correct algorithm, it requires computing a (highly localized) dominance frontier. I've done this in the simplest (IE, least code) way I could come up with, but that may be too naive. Suggestions welcome here, dominance update algorithms are not an area I've studied much, so I don't have strong opinions. In good news, with this patch, turning on simple unswitch passes the LLVM test suite for me with asserts enabled. Differential Revision: https://reviews.llvm.org/D32740 llvm-svn: 303843
* [LegacyPM] Make the 'addLoop' method accept a loop to add rather thanChandler Carruth2017-05-253-17/+20
| | | | | | | | | | | | | | having it internally allocate the loop. This is a much more flexible API and necessary in the new loop unswitch to reasonably support both new and old PMs in common code. It also just seems like a cleaner separation of concerns. NFC, this should just be a pure refactoring. Differential Revision: https://reviews.llvm.org/D33528 llvm-svn: 303834
* [libFuzzer] Don't replace custom signal handlers.Vitaly Buka2017-05-252-2/+17
| | | | | | | | | | | | | | | Summary: This allows to keep handlers installed by sanitizers. In other cases third-party code can replace handlers after libFuzzer initialization anyway. Reviewers: kcc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33522 llvm-svn: 303828
* Fix coverage check for full post-dominator basic blocks.George Karpenkov2017-05-251-1/+4
| | | | | | | | | | | Coverage instrumentation which does not instrument full post-dominators and full-dominators may skip valid paths, as the reasoning for skipping blocks may become circular. This patch fixes that, by only skipping full post-dominators with multiple predecessors, as such predecessors by definition can not be full-dominators. llvm-svn: 303827
* [coroutines] CoroFrame.cpp conform to coding convention (s/repeat/Repeat) (NFC)Gor Nishanov2017-05-251-3/+2
| | | | llvm-svn: 303826
* [coroutines] Relocate instructions that maybe spilled after coro.beginGor Nishanov2017-05-251-0/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Frontend generates store instructions after allocas, for example: ``` define i8* @f(i64 %this) "coroutine.presplit"="1" personality i32 0 { entry: %this.addr = alloca i64 store i64 %this, i64* %this.addr .. %hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc) ``` Such instructions may require spilling into coro.frame, but, coro-frame address is only available after coro.begin and thus needs to be moved after coro.begin. The only instructions that should not be moved are the arguments of coro.begin and all of their operands. Reviewers: GorNishanov, majnemer Reviewed By: GorNishanov Subscribers: llvm-commits, EricWF Differential Revision: https://reviews.llvm.org/D33527 llvm-svn: 303825
* [PowerPC] Fix a performance bug for PPC::XXSLDWI.Tony Jiang2017-05-244-4/+96
| | | | | | | | There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI instruction, this patch recognizes them and does the selection to improve the PPC performance. llvm-svn: 303822
OpenPOWER on IntegriCloud