summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [Hexagon] Fixes to the bitsplit generationKrzysztof Parzyszek2017-03-091-11/+46
| | | | | | | | - Fix the insertion point, which occasionally could have been incorrect. - Avoid creating multiple bitsplits with the same operands, if an old one could be reused. llvm-svn: 297414
* GlobalISel: inform FrameLowering when we emit a function call.Tim Northover2017-03-092-0/+2
| | | | | | | Amongst other things (I expect) this is necessary to ensure decent backtraces when an "unreachable" is involved. llvm-svn: 297413
* [InstSimplify] allow folds for bool vector div/remSanjay Patel2017-03-091-3/+3
| | | | llvm-svn: 297411
* GlobalISel: put debug info for static allocas in the MachineFunction.Tim Northover2017-03-092-8/+9
| | | | | | | | | | | | | | The good reason to do this is that static allocas are pretty simple to handle (especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline should give some kind of performance benefit. The bad reason is that the debug pipeline is an unholy mess of implicit contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a load or not involves the services of at least 3 soothsayers and the sacrifice of at least one chicken. And it still gets it wrong if the variable is at SP directly. llvm-svn: 297410
* [ConstantFold] vector div/rem with any zero element in divisor is undefSanjay Patel2017-03-091-4/+9
| | | | | | | | Follow-up for: https://reviews.llvm.org/D30665 https://reviews.llvm.org/rL297390 llvm-svn: 297409
* AMDGPU: Support for SimplifyDemandedVectorElts for load intrinsicsMatt Arsenault2017-03-091-0/+41
| | | | llvm-svn: 297408
* [DAGCombiner] Do various combine on usubo.Amaury Sechet2017-03-091-0/+34
| | | | | | | | | | | | Summary: This essentially does the same transform as for SUBC. Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30437 llvm-svn: 297404
* [Hexagon] Refactor the DAG preprocessing code, NFCKrzysztof Parzyszek2017-03-091-32/+88
| | | | | | Extract individual transformations into their own functions. llvm-svn: 297401
* Minor format change. nfc.Rong Xu2017-03-091-5/+5
| | | | llvm-svn: 297400
* [Hexagon] Add -mhvx option to the Hexagon backendKrzysztof Parzyszek2017-03-091-2/+8
| | | | llvm-svn: 297393
* [Hexagon] Propagate zext of i1 into arithmetic code in selection DAGKrzysztof Parzyszek2017-03-091-0/+96
| | | | | | | (op ... (zext i1 c) ...) -> (select c (op ... 1 ...), (op ... 0 ...)) llvm-svn: 297391
* [InstSimplify] vector div/rem with any zero element in divisor is undefSanjay Patel2017-03-091-0/+11
| | | | | | | | | | | This was suggested as a DAG simplification in the review for rL297026 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435253.html ...but let's start with IR since we have actual docs for IR (LangRef). Differential Revision: https://reviews.llvm.org/D30665 llvm-svn: 297390
* [DAG] recognize div/rem by 0 as undef before trying constant foldingSanjay Patel2017-03-092-11/+17
| | | | | | | | | | | | | | | | | | | | As discussed in the review thread for rL297026, this is actually 2 changes that would independently fix all of the test cases in the patch: 1. Return undef in FoldConstantArithmetic for div/rem by 0. 2. Move basic undef simplifications for div/rem (simplifyDivRem()) before foldBinopIntoSelect() as a matter of efficiency. I will handle the case of vectors with any zero element as a follow-up. That change is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic(). I'm deleting the test for PR30693 because it does not test for the actual bug any more (dangers of using bugpoint). Differential Revision: https://reviews.llvm.org/D30741 llvm-svn: 297384
* [X86][SSE] Speed up constant pool shuffle mask decoding with direct copy ↵Simon Pilgrim2017-03-091-7/+27
| | | | | | | | (PR32037). If the constants are already the correct size, we can copy them directly into the shuffle mask. llvm-svn: 297381
* [mips] Revert fixes for PR32020.Simon Dardis2017-03-097-400/+162
| | | | | | | | | | | | | | | The fix introduces segfaults and clobbers the value to be stored when the atomic sequence loops. Revert "[Target/MIPS] Kill dead code, no functional change intended." This reverts commit r296153. Revert "Recommit "[mips] Fix atomic compare and swap at O0."" This reverts commit r296134. llvm-svn: 297380
* Fixed typos in comments. NFCI.Simon Pilgrim2017-03-091-6/+6
| | | | llvm-svn: 297379
* fix build on CygwinNuno Lopes2017-03-091-0/+3
| | | | llvm-svn: 297378
* [ARM] remove FIXMEs and add vcmp MC testSjoerd Meijer2017-03-091-5/+0
| | | | | | | | | Minor cleanup in ARMInstrVFP.td: removed some FIXMEs and added a MC test for vcmp that was actually missing. Differential Revision: https://reviews.llvm.org/D30745 llvm-svn: 297376
* [PM/Inliner] Make the new PM's inliner process call edges across anChandler Carruth2017-03-091-29/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | entire SCC before iterating on newly-introduced call edges resulting from any inlined function bodies. This more closely matches the behavior of the old PM's inliner. While it wasn't really clear to me initially, this behavior is actually essential to the inliner behaving reasonably in its current design. Because the inliner is fundamentally a bottom-up inliner and all of its cost modeling is designed around that it often runs into trouble within an SCC where we don't have any meaningful bottom-up ordering to use. In addition to potentially cyclic, infinite inlining that we block with the inline history mechanism, it can also take seemingly simple call graph patterns within an SCC and turn them into *insanely* large functions by accidentally working top-down across the SCC without any of the threshold limitations that traditional top-down inliners use. Consider this diabolical monster.cpp file that Richard Smith came up with to help demonstrate this issue: ``` template <int N> extern const char *str; void g(const char *); template <bool K, int N> void f(bool *B, bool *E) { if (K) g(str<N>); if (B == E) return; if (*B) f<true, N + 1>(B + 1, E); else f<false, N + 1>(B + 1, E); } template <> void f<false, MAX>(bool *B, bool *E) { return f<false, 0>(B, E); } template <> void f<true, MAX>(bool *B, bool *E) { return f<true, 0>(B, E); } extern bool *arr, *end; void test() { f<false, 0>(arr, end); } ``` When compiled with '-DMAX=N' for various values of N, this will create an SCC with a reasonably large number of functions. Previously, the inliner would try to exhaust the inlining candidates in a single function before moving on. This, unfortunately, turns it into a top-down inliner within the SCC. Because our thresholds were never built for that, we will incrementally decide that it is always worth inlining and proceed to flatten the entire SCC into that one function. What's worse, we'll then proceed to the next function, and do the exact same thing except we'll skip the first function, and so on. And at each step, we'll also make some of the constant factors larger, which is awesome. The fix in this patch is the obvious one which makes the new PM's inliner use the same technique used by the old PM: consider all the call edges across the entire SCC before beginning to process call edges introduced by inlining. The result of this is essentially to distribute the inlining across the SCC so that every function incrementally grows toward the inline thresholds rather than allowing the inliner to grow one of the functions vastly beyond the threshold. The code for this is a bit awkward, but it works out OK. We could consider in the future doing something more powerful here such as prioritized order (via lowest cost and/or profile info) and/or a code-growth budget per SCC. However, both of those would require really substantial work both to design the system in a way that wouldn't break really useful abstraction decomposition properties of the current inliner and to be tuned across a reasonably diverse set of code and workloads. It also seems really risky in many ways. I have only found a single real-world file that triggers the bad behavior here and it is generated code that has a pretty pathological pattern. I'm not worried about the inliner not doing an *awesome* job here as long as it does *ok*. On the other hand, the cases that will be tricky to get right in a prioritized scheme with a budget will be more common and idiomatic for at least some frontends (C++ and Rust at least). So while these approaches are still really interesting, I'm not in a huge rush to go after them. Staying even closer to the existing PM's behavior, especially when this easy to do, seems like the right short to medium term approach. I don't really have a test case that makes sense yet... I'll try to find a variant of the IR produced by the monster template metaprogram that is both small enough to be sane and large enough to clearly show when we get this wrong in the future. But I'm not confident this exists. And the behavior change here *should* be unobservable without snooping on debug logging. So there isn't really much to test. The test case updates come from two incidental changes: 1) We now visit functions in an SCC in the opposite order. I don't think there really is a "right" order here, so I just update the test cases. 2) We no longer compute some analyses when an SCC has no call instructions that we consider for inlining. llvm-svn: 297374
* [mips] Fix return loweringSimon Dardis2017-03-091-3/+12
| | | | | | | | | | | | | Fix a machine verifier issue where a instruction was using a invalid register. The return pseudo is expanded and has the return address register added to it. The return register may have been spuriously mark as killed earlier. This partially resolves PR/27458 Thanks to Quentin Colombet for reporting the issue! llvm-svn: 297372
* [SSP] In opt remarks, stream Function directlyAdam Nemet2017-03-091-9/+14
| | | | | | | With this, it shows up as an attribute in YAML and non-printable characters are properly removed by GlobalValue::getRealLinkageName. llvm-svn: 297362
* [SLP] Mark values in Dot that need to be extractedAdam Nemet2017-03-091-3/+9
| | | | llvm-svn: 297361
* DAG: Check no signed zeros instead of unsafe math attributeMatt Arsenault2017-03-091-2/+2
| | | | llvm-svn: 297354
* WholeProgramDevirt: Implement importing for uniform ret val opt.Peter Collingbourne2017-03-091-0/+18
| | | | | | Differential Revision: https://reviews.llvm.org/D29854 llvm-svn: 297350
* WholeProgramDevirt: Implement importing for single-impl devirtualization.Peter Collingbourne2017-03-091-11/+47
| | | | | | Differential Revision: https://reviews.llvm.org/D29844 llvm-svn: 297333
* Perform symbol binding for .symver versioned symbolsTeresa Johnson2017-03-098-11/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In a .symver assembler directive like: .symver name, name2@@nodename "name2@@nodename" should get the same symbol binding as "name". While the ELF object writer is updating the symbol binding for .symver aliases before emitting the object file, not doing so when the module inline assembly is handled by the RecordStreamer is causing the wrong behavior in *LTO mode. E.g. when "name" is global, "name2@@nodename" must also be marked as global. Otherwise, the symbol is skipped when iterating over the LTO InputFile symbols (InputFile::Symbol::shouldSkip). So, for example, when performing any *LTO via the gold-plugin, the versioned symbol definition is not recorded by the plugin and passed back to the linker. If the object was in an archive, and there were no other symbols needed from that object, the object would not be included in the final link and references to the versioned symbol are undefined. The llvm-lto2 tests added will give an error about an unused symbol resolution without the fix. Reviewers: rafael, pcc Reviewed By: pcc Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D30485 llvm-svn: 297332
* AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not ↵Changpeng Fang2017-03-091-0/+4
| | | | | | | | | | | | vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 llvm-svn: 297328
* Don't merge global constants with non-dbg metadata.Evgeniy Stepanov2017-03-091-0/+26
| | | | | | | | | | | !type metadata can not be dropped. An alternative to this is adding !type metadata from the replaced globals to the replacement, but that may weaken type tests and make them slower at the same time. The merged global gets !dbg metadata from replaced globals, and can end up with multiple debug locations. llvm-svn: 297327
* [DebugInfo] Emit address space with DW_AT_address_class attribute for ↵Konstantin Zhuravlyov2017-03-089-32/+85
| | | | | | | | pointer and reference types Differential Revision: https://reviews.llvm.org/D29670 llvm-svn: 297320
* [Outliner] Fix memory leak in suffix tree.Jessica Paquette2017-03-081-9/+9
| | | | | | | | This commit changes the BumpPtrAllocator for suffix tree nodes to a SpecificBumpPtrAllocator. Before, node construction was leaking memory because of the DenseMap in SuffixTreeNodes. Changing this to a SpecificBumpPtrAllocator allows this memory to properly be released. llvm-svn: 297319
* [ConstantFold] Fix defect in constant folding computation for GEPJaved Absar2017-03-081-1/+2
| | | | | | | | | | | | | | | | | When the array indexes are all determined by GVN to be constants, a call is made to constant-folding to optimize/simplify the address computation. The constant-folding, however, makes a mistake in that it sometimes reads back stale Idxs instead of NewIdxs, that it re-computed in previous iteration. This leads to incorrect addresses coming out of constant-folding to GEP. A test case is included. The error is only triggered when indexes have particular patterns that the stale/new index updates interplay matters. Reviewers: Daniel Berlin Differential Revision: https://reviews.llvm.org/D30642 llvm-svn: 297317
* [Support] Add llvm::sys::fs::remove_directories.Zachary Turner2017-03-083-5/+75
| | | | | | | | | | | | | | | | | | | | | We already have a function create_directories() which can create an entire tree, and remove() which can remove an empty directory, but we do not have remove_directories() which can remove an entire tree. This patch adds such a function. Because removing a directory tree can have dangerous consequences when the tree contains a directory symlink, the patch here updates the existing directory_iterator construct to optionally not follow symlinks (previously it would always follow symlinks). The delete algorithm uses this flag so that for symlinks, only the links are removed, and not the targets. On Windows this is implemented with SHFileOperation, which also does not recurse into symbolic links or junctions. Differential Revision: https://reviews.llvm.org/D30676 llvm-svn: 297314
* [MemCpyOpt] clang-format + trim the legacy pass. NFC.George Burgess IV2017-03-081-39/+23
| | | | | | | None of the declarations below `// Helper functions` seem to have definitions anymore. llvm-svn: 297309
* GlobalISel: correctly handle trivial fcmp predicates.Tim Northover2017-03-081-1/+4
| | | | | | | It makes sense to only do them once in IRTranslator rather than making everyone deal with them. llvm-svn: 297304
* [SLP] Visualize SLP trees with -view-slp-treeAdam Nemet2017-03-081-62/+167
| | | | | | | | | | | | | | | | | Analyzing larger trees is extremely difficult with the current debug output so this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data structure. We can now display the SLP trees with Graphviz as in https://reviews.llvm.org/F3132765. I decorated the graph where a value needs to be gathered for one reason or another. These are the red nodes. There are other improvement I am planning to make as I work through my case here. For example, I would also like to mark nodes that need to be extracted. Differential Revision: https://reviews.llvm.org/D30731 llvm-svn: 297303
* [LV] Select legal insert point when fixing first-order recurrencesMatthew Simpson2017-03-081-7/+9
| | | | | | | | | | Because IRBuilder performs constant-folding, it's not guaranteed that an instruction in the original loop map to an instruction in the vector loop. It could map to a constant vector instead. The handling of first-order recurrences was incorrectly making this assumption when setting the IRBuilder's insert point. llvm-svn: 297302
* [GlobalISel] Add default action for G_FNEGVolkan Keles2017-03-082-0/+36
| | | | | | | | | | | | | | Summary: rL297171 introduced G_FNEG for floating-point negation instruction and IRTranslator started to translate `FSUB -0.0, X` to `FNEG X`. This patch adds a default action for G_FNEG to avoid breaking existing targets. Reviewers: qcolombet, ab, kristof.beyls, t.p.northover, aditya_nandakumar, dsanders Reviewed By: qcolombet Subscribers: dberris, rovka, llvm-commits Differential Revision: https://reviews.llvm.org/D30721 llvm-svn: 297301
* Resubmit FileSystem changes.Zachary Turner2017-03-081-0/+12
| | | | | | | | | | This was originall reverted due to some test failures in ModuleCache and TestCompDirSymlink. These issues have all been resolved and the code now passes all tests. Differential Revision: https://reviews.llvm.org/D30698 llvm-svn: 297300
* [Hexagon] Use correct offset when extracting from the high wordKrzysztof Parzyszek2017-03-081-0/+1
| | | | | | | | When extracting a bitfield from the high register in a register pair, the final offset should be relative to the high register (for 32-bit extracts). llvm-svn: 297288
* [Sparc] Check register use with isPhysRegUsed() instead of reg_nodbg_empty()Daniel Cederman2017-03-081-6/+5
| | | | | | | | | | | | | | | | | | Summary: By using reg_nodbg_empty() to determine if a function can be treated as a leaf function or not, we miss the case when the register pair L0_L1 is used but not L0 by itself. This has the effect that use_all_i32_regs(), a test in reserved-regs.ll which tries to use all registers, gets treated as a leaf function. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: davide, RKSimon, sepavloff, llvm-commits Differential Revision: https://reviews.llvm.org/D27089 llvm-svn: 297285
* [JumpThread] Use AA in SimplifyPartiallyRedundantLoad()Jun Bum Lim2017-03-081-11/+20
| | | | | | | | | | | | | | Summary: Use AA when scanning to find an available load value. Reviewers: rengolin, mcrosier, hfinkel, trentxintong, dberlin Reviewed By: rengolin, dberlin Subscribers: aemerson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D30352 llvm-svn: 297284
* [InstCombine] avoid crashing on shuffle shrinkage when input type is not ↵Sanjay Patel2017-03-081-1/+2
| | | | | | same as result type llvm-svn: 297280
* [LoopRotate] Propagate dbg.value intrinsicsSam Parker2017-03-081-3/+45
| | | | | | | | | | | | Recommitting patch which was previously reverted in r297159. These changes should address the casting issues. The original patch enables dbg.value intrinsics to be attached to newly inserted PHI nodes. Differential Review: https://reviews.llvm.org/D30701 llvm-svn: 297269
* [X86][SSE] combineX86ShufflesRecursively can handle shuffle masks up to 64 ↵Simon Pilgrim2017-03-081-8/+7
| | | | | | | | elements wide By defining the mask types as SmallVector<int, 16> we were causing a lot of unnecessary heap usage. llvm-svn: 297267
* [SLP] Fixed non-deterministic behavior in Loop Vectorizer.Amjad Aboud2017-03-081-9/+11
| | | | | | Differential Revision: https://reviews.llvm.org/D30638 llvm-svn: 297257
* Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to ↵Tim Shen2017-03-081-5/+43
| | | | | | | | | | | | | reduce stack frame size"" After inspection, it's an UB in our code base. Someone cast a var-arg function pointer to a non-var-arg one. :/ Re-commit r296771 to continue testing on the patch. Sorry for the trouble! llvm-svn: 297256
* Handle UnreachableInst in isGuaranteedToTransferExecutionToSuccessorSebastian Pop2017-03-081-0/+2
| | | | | | | | | | | A block with an UnreachableInst does not transfer execution to a successor. The problem was exposed by GVN-hoist. This patch fixes bug 32153. Patch by Aditya Kumar. Differential Revision: https://reviews.llvm.org/D30667 llvm-svn: 297254
* [SCCP] Merge markOverdefined and markAnythingOverdefined.Davide Italiano2017-03-081-23/+17
| | | | | | There's no need to have two separate APIs. llvm-svn: 297253
* [NVPTX] Remove unnecessary isImageReadoOnly(), isImageWriteOnly(), & ↵Justin Lebar2017-03-081-3/+1
| | | | | | | | | | | | isImageReadWrite calls This is repetition of isImage() function in NVPTXUtilities.cpp. Patch by Briana Grace! Differential Revision: https://reviews.llvm.org/D30706 llvm-svn: 297252
* AMDGPU: Don't wait at end of block with a trivial successorMatt Arsenault2017-03-081-2/+14
| | | | | | | | | | If there is only one successor, and that successor only has one predecessor the wait can obviously be delayed until uses or the end of the next block. This avoids code quality regressions when there are trivial fallthrough blocks inserted for structurization. llvm-svn: 297251
OpenPOWER on IntegriCloud