summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [ValueTracking] Extract isKnownPositive [NFCI]Philip Reames2016-03-091-2/+2
| | | | | | Extract out a generic interface from a recently landed patch and document a TODO in case compile time becomes a problem. llvm-svn: 263062
* [InstCombine] (icmp sgt smin(PosA, B) 0) -> (icmp sgt B 0)Philip Reames2016-03-091-0/+13
| | | | | | | | When checking whether an smin is positive, we can move the comparison to one of the inputs if the other is known positive. If the known positive one is the min, then the other can't be negative. If the other is the min, then we compute the min. Differential Revision: http://reviews.llvm.org/D17873 llvm-svn: 263059
* [LLE] Add missing check for unit strideAdam Nemet2016-03-091-5/+13
| | | | | | | | | | I somehow missed this. The case in GCC (global_alloc) was similar to the new testcase except it had an array of structs rather than a two dimensional array. Fixes RP26885. llvm-svn: 263058
* InstCombine: Restrict computeKnownBits() on all Values to OptLevel > 2Matthias Braun2016-03-093-29/+48
| | | | | | | | | | | | | | | | | | As part of r251146 InstCombine was extended to call computeKnownBits on every value in the function to determine whether it happens to be constant. This increases typical compiletime by 1-3% (5% in irgen+opt time) in my measurements. On the other hand this case did not trigger once in the whole llvm-testsuite. This patch introduces the notion of ExpensiveCombines which are only enabled for OptLevel > 2. I removed the check in InstructionSimplify as that is called from various places where the OptLevel is not known but given the rarity of the situation I think a check in InstCombine is enough. Differential Revision: http://reviews.llvm.org/D16835 llvm-svn: 263047
* Reland r262337 "calculate builtin_object_size if arg is a removable pointer"Petar Jovanovic2016-03-091-8/+25
| | | | | | | | | | | | | | | | | | Original commit message: calculate builtin_object_size if argument is a removable pointer This patch fixes calculating correct value for builtin_object_size function when pointer is used only in builtin_object_size function call and never after that. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D17337 Reland the original change with a small modification (first do a null check and then do the cast) to satisfy ubsan. llvm-svn: 263011
* [LoopDataPrefetch] Add stats and debug outputAdam Nemet2016-03-091-0/+9
| | | | llvm-svn: 262998
* Return StringRef instead of a naked char*; NFCSanjoy Das2016-03-091-2/+2
| | | | llvm-svn: 262989
* [IRCE] Reflow comments; NFCSanjoy Das2016-03-091-4/+2
| | | | llvm-svn: 262988
* FunctionIndex is not optional for renameModuleForThinLTO(), make it a ↵Mehdi Amini2016-03-092-3/+3
| | | | | | | reference (NFC) From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 262976
* fix variable name; NFCSanjay Patel2016-03-081-3/+3
| | | | llvm-svn: 262953
* use range-based loop; NFCISanjay Patel2016-03-081-3/+2
| | | | llvm-svn: 262952
* rangify, fix function names; NFCISanjay Patel2016-03-081-27/+22
| | | | llvm-svn: 262940
* don't repeat function names in documentation comments; NFCSanjay Patel2016-03-081-4/+4
| | | | llvm-svn: 262937
* Revert "[InstCombine] Combine A->B->A BitCast"Junmo Park2016-03-082-104/+0
| | | | | | This reverts commit r262670 due to compile failure. llvm-svn: 262916
* Fix evaluation order. Spotted by Alexander Riccio!Peter Collingbourne2016-03-081-1/+1
| | | | llvm-svn: 262907
* Revert revisions 262636, 262643, 262679, and 262682.Easwaran Raman2016-03-084-144/+30
| | | | llvm-svn: 262883
* [tsan] Add support for pointer typed atomic stores, loads, and cmpxchgAnna Zaks2016-03-071-8/+31
| | | | | | | | | | TSan instrumentation functions for atomic stores, loads, and cmpxchg work on integer value types. This patch adds casts before calling TSan instrumentation functions in cases where the value is a pointer. Differential Revision: http://reviews.llvm.org/D17833 llvm-svn: 262876
* [LoopDataPrefetch] If prefetch distance is not set, skip passAdam Nemet2016-03-071-2/+5
| | | | | | | | | | | | | | This lets select sub-targets enable this pass. The patch implements the idea from the recent llvm-dev thread: http://thread.gmane.org/gmane.comp.compilers.llvm.devel/94925 The goal is to enable the LoopDataPrefetch pass for the Cyclone sub-target only within Aarch64. Positive and negative tests will be included in an upcoming patch that enables selective prefetching of large-strided accesses on Cyclone. llvm-svn: 262844
* Revert "Enable LoopLoadElimination by default"Adam Nemet2016-03-071-2/+2
| | | | | | | | | | This reverts commit r262250. It causes SPEC2006/gcc to generate wrong result (166.s) in AArch64 when running with *ref* data set. The error happens with "-Ofast -flto -fuse-ld=gold" or "-O3 -fno-strict-aliasing". llvm-svn: 262839
* [DFSan] Remove an overly aggressive assert reported in PR26068.Chandler Carruth2016-03-071-4/+0
| | | | | | | | | | | | This code has been successfully used to bootstrap libc++ in a no-asserts mode for a very long time, so the code that follows cannot be completely incorrect. I've added a test that shows the current behavior for this kind of code with DFSan. If it is desirable for DFSan to do something special when processing an invoke of a variadic function, it can be added, but we shouldn't keep an assert that we've been ignoring due to release builds anyways. llvm-svn: 262829
* [PGO] Add a commandline option to control number of the VP annotation metadata.Rong Xu2016-03-041-2/+10
| | | | llvm-svn: 262750
* Fix a use-after-free bug introduced in r262636Easwaran Raman2016-03-042-6/+11
| | | | llvm-svn: 262679
* [InstCombine] Combine A->B->A BitCastGuozhi Wei2016-03-032-0/+104
| | | | | | | | | | This patch enhances InstCombine to handle following case: A -> B bitcast PHI B -> A bitcast llvm-svn: 262670
* [InstCombine] transform bitcasted bitwise logic ops with constants (PR26702)Sanjay Patel2016-03-031-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given that we're not actually reducing the instruction count in the included regression tests, I think we would call this a canonicalization step. The motivation comes from the example in PR26702: https://llvm.org/bugs/show_bug.cgi?id=26702 If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable example of: define <4 x i32> @is_negative(<4 x i32> %x) { %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> %not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1> %bc = bitcast <4 x i32> %not to <2 x i64> %notnot = xor <2 x i64> %bc, <i64 -1, i64 -1> %bc2 = bitcast <2 x i64> %notnot to <4 x i32> ret <4 x i32> %bc2 } Simplifies to the expected: define <4 x i32> @is_negative(<4 x i32> %x) { %lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> ret <4 x i32> %lobit } Differential Revision: http://reviews.llvm.org/D17583 llvm-svn: 262645
* Infrastructure for PGO enhancements in inlinerEaswaran Raman2016-03-034-29/+138
| | | | | | | | | | | | This patch provides the following infrastructure for PGO enhancements in inliner: Enable the use of block level profile information in inliner Incremental update of block frequency information during inlining Update the function entry counts of callees when they get inlined into callers. Differential Revision: http://reviews.llvm.org/D16381 llvm-svn: 262636
* Use LineLocation instead of CallsiteLocation to index callsite profile.Dehao Chen2016-03-031-14/+6
| | | | | | | | | | | | Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples). Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17827 llvm-svn: 262634
* [LoopUtils, LV] Fix PR26734Matthew Simpson2016-03-031-1/+1
| | | | | | | | The vectorization of first-order recurrences (r261346) caused PR26734. When detecting these recurrences, we need to ensure that the previous value is actually defined inside the loop. This patch includes the fix and test case. llvm-svn: 262624
* Explode store of arrays in instcombineAmaury Sechet2016-03-021-1/+33
| | | | | | | | | | | | Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized. Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17828 llvm-svn: 262530
* Unpack array of all sizes in InstCombineAmaury Sechet2016-03-021-5/+38
| | | | | | | | | | | | Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements. Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15890 llvm-svn: 262521
* Really fix ASAN leak/etc issues with MemorySSA unittestsDaniel Berlin2016-03-021-3/+2
| | | | llvm-svn: 262519
* Revert "Fix ASAN detected errors in code and test" (it was not meant to be ↵Daniel Berlin2016-03-021-2/+3
| | | | | | | | committed yet) This reverts commit 890bbccd600ba1eb050353d06a29650ad0f2eb95. llvm-svn: 262512
* Fix ASAN detected errors in code and testDaniel Berlin2016-03-021-3/+2
| | | | llvm-svn: 262511
* [AA] Hoist the logic to reformulate various AA queries in terms of otherChandler Carruth2016-03-023-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | parts of the AA interface out of the base class of every single AA result object. Because this logic reformulates the query in terms of some other aspect of the API, it would easily cause O(n^2) query patterns in alias analysis. These could in turn be magnified further based on the number of call arguments, and then further based on the number of AA queries made for a particular call. This ended up causing problems for Rust that were actually noticable enough to get a bug (PR26564) and probably other places as well. When originally re-working the AA infrastructure, the desire was to regularize the pattern of refinement without losing any generality. While I think it was successful, that is clearly proving to be too costly. And the cost is needless: we gain no actual improvement for this generality of making a direct query to tbaa actually be able to re-use some other alias analysis's refinement logic for one of the other APIs, or some such. In short, this is entirely wasted work. To the extent possible, delegation to other API surfaces should be done at the aggregation layer so that we can avoid re-walking the aggregation. In fact, this significantly simplifies the logic as we no longer need to smuggle the aggregation layer into each alias analysis (or the TargetLibraryInfo into each alias analysis just so we can form argument memory locations!). However, we also have some delegation logic inside of BasicAA and some of it even makes sense. When the delegation logic is baking in specific knowledge of aliasing properties of the LLVM IR, as opposed to simply reformulating the query to utilize a different alias analysis interface entry point, it makes a lot of sense to restrict that logic to a different layer such as BasicAA. So one aspect of the delegation that was in every AA base class is that when we don't have operand bundles, we re-use function AA results as a fallback for callsite alias results. This relies on the IR properties of calls and functions w.r.t. aliasing, and so seems a better fit to BasicAA. I've lifted the logic up to that point where it seems to be a natural fit. This still does a bit of redundant work (we query function attributes twice, once via the callsite and once via the function AA query) but it is *exactly* twice here, no more. The end result is that all of the delegation logic is hoisted out of the base class and into either the aggregation layer when it is a pure retargeting to a different API surface, or into BasicAA when it relies on the IR's aliasing properties. This should fix the quadratic query pattern reported in PR26564, although I don't have a stand-alone test case to reproduce it. It also seems general goodness. Now the numerous AAs that don't need target library info don't carry it around and depend on it. I think I can even rip out the general access to the aggregation layer and only expose that in BasicAA as it is the only place where we re-query in that manner. However, this is a non-trivial change to the AA infrastructure so I want to get some additional eyes on this before it lands. Sadly, it can't wait long because we should really cherry pick this into 3.8 if we're going to go this route. Differential Revision: http://reviews.llvm.org/D17329 llvm-svn: 262490
* Attempt to fix ASAN failure in a MemorySSA test.George Burgess IV2016-03-021-4/+4
| | | | llvm-svn: 262452
* revert r262424 because there's a *clang test* for AArch64 that checks -O3 ↵Sanjay Patel2016-03-021-17/+5
| | | | | | | | asm output that is broken by this change llvm-svn: 262440
* [InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to ↵Sanjay Patel2016-03-011-5/+17
| | | | | | | | | | | | | | | | | | shifts (PR26701) As noted in the code comment, I don't think we can do the same transform that we do for *scalar* integers comparisons to *vector* integers comparisons because it might pessimize the general case. Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT for integer vectors. But we should now recognize all the variants of this construct and produce the optimal code for the cases shown in: https://llvm.org/bugs/show_bug.cgi?id=26701 llvm-svn: 262424
* Perform InstructioinCombiningPass before SampleProfile pass.Dehao Chen2016-03-012-21/+4
| | | | | | | | | | | | Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls. Reviewers: davidxl, dnovillo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17742 llvm-svn: 262419
* Fix an issue where fast math flags were dropped during scalarization.Owen Anderson2016-03-011-2/+4
| | | | | | | Most portions of InstCombine properly propagate fast math flags, but apparently the vector scalarization section was overlooked. llvm-svn: 262376
* Add the beginnings of an update API for preserving MemorySSADaniel Berlin2016-03-011-0/+107
| | | | | | | | | | | | | | | | | | | Summary: This adds the beginning of an update API to preserve MemorySSA. In particular, this patch adds a way to remove memory SSA accesses when instructions are deleted. It also adds relevant unit testing infrastructure for MemorySSA's API. (There is an actual user of this API, i will make that diff dependent on this one. In practice, a ton of opt passes remove memory instructions, so it's hopefully an obviously useful API :P) Reviewers: hfinkel, reames, george.burgess.iv Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17157 llvm-svn: 262362
* Revert "calculate builtin_object_size if argument is a removable pointer"Petar Jovanovic2016-03-011-19/+6
| | | | | | | Revert r262337 as "check-llvm ubsan" step failed on sanitizer-x86_64-linux-fast buildbot. llvm-svn: 262349
* calculate builtin_object_size if argument is a removable pointerPetar Jovanovic2016-03-011-6/+19
| | | | | | | | | | | | This patch fixes calculating correct value for builtin_object_size function when pointer is used only in builtin_object_size function call and never after that. Patch by Strahinja Petrovic. Differential Revision: http://reviews.llvm.org/D17337 llvm-svn: 262337
* [x86, InstCombine] transform more x86 masked loads to LLVM intrinsicsSanjay Patel2016-02-291-1/+7
| | | | | | | Continuation of: http://reviews.llvm.org/rL262269 llvm-svn: 262273
* [LLE] Fix a commentAdam Nemet2016-02-291-3/+3
| | | | llvm-svn: 262270
* [x86, InstCombine] transform x86 AVX masked loads to LLVM intrinsicsSanjay Patel2016-02-291-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intended effect of this patch in conjunction with: http://reviews.llvm.org/rL259392 http://reviews.llvm.org/rL260145 is that customers using the AVX intrinsics in C will benefit from combines when the load mask is constant: __m128 mload_zeros(float *f) { return _mm_maskload_ps(f, _mm_set1_epi32(0)); } __m128 mload_fakeones(float *f) { return _mm_maskload_ps(f, _mm_set1_epi32(1)); } __m128 mload_ones(float *f) { return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000)); } __m128 mload_oneset(float *f) { return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0)); } ...so none of the above will actually generate a masked load for optimized code. This is the masked load counterpart to: http://reviews.llvm.org/rL262064 llvm-svn: 262269
* [LLE] Fix SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper with PollyAdam Nemet2016-02-291-0/+5
| | | | | | | | | We can actually have dependences between accesses with different underlying types. Bail in this case. A test will follow shortly. llvm-svn: 262267
* Enable LoopLoadElimination by defaultAdam Nemet2016-02-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: I re-benchmarked this and results are similar to original results in D13259: On ARM64: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27% SingleSource/Benchmarks/Polybench/stencils/adi -19.78% On x86: SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -27.14% And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop Distribution. In terms of compile time, there is ~5% increase on both SingleSource/Benchmarks/Misc/oourafft and SingleSource/Benchmarks/Linkpack/linkpack-pc. These are both very tiny loop-intensive programs where SCEV computations dominates compile time. The reason that time spent in SCEV increases has to do with the design of the old pass manager. If a transform pass does not preserve an analysis we *invalidate* the analysis even if there was *no* modification made by the transform pass. This means that currently we don't take advantage of LLE and LV sharing the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV for LLE. (There should be a way to work around this limitation in the case of SCEV and LAA since both compute things on demand and internally cache their result. Thus we could pretend that transform passes preserve these analyses and manually invalidate them upon actual modification. On the other hand the new pass manager is supposed to solve so I am not sure if this is worthwhile.) Reviewers: hfinkel, dberlin Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D16300 llvm-svn: 262250
* Minor code cleanup. NFCRong Xu2016-02-291-16/+15
| | | | llvm-svn: 262242
* Move discriminator assignment to the right place.Dehao Chen2016-02-291-4/+7
| | | | | | | | | | | | Summary: Now discriminator is assigned per-function instead of per-module. Reviewers: davidxl, dnovillo Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D17664 llvm-svn: 262240
* [PGO] Remove redundant counter copies for avail_extern functions.Xinliang David Li2016-02-271-3/+32
| | | | | | Differential Revision: http://reviews.llvm.org/D17654 llvm-svn: 262157
* Revert "[sancov] do not instrument nodes that are full pre-dominators"Renato Golin2016-02-271-22/+11
| | | | | | This reverts commit r262103, as it broke all ARM and AArch64 bots. llvm-svn: 262139
OpenPOWER on IntegriCloud