summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* [LoopVectorize] Use AA to partition potential dependency checksHal Finkel2014-07-209-9/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to this change, the loop vectorizer did not make use of the alias analysis infrastructure. Instead, it performed memory dependence analysis using ScalarEvolution-based linear dependence checks within equivalence classes derived from the results of ValueTracking's GetUnderlyingObjects. Unfortunately, this meant that: 1. The loop vectorizer had logic that essentially duplicated that in BasicAA for aliasing based on identified objects. 2. The loop vectorizer could not partition the space of dependency checks based on information only easily available from within AA (TBAA metadata is currently the prime example). This means, for example, regardless of whether -fno-strict-aliasing was provided, the vectorizer would only vectorize this loop with a runtime memory-overlap check: void foo(int *a, float *b) { for (int i = 0; i < 1600; ++i) a[i] = b[i]; } This is suboptimal because the TBAA metadata already provides the information necessary to show that this check unnecessary. Of course, the vectorizer has a limit on the number of such checks it will insert, so in practice, ignoring TBAA means not vectorizing more-complicated loops that we should. This change causes the vectorizer to use an AliasSetTracker to keep track of the pointers in the loop. The resulting alias sets are then used to partition the space of dependency checks, and potential runtime checks; this results in more-efficient vectorizations. When pointer locations are added to the AliasSetTracker, two things are done: 1. The location size is set to UnknownSize (otherwise you'd not catch inter-iteration dependencies) 2. For instructions in blocks that would need to be predicated, TBAA is removed (because the metadata might have a control dependency on the condition being speculated). For non-predicated blocks, you can leave the TBAA metadata. This is safe because you can't have an iteration dependency on the TBAA metadata (if you did, and you unrolled sufficiently, you'd end up with the same pointer value used by two accesses that TBAA says should not alias, and that would yield undefined behavior). llvm-svn: 213486
* [LoopVectorize] Propagate known metadata to vectorized instructionsHal Finkel2014-07-191-0/+44
| | | | | | | | | | | | | There are some kinds of metadata that are safe to propagate from the scalar instructions to the vector instructions (fpmath and tbaa currently). Regarding TBAA, one might worry about propagating it on if-converted loads and stores, because the metadata might have had a control dependency on the condition, and thus actually aliased with some other non-speculated memory access when the condition was false. However, this would be caught by the runtime overlap checks. llvm-svn: 213452
* Make Value::isDereferenceablePointer handle offsets to pointer types with ↵Hal Finkel2014-07-191-0/+82
| | | | | | | | | | | | | | | | | | dereferenceable attributes When we have a parameter (or call site return) with a dereferenceable attribute, it can specify the size of an array pointed to by that parameter. If we have a value for which we can accumulate a constant offset to such a parameter, then we can use that offset in a direct comparison with the size specified by the dereferenceable attribute. This enables us to handle cases like this: int foo(int a[static 3]) { return a[2]; /* this is always dereferenceable */ } llvm-svn: 213447
* Remove unroll pragma metadata after it is used.Mark Heffernan2014-07-182-0/+73
| | | | llvm-svn: 213412
* MergedLoadStoreMotion passGerolf Hoflehner2014-07-181-0/+84
| | | | | | | | | | | Merges equivalent loads on both sides of a hammock/diamond and hoists into into the header. Merges equivalent stores on both sides of a hammock/diamond and sinks it to the footer. Can enable if conversion and tolerate better load misses and store operand latencies. llvm-svn: 213396
* Add a dereferenceable attributeHal Finkel2014-07-182-0/+120
| | | | | | | | | This attribute indicates that the parameter or return pointer is dereferenceable. Practically speaking, loads from such a pointer within the associated byte range are safe to speculatively execute. Such pointer parameters are common in source languages (C++ references, for example). llvm-svn: 213385
* R600: Implement TTI:getPopcntSupportMatt Arsenault2014-07-182-0/+107
| | | | | | | The test is just copied from X86, and I don't know of a better way to test it. llvm-svn: 213351
* Move ashr optimization from InstCombineShift to InstSimplify.Suyog Sarda2014-07-172-8/+10
| | | | | | | | | Refactor code, no functionality change, test case moved from instcombine to instsimplify. Differential Revision: http://reviews.llvm.org/D4102 llvm-svn: 213231
* Improve BasicAA CS-CS queries (redux)Hal Finkel2014-07-171-0/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303." with a fix for the bug in pr20303. As it turned out, the relevant code was both wrong and over-conservative (because, as with the code it replaced, it would return the overall ModRef mask even if just Ref had been implied by the argument aliasing results). Hopefully, this correctly fixes both problems. Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've cleaned up a little and added in DSE's test directory). The BasicAA test has also been updated to check for this error. Original commit message: BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 213219
* Partially revert r210444 due to performance regressionJingyue Wu2014-07-162-29/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Converting outermost zext(a) to sext(a) causes worse code when the computation of zext(a) could be reused. For example, after converting ... = array[zext(a)] ... = array[zext(a) + 1] to ... = array[sext(a)] ... = array[zext(a) + 1], the program computes sext(a), which is actually unnecessary. I added one test in split-gep-and-gvn.ll to illustrate this scenario. Also, with r211281 and r211084, we annotate more "nuw" tags to computation involving CUDA intrinsics such as threadIdx.x. These annotations help with splitting GEP a lot, rendering the benefit we get from this reverted optimization only marginal. Test Plan: make check-all Reviewers: eliben, meheff Reviewed By: meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D4542 llvm-svn: 213209
* [NVPTX] Rename registers %fl -> %fd and %rl -> %rdJustin Holewinski2014-07-161-3/+3
| | | | | | This matches the internal behavior of NVIDIA tools like libnvvm. llvm-svn: 213168
* Emit warnings if vectorization is forced and fails.Tyler Nowicki2014-07-163-0/+103
| | | | | | | | | | | This patch modifies the existing DiagnosticInfo system to create a generic base class that is inherited to produce diagnostic-based warnings. This is used by the loop vectorizer to trigger a warning when vectorization is forced and fails. Several tests have been added to verify this behavior. Reviewed by: Arnold Schwaighofer llvm-svn: 213110
* MergeFunc patch from Björn Steinbrink.Stepan Dyatkovskiy2014-07-151-0/+91
| | | | | | | Phabricator ticket: D4246, Don't merge functions with different range metadata on call/invoke. Thanks! llvm-svn: 213060
* Teach computeKnownBits to look through addrspacecast.Matt Arsenault2014-07-151-1/+30
| | | | | | This fixes inferring alignment through an addrspacecast. llvm-svn: 213030
* Teach GetUnderlyingObject / BasicAA about addrspacecastMatt Arsenault2014-07-152-3/+36
| | | | llvm-svn: 213025
* Convert test to FileCheck.Matt Arsenault2014-07-151-13/+35
| | | | | | Check the individual test functions for more useful failure errors. llvm-svn: 213021
* Look through addrspacecast in IsConstantOffsetFromGlobalMatt Arsenault2014-07-141-0/+13
| | | | llvm-svn: 213000
* Look through addrspacecast in GetPointerBaseWithConstantOffsetMatt Arsenault2014-07-141-0/+13
| | | | llvm-svn: 212999
* Convert test to FileCheckMatt Arsenault2014-07-141-31/+52
| | | | llvm-svn: 212992
* Fix a test broken in r212981David Majnemer2014-07-141-1/+1
| | | | | | | @icmp_sdiv_neg1 should have referred to %a instead of %call, it was renamed at the last second. llvm-svn: 212983
* InstSimplify: Correct sdiv x / -1David Majnemer2014-07-141-0/+11
| | | | | | | | | | | Determining the bounds of x/ -1 would start off with us dividing it by INT_MIN. Suffice to say, this would not work very well. Instead, handle it upfront by checking for -1 and mapping it to the range: [INT_MIN + 1, INT_MAX. This means that the result of our division can be any value other than INT_MIN. llvm-svn: 212981
* InstSimplify: The upper bound of X / C was missing a rounding stepDavid Majnemer2014-07-141-0/+11
| | | | | | | | | | | | | | | | | | Summary: When calculating the upper bound of X / -8589934592, we would perform the following calculation: Floor[INT_MAX / 8589934592] However, flooring the result would make us wrongly come to the conclusion that 1073741824 was not in the set of possible values. Instead, use the ceiling of the result. Reviewers: nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4502 llvm-svn: 212976
* Look through addrspacecast when checking isDereferenceablePointerMatt Arsenault2014-07-141-0/+38
| | | | llvm-svn: 212971
* Don't eliminate memcpy's when the address of the pointer may itself be ↵Nick Lewycky2014-07-146-7/+29
| | | | | | relevant. Fixes PR18304. Patch by David Wiberg! llvm-svn: 212970
* When we sink an instruction, this can open up opportunity for the operands ↵Aditya Nandakumar2014-07-111-1/+1
| | | | | | to be sunk - add them to the worklist llvm-svn: 212847
* Added test for commit r212802 that was missingMarcello Maggioni2014-07-111-0/+40
| | | | llvm-svn: 212803
* InstCombine: Fix a crash in Descale for multiply-by-zeroDuncan P. N. Exon Smith2014-07-101-0/+21
| | | | | | | | | | Fix a crash in `InstCombiner::Descale()` when a multiply-by-zero gets created as an argument to a GEP partway through an iteration, causing -instcombine to optimize the GEP before the multiply. rdar://problem/17615671 llvm-svn: 212742
* A test case for not asserting in isDereferenceablePointer upon unsized typesHal Finkel2014-07-101-0/+41
| | | | | | This is the test case for r212687. llvm-svn: 212688
* Allow isDereferenceablePointer to look through some bitcastsHal Finkel2014-07-101-0/+160
| | | | | | | | | | | | | | | | isDereferenceablePointer should not give up upon encountering any bitcast. If we're casting from a pointer to a larger type to a pointer to a small type, we can continue by examining the bitcast's operand. This missing capability was noted in a comment in the function. In order for this to work, isDereferenceablePointer now takes an optional DataLayout pointer (essentially all callers already had such a pointer available). Most code uses isDereferenceablePointer though isSafeToSpeculativelyExecute (which already took an optional DataLayout pointer), and to enable the LICM test case, LICM needs to actually provide its DL pointer to isSafeToSpeculativelyExecute (which it was not doing previously). llvm-svn: 212686
* [X86] AVX512: Enable it in the Loop VectorizerAdam Nemet2014-07-091-0/+35
| | | | | | | | | | This lets us experiment with 512-bit vectorization without passing force-vector-width manually. The code generated for a simple integer memset loop is properly vectorized. Disassembly is still broken for it though :(. llvm-svn: 212634
* removed duplicate testcaseSanjay Patel2014-07-091-16/+0
| | | | llvm-svn: 212632
* Fix for PR20059 (instcombine reorders shufflevector after instruction that ↵Sanjay Patel2014-07-091-0/+32
| | | | | | | | | | | | may trap) In PR20059 ( http://llvm.org/pr20059 ), instcombine eliminates shuffles that are necessary before performing an operation that can trap (srem). This patch calls isSafeToSpeculativelyExecute() and bails out of the optimization in SimplifyVectorOp() if needed. Differential Revision: http://reviews.llvm.org/D4424 llvm-svn: 212629
* Revert "GlobalDCE: Delete available_externally initializers if it allows ↵Pete Cooper2014-07-081-34/+1
| | | | | | | | | | removing the value the initializer is referring to." This reverts commit 5b55a47e94e28fbb56d0cd5d72c3db9105c15b4c. A test case was found to crash after this was applied. I'll file a bug to track fixing this with the test case needed. llvm-svn: 212550
* Fix for PR17073 ( http://llvm.org/pr17073 ), simplifycfg illegally hoists an ↵Sanjay Patel2014-07-071-0/+73
| | | | | | | | | | operation in a phi node that can trap. This patch adds to an existing loop over phi nodes in SimplifyCondBranchToCondBranch() to check for trapping ops and bails out of the optimization if we find one of those. The test cases verify that trapping ops are not hoisted and non-trapping ops are still optimized as expected. llvm-svn: 212490
* CodeGen: it turns out that NAND is not the same thing as BIC. At all.Tim Northover2014-07-071-2/+2
| | | | | | | | | | | We've been performing the wrong operation on ARM for "atomicrmw nand" for years, since "a NAND b" is "~(a & b)" rather than ARM's very tempting "a & ~b". This bled over into the generic expansion pass. So I assume no-one has ever actually tried to do an atomic nand in the real world. Oh well. llvm-svn: 212443
* IR: Fold away compares between GV GEPs and GVsDavid Majnemer2014-07-043-9/+5
| | | | | | | | | A GEP of a non-weak global variable will not be equivalent to another non-weak global variable or a GEP of such a variable. Differential Revision: http://reviews.llvm.org/D4238 llvm-svn: 212360
* GlobalDCE: Delete available_externally initializers if it allows removing ↵Benjamin Kramer2014-07-041-1/+34
| | | | | | | | | | | | the value the initializer is referring to. This is useful for functions that are not actually available externally but referenced by a vtable of some kind. Clang emits functions like this for the MS ABI. PR20182. llvm-svn: 212337
* InstCombine: Strength reduce sadd.with.overflow into a regular nsw add if we ↵Benjamin Kramer2014-07-041-0/+13
| | | | | | | | can prove that it cannot overflow. PR20194 llvm-svn: 212331
* InstSimplify: Fix a bug when INT_MIN is in a sdivDavid Majnemer2014-07-041-0/+11
| | | | | | | | | | | | | When INT_MIN is the numerator in a sdiv, we would not properly handle overflow when calculating the bounds of possible values; abs(INT_MIN) is not a meaningful number. Instead, check and handle INT_MIN by reasoning that the largest value is INT_MIN/-2 and the smallest value is INT_MIN. This fixes PR20199. llvm-svn: 212307
* Add new lines to debugging information.Richard Trieu2014-07-031-0/+9
| | | | | | Differential Revision: http://reviews.llvm.org/D4262 llvm-svn: 212250
* InstCombine: Optimize x/INT_MIN to x==INT_MINDavid Majnemer2014-07-021-6/+6
| | | | | | | The result of x/INT_MIN is either 0 or 1, we can just use an icmp instead. llvm-svn: 212167
* InstCombine: Add a vector variant test for PR20186David Majnemer2014-07-021-6/+7
| | | | | | | No functional change, just adding more test coverage that was meant to go in with r212164. llvm-svn: 212165
* InstCombine: Don't turn -(x/INT_MIN) -> x/INT_MINDavid Majnemer2014-07-021-0/+19
| | | | | | | | | It is not safe to negate the smallest signed integer, doing so yields the same number back. This fixes PR20186. llvm-svn: 212164
* DebugInfo: Keep track of subprograms who's arguments have been promoted.David Blaikie2014-07-011-4/+9
| | | | | | | | | | | | Matching behavior with DeadArgumentElimination (and leveraging some now-common infrastructure), keep track of the function from debug info metadata if arguments are promoted. This may produce interesting debug info - since the arguments may be missing or of different types... but at least backtraces, inlining, etc, will be correct. llvm-svn: 212128
* GlobalOpt: Don't swap private for internal linkageDavid Majnemer2014-07-011-4/+14
| | | | | | | | | There were transforms whose *intent* was to downgrade the linkage of external objects to have internal linkage. However, it fired on things with private linkage as well. llvm-svn: 212104
* GlobalOpt: FileCheck-ize testDavid Majnemer2014-07-011-1/+3
| | | | | | No functionality change. llvm-svn: 212103
* GlobalOpt: Handle non-zero offsets for aliasesDavid Majnemer2014-07-011-0/+4
| | | | | | | An alias with an aliasee of a non-zero GEP is not trivially replacable with it's aliasee. llvm-svn: 212079
* Suppress inlining when the block address is takenGerolf Hoflehner2014-07-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inlining functions with block addresses can cause many problem and requires a rich infrastructure to support including escape analysis. At this point the safest approach to address these problems is by blocking inlining from happening. Background: There have been reports on Ruby segmentation faults triggered by inlining functions with block addresses like //Ruby code snippet vm_exec_core() { finish_insn_seq_0 = &&INSN_LABEL_finish; INSN_LABEL_finish: ; } This kind of scenario can also happen when LLVM picks a subset of blocks for inlining, which is the case with the actual code in the Ruby environment. LLVM suppresses inlining for such functions when there is an indirect branch. The attached patch does so even when there is no indirect branch. Note that user code like above would not make much sense: using the global for jumping across function boundaries would be illegal. Why was there a segfault: In the snipped above the block with the label is recognized as dead So it is eliminated. Instead of a block address the cloner stores a constant (sic!) into the global resulting in the segfault (when the global is used in a goto). Why had it worked in the past then: By luck. In older versions vm_exec_core was also inlined but the label address used was the block label address in vm_exec_core. So the global jump ended up in the original function rather than in the caller which accidentally happened to work. Test case ./tools/clang/test/CodeGen/indirect-goto.c will fail as a result of this commit. rdar://17245966 llvm-svn: 212077
* Convert some byval argpromotion grep tests to FileCheckReid Kleckner2014-06-303-44/+58
| | | | | | | Surprisingly, the i32* byval parameter is not transformed by argpromotion. llvm-svn: 212067
* DebugInfo: Preserve debug location information when transforming a call into ↵David Blaikie2014-06-301-0/+37
| | | | | | | | | | | an invoke during inlining. This both improves basic debug info quality, but also fixes a larger hole whenever we inline a call/invoke without a location (debug info for the entire inlining is lost and other badness that the debug info emission code is currently working around but shouldn't have to). llvm-svn: 212065
OpenPOWER on IntegriCloud