summaryrefslogtreecommitdiffstats
path: root/llvm/test/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* Use AA in LoadCombineHal Finkel2014-11-032-0/+83
| | | | | | | | | | | | | | LoadCombine can be smarter about aborting when a writing instruction is encountered, instead of aborting upon encountering any writing instruction, use an AliasSetTracker, and only abort when encountering some write that might alias with the loads that could potentially be combined. This was originally motivated by comments made (and a test case provided) by David Majnemer in response to PR21448. It turned out that LoadCombine was not responsible for that PR, but LoadCombine should also be improved so that unrelated stores (and @llvm.assume) don't interrupt load combining. llvm-svn: 221203
* InstCombine: Remove infinite loop caused by FoldOpIntoPhiDavid Majnemer2014-11-032-16/+23
| | | | | | | | | | | | | | FoldOpIntoPhi could create an infinite loop if the PHI could potentially reach a BB it was considering inserting instructions into. The instructions it would insert would eventually lead to other combines firing which would, again, lead to FoldOpIntoPhi firing. The solution is to handicap FoldOpIntoPhi so that it doesn't attempt to insert instructions that the PHI might reach. This fixes PR21377. llvm-svn: 221187
* EarlyCSE should ignore calls to @llvm.assumeHal Finkel2014-11-031-0/+36
| | | | | | | | | | | | EarlyCSE uses a simple generation scheme for handling memory-based dependencies, and calls to @llvm.assume (which are marked as writing to memory to ensure the preservation of control dependencies) disturb that scheme unnecessarily. Skipping calls to @llvm.assume is legal, and the alternative (adding AA calls in EarlyCSE) is likely undesirable (we have GVN for that). Fixes PR21448. llvm-svn: 221175
* [Reassociate] Canonicalize negative constants out of expressions.Chad Rosier2014-11-033-70/+236
| | | | | | | | | This gives CSE/GVN more options to eliminate duplicate expressions. This is a follow up patch to http://reviews.llvm.org/D4904. http://reviews.llvm.org/D5363 llvm-svn: 221171
* Normally an 'optnone' function goes through fast-isel, which does notPaul Robinson2014-11-031-0/+135
| | | | | | | | | | | | call DAGCombiner. But we ran into a case (on Windows) where the calling convention causes argument lowering to bail out of fast-isel, and we end up in CodeGenAndEmitDAG() which does run DAGCombiner. So, we need to make DAGCombiner check for 'optnone' after all. Commit includes the test that found this, plus another one that got missed in the original optnone work. llvm-svn: 221168
* InstCombine: Combine (X | Y) - X to (~X & Y)David Majnemer2014-11-031-0/+10
| | | | | | | | This implements the transformation from (X | Y) - X to (~X & Y). Differential Revision: http://reviews.llvm.org/D5791 llvm-svn: 221129
* Use Alias Analysis to hoist 2 loads from diamond to the common predecessor ↵Elena Demikhovsky2014-11-021-0/+64
| | | | | | | | | | basic block. Alias Analysis allows to detect real barriers for load hoisting. Review in http://reviews.llvm.org/D5991 llvm-svn: 221091
* InstCombine: Don't assume that m_ZExt matches an InstructionDavid Majnemer2014-11-011-0/+13
| | | | | | | | | | | | m_ZExt might bind against a ConstantExpr instead of an Instruction. Assuming this, using cast<Instruction>, results in InstCombine crashing. Instead, introduce ZExtOperator to bridge both Instruction and ConstantExpr ZExts. This fixes PR21445. llvm-svn: 221069
* InstCombine: Combine (X+cst) < 0 --> X < -cstDavid Majnemer2014-11-011-0/+32
| | | | | | | | | | | | | This can happen pretty often in code that looks like: int foo = bar - 1; if (foo < 0) do stuff In this case, bar < 1 is an equivalent condition. This transform requires that the add instruction be annotated with nsw. llvm-svn: 221045
* Correctly update dom-tree after loop vectorizer.Michael Zolotukhin2014-10-311-0/+142
| | | | llvm-svn: 221009
* [SCEV] Improve Scalar Evolution's use of no {un,}signed wrap flagsBradley Smith2014-10-311-0/+32
| | | | | | | | | | | | | | | In a case where we have a no {un,}signed wrap flag on the increment, if RHS - Start is constant then we can avoid inserting a max operation bewteen the two, since we can statically determine which is greater. This allows us to unroll loops such as: void testcase3(int v) { for (int i=v; i<=v+1; ++i) f(i); } llvm-svn: 220960
* llvm/test/Transforms/SampleProfile/syntax.ll: Relax MISSING-FILE not toNAKAMURA Takumi2014-10-301-1/+1
| | | | | | check locale-aware message catalog. llvm-svn: 220934
* Add handling for range metadata in ValueTracking isKnownNonZeroPhilip Reames2014-10-301-0/+61
| | | | | | | | | | | If we load from a location with range metadata, we can use information about the ranges of the loaded value for optimization purposes. This helps to remove redundant checks and canonicalize checks for other optimization passes. This particular patch checks whether a value is known to be non-zero from the range metadata. Currently, these tests are against InstCombine. In theory, all of these should be InstSimplify since we're not inserting any new instructions. Moving the code may follow in a separate change. Reviewed by: Hal Differential Revision: http://reviews.llvm.org/D5947 llvm-svn: 220925
* Add profile writing capabilities for sampling profiles.Diego Novillo2014-10-304-1/+168
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch finishes up support for handling sampling profiles in both text and binary formats. The new binary format uses uleb128 encoding to represent numeric values. This makes profiles files about 25% smaller. The profile writer class can write profiles in the existing text and the new binary format. In subsequent patches, I will add the capability to read (and perhaps write) profiles in the gcov format used by GCC. Additionally, I will be adding support in llvm-profdata to manipulate sampling profiles. There was a bit of refactoring needed to separate some code that was in the reader files, but is actually common to both the reader and writer. The new test checks that reading the same profile encoded as text or raw, produces the same results. Reviewers: bogner, dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6000 llvm-svn: 220915
* llvm/test/Transforms/LoopRotate/nosimplifylatch.ll: Fix possibly ↵NAKAMURA Takumi2014-10-291-36/+0
| | | | | | mis-repeatedly-pasted test. llvm-svn: 220880
* Test Case for r220872:Do not simplifyLatch for loops where hoisting ↵Yi Jiang2014-10-291-0/+70
| | | | | | increments couldresult in extra live range interferance llvm-svn: 220873
* Do not simplifyLatch for loops where hoisting increments couldresult in ↵Yi Jiang2014-10-292-4/+4
| | | | | | extra live range interferance llvm-svn: 220872
* test: tweak inlined-allocs testSaleem Abdulrasool2014-10-291-4/+2
| | | | | | | | Remove pointless checks for storage of uninteresting values. Ensure that we perform basic alias analysis to make the test more correct. Finally, apply a stylistic change to the test. llvm-svn: 220839
* Transforms: reapply SVN r219899Saleem Abdulrasool2014-10-285-38/+94
| | | | | | | | | | | | | | | | This restores the commit from SVN r219899 with an additional change to ensure that the CodeGen is correct for the case that was identified as being incorrect (originally PR7272). In the case that during inlining we need to synthesize a value on the stack (i.e. for passing a value byval), then any function involving that alloca must be stripped of its tailness as the restriction that it does not access the parent's stack no longer holds. Unfortunately, a single alloca can cause a rippling effect through out the inlining as the value may be aliased or may be mutated through an escaped external call. As such, we simply track if an alloca has been introduced in the frame during inlining, and strip any tail calls. llvm-svn: 220811
* InstCombine: Fix a combine assuming that icmp operands were integersDavid Majnemer2014-10-271-0/+9
| | | | | | | | | An icmp may have pointer arguments, it isn't limited to integers or vectors of integers. This fixes PR21388. llvm-svn: 220664
* [SeparateConstOffsetFromGEP] Fixed a bug related to unsigned moduloJingyue Wu2014-10-251-0/+24
| | | | | | | | | | | The dividend in "signed % unsigned" is treated as unsigned instead of signed, causing unexpected behavior such as -64 % (uint64_t)24 == 0. Added a regression test in split-gep.ll Patched by Hao Liu. llvm-svn: 220618
* [SeparateConstOffsetFromGEP] Fixed a bug in rebuilding OR expressionsJingyue Wu2014-10-251-0/+19
| | | | | | | | | | | | The two operands of the new OR expression should be NextInChain and TheOther instead of the two original operands. Added a regression test in split-gep.ll. Hao Liu reported this bug, and provded the test case and an initial patch. Thanks! llvm-svn: 220615
* Handle sqrt() shrinking in SimplifyLibCalls like any other callSanjay Patel2014-10-231-6/+19
| | | | | | | | | | | | | | | | | This patch removes a chunk of special case logic for folding (float)sqrt((double)x) -> sqrtf(x) in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls. No functional change intended, but I loosened the restriction on the existing sqrt testcases to allow for this optimization even without unsafe-fp-math because that's the existing behavior. I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic in case the result is used as a double. Differential Revision: http://reviews.llvm.org/D5919 llvm-svn: 220514
* test: Make this test runnable in directories with @ in their namesJustin Bogner2014-10-221-1/+1
| | | | | | | | Jenkins likes to use directories with names involving the '@' character, which breaks the sed expression in this test. Switch to use '|' on the assumption that it's less likely to show up in a path. llvm-svn: 220401
* Shrinkify libcalls: use float versions of double libm functions with ↵Sanjay Patel2014-10-221-102/+106
| | | | | | | | | | | | | | | | | | | | | | | | fast-math (bug 17850) When a call to a double-precision libm function has fast-math semantics (via function attribute for now because there is no IR-level FMF on calls), we can avoid fpext/fptrunc operations and use the float version of the call if the input and output are both float. We already do this optimization using a command-line option; this patch just adds the ability for fast-math to use the existing functionality. I moved the cl::opt from InstructionCombining into SimplifyLibCalls because it's only ever used internally to that class. Modified the existing test cases to use the unsafe-fp-math attribute rather than repeating all tests. This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850 Differential Revision: http://reviews.llvm.org/D5893 llvm-svn: 220390
* Change error to warning when a profile cannot be found.Diego Novillo2014-10-221-2/+2
| | | | | | | | When the profile for a function cannot be applied, we use to emit an error. This seems extreme. The compiler can continue, it's just that the optimization opportunities won't include profile information. llvm-svn: 220386
* Support using sample profiles with partial debug info.Diego Novillo2014-10-221-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When using a profile, we used to require the use -gmlt so that we could get access to the line locations. This is used to match line numbers in the input profile to the line numbers in the function's IR. But this is actually not necessary. The driver can provide source location tracking without the emission of debug information. In these cases, the annotation 'llvm.dbg.cu' is missing from the IR, but the actual line location annotations are still present. This patch adds a new way of looking for the start of the current function. Instead of looking through the compile units in llvm.dbg.cu, we can walk up the scope for the first instruction in the function with a debug loc. If that describes the function, we use it. Otherwise, we keep looking until we find one. If no such instruction is found, we then give up and produce an error. Reviewers: echristo, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5887 llvm-svn: 220382
* [InstSimplify] Support constant folding to vector of pointersBruno Cardoso Lopes2014-10-221-0/+35
| | | | | | | | | | | | | | | | | | | ConstantFolding crashes when trying to InstSimplify the following load: @a = private unnamed_addr constant %mst { i8* inttoptr (i64 -1 to i8*), i8* inttoptr (i64 -1 to i8*) }, align 8 %x = load <2 x i8*>* bitcast (%mst* @a to <2 x i8*>*), align 8 This patch fix this by adding support to this type of folding: %x = load <2 x i8*>* bitcast (%mst* @a to <2 x i8*>*), align 8 ==> gets folded to: %x = <2 x i8*> <i8* inttoptr (i64 -1 to i8*), i8* inttoptr (i64 -1 to i8*)> llvm-svn: 220380
* Revert "Teach the load analysis to allow finding available values which ↵Hans Wennborg2014-10-211-61/+9
| | | | | | | | require" (r220277) This seems to have caused PR21330. llvm-svn: 220349
* Add minnum / maxnum intrinsicsMatt Arsenault2014-10-214-0/+557
| | | | | | | | | | | | These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341
* InstCombine: Simplify FoldICmpCstShrCstDavid Majnemer2014-10-213-330/+332
| | | | | | | | | This function was complicated by the fact that it tried to perform canonicalizations that were already preformed by InstSimplify. Remove this extra code and move the tests over to InstSimplify. Add asserts to make sure our preconditions hold before we make any assumptions. llvm-svn: 220314
* Teach the load analysis to allow finding available values which requireChandler Carruth2014-10-211-9/+61
| | | | | | | | | | | | | | | | | | | | inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 220277
* Introduce a 'nonnull' metadata on Load instructions.Philip Reames2014-10-201-0/+23
| | | | | | | | | The newly introduced 'nonnull' metadata is analogous to existing 'nonnull' attributes, but applies to load instructions rather than call arguments or returns. Long term, it would be nice to combine these into a single construct. The value of the load is allowed to vary between successive loads, but null is not a valid value to be loaded by any load marked nonnull. Reviewed by: Hal Finkel Differential Revision: http://reviews.llvm.org/D5220 llvm-svn: 220240
* Fix a miscompile introduced in r220178.Chandler Carruth2014-10-201-0/+31
| | | | | | | | | | | | | | | | | | The original code had an implicit assumption that if the test for allocas or globals was reached, the two pointers were not equal. With my changes to make the pointer analysis more powerful here, I also had to guard against circumstances where the results weren't useful. That in turn violated the assumption and gave rise to a circumstance in which we could have a store with both the queried pointer and stored pointer rooted at *the same* alloca. Clearly, we cannot ignore such a store. There are other things we might do in this code to better handle the case of both pointers ending up at the same alloca or global, but it seems best to at least make the test explicit in what it intends to check. I've added tests for both the alloca and global case here. llvm-svn: 220190
* Fix a somewhat subtle pair of issues with JumpThreading I introduced inChandler Carruth2014-10-201-0/+31
| | | | | | | | | | | | | | | r220178. First, the creation routine doesn't insert prior to the terminator of the basic block provided, but really at the end of the basic block. Instead, get the terminator and insert before that. The next issue was that we need to ensure multiple PHI node entries for a single predecessor re-use the same cast instruction rather than creating new ones. All of the logic here was without tests previously. I've reduced and added a test case from the test suite that crashed without both of these fixes. llvm-svn: 220186
* Teach the load analysis driving core instcombine logic and other bits ofChandler Carruth2014-10-201-0/+109
| | | | | | | | | | | | | | | | | | | logic to look through pointer casts, making them trivially stronger in the face of loads and stores with intervening pointer casts. I've included a few test cases that demonstrate the kind of folding instcombine can do without pointer casts and then variations which obfuscate the logic through bitcasts. Without this patch, the variations all fail to optimize fully. This is more important now than it has been in the past as I've started moving the load canonicialization to more closely follow the value type requirements rather than the pointer type requirements and thus this needs to be prepared for more pointer casts. When I made the same change to stores several test cases regressed without logic along these lines so I wanted to systematically improve matters first. llvm-svn: 220178
* Add a datalayout string to this test so that it exercises the full gamutChandler Carruth2014-10-201-13/+15
| | | | | | | | | | | | | of InstCombine rather than just the bits enabled when datalayout is optional. The primary fixes here are because now things are little endian. In good news, silliness like this seems like it will be going away as we've got pretty stong consensus on dropping optional datalayout entirely. llvm-svn: 220176
* Do a better and more complete job of preserving metadata when combiningChandler Carruth2014-10-192-25/+86
| | | | | | | | | | | | | | | | | | | | loads. This handles many more cases than just the AA metadata, some of them suggested by Hal in his review of the AA metadata handling patch. I've tried to test this behavior where tractable to do so. I'll point out that I have specifically *not* included a test for debuginfo because it was going to require 2 or 3 times as much work to craft some input which would survive the "helpful" stripping of debug info metadata that doesn't match the desired schema. This is another good example of why the current state of write-ability for our debug info metadata is unacceptable. I spent over 30 minutes trying to conjure some test case that would survive, even copying from other debug info tests, but it always failed to survive with no explanation of why or how I might fix it. =[ llvm-svn: 220165
* Move previously dead code to handle computing the known bits of an aliasChandler Carruth2014-10-191-0/+40
| | | | | | | | | | | | | | up to where it actually works as intended. The problem is that a GlobalAlias isa GlobalValue and so the prior block handled all of the cases. This allows us to constant fold based on the actual constant expression in the global alias. As an example, see the last function in the newly added test case which explicitly aligns an unaligned pointer using constant expression math. Without this change, we fail to see that and fold an alignment test to zero. llvm-svn: 220164
* InstCombine: (sub (or A B) (xor A B)) --> (and A B)David Majnemer2014-10-191-0/+10
| | | | | | | | | | | The following implements the transformation: (sub (or A B) (xor A B)) --> (and A B). Patch by Ankur Garg! Differential Revision: http://reviews.llvm.org/D5719 llvm-svn: 220163
* InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1David Majnemer2014-10-191-0/+61
| | | | | | | | | | | | | | | The following implements the optimization for sequences of the form: icmp eq/ne (shl Const2, A), Const1 Such sequences can be transformed to: icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2)) This handles only the equality operators for now. Other operators need to be handled. Patch by Ankur Garg! llvm-svn: 220162
* Fix a long-standing miscompile in the load analysis that was uncoveredChandler Carruth2014-10-193-4/+62
| | | | | | | | | | | | | | | | | | | by my refactoring of this code. The method isSafeToLoadUnconditionally assumes that the load will proceed with the preferred type alignment. Given that, it has to ensure that the alloca or global is at least that aligned. It has always done this historically when a datalayout is present, but has never checked it when the datalayout is absent. When I refactored the code in r220156, I exposed this path when datalayout was present and that turned the latent bug into a patent bug. This fixes the issue by just removing the special case which allows folding things without datalayout. This isn't worth the complexity of trying to tease apart when it is or isn't safe without actually knowing the preferred alignment. llvm-svn: 220161
* Preserve AA metadata when combining (cast (load (...))) -> (load (castChandler Carruth2014-10-181-0/+25
| | | | | | (...))). llvm-svn: 220141
* [InstCombine] Do an about-face on how LLVM canonicalizes (cast (loadChandler Carruth2014-10-186-32/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...)) and (load (cast ...)): canonicalize toward the former. Historically, we've tried to load using the type of the *pointer*, and tried to match that type as closely as possible removing as many pointer casts as we could and trading them for bitcasts of the loaded value. This is deeply and fundamentally wrong. Repeat after me: memory does not have a type! This was a hard lesson for me to learn working on SROA. There is only one thing that should actually drive the type used for a pointer, and that is the type which we need to use to load from that pointer. Matching up pointer types to the loaded value types is very useful because it minimizes the physical size of the IR required for no-op casts. Similarly, the only thing that should drive the type used for a loaded value is *how that value is used*! Again, this minimizes casts. And in fact, the *only* thing motivating types in any part of LLVM's IR are the types used by the operations in the IR. We should match them as closely as possible. I've ended up removing some tests here as they were testing bugs or behavior that is no longer present. Mostly though, this is just cleanup to let the tests continue to function as intended. The only fallout I've found so far from this change was SROA and I have fixed it to not be impeded by the different type of load. If you find more places where this change causes optimizations not to fire, those too are likely bugs where we are assuming that the type of pointers is "significant" for optimization purposes. llvm-svn: 220138
* Remove a test that was ported from the old llvm-gcc frontend test suite.Chandler Carruth2014-10-181-39/+0
| | | | | | | | | | | | | | | | | | | | | | | This test is pretty awesome. It is claiming to test devirtualization. However, the code in question is not in fact devirtualized by LLVM. If you take the original C++ test case and run it through Clang at -O3 we fail to devirtualize it completely. It also isn't a sufficiently focused test case. The *reason* we fail to devirtualize it isn't because of any missing instcombine though. Instead, it is because we fail to emit an available externally vtable and thus the vtable is just an external and completely opaque. If I cause the vtable to be emitted, we successfully devirtualize things. Anyways, I'm just removing it because it is providing negative value at this point: it isn't representative of the output of Clang really, LLVM isn't doing the transform it claims to be testing, LLVM's failure to do the transform isn't actually an LLVM bug at all and we shouldn't be testing for it here, and finally the test is written in such a way that it will trivially pass even when the point of the test is failing. llvm-svn: 220137
* [SROA] Change how SROA does vector-based promotion of allocas to handleChandler Carruth2014-10-181-0/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | cases where the alloca type, the load types, and the store types used all disagree. Previously, the only way that vector-based promotion occured was if the alloca type was a vector type. This was one of the *very* few remaining uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns out it was a bad idea. The alloca type can change very easily based on the mixture of types loaded and stored to that alloca. We shouldn't be relying on it as a signal for very much. Instead, the source of truth should be loads and stores. We should canonicalize the loads and stores as much as possible and then rely on them exclusively in SROA. When looking and loads and stores, we may find many different candidate vector types. This change will let SROA try all of them to find a vector type which is a viable way to promote the entire alloca to a vector register. With this change, it becomes possible to do better canonicalization and optimization of loads and stores without breaking SROA in random ways, and that should allow fixing a core source of performance loss in hot numerical loops such as those in Eigen. llvm-svn: 220116
* Revert "TRE: make TRE a bit more aggressive"Rafael Espindola2014-10-173-38/+6
| | | | | | | | | This reverts commit r219899. This also updates byval-tail-call.ll to make it clear what was breaking. Adding r219899 again will cause the load/store to disappear. llvm-svn: 220093
* [DSE] Remove no-data-layout-only type-based overlap checkingHal Finkel2014-10-173-12/+20
| | | | | | | | | | | | | | | | | DSE's overlap checking contained special logic, used only when no DataLayout was available, which inferred a complete overwrite when the pointee types were equal. This logic seems fine for regular loads/stores, but does not work for memcpy and friends. Instead of fixing this, I'm just removing it. Philosophically, transformations should not contain enhanced behavior used only when data layout is lacking (data layout should be strictly additive), and maintaining these rarely-tested code paths seems not worthwhile at this stage. Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case (slightly reduced from that provided by Aliaksei) replaces the original contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few other tests have been updated to have a data layout. llvm-svn: 220035
* Delete -std-compile-opts.Rafael Espindola2014-10-163-3/+3
| | | | | | These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951
* Allow call-slop optzn for destinations with a suitable dereferenceable attributeBjorn Steinbrink2014-10-161-0/+29
| | | | | | | | | | | | | | | Summary: Currently, call slot optimization requires that if the destination is an argument, the argument has the sret attribute. This is to ensure that the memory access won't trap. In addition to sret, we can also allow the optimization to happen for arguments that have the new dereferenceable attribute, which gives the same guarantee. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5832 llvm-svn: 219950
OpenPOWER on IntegriCloud