summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* LoopVectorize: Simplify code. No functionality change.Benjamin Kramer2014-10-221-19/+7
| | | | llvm-svn: 220405
* Shorten auto iterators for function basic blocks.Diego Novillo2014-10-221-37/+36
| | | | | | | | Use consistent naming for basic block instances. No functional changes. llvm-svn: 220404
* Use auto iteration in lib/Transforms/Scalar/SampleProfile.cpp. No functional ↵Diego Novillo2014-10-221-18/+15
| | | | | | changes. llvm-svn: 220394
* Preserving 'nonnull' metadata in SimplifyCFGPhilip Reames2014-10-221-1/+4
| | | | | | | | | | When we hoist two loads above an if, we can preserve the nonnull metadata. We could also do the same for sinking them, but we appear to not handle metadata at all in that case. Thanks to Hal for the review. Differential Revision: http://reviews.llvm.org/D5910 llvm-svn: 220392
* Shrinkify libcalls: use float versions of double libm functions with ↵Sanjay Patel2014-10-222-10/+24
| | | | | | | | | | | | | | | | | | | | | | | | fast-math (bug 17850) When a call to a double-precision libm function has fast-math semantics (via function attribute for now because there is no IR-level FMF on calls), we can avoid fpext/fptrunc operations and use the float version of the call if the input and output are both float. We already do this optimization using a command-line option; this patch just adds the ability for fast-math to use the existing functionality. I moved the cl::opt from InstructionCombining into SimplifyLibCalls because it's only ever used internally to that class. Modified the existing test cases to use the unsafe-fp-math attribute rather than repeating all tests. This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850 Differential Revision: http://reviews.llvm.org/D5893 llvm-svn: 220390
* Change error to warning when a profile cannot be found.Diego Novillo2014-10-221-1/+3
| | | | | | | | When the profile for a function cannot be applied, we use to emit an error. This seems extreme. The compiler can continue, it's just that the optimization opportunities won't include profile information. llvm-svn: 220386
* Support using sample profiles with partial debug info.Diego Novillo2014-10-221-12/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: When using a profile, we used to require the use -gmlt so that we could get access to the line locations. This is used to match line numbers in the input profile to the line numbers in the function's IR. But this is actually not necessary. The driver can provide source location tracking without the emission of debug information. In these cases, the annotation 'llvm.dbg.cu' is missing from the IR, but the actual line location annotations are still present. This patch adds a new way of looking for the start of the current function. Instead of looking through the compile units in llvm.dbg.cu, we can walk up the scope for the first instruction in the function with a debug loc. If that describes the function, we use it. Otherwise, we keep looking until we find one. If no such instruction is found, we then give up and produce an error. Reviewers: echristo, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5887 llvm-svn: 220382
* [msan] Handle param-tls overflow.Evgeniy Stepanov2014-10-221-14/+34
| | | | | | | | ParamTLS (shadow for function arguments) is of limited size. This change makes all arguments that do not fit unpoisoned, and avoids writing past the end of a TLS buffer. llvm-svn: 220351
* Revert "Teach the load analysis to allow finding available values which ↵Hans Wennborg2014-10-212-6/+5
| | | | | | | | require" (r220277) This seems to have caused PR21330. llvm-svn: 220349
* LTO: respect command-line options that disable vectorization.JF Bastien2014-10-211-2/+4
| | | | | | | | | | | | Summary: Patches 202051 and 208013 added calls to LTO's PassManager which unconditionally add LoopVectorizePass and SLPVectorizerPass instead of following the logic in PassManagerBuilder::populateModulePassManager and honoring the -vectorize-loops -run-slp-after-loop-vectorization flags. Reviewers: nadav, aschwaighofer, yijiang Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5884 llvm-svn: 220345
* Add minnum / maxnum intrinsicsMatt Arsenault2014-10-212-0/+86
| | | | | | | | | | | | These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341
* Teach combineMetadata how to merge 'nonnull' metadata.Philip Reames2014-10-211-0/+4
| | | | | | combineMetadata is used when merging two instructions into one. This change teaches it how to merge 'nonnull' - i.e. only preserve it on the new instruction if it's set on both sources. This isn't actually used yet since I haven't adjusted any of the call sites to pass in nonnull as a 'known metadata'. llvm-svn: 220325
* Preserve 'nonnull' when changing type of the load.Philip Reames2014-10-211-0/+1
| | | | | | | | When changing the type of a load in Chandler's recent InstCombine changes, we can preserve the new 'nonnull' metadata. I considered adding an assert since 'nonnull' is only valid on pointer types, but casting a pointer to a non-pointer would involve more than a bitcast anyways. If someone extends this transform to handle more than bitcasts, the verifier will report the malformed IR, so a separate assertion isn't needed. Also, the fpmath flags would have the same problem. llvm-svn: 220324
* InstCombine: Simplify FoldICmpCstShrCstDavid Majnemer2014-10-211-48/+16
| | | | | | | | | This function was complicated by the fact that it tried to perform canonicalizations that were already preformed by InstSimplify. Remove this extra code and move the tests over to InstSimplify. Add asserts to make sure our preconditions hold before we make any assumptions. llvm-svn: 220314
* Teach the load analysis to allow finding available values which requireChandler Carruth2014-10-212-5/+6
| | | | | | | | | | | | | | | | | | | | inttoptr or ptrtoint cast provided there is datalayout available. Eventually, the datalayout can just be required but in practice it will always be there today. To go with the ability to expose available values requiring a ptrtoint or inttoptr cast, helpers are added to perform one of these three casts. These smarts are necessary to finish canonicalizing loads and stores to the operational type requirements without regressing fundamental combines. I've added some test cases. These should actually improve as the load combining and store combining improves, but they may fundamentally be highlighting some missing combines for select in addition to exercising the specific added logic to load analysis. llvm-svn: 220277
* Do not attribute static allocas to the call site's DebugLoc.Paul Robinson2014-10-211-0/+6
| | | | | | | | | | | | | When functions are inlined, instructions without debug information are attributed to the call site's DebugLoc. After inlining, inlined static allocas are moved to the caller's entry block, adjacent to the caller's original static alloca instructions. By retaining the call site's DebugLoc, these instructions could cause instructions that were subsequently inserted at the entry block to pick up the same DebugLoc. Patch by Wolfgang Pieb! llvm-svn: 220255
* Introduce enum values for previously defined metadata types. (NFC)Philip Reames2014-10-212-7/+3
| | | | | | | | | | | Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types: + MD_nontemporal = 9, // "nontemporal" + MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access" + MD_nonnull = 11 // "nonnull" I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update. llvm-svn: 220248
* IR: Replace DataLayout::RoundUpAlignment with RoundUpToAlignmentDavid Majnemer2014-10-201-4/+4
| | | | | | No functional change intended, just cleaning up some code. llvm-svn: 220187
* Fix a somewhat subtle pair of issues with JumpThreading I introduced inChandler Carruth2014-10-201-3/+6
| | | | | | | | | | | | | | | r220178. First, the creation routine doesn't insert prior to the terminator of the basic block provided, but really at the end of the basic block. Instead, get the terminator and insert before that. The next issue was that we need to ensure multiple PHI node entries for a single predecessor re-use the same cast instruction rather than creating new ones. All of the logic here was without tests previously. I've reduced and added a test case from the test suite that crashed without both of these fixes. llvm-svn: 220186
* Teach the load analysis driving core instcombine logic and other bits ofChandler Carruth2014-10-202-2/+12
| | | | | | | | | | | | | | | | | | | logic to look through pointer casts, making them trivially stronger in the face of loads and stores with intervening pointer casts. I've included a few test cases that demonstrate the kind of folding instcombine can do without pointer casts and then variations which obfuscate the logic through bitcasts. Without this patch, the variations all fail to optimize fully. This is more important now than it has been in the past as I've started moving the load canonicialization to more closely follow the value type requirements rather than the pointer type requirements and thus this needs to be prepared for more pointer casts. When I made the same change to stores several test cases regressed without logic along these lines so I wanted to systematically improve matters first. llvm-svn: 220178
* Do a better and more complete job of preserving metadata when combiningChandler Carruth2014-10-191-8/+58
| | | | | | | | | | | | | | | | | | | | loads. This handles many more cases than just the AA metadata, some of them suggested by Hal in his review of the AA metadata handling patch. I've tried to test this behavior where tractable to do so. I'll point out that I have specifically *not* included a test for debuginfo because it was going to require 2 or 3 times as much work to craft some input which would survive the "helpful" stripping of debug info metadata that doesn't match the desired schema. This is another good example of why the current state of write-ability for our debug info metadata is unacceptable. I spent over 30 minutes trying to conjure some test case that would survive, even copying from other debug info tests, but it always failed to survive with no explanation of why or how I might fix it. =[ llvm-svn: 220165
* InstCombine: (sub (or A B) (xor A B)) --> (and A B)David Majnemer2014-10-191-0/+9
| | | | | | | | | | | The following implements the transformation: (sub (or A B) (xor A B)) --> (and A B). Patch by Ankur Garg! Differential Revision: http://reviews.llvm.org/D5719 llvm-svn: 220163
* InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1David Majnemer2014-10-192-1/+50
| | | | | | | | | | | | | | | The following implements the optimization for sequences of the form: icmp eq/ne (shl Const2, A), Const1 Such sequences can be transformed to: icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2)) This handles only the equality operators for now. Other operators need to be handled. Patch by Ankur Garg! llvm-svn: 220162
* Fix a long-standing miscompile in the load analysis that was uncoveredChandler Carruth2014-10-191-1/+5
| | | | | | | | | | | | | | | | | | | by my refactoring of this code. The method isSafeToLoadUnconditionally assumes that the load will proceed with the preferred type alignment. Given that, it has to ensure that the alloca or global is at least that aligned. It has always done this historically when a datalayout is present, but has never checked it when the datalayout is absent. When I refactored the code in r220156, I exposed this path when datalayout was present and that turned the latent bug into a patent bug. This fixes the issue by just removing the special case which allows folding things without datalayout. This isn't worth the complexity of trying to tease apart when it is or isn't safe without actually knowing the preferred alignment. llvm-svn: 220161
* Preserve AA metadata when combining (cast (load (...))) -> (load (castChandler Carruth2014-10-181-0/+3
| | | | | | (...))). llvm-svn: 220141
* [InstCombine] Do an about-face on how LLVM canonicalizes (cast (loadChandler Carruth2014-10-181-72/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...)) and (load (cast ...)): canonicalize toward the former. Historically, we've tried to load using the type of the *pointer*, and tried to match that type as closely as possible removing as many pointer casts as we could and trading them for bitcasts of the loaded value. This is deeply and fundamentally wrong. Repeat after me: memory does not have a type! This was a hard lesson for me to learn working on SROA. There is only one thing that should actually drive the type used for a pointer, and that is the type which we need to use to load from that pointer. Matching up pointer types to the loaded value types is very useful because it minimizes the physical size of the IR required for no-op casts. Similarly, the only thing that should drive the type used for a loaded value is *how that value is used*! Again, this minimizes casts. And in fact, the *only* thing motivating types in any part of LLVM's IR are the types used by the operations in the IR. We should match them as closely as possible. I've ended up removing some tests here as they were testing bugs or behavior that is no longer present. Mostly though, this is just cleanup to let the tests continue to function as intended. The only fallout I've found so far from this change was SROA and I have fixed it to not be impeded by the different type of load. If you find more places where this change causes optimizations not to fire, those too are likely bugs where we are assuming that the type of pointers is "significant" for optimization purposes. llvm-svn: 220138
* [SROA] Change how SROA does vector-based promotion of allocas to handleChandler Carruth2014-10-181-44/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | cases where the alloca type, the load types, and the store types used all disagree. Previously, the only way that vector-based promotion occured was if the alloca type was a vector type. This was one of the *very* few remaining uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns out it was a bad idea. The alloca type can change very easily based on the mixture of types loaded and stored to that alloca. We shouldn't be relying on it as a signal for very much. Instead, the source of truth should be loads and stores. We should canonicalize the loads and stores as much as possible and then rely on them exclusively in SROA. When looking and loads and stores, we may find many different candidate vector types. This change will let SROA try all of them to find a vector type which is a viable way to promote the entire alloca to a vector register. With this change, it becomes possible to do better canonicalization and optimization of loads and stores without breaking SROA in random ways, and that should allow fixing a core source of performance loss in hot numerical loops such as those in Eigen. llvm-svn: 220116
* [msan] Fix handling of byval arguments with large alignment.Evgeniy Stepanov2014-10-171-1/+2
| | | | | | | MSan param-tls slots are 8-byte aligned. This change clips alignment of memcpy into param-tls to 8. llvm-svn: 220101
* Revert "TRE: make TRE a bit more aggressive"Rafael Espindola2014-10-171-2/+8
| | | | | | | | | This reverts commit r219899. This also updates byval-tail-call.ll to make it clear what was breaking. Adding r219899 again will cause the load/store to disappear. llvm-svn: 220093
* [DSE] Remove no-data-layout-only type-based overlap checkingHal Finkel2014-10-171-8/+1
| | | | | | | | | | | | | | | | | DSE's overlap checking contained special logic, used only when no DataLayout was available, which inferred a complete overwrite when the pointee types were equal. This logic seems fine for regular loads/stores, but does not work for memcpy and friends. Instead of fixing this, I'm just removing it. Philosophically, transformations should not contain enhanced behavior used only when data layout is lacking (data layout should be strictly additive), and maintaining these rarely-tested code paths seems not worthwhile at this stage. Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case (slightly reduced from that provided by Aliaksei) replaces the original contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few other tests have been updated to have a data layout. llvm-svn: 220035
* [SROA] Switch the common variable name for the 'AllocaSlices' class toChandler Carruth2014-10-161-40/+42
| | | | | | | | | | | 'AS'. Using 'S' as this was a terrible idea. Arguably, 'AS' is not much better, but it at least follows the idea of using initialisms and removes active confusion about the AllocaSlices variable and a Slice variable. llvm-svn: 219963
* [SROA] More range-based cleanups to SROA, these brought to you byChandler Carruth2014-10-161-25/+12
| | | | | | | | | clang-modernize. I did have to clean up the variable types and whitespace a bit because the use of auto made the code much less readable here. llvm-svn: 219962
* [SROA] Switch a couple of overly complex iterator accessors to just beChandler Carruth2014-10-161-26/+10
| | | | | | | | | ArrayRef accessors. I think this even came up in review that this was over-engineered, and indeed it was. Time to un-build it. llvm-svn: 219958
* [SROA] Start more deeply moving SROA to use ranges rather than justChandler Carruth2014-10-161-45/+42
| | | | | | | | | | | | iterators. There are a ton of places where it essentially wants ranges rather than just iterators. This is just the first step that adds the core slice range typedefs and uses them in a couple of places. I still have to explicitly construct them because they've not been punched throughout the entire set of code. More range-based cleanups incoming. llvm-svn: 219955
* Allow call-slop optzn for destinations with a suitable dereferenceable attributeBjorn Steinbrink2014-10-161-14/+16
| | | | | | | | | | | | | | | Summary: Currently, call slot optimization requires that if the destination is an argument, the argument has the sret attribute. This is to ensure that the memory access won't trap. In addition to sret, we can also allow the optimization to happen for arguments that have the new dereferenceable attribute, which gives the same guarantee. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5832 llvm-svn: 219950
* fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)Sanjay Patel2014-10-161-1/+87
| | | | | | | | | | | | | | | | | | | | | | | | | If a square root call has an FP multiplication argument that can be reassociated, then we can hoist a repeated factor out of the square root call and into a fabs(). In the simplest case, this: y = sqrt(x * x); becomes this: y = fabs(x); This patch relies on an earlier optimization in instcombine or reassociate to put the multiplication tree into a canonical form, so we don't have to search over every permutation of the multiplication tree. Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to use function-level attributes to do this optimization. This needs to be fixed for both the intrinsics and in the backend. Differential Revision: http://reviews.llvm.org/D5787 llvm-svn: 219944
* Reapply r219832 - InstCombine: Narrow switch instructions using known bits.Akira Hatanaka2014-10-161-0/+31
| | | | | | | The code committed in r219832 asserted when it attempted to shrink a switch statement whose type was larger than 64-bit. llvm-svn: 219902
* TRE: make TRE a bit more aggressiveSaleem Abdulrasool2014-10-161-8/+2
| | | | | | | | | Make tail recursion elimination a bit more aggressive. This allows us to get tail recursion on functions that are just branches to a different function. The fact that the function takes a byval argument does not restrict it from being optimised into just a tail call. llvm-svn: 219899
* Revert r219832.Akira Hatanaka2014-10-161-31/+0
| | | | llvm-svn: 219884
* Preserve non-byval pointer alignment attributes using @llvm.assume when inliningHal Finkel2014-10-151-0/+45
| | | | | | | | | For pointer-typed function arguments, enhanced alignment can be asserted using the 'align' attribute. When inlining, if this enhanced alignment information is not otherwise available, preserve it using @llvm.assume-based alignment assumptions. llvm-svn: 219876
* Fixing the build failure due to compiler warnings and unnecessary ↵Chris Bieneman2014-10-151-3/+2
| | | | | | disambiguation. llvm-svn: 219861
* Defining a new API for debug options that doesn't rely on static global ↵Chris Bieneman2014-10-151-9/+17
| | | | | | | | | | | | | | | | | | cl::opts. Summary: This is based on the discussions from the LLVMDev thread: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075886.html Reviewers: chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5389 llvm-svn: 219854
* InstCombine: Narrow switch instructions using known bits.Akira Hatanaka2014-10-151-0/+31
| | | | | | | | | Truncate the operands of a switch instruction to a narrower type if the upper bits are known to be all ones or zeros. rdar://problem/17720004 llvm-svn: 219832
* [SLPVectorize] Basic ephemeral-value awarenessHal Finkel2014-10-151-3/+30
| | | | | | | | | | | | | | The SLP vectorizer should not vectorize ephemeral values. These are used to express information to the optimizer, and vectorizing them does not lead to faster code (because the ephemeral values are dropped prior to code generation, vectorized or not), and obscures the information the instructions are attempting to communicate (the logic that interprets the arguments to @llvm.assume generically does not understand vectorized conditions). Also, uses by ephemeral values are free (because they, and the necessary extractelement instructions, will be dropped prior to code generation). llvm-svn: 219816
* No need to cache this unused variable.Eric Christopher2014-10-141-3/+1
| | | | | | Patch by Ehsan Akhgari. llvm-svn: 219749
* [LoopVectorize] Ignore @llvm.assume for cost estimates and legalityHal Finkel2014-10-141-3/+32
| | | | | | | | | | | | | | A few minor changes to prevent @llvm.assume from interfering with loop vectorization. First, treat @llvm.assume like the lifetime intrinsics, which are scalarized (but don't otherwise interfere with the legality checking). Second, ignore the cost of ephemeral instructions in the loop (these will go away anyway during CodeGen). Alignment assumptions and other uses of @llvm.assume can often end up inside of loops that should be vectorized (this is not uncommon for assumptions generated by __attribute__((align_value(n))), for example). llvm-svn: 219741
* Optimize away fabs() calls when input is squared (known positive).Sanjay Patel2014-10-141-1/+30
| | | | | | | | | | | | Eliminate library calls and intrinsic calls to fabs when the input is a squared value. Note that no unsafe-math / fast-math assumptions are needed for this optimization. Differential Revision: http://reviews.llvm.org/D5777 llvm-svn: 219717
* InstCombine: Don't miscompile X % ((Pow2 << A) >>u B)David Majnemer2014-10-141-7/+4
| | | | | | | | | | | | | | | | | | | We assumed that A must be greater than B because the right hand side of a remainder operator must be nonzero. However, it is possible for A to be less than B if Pow2 is a power of two greater than 1. Take for example: i32 %A = 0 i32 %B = 31 i32 Pow2 = 2147483648 ((Pow2 << 0) >>u 31) is non-zero but A is less than B. This fixes PR21274. llvm-svn: 219713
* Switch to select optimization for two-case switchesMarcello Maggioni2014-10-141-0/+170
| | | | | | | This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block and a test to check that this condition is handled. llvm-svn: 219656
* fix formatting; NFCSanjay Patel2014-10-141-33/+25
| | | | llvm-svn: 219645
OpenPOWER on IntegriCloud