summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* InstCombine: Check the operand types before merging fcmp ord & fcmp ord.Benjamin Kramer2013-04-121-0/+3
| | | | | | Fixes PR15737. llvm-svn: 179417
* SLPVectorizer: add support for vectorization of diamond shaped trees. We now ↵Nadav Rotem2013-04-122-46/+254
| | | | | | perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from. llvm-svn: 179414
* Add debug prints.Nadav Rotem2013-04-121-1/+5
| | | | llvm-svn: 179412
* Simplify (A & ~B) in icmp if A is a power of 2David Majnemer2013-04-121-0/+9
| | | | | | | | The transform will execute like so: (A & ~B) == 0 --> (A & B) != 0 (A & ~B) != 0 --> (A & B) == 0 llvm-svn: 179386
* LoopVectorizer: integer division is not a reduction operationArnold Schwaighofer2013-04-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't classify idiv/udiv as a reduction operation. Integer division is lossy. For example : (1 / 2) * 4 != 4/2. Example: int a[] = { 2, 5, 2, 2} int x = 80; for() x /= a[i]; Scalar: x /= 2 // = 40 x /= 5 // = 8 x /= 2 // = 4 x /= 2 // = 2 Vectorized: <80, 1> / <2,5> //= <40,0> <40, 0> / <2,2> //= <20,0> 20*0 = 0 radar://13640654 llvm-svn: 179381
* Optimize icmp involving addition betterDavid Majnemer2013-04-111-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | Allows LLVM to optimize sequences like the following: %add = add nsw i32 %x, 1 %cmp = icmp sgt i32 %add, %y into: %cmp = icmp sge i32 %x, %y as well as: %add1 = add nsw i32 %x, 20 %add2 = add nsw i32 %y, 57 %cmp = icmp sge i32 %add1, %add2 into: %add = add nsw i32 %y, 37 %cmp = icmp sle i32 %cmp, %x llvm-svn: 179316
* Fix for wrong instcombine on vector insert/extractBenjamin Kramer2013-04-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When trying to collapse sequences of insertelement/extractelement instructions into single shuffle instructions, there is one specific case where the Instruction Combiner wrongly updates the resulting Mask of shuffle indexes. The problem is in function CollectShuffleElments. If we have a sequence of insert/extract element instructions like the one below: %tmp1 = extractelement <4 x float> %LHS, i32 0 %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1 %tmp3 = extractelement <4 x float> %RHS, i32 2 %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3 Where: . %RHS will have a mask of [4,5,6,7] . %LHS will have a mask of [0,1,2,3] The Mask of shuffle indexes is wrongly computed to [4,1,6,7] instead of [4,0,6,7]. When analyzing %tmp2 in order to compute the Mask for the resulting shuffle instruction, the algorithm forgets to update the mask index at position 1 with the index associated to the element extracted from %LHS by instruction %tmp1. Patch by Andrea DiBiagio! llvm-svn: 179291
* [ASan] Allow disabling init-order checks for globals by source file name.Alexey Samsonov2013-04-111-1/+2
| | | | llvm-svn: 179280
* Rename the C function to create a SLPVectorizerPass to something sane and ↵Benjamin Kramer2013-04-111-2/+2
| | | | | | expose it in the header file. llvm-svn: 179272
* Make the SLP store-merger less paranoid about function calls. We check for ↵Nadav Rotem2013-04-101-4/+0
| | | | | | function calls when we check if it is safe to sink instructions. llvm-svn: 179207
* We require DataLayout for analyzing the size of stores.Nadav Rotem2013-04-102-1/+6
| | | | llvm-svn: 179206
* Change CloneFunctionInto to always clone Argument attributes induvidually,Joey Gouly2013-04-101-22/+19
| | | | | | | rather than checking if the source and destination have the same number of arguments and copying the attributes over directly. llvm-svn: 179169
* Fix some comment typos.Bob Wilson2013-04-091-2/+2
| | | | llvm-svn: 179132
* Add support for bottom-up SLP vectorization infrastructure.Nadav Rotem2013-04-095-0/+707
| | | | | | | | | | | | | | | | | | | | | | This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations. The infrastructure has three potential users: 1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]). 2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute. 3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization. This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code: void SAXPY(int *x, int *y, int a, int i) { x[i] = a * x[i] + y[i]; x[i+1] = a * x[i+1] + y[i+1]; x[i+2] = a * x[i+2] + y[i+2]; x[i+3] = a * x[i+3] + y[i+3]; } llvm-svn: 179117
* Redo the fix Benjamin Kramer committed in r178793 about iterator ↵Shuxin Yang2013-04-081-12/+14
| | | | | | | | | | | | | | | | | invalidation in Reassociate. I brazenly think this change is slightly simpler than r178793 because: - no "state" in functor - "OpndPtrs[i]" looks simpler than "&Opnds[OpndIndices[i]]" While I can reproduce the probelm in Valgrind, it is rather difficult to come up a standalone testing case. The reason is that when an iterator is invalidated, the stale invalidated elements are not yet clobbered by nonsense data, so the optimizer can still proceed successfully. Thank Benjamin for fixing this bug and generously providing the test case. llvm-svn: 179062
* Fix PR15674 (and PR15603): a SROA think-o.Chandler Carruth2013-04-071-0/+1
| | | | | | | | | | | | | | The fix for PR14972 in r177055 introduced a real think-o in the *store* side, likely because I was much more focused on the load side. While we can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily widen a value to be stored, as that changes the width of memory access! Lock down the code path in the store rewriting which would do this to only handle the intended circumstance. All of the existing tests continue to pass, and I've added a test from the PR. llvm-svn: 178974
* Removed trailing whitespace.Michael Gottesman2013-04-051-27/+27
| | | | llvm-svn: 178932
* An objc_retain can serve as a use for a different pointer.Michael Gottesman2013-04-051-2/+3
| | | | | | | This is the counterpart to commit r160637, except it performs the action in the bottomup portion of the data flow analysis. llvm-svn: 178922
* Properly model precise lifetime when given an incomplete dataflow sequence.Michael Gottesman2013-04-051-6/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The normal dataflow sequence in the ARC optimizer consists of the following states: Retain -> CanRelease -> Use -> Release The optimizer before this patch stored the uses that determine the lifetime of the retainable object pointer when it bottom up hits a retain or when top down it hits a release. This is correct for an imprecise lifetime scenario since what we are trying to do is remove retains/releases while making sure that no ``CanRelease'' (which is usually a call) deallocates the given pointer before we get to the ``Use'' (since that would cause a segfault). If we are considering the precise lifetime scenario though, this is not correct. In such a situation, we *DO* care about the previous sequence, but additionally, we wish to track the uses resulting from the following incomplete sequences: Retain -> CanRelease -> Release (TopDown) Retain <- Use <- Release (BottomUp) *NOTE* This patch looks large but the most of it consists of updating test cases. Additionally this fix exposed an additional bug. I removed the test case that expressed said bug and will recommit it with the fix in a little bit. llvm-svn: 178921
* Tidy up a bit. No functional change.Jim Grosbach2013-04-059-259/+261
| | | | llvm-svn: 178915
* Disable the optimization about promoting vector-element-access with symbolic ↵Shuxin Yang2013-04-051-11/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | index. This optimization is unstable at this moment; it 1) block us on a very important application 2) PR15200 3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll (the CHECK command compare the output against wrong result) I personally believe this optimization should not have any impact on the autovectorized code, as auto-vectorizer is supposed to put gather/scatter in a "right" way. Although in theory downstream optimizaters might reveal some gather/scatter optimization opportunities, the chance is quite slim. For the hand-crafted vectorizing code, in term of redundancy elimination, load-CSE, copy-propagation and DSE can collectively achieve the same result, but in much simpler way. On the other hand, these optimizers are able to improve the code in a incremental way; in contrast, SROA is sort of all-or-none approach. However, SROA might slighly win in stack size, as it tries to figure out a stretch of memory tightenly cover the area accessed by the dynamic index. rdar://13174884 PR15200 llvm-svn: 178912
* Added two debug logging messages to VisitInstructionsTopDown to match ↵Michael Gottesman2013-04-051-0/+4
| | | | | | VisitInstructionsBottomUp. llvm-svn: 178895
* Cleaned up whitespace and made debug logging less verbose.Michael Gottesman2013-04-051-114/+95
| | | | llvm-svn: 178893
* LoopVectorizer: Pass OperandValueKind information to the cost modelArnold Schwaighofer2013-04-041-2/+13
| | | | | | | | | | | | Pass down the fact that an operand is going to be a vector of constants. This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86 back. It had degraded to scalar performance due to my pervious shift cost change that made all shifts expensive on x86. radar://13576547 llvm-svn: 178809
* Reassociate: Avoid iterator invalidation.Benjamin Kramer2013-04-041-7/+12
| | | | | | | | OpndPtrs stored pointers into the Opnd vector that became invalid when the vector grows. Store indices instead. Sadly I only have a large testcase that only triggers under valgrind, so I didn't include it. llvm-svn: 178793
* Refactored out the helper method FindPredecessorAutoreleaseWithSafePath from ↵Michael Gottesman2013-04-031-25/+45
| | | | | | | | ObjCARCOpt::OptimizeReturns. Now ObjCARCOpt::OptimizeReturns is easy to read and reason about. llvm-svn: 178715
* Refactored out the helper function FindPredecessorRetainWithSafePath from ↵Michael Gottesman2013-04-031-18/+32
| | | | | | ObjCARCOpt::OptimizeReturns. llvm-svn: 178714
* Small cleanups.Michael Gottesman2013-04-031-14/+14
| | | | | | | | Cleaned up trailing whitespace and added extra slashes in front of a function level comment so that it follow the convention of having 3 slashes. llvm-svn: 178712
* Refactored out a part of ObjCARCOpt::OptimizeReturns into its own method ↵Michael Gottesman2013-04-031-22/+33
| | | | | | HasSafePathToPredecessorCall. llvm-svn: 178710
* Removed an old comment.Michael Gottesman2013-04-031-7/+0
| | | | llvm-svn: 178709
* Clean up arc annotations by moving the top/bottom BB annotations into ↵Michael Gottesman2013-04-031-58/+46
| | | | | | | | conditional macros that no-op in Release mode instead of #ifdef sections of the code. This is to follow the example of the DEBUG macro. llvm-svn: 178705
* Remove an optimization where we were changing an objc_autorelease into an ↵Michael Gottesman2013-04-031-16/+1
| | | | | | | | | | | | | | | | | | | | | objc_autoreleaseReturnValue. The semantics of ARC implies that a pointer passed into an objc_autorelease must live until some point (potentially down the stack) where an autorelease pool is popped. On the other hand, an objc_autoreleaseReturnValue just signifies that the object must live until the end of the given function at least. Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in terms of the semantics of ARC* implying that performing the given strength reduction without any knowledge of how this relates to the autorelease pool pop that is further up the stack violates the semantics of ARC. *Even though objc_autoreleaseReturnValue if you know that no RV optimization will occur is more computationally expensive. llvm-svn: 178612
* Improved comment. No functionality change.Michael Gottesman2013-04-031-1/+2
| | | | llvm-svn: 178605
* Use a worklist to avoid a sneaky iterator invalidation.Bill Wendling2013-04-021-3/+3
| | | | | | | | | | | | | The iterator could be invalidated when it's recursively deleting a whole bunch of constant expressions in a constant initializer. Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was run on a `.ll' file, it wouldn't crash. This is why the test first pushes the `.ll' file through `llvm-as' before feeding it to `opt'. PR15440 llvm-svn: 178531
* Correct assertion conditionShuxin Yang2013-04-011-1/+1
| | | | llvm-svn: 178484
* Implement XOR reassociation. It is based on following rules:Shuxin Yang2013-03-301-1/+325
| | | | | | | | | | | | | | | rule 1: (x | c1) ^ c2 => (x & ~c1) ^ (c1^c2), only useful when c1=c2 rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2)) rule 3: (x | c1) ^ (x | c2) = (x & c3) ^ c3 where c3 = c1 ^ c2 rule 4: (x | c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2 It reduces an application's size (in terms of # of instructions) by 8.9%. Reviwed by Pete Cooper. Thanks a lot! rdar://13212115 llvm-svn: 178409
* Add clang.arc.used to ModuleHasARC so ARC always runs if said call is ↵Michael Gottesman2013-03-291-1/+2
| | | | | | | | | | present in a module. clang.arc.used is an interesting call for ARC since ObjCARCContract needs to run to remove said intrinsic to avoid a linker error (since the call does not exist). llvm-svn: 178369
* Removed trailing whitespace.Michael Gottesman2013-03-291-15/+15
| | | | llvm-svn: 178329
* Removed dead code from ObjCARCOpts relating to tracking objc_retainBlocks ↵Michael Gottesman2013-03-281-37/+6
| | | | | | through the ARC Dataflow analysis. By the time we get to the ARC dataflow analysis, any objc_retainBlock calls are not optimizable. llvm-svn: 178306
* Minor simplification.Bill Wendling2013-03-281-8/+4
| | | | | | Go ahead and use the full path for both the .gcno and .gcda files. llvm-svn: 178302
* Non optimizable objc_retainBlock calls are not forwarding.Michael Gottesman2013-03-281-3/+0
| | | | | | | | | | | | Since we handle optimizable objc_retainBlocks through strength reduction in OptimizableIndividualCalls, we know that all code after that point will only see non-optimizable objc_retainBlock calls. IsForwarding is only called by functions after that point, so it is ok to just classify objc_retainBlock as non-forwarding. <rdar://problem/13249661>. llvm-svn: 178285
* [ObjCARC] Strength reduce objc_retainBlock -> objc_retain if the ↵Michael Gottesman2013-03-281-10/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | objc_retainBlock is optimizable. If an objc_retainBlock has the copy_on_escape metadata attached to it AND if the block pointer argument only escapes down the stack, we are allowed to strength reduce the objc_retainBlock to to an objc_retain and thus optimize it. Current there is logic in the ARC data flow analysis to handle this case which is complicated and involved making distinctions in between objc_retainBlock and objc_retain in certain places and considering them the same in others. This patch simplifies said code by: 1. Performing the strength reduction in the initial ARC peephole analysis (ObjCARCOpts::OptimizeIndividualCalls). 2. Changes the ARC dataflow analysis (which runs after the peephole analysis) to consider all objc_retainBlock calls to not be optimizable (since if the call was optimizable, we would have strength reduced it already). This patch leaves in the infrastructure in the ARC dataflow analysis to handle this case, which due to 2 will just be dead code. I am doing this on purpose to separate the removal of the old code from the testing of the new code. <rdar://problem/13249661>. llvm-svn: 178284
* [tsan] make sure memset/memcpy/memmove are not inlined in tsan modeKostya Serebryany2013-03-281-0/+52
| | | | llvm-svn: 178230
* Check if Type is a vector before calling function Type::getVectorNumElements.Akira Hatanaka2013-03-281-3/+4
| | | | llvm-svn: 178208
* Use the full path when outputting the `.gcda' file.Bill Wendling2013-03-261-5/+14
| | | | | | | | | | | | | | If we compile a single source program, the `.gcda' file will be generated where the program was executed. This isn't desirable, because that place may be at an unpredictable place (the program could call `chdir' for instance). Instead, we will output the `.gcda' file in the same place we output the `.gcno' file. I.e., the directory where the executable was generated. This matches GCC's behavior. <rdar://problem/13061072> & PR11809 llvm-svn: 178084
* Make InstCombineCasts.cpp:OptimizeIntToFloatBitCast endian safe.Ulrich Weigand2013-03-261-1/+9
| | | | | | | | | | | | | The OptimizeIntToFloatBitCast converts shift-truncate sequences into extractelement operations. The computation of the element index to be used in the resulting operation is currently only correct for little-endian targets. This commit fixes the element index computation to be correct for big-endian targets as well. If the target byte order is unknown, the optimization cannot be performed at all. llvm-svn: 178031
* [ASan] Change the ABI of __asan_before_dynamic_init function: now it takes ↵Alexey Samsonov2013-03-261-17/+13
| | | | | | pointer to private string with module name. This string serves as a unique module ID in ASan runtime. LLVM part llvm-svn: 178013
* [ObjCARC Annotations] Added support for displaying the state of pointers at ↵Michael Gottesman2013-03-262-4/+147
| | | | | | | | | | | | | | | | | | | | | | | the bottom/top of BBs of the ARC dataflow analysis for both bottomup and topdown analyses. This will allow for verification and analysis of the merge function of the data flow analyses in the ARC optimizer. The actual implementation of this feature is by introducing calls to the functions llvm.arc.annotation.{bottomup,topdown}.{bbstart,bbend} which are only declared. Each such call takes in a pointer to a global with the same name as the pointer whose provenance is being tracked and a pointer whose name is one of our Sequence states and points to a string that contains the same name. To ensure that the optimizer does not consider these annotations in any way, I made it so that the annotations are considered to be of IC_None type. A test case is included for this commit and the previous ObjCARCAnnotation commit. llvm-svn: 177952
* [ObjCARC Annotations] Implemented ARC annotation metadata to expose the ARC ↵Michael Gottesman2013-03-261-5/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | data flow analysis state in the IR via metadata. Previously the inner works of the data flow analysis in ObjCARCOpts was hard to get out of the optimizer for analysis of bugs or testing. All of the current ARC unit tests are based off of testing the effect of the data flow analysis (i.e. what statements are removed or moved, etc.). This creates weakness in the current unit testing regimem since we are not actually testing what effects various instructions have on the modeled pointer state. Additionally in order to analyze a bug in the optimizer, one would need to track by hand what the optimizer was actually doing either through use of DEBUG statements or through the usage of a debugger, both yielding large loses in developer productivity. This patch deals with these two issues by providing ARC annotation metadata that annotates instructions with the state changes that they cause in various pointers as well as provides metadata to annotate provenance sources. Specifically, we introduce the following metadata types: 1. llvm.arc.annotation.bottomup. 2. llvm.arc.annotation.topdown. 3. llvm.arc.annotation.provenancesource. llvm.arc.annotation.{bottomup,topdown}: These annotations describes a state change in a pointer when we are visiting instructions bottomup/topdown respectively. The output format for both is the same: !1 = metadata !{metadata !"(test,%x)", metadata !"S_Release", metadata !"S_Use"} The first element is a string tuple with the following format: (function,variable name) The second two elements of the metadata show the previous state of the pointer (in this case S_Release) and the new state of the pointer (S_Use). We write the metadata in such a manner to ensure that it is easy for outside tools to parse. This is important since I am currently working on a tool for taking this information and pretty printing it besides the IR and that can be used for LIT style testing via the generation of an index. llvm.arc.annotation.provenancesource: This metadata is used to annotate instructions which act as provenance sources, i.e. ones that introduce a new (from the optimizer's perspective) non-argument pointer to track. This enables cross-referencing in between provenance sources and the state changes that occur to them. This is still a work in progress. Additionally I plan on committing later today additions to the annotations that annotate at the top/bottom of basic blocks the state of the various pointers being tracked. *NOTE* The metadata support is conditionally compiled into libObjCARCOpts only when we are producing a debug build of llvm/clang and even so are disabled by default. To enable the annotation metadata, pass in -enable-objc-arc-annotations to opt. llvm-svn: 177951
* Fix a bug in fast-math fadd/fsub simplification. Shuxin Yang2013-03-251-10/+43
| | | | | | | | | | | The problem is that the code mistakenly took for granted that following constructor is able to create an APFloat from a *SIGNED* integer: APFloat::APFloat(const fltSemantics &ourSemantics, integerPart value) rdar://13486998 llvm-svn: 177906
OpenPOWER on IntegriCloud