summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
...
* LoopVectorizer: Use matcher from PatternMatch.h for the min/max patternsArnold Schwaighofer2013-04-191-104/+102
| | | | | | | | | Also make some static function class functions to avoid having to mention the class namespace for enums all the time. No functionality change intended. llvm-svn: 179886
* Keep coding stanard. Don't use "else if" after "return".Jakub Staszak2013-04-191-3/+4
| | | | llvm-svn: 179826
* Implement a better fix for PR15185.Bill Wendling2013-04-181-6/+11
| | | | | | | If the return type is a pointer and the call returns an integer, then do the inttoptr convertions. And vice versa. llvm-svn: 179817
* Fix a -Wdocumentation warningDmitri Gribenko2013-04-181-1/+1
| | | | llvm-svn: 179789
* In the function InstCombiner::visitExtractElementInst() removed the ↵Anat Shemer2013-04-181-4/+4
| | | | | | limitation that extract is promoted over a cast only if the cast has only one use. llvm-svn: 179786
* Added a function scalarizePHI() that sclarizes a vector phi instruction if ↵Anat Shemer2013-04-182-0/+78
| | | | | | it has only 2 uses: one to promote the vector phi in a loop and the other use is an extract operation of one element at a constant location. llvm-svn: 179783
* Fix a comment, PR15777.Chris Lattner2013-04-181-2/+2
| | | | llvm-svn: 179775
* LoopVectorizer: Recognize min/max reductionsArnold Schwaighofer2013-04-181-34/+209
| | | | | | | | | | | | A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y) sequence in LLVM. If we see such a sequence we can treat it just as any other commutative binary instruction and reduce it. This appears to help bzip2 by about 1.5% on an imac12,2. radar://12960601 llvm-svn: 179773
* LoopVectorize: Use a set to avoid longer cycles in the reduction chain too.Benjamin Kramer2013-04-181-8/+6
| | | | | | Fixes PR15748. llvm-svn: 179757
* Revert "Combine bit test + conditional or into simple math"David Majnemer2013-04-181-61/+0
| | | | | | It is causing stage2 builds to fail, let's get them running again. llvm-svn: 179750
* Combine bit test + conditional or into simple mathDavid Majnemer2013-04-181-0/+61
| | | | | | | | | | | | | | | | Simplify: (select (icmp eq (and X, C1), 0), Y, (or Y, C2)) Into: (or (shl (and X, C1), C3), y) Where: C3 = Log(C2) - Log(C1) If: C1 and C2 are both powers of two llvm-svn: 179748
* [objc-arc] Do not mismatch up retains inside a for loop with releases ↵Michael Gottesman2013-04-181-96/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | outside said for loop in the presense of differing provenance caused by escaping blocks. This occurs due to an alloca representing a separate ownership from the original pointer. Thus consider the following pseudo-IR: objc_retain(%a) for (...) { objc_retain(%a) %block <- %a F(%block) objc_release(%block) } objc_release(%a) From the perspective of the optimizer, the %block is a separate provenance from the original %a. Thus the optimizer pairs up the inner retain for %a and the outer release from %a, resulting in segfaults. This is fixed by noting that the signature of a mismatch of retain/releases inside the for loop is a Use/CanRelease top down with an None bottom up (since bottom up the Retain-CanRelease-Use-Release sequence is completed by the inner objc_retain, but top down due to the differing provenance from the objc_release said sequence is not completed). In said case in CheckForCFGHazards, we now clear the state of %a implying that no pairing will occur. Additionally a test case is included. rdar://12969722 llvm-svn: 179747
* Removed trailing whitespace.Michael Gottesman2013-04-181-3/+3
| | | | llvm-svn: 179746
* [objc-arc] Added annotation option to only emit annotations for a specific ↵Michael Gottesman2013-04-171-0/+24
| | | | | | ssa identifier. llvm-svn: 179729
* Fixed typo.Michael Gottesman2013-04-171-4/+4
| | | | llvm-svn: 179721
* [objc-arc] Added descriptions for EnableARCAnnotations, ↵Michael Gottesman2013-04-172-3/+7
| | | | | | EnableCheckForCFGHazards, EnableARCOptimizations. llvm-svn: 179718
* [objc-arc] Added an option to arc-annotations for turning off CheckForCFGHazard.Michael Gottesman2013-04-171-0/+6
| | | | llvm-svn: 179717
* Do not optimise fprintf() calls if its return value is used.Peter Collingbourne2013-04-171-9/+12
| | | | | | Differential Revision: http://llvm-reviews.chandlerc.com/D620 llvm-svn: 179661
* simplifycfg: Fix integer overflow converting switch into icmp.Hans Wennborg2013-04-161-1/+6
| | | | | | | | | | | If a switch instruction has a case for every possible value of its type, with the same successor, SimplifyCFG would replace it with an icmp ult, but the computation of the bound overflows in that case, which inverts the test. Patch by Jed Davis! llvm-svn: 179587
* We are not able to bitcast a pointer to an integral value.Bill Wendling2013-04-151-5/+5
| | | | | | | | Two return types are not equivalent if one is a pointer and the other is an integral. This is because we cannot bitcast a pointer to an integral value. PR15185 llvm-svn: 179569
* SLPVectorizer: Make it a function pass and add code for hoisting the ↵Nadav Rotem2013-04-154-163/+256
| | | | | | vector-gather sequence out of loops. llvm-svn: 179562
* Fix a typo in comment.Jim Grosbach2013-04-151-1/+1
| | | | llvm-svn: 179542
* Add an option -vectorize-slp-aggressive for running the BB vectorizer. Make ↵Nadav Rotem2013-04-151-1/+12
| | | | | | -fslp-vectorize run the slp-vectorizer. llvm-svn: 179508
* Rename the slp-vectorizer clang/llvm flags. No functionality change.Nadav Rotem2013-04-151-3/+3
| | | | llvm-svn: 179505
* SLPVectorizer: Add support for vectorizing trees that start at compare ↵Nadav Rotem2013-04-151-21/+40
| | | | | | instructions. llvm-svn: 179504
* Reorders two transforms that collide with each otherDavid Majnemer2013-04-141-8/+8
| | | | | | | | | | | | | | | | | | | | | | One performs: (X == 13 | X == 14) -> X-13 <u 2 The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1 The problem is that there are certain values of C1 and C2 that trigger both transforms but the first one blocks out the second, this generates suboptimal code. Reordering the transforms should be better in every case and allows us to do interesting stuff like turn: %shr = lshr i32 %X, 4 %and = and i32 %shr, 15 %add = add i32 %and, -14 %tobool = icmp ne i32 %add, 0 into: %and = and i32 %X, 240 %tobool = icmp ne i32 %and, 224 llvm-svn: 179493
* Miscellaneous cleanups for VecUtils.hBenjamin Kramer2013-04-141-9/+6
| | | | llvm-svn: 179483
* SLP: Document the scalarization cost method.Nadav Rotem2013-04-141-3/+10
| | | | llvm-svn: 179479
* SLPVectorizer: Add support for trees that don't start at binary operators, ↵Nadav Rotem2013-04-143-7/+25
| | | | | | and add the cost of extracting values from the roots of the tree. llvm-svn: 179475
* SLPVectorizer: add initial support for reduction variable vectorization.Nadav Rotem2013-04-143-7/+95
| | | | llvm-svn: 179470
* GlobalDCE: Fix an oversight in my last commit that could lead to crashes.Benjamin Kramer2013-04-131-2/+2
| | | | | | There is a Constant with non-constant operands: blockaddress. llvm-svn: 179460
* Fix a scalability issue with complex ConstantExprs.Benjamin Kramer2013-04-131-4/+9
| | | | | | | | | | | | | | | This is basically the same fix in three different places. We use a set to avoid walking the whole tree of a big ConstantExprs multiple times. For example: (select cmp, (add big_expr 1), (add big_expr 2)) We don't want to visit big_expr twice here, it may consist of thousands of nodes. The testcase exercises this by creating an insanely large ConstantExprs out of a loop. It's questionable if the optimizer should ever create those, but this can be triggered with real C code. Fixes PR15714. llvm-svn: 179458
* InstCombine: Check the operand types before merging fcmp ord & fcmp ord.Benjamin Kramer2013-04-121-0/+3
| | | | | | Fixes PR15737. llvm-svn: 179417
* SLPVectorizer: add support for vectorization of diamond shaped trees. We now ↵Nadav Rotem2013-04-122-46/+254
| | | | | | perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from. llvm-svn: 179414
* Add debug prints.Nadav Rotem2013-04-121-1/+5
| | | | llvm-svn: 179412
* Simplify (A & ~B) in icmp if A is a power of 2David Majnemer2013-04-121-0/+9
| | | | | | | | The transform will execute like so: (A & ~B) == 0 --> (A & B) != 0 (A & ~B) != 0 --> (A & B) == 0 llvm-svn: 179386
* LoopVectorizer: integer division is not a reduction operationArnold Schwaighofer2013-04-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't classify idiv/udiv as a reduction operation. Integer division is lossy. For example : (1 / 2) * 4 != 4/2. Example: int a[] = { 2, 5, 2, 2} int x = 80; for() x /= a[i]; Scalar: x /= 2 // = 40 x /= 5 // = 8 x /= 2 // = 4 x /= 2 // = 2 Vectorized: <80, 1> / <2,5> //= <40,0> <40, 0> / <2,2> //= <20,0> 20*0 = 0 radar://13640654 llvm-svn: 179381
* Optimize icmp involving addition betterDavid Majnemer2013-04-111-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | Allows LLVM to optimize sequences like the following: %add = add nsw i32 %x, 1 %cmp = icmp sgt i32 %add, %y into: %cmp = icmp sge i32 %x, %y as well as: %add1 = add nsw i32 %x, 20 %add2 = add nsw i32 %y, 57 %cmp = icmp sge i32 %add1, %add2 into: %add = add nsw i32 %y, 37 %cmp = icmp sle i32 %cmp, %x llvm-svn: 179316
* Fix for wrong instcombine on vector insert/extractBenjamin Kramer2013-04-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When trying to collapse sequences of insertelement/extractelement instructions into single shuffle instructions, there is one specific case where the Instruction Combiner wrongly updates the resulting Mask of shuffle indexes. The problem is in function CollectShuffleElments. If we have a sequence of insert/extract element instructions like the one below: %tmp1 = extractelement <4 x float> %LHS, i32 0 %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1 %tmp3 = extractelement <4 x float> %RHS, i32 2 %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3 Where: . %RHS will have a mask of [4,5,6,7] . %LHS will have a mask of [0,1,2,3] The Mask of shuffle indexes is wrongly computed to [4,1,6,7] instead of [4,0,6,7]. When analyzing %tmp2 in order to compute the Mask for the resulting shuffle instruction, the algorithm forgets to update the mask index at position 1 with the index associated to the element extracted from %LHS by instruction %tmp1. Patch by Andrea DiBiagio! llvm-svn: 179291
* [ASan] Allow disabling init-order checks for globals by source file name.Alexey Samsonov2013-04-111-1/+2
| | | | llvm-svn: 179280
* Rename the C function to create a SLPVectorizerPass to something sane and ↵Benjamin Kramer2013-04-111-2/+2
| | | | | | expose it in the header file. llvm-svn: 179272
* Make the SLP store-merger less paranoid about function calls. We check for ↵Nadav Rotem2013-04-101-4/+0
| | | | | | function calls when we check if it is safe to sink instructions. llvm-svn: 179207
* We require DataLayout for analyzing the size of stores.Nadav Rotem2013-04-102-1/+6
| | | | llvm-svn: 179206
* Change CloneFunctionInto to always clone Argument attributes induvidually,Joey Gouly2013-04-101-22/+19
| | | | | | | rather than checking if the source and destination have the same number of arguments and copying the attributes over directly. llvm-svn: 179169
* Fix some comment typos.Bob Wilson2013-04-091-2/+2
| | | | llvm-svn: 179132
* Add support for bottom-up SLP vectorization infrastructure.Nadav Rotem2013-04-095-0/+707
| | | | | | | | | | | | | | | | | | | | | | This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations. The infrastructure has three potential users: 1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]). 2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute. 3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization. This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code: void SAXPY(int *x, int *y, int a, int i) { x[i] = a * x[i] + y[i]; x[i+1] = a * x[i+1] + y[i+1]; x[i+2] = a * x[i+2] + y[i+2]; x[i+3] = a * x[i+3] + y[i+3]; } llvm-svn: 179117
* Redo the fix Benjamin Kramer committed in r178793 about iterator ↵Shuxin Yang2013-04-081-12/+14
| | | | | | | | | | | | | | | | | invalidation in Reassociate. I brazenly think this change is slightly simpler than r178793 because: - no "state" in functor - "OpndPtrs[i]" looks simpler than "&Opnds[OpndIndices[i]]" While I can reproduce the probelm in Valgrind, it is rather difficult to come up a standalone testing case. The reason is that when an iterator is invalidated, the stale invalidated elements are not yet clobbered by nonsense data, so the optimizer can still proceed successfully. Thank Benjamin for fixing this bug and generously providing the test case. llvm-svn: 179062
* Fix PR15674 (and PR15603): a SROA think-o.Chandler Carruth2013-04-071-0/+1
| | | | | | | | | | | | | | The fix for PR14972 in r177055 introduced a real think-o in the *store* side, likely because I was much more focused on the load side. While we can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily widen a value to be stored, as that changes the width of memory access! Lock down the code path in the store rewriting which would do this to only handle the intended circumstance. All of the existing tests continue to pass, and I've added a test from the PR. llvm-svn: 178974
* Removed trailing whitespace.Michael Gottesman2013-04-051-27/+27
| | | | llvm-svn: 178932
* An objc_retain can serve as a use for a different pointer.Michael Gottesman2013-04-051-2/+3
| | | | | | | This is the counterpart to commit r160637, except it performs the action in the bottomup portion of the data flow analysis. llvm-svn: 178922
OpenPOWER on IntegriCloud