summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms/Scalar
Commit message (Collapse)AuthorAgeFilesLines
* Fix (hopefully the last) issue where LSR is nondeterminstic. When pullingChris Lattner2005-10-111-8/+14
| | | | | | | out CSE's of base expressions it could build a result whose order was nondet. llvm-svn: 23698
* Fix another problem where LSR was being nondeterminstic. Also remove elementsChris Lattner2005-10-111-10/+16
| | | | | | from the end of a vector instead of the beginning llvm-svn: 23697
* Fix another lsr-is-nondeterministic caseChris Lattner2005-10-111-6/+10
| | | | llvm-svn: 23695
* Make MaskedValueIsZero a bit more aggressiveChris Lattner2005-10-091-3/+9
| | | | llvm-svn: 23677
* Fix funky xcode indentationChris Lattner2005-10-091-50/+50
| | | | llvm-svn: 23674
* Hrm, you didn't see this.Chris Lattner2005-10-091-3/+0
| | | | llvm-svn: 23673
* Fix a source of non-determinism in the backend: the order of processingChris Lattner2005-10-091-6/+25
| | | | | | | IV strides dependend on the pointer order of the strides in memory. Non-determinism is bad. llvm-svn: 23672
* Remove useless variable.Jeff Cohen2005-10-071-1/+1
| | | | llvm-svn: 23656
* Make IVUseShouldUsePostIncValue more aggressive when the use is a PHI. InChris Lattner2005-10-031-6/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | particular, it should realize that phi's use their values in the pred block not the phi block itself. This change turns our em3d loop from this: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r2, 0 b LBB_test_6 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit or r2, r6, r6 lwz r6, 0(r3) cmpw cr0, r6, r5 beq cr0, LBB_test_6 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r2, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; endif.loopexit.loopexit_crit_edge addi r3, r2, 1 blr LBB_test_6: ; loopexit or r3, r2, r2 blr into: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r2, 0 b LBB_test_5 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 or r2, r6, r6 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 or r2, r6, r6 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; loopexit or r3, r2, r2 blr Unfortunately, this is actually worse code, because the register coallescer is getting confused somehow. If it were doing its job right, it could turn the code into this: _test: cmpwi cr0, r4, 0 bgt cr0, LBB_test_2 ; entry.no_exit_crit_edge LBB_test_1: ; entry.loopexit_crit_edge li r6, 0 b LBB_test_5 ; loopexit LBB_test_2: ; entry.no_exit_crit_edge li r6, 0 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit LBB_test_5: ; loopexit or r3, r6, r6 blr ... which I'll work on next. :) llvm-svn: 23604
* Refactor some code into a functionChris Lattner2005-10-031-7/+23
| | | | llvm-svn: 23603
* This break is bogus and I have no idea why it was there. Basically it preventsChris Lattner2005-10-031-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | memoizing code when IV's are used by phinodes outside of loops. In a simple example, we were getting this code before (note that r6 and r7 are isomorphic IV's): li r6, 0 or r7, r6, r6 LBB_test_3: ; no_exit lwz r2, 0(r3) cmpw cr0, r2, r5 or r2, r7, r7 beq cr0, LBB_test_5 ; loopexit LBB_test_4: ; endif addi r2, r7, 1 addi r7, r7, 1 addi r3, r3, 4 addi r6, r6, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit Now we get: li r6, 0 LBB_test_3: ; no_exit or r2, r6, r6 lwz r6, 0(r3) cmpw cr0, r6, r5 beq cr0, LBB_test_6 ; loopexit LBB_test_4: ; endif addi r3, r3, 4 addi r6, r2, 1 cmpw cr0, r6, r4 blt cr0, LBB_test_3 ; no_exit this was noticed in em3d. llvm-svn: 23602
* when checking if we should move a split edge block outside of a loop,Chris Lattner2005-10-031-7/+6
| | | | | | | | check the presplit pred, not the post-split pred. This was causing us to make the wrong decision in some cases, leaving the critical edge block in the loop. llvm-svn: 23601
* Fix VC++ warnings.Jeff Cohen2005-10-011-1/+0
| | | | llvm-svn: 23579
* Insert stores after phi nodes in the normal dest. This fixesChris Lattner2005-09-291-2/+5
| | | | | | LowerInvoke/2005-08-03-InvokeWithPHI.ll llvm-svn: 23525
* add a note about a way to improve this code further, that I won't be gettingChris Lattner2005-09-271-0/+8
| | | | | | to right now. llvm-svn: 23485
* Avoid spilling stack slots... to stack slots.Chris Lattner2005-09-271-0/+6
| | | | llvm-svn: 23478
* Completely rewrite 'correct' eh support. This changes how setjmp insertionChris Lattner2005-09-271-140/+301
| | | | | | | | | | | | | | | | | is performed so it is only at most once per function that contains an invoke instead of once per invoke in the function. This patch has the following perks: 1. It fixes PR631, which complains about slowness. 2. If fixes PR240, which complains about non-volatile vars being live across setjmp/longjmps. 3. It improves (but does not fix) the jmpbuf alignment issue on itanium by not forcing the jmpbufs to always be 8-bytes off the alignment of the structure. 4. It speeds up 253.perlbmk from 338s to 13.70s (a 25x improvement!), making us now about 4% faster than GCC. Further improvements are also possible. llvm-svn: 23477
* Make the pass name simplerChris Lattner2005-09-271-1/+1
| | | | llvm-svn: 23476
* Eliminate GetGEPGlobalInitializer in favor of the more powerfulChris Lattner2005-09-261-27/+1
| | | | | | ConstantFoldLoadThroughGEPConstantExpr function in the utils lib. llvm-svn: 23446
* Factor the GetGEPGlobalInitializer out of this pass and into Transforms/UtilsChris Lattner2005-09-261-44/+2
| | | | | | as ConstantFoldLoadThroughGEPConstantExpr. llvm-svn: 23445
* Move MaskedValueIsZero up.Chris Lattner2005-09-241-77/+146
| | | | | | Match a bunch of idioms for sign extensions, implementing InstCombine/signext.ll llvm-svn: 23428
* Refactor this code a bit and make it more general. This now compiles:Chris Lattner2005-09-181-24/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus2 (unsigned int x) { b.j += x; } To: _plus2: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) slwi r3, r3, 6 add r3, r4, r3 rlwimi r3, r4, 0, 26, 14 stw r3, 0(r2) blr instead of: _plus2: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) rlwinm r5, r4, 26, 21, 31 add r3, r5, r3 rlwimi r4, r3, 6, 15, 25 stw r4, 0(r2) blr by eliminating an 'and'. I'm pretty sure this is as small as we can go :) llvm-svn: 23386
* CompileChris Lattner2005-09-181-31/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus2 (unsigned int x) { b.j += x; } to: plus2: mov %EAX, DWORD PTR [b] mov %ECX, %EAX and %ECX, 131008 mov %EDX, DWORD PTR [%ESP + 4] shl %EDX, 6 add %EDX, %ECX and %EDX, 131008 and %EAX, -131009 or %EDX, %EAX mov DWORD PTR [b], %EDX ret instead of: plus2: mov %EAX, DWORD PTR [b] mov %ECX, %EAX shr %ECX, 6 and %ECX, 2047 add %ECX, DWORD PTR [%ESP + 4] shl %ECX, 6 and %ECX, 131008 and %EAX, -131009 or %ECX, %EAX mov DWORD PTR [b], %ECX ret llvm-svn: 23385
* Generalize this transform, using MaskedValueIsZero, allowing us to compile:Chris Lattner2005-09-181-14/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus3 (unsigned int x) { b.k += x; } To: plus3: mov %EAX, DWORD PTR [%ESP + 4] shl %EAX, 17 add DWORD PTR [b], %EAX ret instead of: plus3: mov %EAX, DWORD PTR [%ESP + 4] shl %EAX, 17 mov %ECX, DWORD PTR [b] add %EAX, %ECX and %EAX, -131072 and %ECX, 131071 or %ECX, %EAX mov DWORD PTR [b], %ECX ret llvm-svn: 23384
* fix typeoChris Lattner2005-09-181-1/+1
| | | | llvm-svn: 23383
* Remove unintentionally committed codeChris Lattner2005-09-181-3/+0
| | | | llvm-svn: 23382
* implement shift.ll:test25. This compiles:Chris Lattner2005-09-181-3/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus3 (unsigned int x) { b.k += x; } to: _plus3: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r3, 0(r2) rlwinm r4, r3, 0, 0, 14 add r4, r4, r3 rlwimi r4, r3, 0, 15, 31 stw r4, 0(r2) blr instead of: _plus3: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) srwi r5, r4, 17 add r3, r5, r3 slwi r3, r3, 17 rlwimi r3, r4, 0, 15, 31 stw r3, 0(r2) blr llvm-svn: 23381
* Implement add.ll:test29. Codegening:Chris Lattner2005-09-181-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct S { unsigned int i : 6, j : 11, k : 15; } b; void plus1 (unsigned int x) { b.i += x; } as: _plus1: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) add r3, r4, r3 rlwimi r3, r4, 0, 0, 25 stw r3, 0(r2) blr instead of: _plus1: lis r2, ha16(L_b$non_lazy_ptr) lwz r2, lo16(L_b$non_lazy_ptr)(r2) lwz r4, 0(r2) rlwinm r5, r4, 0, 26, 31 add r3, r5, r3 rlwimi r3, r4, 0, 0, 25 stw r3, 0(r2) blr llvm-svn: 23379
* remove debug outputChris Lattner2005-09-181-1/+0
| | | | llvm-svn: 23377
* Implement or.ll:test21. This teaches instcombine to be able to turn this:Chris Lattner2005-09-181-3/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct { unsigned int bit0:1; unsigned int ubyte:31; } sdata; void foo() { sdata.ubyte++; } into this: foo: add DWORD PTR [sdata], 2 ret instead of this: foo: mov %EAX, DWORD PTR [sdata] mov %ECX, %EAX add %ECX, 2 and %ECX, -2 and %EAX, 1 or %EAX, %ECX mov DWORD PTR [sdata], %EAX ret llvm-svn: 23376
* Fix the regression last night compiling povrayChris Lattner2005-09-141-2/+3
| | | | llvm-svn: 23348
* Add a simple xform to simplify array accesses with casts in the way.Chris Lattner2005-09-131-2/+62
| | | | | | | This is useful for 178.galgel where resolution of dope vectors (by the optimizer) causes the scales to become apparent. llvm-svn: 23328
* Fix an issue where LSR would miss rewriting a use of an IV expression by a ↵Chris Lattner2005-09-131-4/+8
| | | | | | | | | PHI node that is not the original PHI. This fixes up a dot-product loop in galgel, speeding it up from 18.47s to 16.13s. llvm-svn: 23327
* Add a helper function, allowing us to simplify some code a bit, changingChris Lattner2005-09-131-39/+47
| | | | | | indentation, no functionality change llvm-svn: 23325
* Implement a simple xform to turn code like this:Chris Lattner2005-09-121-0/+66
| | | | | | | | | if () { store A -> P; } else { store B -> P; } into a PHI node with one store, in the most trival case. This implements load.ll:test10. llvm-svn: 23324
* Another load-peephole optimization: do gcse when two loads are next toChris Lattner2005-09-121-2/+5
| | | | | | each other. This implements InstCombine/load.ll:test9 llvm-svn: 23322
* Implement a trivial form of store->load forwarding where the store and theChris Lattner2005-09-121-0/+9
| | | | | | | | load are exactly consequtive. This is picked up by other passes, but this triggers thousands of times in fortran programs that use static locals (and is thus a compile-time speedup). llvm-svn: 23320
* Fix a regression from last night, which caused this pass to create invalidChris Lattner2005-09-121-8/+6
| | | | | | | | | | | | code for IV uses outside of loops that are not dominated by the latch block. We should only convert these uses to use the post-inc value if they ARE dominated by the latch block. Also use a new LoopInfo method to simplify some code. This fixes Transforms/LoopStrengthReduce/2005-09-12-UsesOutOutsideOfLoop.ll llvm-svn: 23318
* _test:Chris Lattner2005-09-121-5/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | li r2, 0 LBB_test_1: ; no_exit.2 li r5, 0 stw r5, 0(r3) addi r2, r2, 1 addi r3, r3, 4 cmpwi cr0, r2, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r2, 1 stw r2, 0(r4) blr [zion ~/llvm]$ cat > ~/xx Uses of IV's outside of the loop should use hte post-incremented version of the IV, not the preincremented version. This helps many loops (e.g. in sixtrack) which used to generate code like this (this is the code from the dont-hoist-simple-loop-constants.ll testcase): _test: li r2, 0 **** IV starts at 0 LBB_test_1: ; no_exit.2 or r5, r2, r2 **** Copy for loop exit li r2, 0 stw r2, 0(r3) addi r3, r3, 4 addi r2, r5, 1 addi r6, r5, 2 **** IV+2 cmpwi cr0, r6, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r5, 2 **** IV+2 stw r2, 0(r4) blr And now generated code like this: _test: li r2, 1 *** IV starts at 1 LBB_test_1: ; no_exit.2 li r5, 0 stw r5, 0(r3) addi r2, r2, 1 addi r3, r3, 4 cmpwi cr0, r2, 701 *** IV.postinc + 0 blt cr0, LBB_test_1 LBB_test_2: ; loopexit.2.loopexit stw r2, 0(r4) *** IV.postinc + 0 blr llvm-svn: 23313
* implement Transforms/LoopStrengthReduce/dont-hoist-simple-loop-constants.ll.Chris Lattner2005-09-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to emit this code for it: _test: li r2, 1 ;; Value tying up a register for the whole loop li r5, 0 LBB_test_1: ; no_exit.2 or r6, r5, r5 li r5, 0 stw r5, 0(r3) addi r5, r6, 1 addi r3, r3, 4 add r7, r2, r5 ;; should be addi r7, r5, 1 cmpwi cr0, r7, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r6, 2 stw r2, 0(r4) blr now we emit this: _test: li r2, 0 LBB_test_1: ; no_exit.2 or r5, r2, r2 li r2, 0 stw r2, 0(r3) addi r3, r3, 4 addi r2, r5, 1 addi r6, r5, 2 ;; whoa, fold those adds! cmpwi cr0, r6, 701 blt cr0, LBB_test_1 ; no_exit.2 LBB_test_2: ; loopexit.2.loopexit addi r2, r5, 2 stw r2, 0(r4) blr more improvement coming. llvm-svn: 23306
* Fix a problem that Dan Berlin noticed, where reassociation would not succeedChris Lattner2005-09-021-0/+6
| | | | | | | | | | | | | | | | | | in building maximal expressions before simplifying them. In particular, i cases like this: X-(A+B+X) the code would consider A+B+X to be a maximal expression (not understanding that the single use '-' would be turned into a + later), simplify it (a noop) then later get simplified again. Each of these simplify steps is where the cost of reassociation comes from, so this patch should speed up the already fast pass a bit. Thanks to Dan for noticing this! llvm-svn: 23214
* Avoid creating garbage instructions, just move the old add instructionChris Lattner2005-09-021-9/+11
| | | | | | to where we need it when converting -(A+B+C) -> -A + -B + -C. llvm-svn: 23213
* add some assertions and fix problems where reassociate could access theChris Lattner2005-09-021-2/+11
| | | | | | Ops vector out of range llvm-svn: 23211
* Fix Regression/Transforms/Reassociate/2005-08-24-Crash.llChris Lattner2005-08-241-1/+7
| | | | llvm-svn: 23019
* Fix Transforms/LoopStrengthReduce/2005-08-17-OutOfLoopVariant.ll, a crashChris Lattner2005-08-171-1/+4
| | | | | | on 177.mesa llvm-svn: 22843
* Use a new helper to split critical edges, making the code simpler.Chris Lattner2005-08-171-18/+21
| | | | | | | | Do not claim to not change the CFG. We do change the cfg to split critical edges. This isn't causing us a problem now, but could likely do so in the future. llvm-svn: 22824
* Fix a bad case in gzip where we put lots of things in registers across theChris Lattner2005-08-161-9/+17
| | | | | | | loop, because a IV-dependent value was used outside of the loop and didn't have immediate-folding capability llvm-svn: 22798
* Ooops, don't forget to clear this. The real inner loop is now:Chris Lattner2005-08-131-0/+1
| | | | | | | | | | | | | | | | | | .LBB_foo_3: ; no_exit.1 lfd f2, 0(r9) lfd f3, 8(r9) fmul f4, f1, f2 fmadd f4, f0, f3, f4 stfd f4, 8(r9) fmul f3, f1, f3 fmsub f2, f0, f2, f3 stfd f2, 0(r9) addi r9, r9, 16 addi r8, r8, 1 cmpw cr0, r8, r4 ble .LBB_foo_3 ; no_exit.1 llvm-svn: 22782
* Recursively scan scev expressions for common subexpressions. This allows usChris Lattner2005-08-131-28/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to handle nested loops much better, for example, by being able to tell that these two expressions: {( 8 + ( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp 12)}<loopentry.1> {(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1> Have the following common part that can be shared: {(( 16 * ( 1 + %Tmp11 + %Tmp12)) + %c_),+,( 16 * %Tmp12)}<loopentry.1> This allows us to codegen an important inner loop in 168.wupwise as: .LBB_foo_4: ; no_exit.1 lfd f2, 16(r9) fmul f3, f0, f2 fmul f2, f1, f2 fadd f4, f3, f2 stfd f4, 8(r9) fsub f2, f3, f2 stfd f2, 16(r9) addi r8, r8, 1 addi r9, r9, 16 cmpw cr0, r8, r4 ble .LBB_foo_4 ; no_exit.1 instead of: .LBB_foo_3: ; no_exit.1 lfdx f2, r6, r9 add r10, r6, r9 lfd f3, 8(r10) fmul f4, f1, f2 fmadd f4, f0, f3, f4 stfd f4, 8(r10) fmul f3, f1, f3 fmsub f2, f0, f2, f3 stfdx f2, r6, r9 addi r9, r9, 16 addi r8, r8, 1 cmpw cr0, r8, r4 ble .LBB_foo_3 ; no_exit.1 llvm-svn: 22781
* remove dead code. The exit block list is computed on demand, thus does notChris Lattner2005-08-131-15/+0
| | | | | | need to be updated. This code is a relic from when it did. llvm-svn: 22775
OpenPOWER on IntegriCloud