summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
...
* earlycse can do trivial with-a-block dead store Chris Lattner2011-01-032-6/+48
| | | | | | | elimination as well. This deletes 60 stores in 176.gcc that largely come from bitfield code. llvm-svn: 122736
* Use a RecyclingAllocator to allocate values for MachineCSE's ScopedHashTable forCameron Zwarich2011-01-031-3/+7
| | | | | | a 28% speedup of MachineCSE time on 403.gcc. llvm-svn: 122735
* Permit CallGraphSCCPasses readonly access to the direct callers of the functionsNick Lewycky2011-01-031-2/+2
| | | | | | in their SCC as they already have with the direct callees. llvm-svn: 122734
* switch the load table to use a recycling bump pointer allocator,Chris Lattner2011-01-031-1/+4
| | | | | | speeding earlycse up by 6%. llvm-svn: 122733
* now that loads are in their own table, we can implementChris Lattner2011-01-032-1/+21
| | | | | | | store->load forwarding. This allows EarlyCSE to zap 600 more loads from 176.gcc. llvm-svn: 122732
* split loads and calls into separate tables. Loads are now just indexedChris Lattner2011-01-031-42/+74
| | | | | | by their pointer instead of using MemoryValue to wrap it. llvm-svn: 122731
* add a testcase for readonly call CSEChris Lattner2011-01-031-0/+12
| | | | llvm-svn: 122730
* various cleanups, no functionality change.Chris Lattner2011-01-031-24/+19
| | | | llvm-svn: 122729
* Add spliceFunction to the CallGraph interface. This allows users to efficientlyNick Lewycky2011-01-032-2/+25
| | | | | | | | | | update a callGraph when performing the common operation of splicing the body to a new function and updating all callers (such as via RAUW). No users yet, though this is intended for DeadArgumentElimination as part of PR8887. llvm-svn: 122728
* Teach EarlyCSE to do trivial CSE of loads and read-only calls.Chris Lattner2011-01-032-22/+197
| | | | | | | On 176.gcc, this catches 13090 loads and calls, and increases the number of simple instructions CSE'd from 29658 to 36208. llvm-svn: 122727
* add a handy typedef.Chris Lattner2011-01-031-0/+5
| | | | llvm-svn: 122726
* rename InstValue to SimpleValue, add some comments.Chris Lattner2011-01-031-26/+41
| | | | llvm-svn: 122725
* CMake: Add missing source file.Michael J. Spencer2011-01-031-0/+1
| | | | llvm-svn: 122724
* Allocate nodes for the scoped hash table from a recyling bump pointerChris Lattner2011-01-031-5/+9
| | | | | | allocator. This speeds up early cse by about 20% llvm-svn: 122723
* really get this working with a custom allocator.Chris Lattner2011-01-031-22/+22
| | | | llvm-svn: 122722
* Enhance ScopedHashTable to allow it to take an allocator argument.Chris Lattner2011-01-033-18/+43
| | | | llvm-svn: 122721
* reduce redundancy in the hashing code and other misc cleanups.Chris Lattner2011-01-032-20/+24
| | | | llvm-svn: 122720
* Add a new loop-instsimplify pass, with the intention of replacing the instanceCameron Zwarich2011-01-035-0/+121
| | | | | | | of instcombine that is currently in the middle of the loop pass pipeline. This commit only checks in the pass; it will hopefully be enabled by default later. llvm-svn: 122719
* fix some pastosChris Lattner2011-01-021-4/+4
| | | | llvm-svn: 122718
* add DEBUG and -stats output to earlycse.Chris Lattner2011-01-023-8/+52
| | | | | | Teach it to CSE the rest of the non-side-effecting instructions. llvm-svn: 122716
* Enhance earlycse to do CSE of casts, instsimplify and die.Chris Lattner2011-01-023-4/+165
| | | | | | Add a testcase. llvm-svn: 122715
* split dom frontier handling stuff out to its own DominanceFrontier header,Chris Lattner2011-01-0226-245/+259
| | | | | | so that Dominators.h is *just* domtree. Also prune #includes a bit. llvm-svn: 122714
* sketch out a new early cse pass. No functionality yet.Chris Lattner2011-01-025-0/+72
| | | | llvm-svn: 122713
* fix a miscompilation of tramp3d-v4: when forming a memcpy, we have to makeChris Lattner2011-01-022-12/+56
| | | | | | | | sure that the loop we're promoting into a memcpy doesn't mutate the input of the memcpy. Before we were just checking that the dest of the memcpy wasn't mod/ref'd by the loop. llvm-svn: 122712
* If a loop iterates exactly once (has backedge count = 0) then don'tChris Lattner2011-01-022-0/+24
| | | | | | | mess with it. We'd rather peel/unroll it than convert all of its stores into memsets. llvm-svn: 122711
* Try to reuse the value when lowering memset.Benjamin Kramer2011-01-023-47/+30
| | | | | | | | | | This allows us to compile: void test(char *s, int a) { __builtin_memset(s, a, 15); } into 1 mul + 3 stores instead of 3 muls + 3 stores. llvm-svn: 122710
* Lower the i8 extension in memset to a multiply instead of a potentially long ↵Benjamin Kramer2011-01-022-15/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | series of shifts and ors. We could implement a DAGCombine to turn x * 0x0101 back into logic operations on targets that doesn't support the multiply or it is slow (p4) if someone cares enough. Example code: void test(char *s, int a) { __builtin_memset(s, a, 4); } before: _test: ## @test movzbl 8(%esp), %eax movl %eax, %ecx shll $8, %ecx orl %eax, %ecx movl %ecx, %eax shll $16, %eax orl %ecx, %eax movl 4(%esp), %ecx movl %eax, 4(%ecx) movl %eax, (%ecx) ret after: _test: ## @test movzbl 8(%esp), %eax imull $16843009, %eax, %eax ## imm = 0x1010101 movl 4(%esp), %ecx movl %eax, 4(%ecx) movl %eax, (%ecx) ret llvm-svn: 122707
* A workaround for a bug in cmake 2.8.3 diagnosed on PR 8885.Oscar Fuentes2011-01-021-0/+5
| | | | llvm-svn: 122706
* Also remove functions that use complex constant expressions in terms ofNick Lewycky2011-01-021-5/+18
| | | | | | another function. llvm-svn: 122705
* enhance loop idiom recognition to scan *all* unconditionally executedChris Lattner2011-01-022-8/+62
| | | | | | | blocks in a loop, instead of just the header block. This makes it more aggressive, able to handle Duncan's Ada examples. llvm-svn: 122704
* make inSubLoop much more efficient.Chris Lattner2011-01-021-4/+1
| | | | llvm-svn: 122703
* rip out isExitBlockDominatedByBlockInLoop, calling DomTree::dominates instead.Chris Lattner2011-01-021-37/+4
| | | | | | | | isExitBlockDominatedByBlockInLoop is a relic of the days when domtree was *just* a tree and didn't have DFS numbers. Checking DFS numbers is faster and easier than "limiting the search of the tree". llvm-svn: 122702
* add a list of opportunities for future improvement.Chris Lattner2011-01-021-1/+22
| | | | llvm-svn: 122701
* update a bunch of entries.Chris Lattner2011-01-022-137/+56
| | | | llvm-svn: 122700
* Fix PR8702 by not having LoopSimplify claim to preserve LCSSA form. As ↵Duncan Sands2011-01-022-15/+55
| | | | | | | | | | described in the PR, the pass could break LCSSA form when inserting preheaders. It probably would be easy enough to fix this, but since currently we always go into LCSSA form after running this pass, doing so is not urgent. llvm-svn: 122695
* Remove an unused member function.Cameron Zwarich2011-01-021-3/+0
| | | | llvm-svn: 122693
* Propagate to parent scope changes made to CMAKE_CXX_FLAGS.Oscar Fuentes2011-01-021-0/+1
| | | | llvm-svn: 122692
* Fix a typo in a variable name.Cameron Zwarich2011-01-021-3/+3
| | | | llvm-svn: 122691
* Move a load into the only branch where it is used and eliminate a temporary.Cameron Zwarich2011-01-021-3/+1
| | | | llvm-svn: 122690
* Add the explanatory comment from r122680's commit message to the code itself.Cameron Zwarich2011-01-021-0/+10
| | | | llvm-svn: 122689
* Tidy up indentation.Cameron Zwarich2011-01-021-5/+5
| | | | llvm-svn: 122688
* Fix a typo, which should also fix the failure on llvm-x86_64-linux-checks.Cameron Zwarich2011-01-021-1/+1
| | | | llvm-svn: 122687
* Allow loop-idiom to run on multiple BB loops, but still only scan the loop Chris Lattner2011-01-023-13/+29
| | | | | | | | | | | | | | | | | | header for now for memset/memcpy opportunities. It turns out that loop-rotate is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for loops" into 2 basic block loops that loop-idiom was ignoring. With this fix, we form many *many* more memcpy and memsets than before, including on the "history" loops in the viterbi benchmark, which look like this: for (j=0; j<MAX_history; ++j) { history_new[i][j+1] = history[2*i][j]; } Transforming these loops into memcpy's speeds up the viterbi benchmark from 11.98s to 3.55s on my machine. Woo. llvm-svn: 122685
* Remove the #ifdef'd code for balancing the eval-link data structure. It doesn'tCameron Zwarich2011-01-021-65/+3
| | | | | | | | | | compile, and everyone's tests have shown it to be slower in practice, even for quite large graphs. I also hope to do an optimization that is only correct with the simpler data structure, which would break this even further. llvm-svn: 122684
* remove debugging code.Chris Lattner2011-01-021-4/+0
| | | | llvm-svn: 122683
* add some -stats output.Chris Lattner2011-01-021-1/+10
| | | | llvm-svn: 122682
* improve loop rotation to use CodeMetrics to analyze theChris Lattner2011-01-022-17/+8
| | | | | | | size of a loop header instead of its own code size estimator. This allows it to handle bitcasts etc more precisely. llvm-svn: 122681
* Speed up dominator computation some more by optimizing bucket processing. WhenCameron Zwarich2011-01-022-14/+24
| | | | | | | | | | | | | | | | | | | | | | | | | naively implemented, the Lengauer-Tarjan algorithm requires a separate bucket for each vertex. However, this is unnecessary, because each vertex is only placed into a single bucket (that of its semidominator), and each vertex's bucket is processed before it is added to any bucket itself. Instead of using a bucket per vertex, we use a single array Buckets that has two purposes. Before the vertex V with DFS number i is processed, Buckets[i] stores the index of the first element in V's bucket. After V's bucket is processed, Buckets[i] stores the index of the next element in the bucket to which V now belongs, if any. Reading from the buckets can also be optimized. Instead of processing the bucket of V's parent at the end of processing V, we process the bucket of V itself at the beginning of processing V. This means that the case of the root vertex can be simplified somewhat. It also means that we don't need to look up the DFS number of the semidominator of every node in the bucket we are processing, since we know it is the current index being processed. This is a 6.5% speedup running -domtree on test-suite + SPEC2000/2006, with larger speedups of around 12% on the larger benchmarks like GCC. llvm-svn: 122680
* teach loop idiom recognition to form memcpy's from simple loops.Chris Lattner2011-01-022-22/+130
| | | | llvm-svn: 122678
* Remove functions from the FnSet when one of their callee's is being merged. ThisNick Lewycky2011-01-021-82/+66
| | | | | | | | | | | maintains the guarantee that the DenseSet expects two elements it contains to not go from inequal to equal under its nose. As a side-effect, this also lets us switch from iterating to a fixed-point to actually maintaining a work queue of functions to look at again, and we don't add thunks to our work queue so we don't need to detect and ignore them. llvm-svn: 122677
OpenPOWER on IntegriCloud