summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* fix PR9017, a bug where we'd assert when promoting in unreachableChris Lattner2011-01-241-0/+3
| | | | | | code. llvm-svn: 124100
* fix PR9015, a crash linking recursive metadata.Chris Lattner2011-01-241-6/+11
| | | | llvm-svn: 124099
* enhance SRoA to promote allocas that are used by PHI nodes. This oftenChris Lattner2011-01-241-26/+157
| | | | | | | | | | occurs because instcombine sinks loads and inserts phis. This kicks in on such apps as 175.vpr, eon, 403.gcc, xalancbmk and a bunch of times in spec2006 in some app that uses std::deque. This resolves the last of rdar://7339113. llvm-svn: 124090
* Enhance SRoA to promote allocas that are used by selects in someChris Lattner2011-01-231-1/+132
| | | | | | | | | | | | | | | | | | | | common cases. This triggers a surprising number of times in SPEC2K6 because min/max idioms end up doing this. For example, code from the STL ends up looking like this to SRoA: %202 = load i64* %__old_size, align 8, !tbaa !3 %203 = load i64* %__old_size, align 8, !tbaa !3 %204 = load i64* %__n, align 8, !tbaa !3 %205 = icmp ult i64 %203, %204 %storemerge.i = select i1 %205, i64* %__n, i64* %__old_size %206 = load i64* %storemerge.i, align 8, !tbaa !3 We can now promote both the __n and the __old_size allocas. This addresses another chunk of rdar://7339113, poor codegen on stringswitch. llvm-svn: 124088
* Null initialize a few variables flagged byTed Kremenek2011-01-231-1/+1
| | | | | | | | | | clang's -Wuninitialized-experimental warning. While these don't look like real bugs, clang's -Wuninitialized-experimental analysis is stricter than GCC's, and these fixes have the benefit of being general nice cleanups. llvm-svn: 124073
* Enhance SRoA to be more aggressive about scalarization of aggregate allocasChris Lattner2011-01-231-12/+114
| | | | | | | | | | | | that have PHI or select uses of their element pointers. This can often happen when instcombine sinks two loads into a successor, inserting a phi or select. With this patch, we can scalarize the alloca, but the pinned elements are not yet promoted. This is still a win for large aggregates where only one element is used. This fixes rdar://8904039 and part of rdar://7339113 (poor codegen on stringswitch). llvm-svn: 124070
* Convert two std::vectors to SmallVectors for a 3.4% speedup running -scalarreplCameron Zwarich2011-01-231-2/+2
| | | | | | on test-suite + SPEC2000 & SPEC2006. llvm-svn: 124068
* have AllocaInfo store the alloca being inspected, simplifying callers.Chris Lattner2011-01-231-22/+24
| | | | | | No functionality change. llvm-svn: 124067
* Rearrange some code a bit. Change MarkUnsafe to Chris Lattner2011-01-231-27/+29
| | | | | | | | handle the "Transformation preventing inst" printing, so that -scalarrepl -debug will always print the rejected instruction. No functionality change. llvm-svn: 124066
* remove an old hack that avoided creating MMX datatypes. TheChris Lattner2011-01-231-22/+1
| | | | | | X86 backend has been fixed. llvm-svn: 124064
* Actually check memcpy lengths, instead of just commenting aboutDan Gohman2011-01-211-2/+4
| | | | | | how they should be checked. llvm-svn: 123999
* Just because we have determined that an (fcmp | fcmp) is true for A < B,Owen Anderson2011-01-211-1/+3
| | | | | | | A == B, and A > B, does not mean we can fold it to true. We still need to check for A ? B (A unordered B). llvm-svn: 123993
* SCCP doesn't actually preserve the CFG. It will delete and insert terminatorNick Lewycky2011-01-211-4/+0
| | | | | | instructions. llvm-svn: 123973
* fix PR9013, an infinite loop in instcombine.Chris Lattner2011-01-211-2/+10
| | | | llvm-svn: 123968
* update obsolete comment.Chris Lattner2011-01-211-4/+3
| | | | llvm-svn: 123965
* Don't try to pull vector bitcasts that change the number of elements throughNick Lewycky2011-01-211-3/+17
| | | | | | | a select. A vector select is pairwise on each element so we'd need a new condition with the right number of elements to select on. Fixes PR8994. llvm-svn: 123963
* At -O123 the early-cse pass is run before instcombine has run. According to myDuncan Sands2011-01-201-32/+11
| | | | | | | | | | | | | | | | auto-simplier the transform most missed by early-cse is (zext X) != 0 -> X != 0. This patch adds this transform and some related logic to InstructionSimplify and removes some of the logic from instcombine (unfortunately not all because there are several situations in which instcombine can improve things by making new instructions, whereas instsimplify is not allowed to do this). At -O2 this often results in more than 15% more simplifications by early-cse, and results in hundreds of lines of bitcode being eliminated from the testsuite. I did see some small negative effects in the testsuite, for example a few additional instructions in three programs. One program, 483.xalancbmk, got an additional 35 instructions, which seems to be due to a function getting an additional instruction and then being inlined all over the place. llvm-svn: 123911
* Add unnamed_addr when we can show that address of a global is not used.Rafael Espindola2011-01-191-13/+42
| | | | llvm-svn: 123834
* fix rdar://8878965, a regression I introduced with the recentChris Lattner2011-01-181-1/+3
| | | | | | llvm.objectsize changes. llvm-svn: 123771
* Convert a std::map to a DenseMap for another 1.7% speedup on -scalarrepl.Cameron Zwarich2011-01-181-3/+3
| | | | llvm-svn: 123732
* Make a std::vector a SmallVector<*, 32> like the other vectors in the sameCameron Zwarich2011-01-181-1/+1
| | | | | | | function. This seems to be about a 1.5% speedup of -scalarrepl on test-suite with SPEC2000 and SPEC2006. llvm-svn: 123731
* Reduce indentation and remove commented out code.Rafael Espindola2011-01-181-122/+101
| | | | llvm-svn: 123729
* Remove code for updating dominance frontiers and some outdated references toCameron Zwarich2011-01-187-105/+21
| | | | | | dominance and post-dominance frontiers. llvm-svn: 123725
* Remove outdated references to dominance frontiers.Cameron Zwarich2011-01-184-29/+27
| | | | llvm-svn: 123724
* Remove dead code, that I apparently wrote a while back. We seem to be doing ↵Owen Anderson2011-01-171-15/+0
| | | | | | | | | | well enough without whatever this was trying to do. When/if someone has the time to do some empirical evaluations, it might be worth it to figure out what this code was trying to do and see if it's worth resurrecting/fixing. llvm-svn: 123684
* Roll r123609 back in with two changes that fix test failures with expensiveCameron Zwarich2011-01-173-61/+122
| | | | | | | | | | | | | | checks enabled: 1) Use '<' to compare integers in a comparison function rather than '<='. 2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize the priority queue. The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at just under 16% rather than 17%. llvm-svn: 123662
* Roll out r123609 due to failures on the llvm-x86_64-linux-checks bot.Cameron Zwarich2011-01-173-121/+60
| | | | llvm-svn: 123618
* Eliminate the use of dominance frontiers in PromoteMemToReg. In addition toCameron Zwarich2011-01-173-60/+121
| | | | | | | | | | | | | eliminating a potentially quadratic data structure, this also gives a 17% speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial experiment gave a greater speedup around 25%, but I moved the dominator tree level computation from dominator tree construction to PromoteMemToReg. Since this approach to computing IDFs has a much lower overhead than the old code using precomputed DFs, it is worth looking at using this new code for the second scalarrepl pass as well. llvm-svn: 123609
* Teach DAE to look for functions whose arguments are unused, and change all ↵Anders Carlsson2011-01-161-1/+61
| | | | | | callers to pass in an undefvalue instead. llvm-svn: 123596
* tidy up a comment, as suggested by duncanChris Lattner2011-01-161-2/+2
| | | | llvm-svn: 123590
* Don't merge two constants if we care about the address of both.Rafael Espindola2011-01-161-22/+38
| | | | | | | | | | | | | | This fixes the original testcase in PR8927. It also causes a clang binary built with a patched clang to increase in size by 0.21%. We can probably get some of the size back by writing a pass that detects that a global never has its pointer compared and adds unnamed_addr to it (maybe extend global opt). It is also possible that there are some other cases clang could add unnamed_addr to. I will investigate extending globalopt next. llvm-svn: 123584
* fix PR8932, a case where arg promotion could infinitely promote.Chris Lattner2011-01-161-24/+51
| | | | llvm-svn: 123574
* simplify a littleChris Lattner2011-01-161-7/+3
| | | | llvm-svn: 123573
* if an alloca is only ever accessed as a unit, and is accessed with ↵Chris Lattner2011-01-161-3/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | load/store instructions, then don't try to decimate it into its individual pieces. This will just make a mess of the IR and is pointless if none of the elements are individually accessed. This was generating really terrible code for std::bitset (PR8980) because it happens to be lowered by clang as an {[8 x i8]} structure instead of {i64}. The testcase now is optimized to: define i64 @test2(i64 %X) { br label %L2 L2: ; preds = %0 ret i64 %X } before we generated: define i64 @test2(i64 %X) { %sroa.store.elt = lshr i64 %X, 56 %1 = trunc i64 %sroa.store.elt to i8 %sroa.store.elt8 = lshr i64 %X, 48 %2 = trunc i64 %sroa.store.elt8 to i8 %sroa.store.elt9 = lshr i64 %X, 40 %3 = trunc i64 %sroa.store.elt9 to i8 %sroa.store.elt10 = lshr i64 %X, 32 %4 = trunc i64 %sroa.store.elt10 to i8 %sroa.store.elt11 = lshr i64 %X, 24 %5 = trunc i64 %sroa.store.elt11 to i8 %sroa.store.elt12 = lshr i64 %X, 16 %6 = trunc i64 %sroa.store.elt12 to i8 %sroa.store.elt13 = lshr i64 %X, 8 %7 = trunc i64 %sroa.store.elt13 to i8 %8 = trunc i64 %X to i8 br label %L2 L2: ; preds = %0 %9 = zext i8 %1 to i64 %10 = shl i64 %9, 56 %11 = zext i8 %2 to i64 %12 = shl i64 %11, 48 %13 = or i64 %12, %10 %14 = zext i8 %3 to i64 %15 = shl i64 %14, 40 %16 = or i64 %15, %13 %17 = zext i8 %4 to i64 %18 = shl i64 %17, 32 %19 = or i64 %18, %16 %20 = zext i8 %5 to i64 %21 = shl i64 %20, 24 %22 = or i64 %21, %19 %23 = zext i8 %6 to i64 %24 = shl i64 %23, 16 %25 = or i64 %24, %22 %26 = zext i8 %7 to i64 %27 = shl i64 %26, 8 %28 = or i64 %27, %25 %29 = zext i8 %8 to i64 %30 = or i64 %29, %28 ret i64 %30 } In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough PHIs are in play that instcombine backs off. It's better to not generate this stuff in the first place. llvm-svn: 123571
* Use an irbuilder to get some trivial constant folding when doing a storeChris Lattner2011-01-161-21/+17
| | | | | | of a constant. llvm-svn: 123570
* remove a dead check, this was needed before we had an explicit veto on uses ↵Chris Lattner2011-01-161-5/+0
| | | | | | of phis. llvm-svn: 123569
* enhance FoldOpIntoPhi in instcombine to try harder when a phi hasChris Lattner2011-01-162-3/+20
| | | | | | | | multiple uses. In some cases, all the uses are the same operation, so instcombine can go ahead and promote the phi. In the testcase this pushes an add out of the loop. llvm-svn: 123568
* remove the AllowAggressive argument to FoldOpIntoPhi. It is forced to false ↵Chris Lattner2011-01-163-14/+6
| | | | | | | | in the first line of the function because it isn't a good idea, even for compares. llvm-svn: 123566
* more cleanups: use the IR builder.Chris Lattner2011-01-161-38/+39
| | | | llvm-svn: 123565
* tidy up code.Chris Lattner2011-01-161-16/+20
| | | | llvm-svn: 123564
* Improve the safety of my globalopt enhancement by ensuring that the bitcastOwen Anderson2011-01-161-12/+22
| | | | | | of the stored value to the new store type is always. Also, add a testcase. llvm-svn: 123563
* simplify this code, it is still broken but will follow up on llvm-commits.Chris Lattner2011-01-161-15/+5
| | | | llvm-svn: 123558
* remove the partial specialization pass. It is unmaintained and has bugs.Chris Lattner2011-01-163-230/+0
| | | | llvm-svn: 123554
* Add missing whitespace.Nick Lewycky2011-01-151-2/+2
| | | | llvm-svn: 123543
* Make constmerge a two-pass algorithm so that it won't miss mergingNick Lewycky2011-01-151-4/+34
| | | | | | opporuntities. Fixes PR8978. llvm-svn: 123541
* Try to unbreak selfhost.Benjamin Kramer2011-01-151-0/+1
| | | | llvm-svn: 123537
* Add a cache that protects mergefunc's internals from more surprises in DenseSet.Nick Lewycky2011-01-151-5/+27
| | | | | | Also, replace tabs with spaces. Yes, it's 2011. llvm-svn: 123535
* temporarily revert r123526. While working on a follow-on patch IChris Lattner2011-01-151-3/+0
| | | | | | realize that ConstantFoldTerminator doesn't preserve dominfo. llvm-svn: 123527
* fix rdar://8785296 - -fcatch-undefined-behavior generates inefficient codeChris Lattner2011-01-151-0/+3
| | | | | | | | | The basic issue is that isel (very reasonably!) expects conditional branches to be folded, so CGP leaving around a bunch dead computation feeding conditional branches isn't such a good idea. Just fold branches on constants into unconditional branches. llvm-svn: 123526
* simplify code, no functionality change.Chris Lattner2011-01-151-30/+37
| | | | llvm-svn: 123525
OpenPOWER on IntegriCloud