summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Transforms
Commit message (Collapse)AuthorAgeFilesLines
* reduce indentationChris Lattner2011-01-151-29/+29
| | | | llvm-svn: 123514
* Generalize LoadAndStorePromoter a bit and switch LICMChris Lattner2011-01-153-190/+111
| | | | | | to use it. llvm-svn: 123501
* Fix a false-positive warning.Owen Anderson2011-01-141-1/+3
| | | | llvm-svn: 123480
* Enhance GlobalOpt to be able evaluate initializers that involve stores throughOwen Anderson2011-01-141-2/+49
| | | | | | bitcasts, at least in simple cases. This fixes clang's CodeGenCXX/virtual-base-dtor.cpp llvm-svn: 123477
* switch SRoA to use LoadAndStorePromoter instead of its own copy of the code.Chris Lattner2011-01-141-136/+26
| | | | llvm-svn: 123457
* Add a new LoadAndStorePromoter class, which implements the generalChris Lattner2011-01-141-0/+154
| | | | | | | "promote a bunch of load and stores" logic, allowing the code to be shared and reused. llvm-svn: 123456
* split SROA into two passes: one that uses DomFrontiers (-scalarrepl) Chris Lattner2011-01-142-27/+57
| | | | | | and one that uses SSAUpdater (-scalarrepl-ssa) llvm-svn: 123436
* Implement full support for promoting allocas to registers using SSAUpdaterChris Lattner2011-01-141-5/+162
| | | | | | | | | | | instead of DomTree/DomFrontier. This may be interesting for reducing compile time. This is currently disabled, but seems to work just fine. When this is enabled, we eliminate two runs of dominator frontier, one in the "early per-function" optimizations and one in the "interlaced with inliner" function passes. llvm-svn: 123434
* indentationChris Lattner2011-01-141-1/+1
| | | | llvm-svn: 123426
* Move some shift transforms out of instcombine and into InstructionSimplify.Duncan Sands2011-01-141-26/+10
| | | | | | | | | | | | While there, I noticed that the transform "undef >>a X -> undef" was wrong. For example if X is 2 then the top two bits must be equal, so the result can not be anything. I fixed this in the constant folder as well. Also, I made the transform for "X << undef" stronger: it now folds to undef always, even though X might be zero. This is in accordance with the LangRef, but I must admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef" following the LangRef and the constant folder, likewise fairly aggressive. llvm-svn: 123417
* Fix whitespace.Bob Wilson2011-01-131-120/+120
| | | | llvm-svn: 123396
* Check for empty structs, and for consistency, zero-element arrays.Bob Wilson2011-01-131-2/+2
| | | | llvm-svn: 123383
* Extend SROA to handle arrays accessed as homogeneous structs and vice versa.Bob Wilson2011-01-131-14/+57
| | | | | | | | | | | | | | | | | This is a minor extension of SROA to handle a special case that is important for some ARM NEON operations. Some of the NEON intrinsics return multiple values, which are handled as struct types containing multiple elements of the same vector type. The corresponding return types declared in the arm_neon.h header have equivalent arrays. We need SROA to recognize that it can split up those arrays and structs into separate vectors, even though they are not always accessed with the same type. SROA already handles loads and stores of an entire alloca by using insertvalue/extractvalue to access the individual pieces, and that code works the same regardless of whether the type is a struct or an array. So, all that needs to be done is to check for compatible arrays and homogeneous structs. llvm-svn: 123381
* Make SROA more aggressive with allocas containing padding.Bob Wilson2011-01-131-34/+27
| | | | | | | SROA only split up structs and arrays one level at a time, so padding can only cause trouble if it is located in between the struct or array elements. llvm-svn: 123380
* Use SmallVector instead of SmallPtrSet and avoid non-deterministic behavior.Devang Patel2011-01-121-3/+3
| | | | llvm-svn: 123318
* revert 123144, reenabling the rest of memset formation.Chris Lattner2011-01-121-3/+0
| | | | llvm-svn: 123302
* revert r123146 which disabled code that wasn't the root causeChris Lattner2011-01-121-2/+0
| | | | | | of the bootstrap miscompare issue. llvm-svn: 123299
* revert r123149, reenabling an improvement to memcpyopt that wasn'tChris Lattner2011-01-121-4/+2
| | | | | | the source of the bootstrap problem. llvm-svn: 123298
* Remove the PR8954 workaround.Jakob Stoklund Olesen2011-01-111-4/+0
| | | | llvm-svn: 123288
* Fix a non-deterministic loop in llvm::MergeBlockIntoPredecessor.Jakob Stoklund Olesen2011-01-111-2/+2
| | | | | | | | | DT->changeImmediateDominator() trivially ignores identity updates, so there is really no need for the uniqueing provided by SmallPtrSet. I expect this to fix PR8954. llvm-svn: 123286
* Dial back the speculative fix for PR8954 a bit, so that we only recompute ↵Cameron Zwarich2011-01-111-1/+3
| | | | | | | | dominators once at the beginning of GVN instead of once per iteration. llvm-svn: 123278
* Attempt to fix the bootstrap buildbot. Rafael says this works for him on ↵Cameron Zwarich2011-01-111-0/+1
| | | | | | x86-64 Linux. llvm-svn: 123270
* Remove dead variable, const-ref-ize an APInt.Owen Anderson2011-01-111-4/+1
| | | | llvm-svn: 123248
* this pass claims to preserve scev, make sure to tell it about deletions.Chris Lattner2011-01-111-0/+1
| | | | llvm-svn: 123247
* Factor the actual simplification out of SimplifyIndirectBrOnSelect and into ↵Frits van Bommel2011-01-111-26/+37
| | | | | | | | a new helper function so it can be reused in e.g. an upcoming SimplifySwitchOnSelect. No functional change. llvm-svn: 123234
* update memdep when an instruction is deleted. This code isn'tChris Lattner2011-01-111-2/+5
| | | | | | | actually reached in the testcase in PR8954, but it's safe and good practice. llvm-svn: 123224
* when MergeBlockIntoPredecessor merges two blocks, update MemDep if itChris Lattner2011-01-111-0/+4
| | | | | | is floating around in the ether. llvm-svn: 123223
* Fix FoldSingleEntryPHINodes to update memdep and AA when it deletesChris Lattner2011-01-112-5/+21
| | | | | | | | | | phi nodes. It is called from MergeBlockIntoPredecessor which is called from GVN, which claims to preserve these. I'm skeptical that this is the actual problem behind PR8954, but this is a stab in the right direction. llvm-svn: 123222
* random cleanupsChris Lattner2011-01-112-3/+2
| | | | llvm-svn: 123221
* remove a bogus assertion: the latch block of a loop is not Chris Lattner2011-01-111-6/+5
| | | | | | | neccesarily an uncond branch to the header. This fixes PR8955 (the assertion tripping). llvm-svn: 123219
* Fix a random missed optimization by making InstCombine more aggressive when ↵Owen Anderson2011-01-111-2/+40
| | | | | | | | determining which bits are demanded by a comparison against a constant. llvm-svn: 123203
* Teach instcombine about the rest of the SSE and SSE2 conversionChandler Carruth2011-01-101-4/+11
| | | | | | intrinsics element dependencies. Reviewed by Nick. llvm-svn: 123161
* another random stab in the dark trying to fix llvm-gcc-i386-linux-selfhostChris Lattner2011-01-101-2/+4
| | | | llvm-svn: 123149
* another (more) aggressive attempt to bring llvm-gcc-i386-linux-selfhostChris Lattner2011-01-101-0/+2
| | | | | | back to life. llvm-svn: 123146
* temporarily disable memset formation from memsets in an effort to restore ↵Chris Lattner2011-01-091-0/+3
| | | | | | buildbot stability. llvm-svn: 123144
* fix a few old bugs (found by inspection) where we would zap instructionsChris Lattner2011-01-091-1/+4
| | | | | | | | without informing memdep. This could cause nondeterminstic weirdness based on where instructions happen to get allocated, and will hopefully breath some life into some broken testers. llvm-svn: 123124
* Instcombine: Fix pattern where the sext did not dominate the icmp using itTobias Grosser2011-01-091-2/+7
| | | | llvm-svn: 123121
* LoopInstSimplify preserves LoopSimplify.Cameron Zwarich2011-01-091-0/+1
| | | | llvm-svn: 123117
* reduce indentation. Print <nuw> and <nsw> when dumping SCEV AddRec'sChris Lattner2011-01-091-3/+2
| | | | | | that have the bit set. llvm-svn: 123104
* fix a latent bug in memcpyoptimizer that my recent patches exposed: it wasn't Chris Lattner2011-01-081-2/+4
| | | | | | | updating memdep when fusing stores together. This fixes the crash optimizing the bullet benchmark. llvm-svn: 123091
* tryMergingIntoMemset can only handle constant length memsets.Chris Lattner2011-01-081-5/+6
| | | | llvm-svn: 123090
* Merge memsets followed by neighboring memsets and other stores intoChris Lattner2011-01-081-3/+18
| | | | | | | | | | | | | | | | larger memsets. Among other things, this fixes rdar://8760394 and allows us to handle "Example 2" from http://blog.regehr.org/archives/320, compiling it into a single 4096-byte memset: _mad_synth_mute: ## @mad_synth_mute ## BB#0: ## %entry pushq %rax movl $4096, %esi ## imm = 0x1000 callq ___bzero popq %rax ret llvm-svn: 123089
* fix an issue in IsPointerOffset that prevented us from recognizing thatChris Lattner2011-01-081-3/+19
| | | | | | P and P+1 are relative to the same base pointer. llvm-svn: 123087
* enhance memcpyopt to merge a store and a subsequentChris Lattner2011-01-081-53/+83
| | | | | | memset into a single larger memset. llvm-svn: 123086
* constify TargetData references.Chris Lattner2011-01-081-86/+96
| | | | | | | Split memset formation logic out into its own "tryMergingIntoMemset" helper function. llvm-svn: 123081
* When loop rotation happens, it is *very* common for the duplicated condbrChris Lattner2011-01-081-21/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to be foldable into an uncond branch. When this happens, we can make a much simpler CFG for the loop, which is important for nested loop cases where we want the outer loop to be aggressively optimized. Handle this case more aggressively. For example, previously on phi-duplicate.ll we would get this: define void @test(i32 %N, double* %G) nounwind ssp { entry: %cmp1 = icmp slt i64 1, 1000 br i1 %cmp1, label %bb.nph, label %for.end bb.nph: ; preds = %entry br label %for.body for.body: ; preds = %bb.nph, %for.cond %j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ] %arrayidx = getelementptr inbounds double* %G, i64 %j.02 %tmp3 = load double* %arrayidx %sub = sub i64 %j.02, 1 %arrayidx6 = getelementptr inbounds double* %G, i64 %sub %tmp7 = load double* %arrayidx6 %add = fadd double %tmp3, %tmp7 %arrayidx10 = getelementptr inbounds double* %G, i64 %j.02 store double %add, double* %arrayidx10 %inc = add nsw i64 %j.02, 1 br label %for.cond for.cond: ; preds = %for.body %cmp = icmp slt i64 %inc, 1000 br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge for.cond.for.end_crit_edge: ; preds = %for.cond br label %for.end for.end: ; preds = %for.cond.for.end_crit_edge, %entry ret void } Now we get the much nicer: define void @test(i32 %N, double* %G) nounwind ssp { entry: br label %for.body for.body: ; preds = %entry, %for.body %j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ] %arrayidx = getelementptr inbounds double* %G, i64 %j.01 %tmp3 = load double* %arrayidx %sub = sub i64 %j.01, 1 %arrayidx6 = getelementptr inbounds double* %G, i64 %sub %tmp7 = load double* %arrayidx6 %add = fadd double %tmp3, %tmp7 %arrayidx10 = getelementptr inbounds double* %G, i64 %j.01 store double %add, double* %arrayidx10 %inc = add nsw i64 %j.01, 1 %cmp = icmp slt i64 %inc, 1000 br i1 %cmp, label %for.body, label %for.end for.end: ; preds = %for.body ret void } With all of these recent changes, we are now able to compile: void foo(char *X) { for (int i = 0; i != 100; ++i) for (int j = 0; j != 100; ++j) X[j+i*100] = 0; } into a single memset of 10000 bytes. This series of changes should also be helpful for other nested loop scenarios as well. llvm-svn: 123079
* split ssa updating code out to its own helper function. Don't botherChris Lattner2011-01-081-74/+78
| | | | | | | moving the OrigHeader block anymore: we just merge it away anyway so its code layout doesn't matter. llvm-svn: 123077
* Implement a TODO: Enhance loopinfo to merge away the unconditional branchChris Lattner2011-01-081-11/+7
| | | | | | | | | | that it was leaving in loops after rotation (between the original latch block and the original header. With this change, it is possible for rotated loops to have just a single basic block, which is useful. llvm-svn: 123075
* various code cleanups, enhance MergeBlockIntoPredecessor to preserveChris Lattner2011-01-081-13/+10
| | | | | | loop info. llvm-svn: 123074
* inline preserveCanonicalLoopForm now that it is simple.Chris Lattner2011-01-081-39/+17
| | | | llvm-svn: 123073
OpenPOWER on IntegriCloud