| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
determining which bits are demanded by
a comparison against a constant.
llvm-svn: 123203
|
|
|
|
|
|
| |
intrinsics element dependencies. Reviewed by Nick.
llvm-svn: 123161
|
|
|
|
| |
llvm-svn: 123149
|
|
|
|
|
|
| |
back to life.
llvm-svn: 123146
|
|
|
|
|
|
| |
buildbot stability.
llvm-svn: 123144
|
|
|
|
|
|
|
|
| |
without informing memdep. This could cause nondeterminstic weirdness
based on where instructions happen to get allocated, and will hopefully
breath some life into some broken testers.
llvm-svn: 123124
|
|
|
|
| |
llvm-svn: 123121
|
|
|
|
| |
llvm-svn: 123117
|
|
|
|
|
|
| |
that have the bit set.
llvm-svn: 123104
|
|
|
|
|
|
|
| |
updating memdep when fusing stores together. This fixes the crash optimizing
the bullet benchmark.
llvm-svn: 123091
|
|
|
|
| |
llvm-svn: 123090
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
larger memsets. Among other things, this fixes rdar://8760394 and
allows us to handle "Example 2" from http://blog.regehr.org/archives/320,
compiling it into a single 4096-byte memset:
_mad_synth_mute: ## @mad_synth_mute
## BB#0: ## %entry
pushq %rax
movl $4096, %esi ## imm = 0x1000
callq ___bzero
popq %rax
ret
llvm-svn: 123089
|
|
|
|
|
|
| |
P and P+1 are relative to the same base pointer.
llvm-svn: 123087
|
|
|
|
|
|
| |
memset into a single larger memset.
llvm-svn: 123086
|
|
|
|
|
|
|
| |
Split memset formation logic out into its own
"tryMergingIntoMemset" helper function.
llvm-svn: 123081
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to be foldable into an uncond branch. When this happens, we can make a
much simpler CFG for the loop, which is important for nested loop cases
where we want the outer loop to be aggressively optimized.
Handle this case more aggressively. For example, previously on
phi-duplicate.ll we would get this:
define void @test(i32 %N, double* %G) nounwind ssp {
entry:
%cmp1 = icmp slt i64 1, 1000
br i1 %cmp1, label %bb.nph, label %for.end
bb.nph: ; preds = %entry
br label %for.body
for.body: ; preds = %bb.nph, %for.cond
%j.02 = phi i64 [ 1, %bb.nph ], [ %inc, %for.cond ]
%arrayidx = getelementptr inbounds double* %G, i64 %j.02
%tmp3 = load double* %arrayidx
%sub = sub i64 %j.02, 1
%arrayidx6 = getelementptr inbounds double* %G, i64 %sub
%tmp7 = load double* %arrayidx6
%add = fadd double %tmp3, %tmp7
%arrayidx10 = getelementptr inbounds double* %G, i64 %j.02
store double %add, double* %arrayidx10
%inc = add nsw i64 %j.02, 1
br label %for.cond
for.cond: ; preds = %for.body
%cmp = icmp slt i64 %inc, 1000
br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge
for.cond.for.end_crit_edge: ; preds = %for.cond
br label %for.end
for.end: ; preds = %for.cond.for.end_crit_edge, %entry
ret void
}
Now we get the much nicer:
define void @test(i32 %N, double* %G) nounwind ssp {
entry:
br label %for.body
for.body: ; preds = %entry, %for.body
%j.01 = phi i64 [ 1, %entry ], [ %inc, %for.body ]
%arrayidx = getelementptr inbounds double* %G, i64 %j.01
%tmp3 = load double* %arrayidx
%sub = sub i64 %j.01, 1
%arrayidx6 = getelementptr inbounds double* %G, i64 %sub
%tmp7 = load double* %arrayidx6
%add = fadd double %tmp3, %tmp7
%arrayidx10 = getelementptr inbounds double* %G, i64 %j.01
store double %add, double* %arrayidx10
%inc = add nsw i64 %j.01, 1
%cmp = icmp slt i64 %inc, 1000
br i1 %cmp, label %for.body, label %for.end
for.end: ; preds = %for.body
ret void
}
With all of these recent changes, we are now able to compile:
void foo(char *X) {
for (int i = 0; i != 100; ++i)
for (int j = 0; j != 100; ++j)
X[j+i*100] = 0;
}
into a single memset of 10000 bytes. This series of changes
should also be helpful for other nested loop scenarios as well.
llvm-svn: 123079
|
|
|
|
|
|
|
| |
moving the OrigHeader block anymore: we just merge it away anyway so
its code layout doesn't matter.
llvm-svn: 123077
|
|
|
|
|
|
|
|
|
|
| |
that it was leaving in loops after rotation (between the original latch
block and the original header.
With this change, it is possible for rotated loops to have just a single
basic block, which is useful.
llvm-svn: 123075
|
|
|
|
|
|
| |
loop info.
llvm-svn: 123074
|
|
|
|
| |
llvm-svn: 123073
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Rip out LoopRotate's domfrontier updating code. It isn't
needed now that LICM doesn't use DF and it is super complex
and gross.
2. Make DomTree updating code a lot simpler and faster. The
old loop over all the blocks was just to find a block??
3. Change the code that inserts the new preheader to just use
SplitCriticalEdge instead of doing an overcomplex
reimplementation of it.
No behavior change, except for the name of the inserted preheader.
llvm-svn: 123072
|
|
|
|
| |
llvm-svn: 123071
|
|
|
|
|
|
|
| |
and latch blocks. Reorder entry conditions to make hte pass faster
and more logical.
llvm-svn: 123069
|
|
|
|
| |
llvm-svn: 123068
|
|
|
|
|
|
| |
that are just passed to one function.
llvm-svn: 123067
|
|
|
|
|
|
| |
to violate LCSSA form
llvm-svn: 123066
|
|
|
|
| |
llvm-svn: 123065
|
|
|
|
|
|
|
|
|
|
| |
they all ready do). This removes two dominator recomputations prior to isel,
which is a 1% improvement in total llc time for 403.gcc.
The only potentially suspect thing is making GCStrategy recompute dominators if
it used a custom lowering strategy.
llvm-svn: 123064
|
|
|
|
|
|
| |
top of subloop headers, as the phi uses logically occur outside of the subloop.
llvm-svn: 123062
|
|
|
|
| |
llvm-svn: 123061
|
|
|
|
|
|
|
|
|
|
|
|
| |
them into the loop preheader, eliminating silly instructions like
"icmp i32 0, 100" in fixed tripcount loops. This also better exposes the
bigger problem with loop rotate that I'd like to fix: once this has been
folded, the duplicated conditional branch *often* turns into an uncond branch.
Not aggressively handling this is pessimizing later loop optimizations
somethin' fierce by making "dominates all exit blocks" checks fail.
llvm-svn: 123060
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Take a flags argument instead of a bool. This makes
it more clear to the reader what it is used for.
2. Add a flag that says that "remapping a value not in the
map is ok".
3. Reimplement MapValue to share a bunch of code and be a lot
more efficient. For lookup failures, don't drop null values
into the map.
4. Using the new flag a bunch of code can vaporize in LinkModules
and LoopUnswitch, kill it.
No functionality change.
llvm-svn: 123058
|
|
|
|
|
|
|
|
| |
map from ValueMapper.h (giving us access to its utilities)
and add a fastpath in the loop rotation code, avoiding expensive
ssa updator manipulation for values with nothing to update.
llvm-svn: 123057
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X = sext x; x >s c ? X : C+1 --> X = sext x; X <s C+1 ? C+1 : X
X = sext x; x <s c ? X : C-1 --> X = sext x; X >s C-1 ? C-1 : X
X = zext x; x >u c ? X : C+1 --> X = zext x; X <u C+1 ? C+1 : X
X = zext x; x <u c ? X : C-1 --> X = zext x; X >u C-1 ? C-1 : X
X = sext x; x >u c ? X : C+1 --> X = sext x; X <u C+1 ? C+1 : X
X = sext x; x <u c ? X : C-1 --> X = sext x; X >u C-1 ? C-1 : X
Instead of calculating this with mixed types promote all to the
larger type. This enables scalar evolution to analyze this
expression. PR8866
llvm-svn: 123034
|
|
|
|
| |
llvm-svn: 123033
|
|
|
|
|
|
| |
additional notes.
llvm-svn: 123030
|
|
|
|
| |
llvm-svn: 123025
|
|
|
|
|
|
|
|
| |
langth are equal.
This happens when we take the (non-constant) length from a malloc.
llvm-svn: 122961
|
|
|
|
|
|
| |
with the size passed to malloc.
llvm-svn: 122959
|
|
|
|
| |
llvm-svn: 122958
|
|
|
|
|
|
| |
OptimizeInst() so that they can be used on a worklist instruction.
llvm-svn: 122945
|
|
|
|
| |
llvm-svn: 122944
|
|
|
|
|
|
|
| |
into a separate function, so that it can be called from a loop using a worklist
rather than a loop traversing a whole basic block.
llvm-svn: 122943
|
|
|
|
|
|
| |
Simplify RALinScan::DowngradeRegister with TRI::getOverlaps while we are there.
llvm-svn: 122940
|
|
|
|
|
|
| |
worklist, the key will need to become std::pair<BasicBlock*, Value*>.
llvm-svn: 122932
|
|
|
|
| |
llvm-svn: 122891
|
|
|
|
|
|
| |
regressing code quality.
llvm-svn: 122887
|
|
|
|
| |
llvm-svn: 122876
|
|
|
|
|
|
|
| |
step is to only process instructions in subloops if they have been modified by
an earlier simplification.
llvm-svn: 122869
|
|
|
|
|
|
|
|
|
|
| |
skipping them, but it should probably use a worklist and only revisit those
instructions in subloops that have actually changed. It should probably also
use a worklist after the first iteration like instsimplify now does. Regardless,
it's only 0.3% of opt -O2 time on 403.gcc if it replaces the instcombine placed
in the middle of the loop passes.
llvm-svn: 122868
|