| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 112543
|
| |
|
|
|
|
|
|
| |
results from ComputeValueKnownInPredecessors
(indicating undef), and re-use existing constant folding APIs.
llvm-svn: 112539
|
| |
|
|
|
|
|
|
| |
instead of PromoteMemToReg. This allows it to stop using DF and DT,
eliminating a computation of DT and DF from clang -O3. Clang is now
down to 2 runs of DomFrontier.
llvm-svn: 112457
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
assertingvh so we get a violent explosion if the pointer dangles.
2) Fix AliasSetTracker::deleteValue to remove call sites with
by-pointer comparisons instead of by-alias queries. Using
findAliasSetForCallSite can cause alias sets to get merged
when they shouldn't, and can also miss alias sets when the
call is readonly.
#2 fixes PR6889, which only repros with a .c file :(
llvm-svn: 112452
|
| |
|
|
|
|
| |
out of loops, just delete them.
llvm-svn: 112451
|
| |
|
|
|
|
|
| |
symtab manipulation, so its faster (in addition to being
more elegant)
llvm-svn: 112450
|
| |
|
|
| |
llvm-svn: 112449
|
| |
|
|
|
|
|
| |
of AST to remove the hoisted instruction from the AST, since it
is no longer in the loop.
llvm-svn: 112448
|
| |
|
|
|
|
|
|
|
|
| |
LICM correctly. When sinking an instruction, it should not add
entries for the sunk instruction to the AST, it should remove
the entry for the sunk instruction. The blocks being sunk to
are not in the loop, so their instructions shouldn't be in the
AST (yet)!
llvm-svn: 112447
|
| |
|
|
|
|
|
|
|
|
| |
keeping them around until the pass is destroyed, keep them
around a) just when useful (not for outer loops) and b) destroy
them right after we use them. This should reduce memory use
and fixes potential bugs where a loop is deleted and another
loop gets allocated to the same address.
llvm-svn: 112446
|
| |
|
|
|
|
| |
claims that it preserves domfrontier if it doesn't really.
llvm-svn: 112445
|
| |
|
|
|
|
|
| |
for the unroller to pretend it supports updating it. It still
has a horrible hack for DomTree.
llvm-svn: 112444
|
| |
|
|
|
|
|
| |
other filtering techniques, as those may allow it to filter
out more obviously unprofitable candidates.
llvm-svn: 112441
|
| |
|
|
|
|
|
|
| |
LSRInstance data structures up to date. This fixes some
pessimizations caused by stale data which will be exposed
in an upcoming change.
llvm-svn: 112440
|
| |
|
|
|
|
| |
NarrowSearchSpaceUsingHeuristics into separate functions.
llvm-svn: 112439
|
| |
|
|
| |
llvm-svn: 112438
|
| |
|
|
| |
llvm-svn: 112437
|
| |
|
|
|
|
| |
the high-level logic.
llvm-svn: 112436
|
| |
|
|
| |
llvm-svn: 112435
|
| |
|
|
| |
llvm-svn: 112434
|
| |
|
|
|
|
| |
preserves domfrontier. It does preserve AA though.
llvm-svn: 112419
|
| |
|
|
|
|
|
| |
require DomFrontier. Dropping this doesn't actually save any runs
of the pass though.
llvm-svn: 112418
|
| |
|
|
|
|
|
| |
Among other things, this uses SSAUpdater instead of
PromoteMemToReg.
llvm-svn: 112417
|
| |
|
|
| |
llvm-svn: 112412
|
| |
|
|
|
|
| |
This leads to much simpler code.
llvm-svn: 112410
|
| |
|
|
| |
llvm-svn: 112409
|
| |
|
|
| |
llvm-svn: 112408
|
| |
|
|
|
|
|
| |
getUniqueExitBlocks instead of getExitBlocks and a manual
set to eliminate dupes.
llvm-svn: 112405
|
| |
|
|
| |
llvm-svn: 112404
|
| |
|
|
|
|
| |
being actively maintained, improved, or extended.
llvm-svn: 112356
|
| |
|
|
|
|
|
| |
I'm aware of, aren't maintained, and LVI will be replacing their value.
nlewycky approved this on irc.
llvm-svn: 112355
|
| |
|
|
| |
llvm-svn: 112351
|
| |
|
|
| |
llvm-svn: 112350
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
like this:
struct S { float A, B, C, D; };
struct S g;
struct S bar() {
struct S A = g;
++A.B;
A.A = 42;
return A;
}
we now generate:
_bar: ## @bar
## BB#0: ## %entry
movq _g@GOTPCREL(%rip), %rax
movss 12(%rax), %xmm0
pshufd $16, %xmm0, %xmm0
movss 4(%rax), %xmm2
movss 8(%rax), %xmm1
pshufd $16, %xmm1, %xmm1
unpcklps %xmm0, %xmm1
addss LCPI1_0(%rip), %xmm2
pshufd $16, %xmm2, %xmm2
movss LCPI1_1(%rip), %xmm0
pshufd $16, %xmm0, %xmm0
unpcklps %xmm2, %xmm0
ret
instead of:
_bar: ## @bar
## BB#0: ## %entry
movq _g@GOTPCREL(%rip), %rax
movss 12(%rax), %xmm0
pshufd $16, %xmm0, %xmm0
movss 4(%rax), %xmm2
movss 8(%rax), %xmm1
pshufd $16, %xmm1, %xmm1
unpcklps %xmm0, %xmm1
addss LCPI1_0(%rip), %xmm2
movd %xmm2, %eax
shlq $32, %rax
addq $1109917696, %rax ## imm = 0x42280000
movd %rax, %xmm0
ret
llvm-svn: 112345
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
element insertion from the pieces that feed into the vector.
This handles a pattern that occurs frequently due to code
generated for the x86-64 abi. We now compile something like
this:
struct S { float A, B, C, D; };
struct S g;
struct S bar() {
struct S A = g;
++A.A;
++A.C;
return A;
}
into all nice vector operations:
_bar: ## @bar
## BB#0: ## %entry
movq _g@GOTPCREL(%rip), %rax
movss LCPI1_0(%rip), %xmm1
movss (%rax), %xmm0
addss %xmm1, %xmm0
pshufd $16, %xmm0, %xmm0
movss 4(%rax), %xmm2
movss 12(%rax), %xmm3
pshufd $16, %xmm2, %xmm2
unpcklps %xmm2, %xmm0
addss 8(%rax), %xmm1
pshufd $16, %xmm1, %xmm1
pshufd $16, %xmm3, %xmm2
unpcklps %xmm2, %xmm1
ret
instead of icky integer operations:
_bar: ## @bar
movq _g@GOTPCREL(%rip), %rax
movss LCPI1_0(%rip), %xmm1
movss (%rax), %xmm0
addss %xmm1, %xmm0
movd %xmm0, %ecx
movl 4(%rax), %edx
movl 12(%rax), %esi
shlq $32, %rdx
addq %rcx, %rdx
movd %rdx, %xmm0
addss 8(%rax), %xmm1
movd %xmm1, %eax
shlq $32, %rsi
addq %rax, %rsi
movd %rsi, %xmm1
ret
This resolves rdar://8360454
llvm-svn: 112343
|
| |
|
|
| |
llvm-svn: 112332
|
| |
|
|
|
|
|
|
| |
to simplify PHIs and select's.
This pass addresses the missed optimizations from PR2581 and PR4420.
llvm-svn: 112325
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A = shl x, 42
...
B = lshr ..., 38
which can be transformed into:
A = shl x, 4
...
iff we can prove that the would-be-shifted-in bits
are already zero. This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.
llvm-svn: 112314
|
| |
|
|
|
|
|
|
|
|
|
|
| |
framework, which is good at ripping through bitfield
operations. This generalize a bunch of the existing
xforms that instcombine does, such as
(x << c) >> c -> and
to handle intermediate logical nodes. This is useful for
ripping up the "promote to large integer" code produced by
SRoA.
llvm-svn: 112304
|
| |
|
|
|
|
| |
more general simplify demanded bits logic.
llvm-svn: 112291
|
| |
|
|
| |
llvm-svn: 112286
|
| |
|
|
|
|
|
| |
computation can be truncated if it is fed by a sext/zext that doesn't
have to be exactly equal to the truncation result type.
llvm-svn: 112285
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:
%94 = zext i16 %93 to i32 ; <i32> [#uses=2]
%96 = lshr i32 %94, 8 ; <i32> [#uses=1]
%101 = trunc i32 %96 to i8 ; <i8> [#uses=1]
This also unblocks other xforms from happening, now clang is able to compile:
struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }
into:
_foo: ## @foo
## BB#0: ## %entry
pshufd $1, %xmm0, %xmm2
addss %xmm0, %xmm2
movdqa %xmm1, %xmm3
addss %xmm2, %xmm3
pshufd $1, %xmm1, %xmm0
addss %xmm3, %xmm0
ret
on x86-64, instead of:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movapd %xmm1, %xmm3
addss %xmm2, %xmm3
movd %xmm1, %rax
shrq $32, %rax
movd %eax, %xmm0
addss %xmm3, %xmm0
ret
This seems pretty close to optimal to me, at least without
using horizontal adds. This also triggers in lots of other
code, including SPEC.
llvm-svn: 112278
|
| |
|
|
|
|
|
|
| |
condition previously. Update tests for this change.
This fixes PR5652.
llvm-svn: 112270
|
| |
|
|
|
|
|
| |
by SRoA. This is part of rdar://7892780, but needs another xform to
expose this.
llvm-svn: 112232
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
is a vector to be a vector element extraction. This allows clang to
compile:
struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }
into:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movapd %xmm1, %xmm3
addss %xmm2, %xmm3
movd %xmm1, %rax
shrq $32, %rax
movd %eax, %xmm0
addss %xmm3, %xmm0
ret
instead of:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
movd %eax, %xmm0
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movd %xmm1, %rax
movd %eax, %xmm1
addss %xmm2, %xmm1
shrq $32, %rax
movd %eax, %xmm0
addss %xmm1, %xmm0
ret
... eliminating half of the horribleness.
llvm-svn: 112227
|
| |
|
|
|
|
| |
compiled with clang++.
llvm-svn: 112198
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
fix: add a flag to MapValue and friends which indicates whether
any module-level mappings are being made. In the common case of
inlining, no module-level mappings are needed, so MapValue doesn't
need to examine non-function-local metadata, which can be very
expensive in the case of a large module with really deep metadata
(e.g. a large C++ program compiled with -g).
This flag is a little awkward; perhaps eventually it can be moved
into the ClonedCodeInfo class.
llvm-svn: 112190
|
| |
|
|
|
|
|
| |
except ...", it is causing *massive* performance regressions when building Clang
with itself (-O3 -g).
llvm-svn: 112158
|
| |
|
|
|
|
| |
individual ...", which depends on r111922, which I am reverting.
llvm-svn: 112157
|