| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
contains a call, lower the
unrolling threshold to the optimize-for-size threshold. Basically, for loops containing calls, unrolling
can still be profitable as long as the loop is REALLY small.
llvm-svn: 113439
|
| |
|
|
|
|
| |
single test. Patch by Dirk Steinke!
llvm-svn: 113423
|
| |
|
|
|
|
|
|
| |
turning (fptrunc (sqrt (fpext x))) -> (sqrtf x) is great, but we have
to delete the original sqrt as well. Not doing so causes us to do
two sqrt's when building with -fmath-errno (the default on linux).
llvm-svn: 113260
|
| |
|
|
| |
llvm-svn: 113257
|
| |
|
|
| |
llvm-svn: 113146
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
in the duplicated block instead of duplicating them.
Duplicating them into the end of the loop and the preheader
means that we got a phi node in the header of the loop,
which prevented LICM from hoisting them. GVN would
usually come around later and merge the duplicated
instructions so we'd get reasonable output... except that
anything dependent on the shoulda-been-hoisted value can't
be hoisted. In PR5319 (which this fixes), a memory value
didn't get promoted.
llvm-svn: 113134
|
| |
|
|
| |
llvm-svn: 113109
|
| |
|
|
|
|
|
| |
into an inner loop, as the new loop iteration may differ substantially.
This fixes PR8078.
llvm-svn: 113057
|
| |
|
|
|
|
|
|
| |
location is being re-stored to the memory location. We would get
a dangling pointer from the SSAUpdate data structure and miss a
use. This fixes PR8068
llvm-svn: 113042
|
| |
|
|
| |
llvm-svn: 113025
|
| |
|
|
|
|
|
|
| |
global when it
is provable that they're equivalent. This fixes PR4855.
llvm-svn: 112994
|
| |
|
|
| |
llvm-svn: 112987
|
| |
|
|
| |
llvm-svn: 112971
|
| |
|
|
| |
llvm-svn: 112892
|
| |
|
|
| |
llvm-svn: 112889
|
| |
|
|
| |
llvm-svn: 112878
|
| |
|
|
| |
llvm-svn: 112851
|
| |
|
|
|
|
|
|
| |
LVI lattice, undef and the full set ConstantRange should not
be treated as equivalent.
llvm-svn: 112843
|
| |
|
|
|
|
| |
and there seems to be no reason not to.
llvm-svn: 112812
|
| |
|
|
| |
llvm-svn: 112763
|
| |
|
|
|
|
| |
infinite loops or exits will eventually exit. This fixes PR5373.
llvm-svn: 112745
|
| |
|
|
| |
llvm-svn: 112724
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
on llvmdev: SRoA is introducing MMX datatypes like <1 x i64>,
which then cause random problems because the X86 backend is
producing mmx stuff without inserting proper emms calls.
In the short term, force off MMX datatypes. In the long term,
the X86 backend should not select generic vector types to MMX
registers. This is being worked on, but won't be done in time
for 2.8. rdar://8380055
llvm-svn: 112696
|
| |
|
|
| |
llvm-svn: 112695
|
| |
|
|
|
|
|
| |
instead of hoisting them, just fold them away. This occurs in the
testcase for PR8041, for example.
llvm-svn: 112669
|
| |
|
|
| |
llvm-svn: 112635
|
| |
|
|
| |
llvm-svn: 112621
|
| |
|
|
| |
llvm-svn: 112617
|
| |
|
|
| |
llvm-svn: 112615
|
| |
|
|
| |
llvm-svn: 112613
|
| |
|
|
|
|
|
|
|
|
|
|
| |
I have not been able to find a way to test each in isolation, for a few reasons:
1) The ability to look-through non-i1 BinaryOperator's requires the ability to look through non-constant
ICmps in order for it to ever trigger.
2) The ability to do LVI-powered PHI value determination only matters in cases that ProcessBranchOnPHI
can't handle. Since it already handles all the cases without other instructions in the def-use chain
between the PHI and the branch, it requires the ability to look through ICmps and/or BinaryOperators
as well.
llvm-svn: 112611
|
| |
|
|
| |
llvm-svn: 112592
|
| |
|
|
| |
llvm-svn: 112591
|
| |
|
|
|
|
|
|
|
| |
constant-fold undef, and be more careful with its return value.
This actually exposed an infinite recursion bug in ComputeValueKnownInPredecessors which theoretically already existed (in JumpThreading's
handling of and/or of i1's), but never manifested before. This patch adds a tracking set to prevent this case.
llvm-svn: 112589
|
| |
|
|
|
|
|
|
| |
discovered a miscompilation in it, and it's not easily
fixable at the optimizer level. I'll investigate reimplementing it in DAGCombine.
llvm-svn: 112575
|
| |
|
|
| |
llvm-svn: 112554
|
| |
|
|
| |
llvm-svn: 112469
|
| |
|
|
|
|
| |
out of loops, just delete them.
llvm-svn: 112451
|
| |
|
|
|
|
|
| |
I'm aware of, aren't maintained, and LVI will be replacing their value.
nlewycky approved this on irc.
llvm-svn: 112355
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
like this:
struct S { float A, B, C, D; };
struct S g;
struct S bar() {
struct S A = g;
++A.B;
A.A = 42;
return A;
}
we now generate:
_bar: ## @bar
## BB#0: ## %entry
movq _g@GOTPCREL(%rip), %rax
movss 12(%rax), %xmm0
pshufd $16, %xmm0, %xmm0
movss 4(%rax), %xmm2
movss 8(%rax), %xmm1
pshufd $16, %xmm1, %xmm1
unpcklps %xmm0, %xmm1
addss LCPI1_0(%rip), %xmm2
pshufd $16, %xmm2, %xmm2
movss LCPI1_1(%rip), %xmm0
pshufd $16, %xmm0, %xmm0
unpcklps %xmm2, %xmm0
ret
instead of:
_bar: ## @bar
## BB#0: ## %entry
movq _g@GOTPCREL(%rip), %rax
movss 12(%rax), %xmm0
pshufd $16, %xmm0, %xmm0
movss 4(%rax), %xmm2
movss 8(%rax), %xmm1
pshufd $16, %xmm1, %xmm1
unpcklps %xmm0, %xmm1
addss LCPI1_0(%rip), %xmm2
movd %xmm2, %eax
shlq $32, %rax
addq $1109917696, %rax ## imm = 0x42280000
movd %rax, %xmm0
ret
llvm-svn: 112345
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
element insertion from the pieces that feed into the vector.
This handles a pattern that occurs frequently due to code
generated for the x86-64 abi. We now compile something like
this:
struct S { float A, B, C, D; };
struct S g;
struct S bar() {
struct S A = g;
++A.A;
++A.C;
return A;
}
into all nice vector operations:
_bar: ## @bar
## BB#0: ## %entry
movq _g@GOTPCREL(%rip), %rax
movss LCPI1_0(%rip), %xmm1
movss (%rax), %xmm0
addss %xmm1, %xmm0
pshufd $16, %xmm0, %xmm0
movss 4(%rax), %xmm2
movss 12(%rax), %xmm3
pshufd $16, %xmm2, %xmm2
unpcklps %xmm2, %xmm0
addss 8(%rax), %xmm1
pshufd $16, %xmm1, %xmm1
pshufd $16, %xmm3, %xmm2
unpcklps %xmm2, %xmm1
ret
instead of icky integer operations:
_bar: ## @bar
movq _g@GOTPCREL(%rip), %rax
movss LCPI1_0(%rip), %xmm1
movss (%rax), %xmm0
addss %xmm1, %xmm0
movd %xmm0, %ecx
movl 4(%rax), %edx
movl 12(%rax), %esi
shlq $32, %rdx
addq %rcx, %rdx
movd %rdx, %xmm0
addss 8(%rax), %xmm1
movd %xmm1, %eax
shlq $32, %rsi
addq %rax, %rsi
movd %rsi, %xmm1
ret
This resolves rdar://8360454
llvm-svn: 112343
|
| |
|
|
|
|
|
|
| |
to simplify PHIs and select's.
This pass addresses the missed optimizations from PR2581 and PR4420.
llvm-svn: 112325
|
| |
|
|
| |
llvm-svn: 112321
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A = shl x, 42
...
B = lshr ..., 38
which can be transformed into:
A = shl x, 4
...
iff we can prove that the would-be-shifted-in bits
are already zero. This eliminates two shifts in the testcase
and allows eliminate of the whole i128 chain in the real example.
llvm-svn: 112314
|
| |
|
|
|
|
|
|
|
|
|
|
| |
framework, which is good at ripping through bitfield
operations. This generalize a bunch of the existing
xforms that instcombine does, such as
(x << c) >> c -> and
to handle intermediate logical nodes. This is useful for
ripping up the "promote to large integer" code produced by
SRoA.
llvm-svn: 112304
|
| |
|
|
| |
llvm-svn: 112289
|
| |
|
|
| |
llvm-svn: 112288
|
| |
|
|
|
|
|
| |
computation can be truncated if it is fed by a sext/zext that doesn't
have to be exactly equal to the truncation result type.
llvm-svn: 112285
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:
%94 = zext i16 %93 to i32 ; <i32> [#uses=2]
%96 = lshr i32 %94, 8 ; <i32> [#uses=1]
%101 = trunc i32 %96 to i8 ; <i8> [#uses=1]
This also unblocks other xforms from happening, now clang is able to compile:
struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }
into:
_foo: ## @foo
## BB#0: ## %entry
pshufd $1, %xmm0, %xmm2
addss %xmm0, %xmm2
movdqa %xmm1, %xmm3
addss %xmm2, %xmm3
pshufd $1, %xmm1, %xmm0
addss %xmm3, %xmm0
ret
on x86-64, instead of:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movapd %xmm1, %xmm3
addss %xmm2, %xmm3
movd %xmm1, %rax
shrq $32, %rax
movd %eax, %xmm0
addss %xmm3, %xmm0
ret
This seems pretty close to optimal to me, at least without
using horizontal adds. This also triggers in lots of other
code, including SPEC.
llvm-svn: 112278
|
| |
|
|
|
|
|
|
| |
condition previously. Update tests for this change.
This fixes PR5652.
llvm-svn: 112270
|