| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
more general simplify demanded bits logic.
llvm-svn: 112291
|
| |
|
|
| |
llvm-svn: 112286
|
| |
|
|
|
|
|
| |
computation can be truncated if it is fed by a sext/zext that doesn't
have to be exactly equal to the truncation result type.
llvm-svn: 112285
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
by the SRoA "promote to large integer" code, eliminating
some type conversions like this:
%94 = zext i16 %93 to i32 ; <i32> [#uses=2]
%96 = lshr i32 %94, 8 ; <i32> [#uses=1]
%101 = trunc i32 %96 to i8 ; <i8> [#uses=1]
This also unblocks other xforms from happening, now clang is able to compile:
struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }
into:
_foo: ## @foo
## BB#0: ## %entry
pshufd $1, %xmm0, %xmm2
addss %xmm0, %xmm2
movdqa %xmm1, %xmm3
addss %xmm2, %xmm3
pshufd $1, %xmm1, %xmm0
addss %xmm3, %xmm0
ret
on x86-64, instead of:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movapd %xmm1, %xmm3
addss %xmm2, %xmm3
movd %xmm1, %rax
shrq $32, %rax
movd %eax, %xmm0
addss %xmm3, %xmm0
ret
This seems pretty close to optimal to me, at least without
using horizontal adds. This also triggers in lots of other
code, including SPEC.
llvm-svn: 112278
|
| |
|
|
|
|
|
|
| |
condition previously. Update tests for this change.
This fixes PR5652.
llvm-svn: 112270
|
| |
|
|
|
|
|
| |
by SRoA. This is part of rdar://7892780, but needs another xform to
expose this.
llvm-svn: 112232
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
is a vector to be a vector element extraction. This allows clang to
compile:
struct S { float A, B, C, D; };
float foo(struct S A) { return A.A + A.B+A.C+A.D; }
into:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movapd %xmm1, %xmm3
addss %xmm2, %xmm3
movd %xmm1, %rax
shrq $32, %rax
movd %eax, %xmm0
addss %xmm3, %xmm0
ret
instead of:
_foo: ## @foo
## BB#0: ## %entry
movd %xmm0, %rax
movd %eax, %xmm0
shrq $32, %rax
movd %eax, %xmm2
addss %xmm0, %xmm2
movd %xmm1, %rax
movd %eax, %xmm1
addss %xmm2, %xmm1
shrq $32, %rax
movd %eax, %xmm0
addss %xmm1, %xmm0
ret
... eliminating half of the horribleness.
llvm-svn: 112227
|
| |
|
|
|
|
| |
compiled with clang++.
llvm-svn: 112198
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
fix: add a flag to MapValue and friends which indicates whether
any module-level mappings are being made. In the common case of
inlining, no module-level mappings are needed, so MapValue doesn't
need to examine non-function-local metadata, which can be very
expensive in the case of a large module with really deep metadata
(e.g. a large C++ program compiled with -g).
This flag is a little awkward; perhaps eventually it can be moved
into the ClonedCodeInfo class.
llvm-svn: 112190
|
| |
|
|
|
|
|
| |
except ...", it is causing *massive* performance regressions when building Clang
with itself (-O3 -g).
llvm-svn: 112158
|
| |
|
|
|
|
| |
individual ...", which depends on r111922, which I am reverting.
llvm-svn: 112157
|
| |
|
|
| |
llvm-svn: 112130
|
| |
|
|
|
|
| |
and was over-complicated, and replacing it with a simple implementation.
llvm-svn: 112120
|
| |
|
|
| |
llvm-svn: 112104
|
| |
|
|
|
|
| |
instructions, not when remapping modules.
llvm-svn: 112091
|
| |
|
|
|
|
| |
directly folded into a constant by FE.
llvm-svn: 112072
|
| |
|
|
|
|
|
|
| |
which does the same thing. This eliminates redundant code and
handles MDNodes better. MDNode linking still doesn't fully
work yet though.
llvm-svn: 111941
|
| |
|
|
| |
llvm-svn: 111923
|
| |
|
|
|
|
|
|
| |
that it avoids a lot of unnecessary cloning by avoiding remapping
MDNode cycles when none of the nodes in the cycle actually need to
be remapped. Also it uses the new temporary MDNode mechanism.
llvm-svn: 111922
|
| |
|
|
| |
llvm-svn: 111834
|
| |
|
|
| |
llvm-svn: 111816
|
| |
|
|
|
|
| |
passes over to the new registration API.
llvm-svn: 111815
|
| |
|
|
| |
llvm-svn: 111665
|
| |
|
|
| |
llvm-svn: 111571
|
| |
|
|
|
|
|
|
| |
only modifies the low bytes of a value,
we can narrow the store to only over-write the affected bytes.
llvm-svn: 111568
|
| |
|
|
| |
llvm-svn: 111551
|
| |
|
|
| |
llvm-svn: 111543
|
| |
|
|
|
|
| |
of the two.
llvm-svn: 111495
|
| |
|
|
|
|
| |
issues.
llvm-svn: 111382
|
| |
|
|
|
|
|
|
|
| |
from the LHS should disable reconsidering that pred on the
RHS. However, knowing something about the pred on the RHS
shouldn't disable subsequent additions on the RHS from
happening.
llvm-svn: 111349
|
| |
|
|
| |
llvm-svn: 111348
|
| |
|
|
| |
llvm-svn: 111344
|
| |
|
|
| |
llvm-svn: 111342
|
| |
|
|
|
|
| |
vector heavy code. I'll re-enable when we've tracked down the problem.
llvm-svn: 111318
|
| |
|
|
|
|
|
| |
loop, making the resulting loop significantly less ugly. Also, zap
its trivial PHI nodes, since it's easy.
llvm-svn: 111255
|
| |
|
|
|
|
| |
what it does manually.
llvm-svn: 111248
|
| |
|
|
|
|
| |
PHI elimination is already doing all (most?) of the splitting needed. But machine-licm and machine-sink seem to miss some important optimizations when splitting is disabled.
llvm-svn: 111224
|
| |
|
|
|
|
|
|
| |
uninteresting, just put all the operands on one list and make
GenerateReassociations make the decision about what's interesting.
This is simpler, and it avoids an extra ScalarEvolution::getAddExpr call.
llvm-svn: 111133
|
| |
|
|
|
|
|
| |
This isn't necessary, because ScalarEvolution sorts them anyway,
but it's tidier this way.
llvm-svn: 111132
|
| |
|
|
|
|
| |
actually use ScalarEvolution.
llvm-svn: 111124
|
| |
|
|
|
|
| |
indirectbr destination lists.
llvm-svn: 111122
|
| |
|
|
| |
llvm-svn: 111061
|
| |
|
|
|
|
|
|
|
|
|
| |
- Eliminate redundant successors.
- Convert an indirectbr with one successor into a direct branch.
Also, generalize SimplifyCFG to be able to be run on a function entry block.
It knows quite a few simplifications which are applicable to the entry
block, and it only needs a few checks to avoid trouble with the entry block.
llvm-svn: 111060
|
| |
|
|
|
|
|
| |
ScalarEvolution::getAddExpr, which can be pretty expensive, when nothing
has changed, which is pretty common.
llvm-svn: 111042
|
| |
|
|
|
|
| |
it previously failed.
llvm-svn: 110987
|
| |
|
|
|
|
| |
before it rewrites the code, we need to use that in the post-rewrite pass.
llvm-svn: 110962
|
| |
|
|
|
|
| |
in an external testsuite.
llvm-svn: 110905
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
to recognize
patterns generated by clang for transpose of a matrix in generic vectors. This is made
of two parts:
1) Propagating vector extracts of hi/lo half into their users
2) Recognizing an insertion of even elements followed by the odd elements as an unpack.
Testcase to come, but this shrinks the # of shuffle instructions generated on x86 from ~40 to the minimal 8.
llvm-svn: 110734
|
| |
|
|
| |
llvm-svn: 110601
|
| |
|
|
|
|
| |
it doesn't regress again.
llvm-svn: 110597
|