| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 40998
|
| |
|
|
| |
llvm-svn: 40997
|
| |
|
|
| |
llvm-svn: 40979
|
| |
|
|
| |
llvm-svn: 40978
|
| |
|
|
|
|
| |
not split condition constraints.
llvm-svn: 40977
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
alloca, increase the alignment of the load, turning it into an aligned load.
This allows us to compile:
#include <xmmintrin.h>
__m128i foo(__m128i x){
static const unsigned int c_0[4] = { 0, 0, 0, 0 };
__m128i v_Zero = _mm_loadu_si128((__m128i*)c_0);
x = _mm_unpacklo_epi8(x, v_Zero);
return x;
}
into:
_foo:
punpcklbw _c_0.5944, %xmm0
ret
.data
.lcomm _c_0.5944,16,4 # c_0.5944
instead of:
_foo:
movdqu _c_0.5944, %xmm1
punpcklbw %xmm1, %xmm0
ret
.data
.lcomm _c_0.5944,16,2 # c_0.5944
llvm-svn: 40971
|
| |
|
|
| |
llvm-svn: 40961
|
| |
|
|
| |
llvm-svn: 40960
|
| |
|
|
| |
llvm-svn: 40952
|
| |
|
|
| |
llvm-svn: 40947
|
| |
|
|
| |
llvm-svn: 40946
|
| |
|
|
| |
llvm-svn: 40944
|
| |
|
|
| |
llvm-svn: 40941
|
| |
|
|
| |
llvm-svn: 40936
|
| |
|
|
|
|
| |
and one hack to avoid hitting a bad case when the alias analysis is imprecise.
llvm-svn: 40935
|
| |
|
|
|
|
| |
it for potentially undeading pointers.
llvm-svn: 40933
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 40932
|
| |
|
|
| |
llvm-svn: 40922
|
| |
|
|
| |
llvm-svn: 40919
|
| |
|
|
| |
llvm-svn: 40915
|
| |
|
|
| |
llvm-svn: 40912
|
| |
|
|
| |
llvm-svn: 40909
|
| |
|
|
| |
llvm-svn: 40903
|
| |
|
|
| |
llvm-svn: 40898
|
| |
|
|
| |
llvm-svn: 40897
|
| |
|
|
| |
llvm-svn: 40883
|
| |
|
|
| |
llvm-svn: 40870
|
| |
|
|
| |
llvm-svn: 40861
|
| |
|
|
| |
llvm-svn: 40859
|
| |
|
|
|
|
| |
actual argument name of the documented function.
llvm-svn: 40851
|
| |
|
|
|
|
|
|
| |
This shrinks it down to something small. On the testcase
from PR1432, this speeds up instcombine from 0.7959s to 0.5000s,
(59%)
llvm-svn: 40840
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the old way, we computed and inserted phi nodes for the whole IDF of
the definitions of the alloca, then computed which ones were dead and
removed them.
In the new method, we first compute the region where the value is live,
and use that information to only insert phi nodes that are live. This
eliminates the need to compute liveness later, and stops the algorithm
from inserting a bunch of phis which it then later removes.
This speeds up the testcase in PR1432 from 2.00s to 0.15s (14x) in a
release build and 6.84s->0.50s (14x) in a debug build.
llvm-svn: 40825
|
| |
|
|
| |
llvm-svn: 40824
|
| |
|
|
|
|
| |
measurable speedup.
llvm-svn: 40823
|
| |
|
|
|
|
|
| |
to the worklist, and handling the last one with a 'tail call'. This speeds
up PR1432 from 2.0578s to 2.0012s (2.8%)
llvm-svn: 40822
|
| |
|
|
|
|
| |
mem2reg from 2.0742->2.0522s on PR1432.
llvm-svn: 40821
|
| |
|
|
| |
llvm-svn: 40820
|
| |
|
|
| |
llvm-svn: 40819
|
| |
|
|
|
|
|
| |
faster than with the 'local to a block' fastpath. This speeds
up PR1432 from 2.1232 to 2.0686s (2.6%)
llvm-svn: 40818
|
| |
|
|
|
|
|
| |
to increment NumLocalPromoted, and didn't actually delete the
dead alloca, leading to an extra iteration of mem2reg.
llvm-svn: 40817
|
| |
|
|
| |
llvm-svn: 40816
|
| |
|
|
|
|
| |
Predsimplify fails llvm-gcc bootstrap.
llvm-svn: 40815
|
| |
|
|
|
|
|
|
|
| |
stored value was a non-instruction value. Doh.
This increase the # single store allocas from 8982 to 9026, and
speeds up mem2reg on the testcase in PR1432 from 2.17 to 2.13s.
llvm-svn: 40813
|
| |
|
|
|
|
|
|
| |
and the alloca so they don't get reprocessed.
This speeds up PR1432 from 2.20s to 2.17s.
llvm-svn: 40812
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
1. Check for revisiting a block before checking domination, which is faster.
2. If the stored value isn't an instruction, we don't have to check for domination.
3. If we have a value used in the same block more than once, make sure to remove the
block from the UsingBlocks vector. Not doing so forces us to go through the slow
path for the alloca.
The combination of these improvements increases the number of allocas on the fastpath
from 8935 to 8982 on PR1432. This speeds it up from 2.90s to 2.20s (31%)
llvm-svn: 40811
|
| |
|
|
|
|
| |
testcase in PR1432 from 6.33s to 2.90s (2.22x)
llvm-svn: 40810
|
| |
|
|
|
|
|
|
|
|
| |
a using block from the list if we handle it. Not doing this caused us
to not be able to promote (with the fast path) allocas which have uses (whoops).
This increases the # allocas hitting this fastpath from 4042 to 8935 on the
testcase in PR1432, speeding up mem2reg by 2.6x
llvm-svn: 40809
|
| |
|
|
|
|
|
|
| |
LLVM. It cleans up the intrinsic definitions and generally smooths the process for more complicated intrinsic writing. It will be used by the upcoming atomic intrinsics as well as vector and float intrinsics in the future.
This also changes the syntax for llvm.bswap, llvm.part.set, llvm.part.select, and llvm.ct* intrinsics. They are automatically upgraded by both the LLVM ASM reader and the bitcode reader. The test cases have been updated, with special tests added to ensure the automatic upgrading is supported.
llvm-svn: 40807
|
| |
|
|
|
|
| |
method.
llvm-svn: 40806
|
| |
|
|
| |
llvm-svn: 40805
|