| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 123604
|
| |
|
|
|
|
| |
This fixes PR8987
llvm-svn: 123598
|
| |
|
|
| |
llvm-svn: 123597
|
| |
|
|
|
|
| |
callers to pass in an undefvalue instead.
llvm-svn: 123596
|
| |
|
|
|
|
| |
This reverts commit dd103021a889a986a181ce36ed7b0e8dc9b645e1.
llvm-svn: 123595
|
| |
|
|
| |
llvm-svn: 123593
|
| |
|
|
| |
llvm-svn: 123590
|
| |
|
|
| |
llvm-svn: 123585
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes the original testcase in PR8927. It also causes a clang
binary built with a patched clang to increase in size by 0.21%.
We can probably get some of the size back by writing a pass that
detects that a global never has its pointer compared and adds
unnamed_addr to it (maybe extend global opt). It is also possible that
there are some other cases clang could add unnamed_addr to.
I will investigate extending globalopt next.
llvm-svn: 123584
|
| |
|
|
|
|
| |
User::dropHungOffUses().
llvm-svn: 123580
|
| |
|
|
|
|
|
|
| |
into and/shift would cause nodes to move around and a dangling pointer
to happen. The code tried to avoid this with a HandleSDNode, but
got the details wrong.
llvm-svn: 123578
|
| |
|
|
|
|
| |
User.cpp.
llvm-svn: 123575
|
| |
|
|
| |
llvm-svn: 123574
|
| |
|
|
| |
llvm-svn: 123573
|
| |
|
|
| |
llvm-svn: 123572
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
load/store instructions,
then don't try to decimate it into its individual pieces. This will just make a mess of the
IR and is pointless if none of the elements are individually accessed. This was generating
really terrible code for std::bitset (PR8980) because it happens to be lowered by clang
as an {[8 x i8]} structure instead of {i64}.
The testcase now is optimized to:
define i64 @test2(i64 %X) {
br label %L2
L2: ; preds = %0
ret i64 %X
}
before we generated:
define i64 @test2(i64 %X) {
%sroa.store.elt = lshr i64 %X, 56
%1 = trunc i64 %sroa.store.elt to i8
%sroa.store.elt8 = lshr i64 %X, 48
%2 = trunc i64 %sroa.store.elt8 to i8
%sroa.store.elt9 = lshr i64 %X, 40
%3 = trunc i64 %sroa.store.elt9 to i8
%sroa.store.elt10 = lshr i64 %X, 32
%4 = trunc i64 %sroa.store.elt10 to i8
%sroa.store.elt11 = lshr i64 %X, 24
%5 = trunc i64 %sroa.store.elt11 to i8
%sroa.store.elt12 = lshr i64 %X, 16
%6 = trunc i64 %sroa.store.elt12 to i8
%sroa.store.elt13 = lshr i64 %X, 8
%7 = trunc i64 %sroa.store.elt13 to i8
%8 = trunc i64 %X to i8
br label %L2
L2: ; preds = %0
%9 = zext i8 %1 to i64
%10 = shl i64 %9, 56
%11 = zext i8 %2 to i64
%12 = shl i64 %11, 48
%13 = or i64 %12, %10
%14 = zext i8 %3 to i64
%15 = shl i64 %14, 40
%16 = or i64 %15, %13
%17 = zext i8 %4 to i64
%18 = shl i64 %17, 32
%19 = or i64 %18, %16
%20 = zext i8 %5 to i64
%21 = shl i64 %20, 24
%22 = or i64 %21, %19
%23 = zext i8 %6 to i64
%24 = shl i64 %23, 16
%25 = or i64 %24, %22
%26 = zext i8 %7 to i64
%27 = shl i64 %26, 8
%28 = or i64 %27, %25
%29 = zext i8 %8 to i64
%30 = or i64 %29, %28
ret i64 %30
}
In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough
PHIs are in play that instcombine backs off. It's better to not generate this stuff
in the first place.
llvm-svn: 123571
|
| |
|
|
|
|
| |
of a constant.
llvm-svn: 123570
|
| |
|
|
|
|
| |
of phis.
llvm-svn: 123569
|
| |
|
|
|
|
|
|
| |
multiple uses. In some cases, all the uses are the same operation,
so instcombine can go ahead and promote the phi. In the testcase
this pushes an add out of the loop.
llvm-svn: 123568
|
| |
|
|
| |
llvm-svn: 123567
|
| |
|
|
|
|
|
|
| |
in the
first line of the function because it isn't a good idea, even for compares.
llvm-svn: 123566
|
| |
|
|
| |
llvm-svn: 123565
|
| |
|
|
| |
llvm-svn: 123564
|
| |
|
|
|
|
| |
of the stored value to the new store type is always. Also, add a testcase.
llvm-svn: 123563
|
| |
|
|
| |
llvm-svn: 123562
|
| |
|
|
| |
llvm-svn: 123561
|
| |
|
|
|
|
| |
compare.
llvm-svn: 123560
|
| |
|
|
|
|
|
| |
multi-instruction sequences like calls. Many thanks to Jakob for
finding a testcase.
llvm-svn: 123559
|
| |
|
|
| |
llvm-svn: 123558
|
| |
|
|
|
|
| |
external API still uses PathV1."
llvm-svn: 123557
|
| |
|
|
| |
llvm-svn: 123556
|
| |
|
|
| |
llvm-svn: 123554
|
| |
|
|
|
|
| |
still uses PathV1.
llvm-svn: 123551
|
| |
|
|
| |
llvm-svn: 123549
|
| |
|
|
| |
llvm-svn: 123548
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
In a silly microbenchmark on a 65 nm core2 this is 1.5x faster than the old
code in 32 bit mode and about 2x faster in 64 bit mode. It's also a lot shorter,
especially when counting 64 bit population on a 32 bit target.
I hope this is fast enough to replace Kernighan-style counting loops even when
the input is rather sparse.
llvm-svn: 123547
|
| |
|
|
| |
llvm-svn: 123545
|
| |
|
|
| |
llvm-svn: 123544
|
| |
|
|
| |
llvm-svn: 123543
|
| |
|
|
|
|
| |
opporuntities. Fixes PR8978.
llvm-svn: 123541
|
| |
|
|
| |
llvm-svn: 123537
|
| |
|
|
|
|
| |
Also, replace tabs with spaces. Yes, it's 2011.
llvm-svn: 123535
|
| |
|
|
|
|
|
|
|
|
|
| |
half a million non-local queries, each of which would otherwise have triggered a
linear scan over a basic block.
Also fix a fixme for memory intrinsics which dereference pointers. With this,
we prove that a pointer is non-null because it was dereferenced by an intrinsic
112 times in llvm-test.
llvm-svn: 123533
|
| |
|
|
| |
llvm-svn: 123529
|
| |
|
|
|
|
| |
realize that ConstantFoldTerminator doesn't preserve dominfo.
llvm-svn: 123527
|
| |
|
|
|
|
|
|
|
| |
The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea. Just fold branches on constants
into unconditional branches.
llvm-svn: 123526
|
| |
|
|
| |
llvm-svn: 123525
|
| |
|
|
|
|
|
|
| |
have objectsize folding recursively simplify away their result when it
folds. It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.
llvm-svn: 123524
|
| |
|
|
|
|
|
| |
potentially invalidate it (like inline asm lowering) to be sunk into
their proper place, cleaning up a ton of code.
llvm-svn: 123523
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and-with-constant operations.
This fixes rdar://8808586 which observed that we used to compile:
union xy {
struct x { _Bool b[15]; } x;
__attribute__((packed))
struct y {
__attribute__((packed)) unsigned long b0to7;
__attribute__((packed)) unsigned int b8to11;
__attribute__((packed)) unsigned short b12to13;
__attribute__((packed)) unsigned char b14;
} y;
};
struct x
foo(union xy *xy)
{
return xy->x;
}
into:
_foo: ## @foo
movq (%rdi), %rax
movabsq $1095216660480, %rcx ## imm = 0xFF00000000
andq %rax, %rcx
movabsq $-72057594037927936, %rdx ## imm = 0xFF00000000000000
andq %rax, %rdx
movzbl %al, %esi
orq %rdx, %rsi
movq %rax, %rdx
andq $65280, %rdx ## imm = 0xFF00
orq %rsi, %rdx
movq %rax, %rsi
andq $16711680, %rsi ## imm = 0xFF0000
orq %rdx, %rsi
movl %eax, %edx
andl $-16777216, %edx ## imm = 0xFFFFFFFFFF000000
orq %rsi, %rdx
orq %rcx, %rdx
movabsq $280375465082880, %rcx ## imm = 0xFF0000000000
movq %rax, %rsi
andq %rcx, %rsi
orq %rdx, %rsi
movabsq $71776119061217280, %r8 ## imm = 0xFF000000000000
andq %r8, %rax
orq %rsi, %rax
movzwl 12(%rdi), %edx
movzbl 14(%rdi), %esi
shlq $16, %rsi
orl %edx, %esi
movq %rsi, %r9
shlq $32, %r9
movl 8(%rdi), %edx
orq %r9, %rdx
andq %rdx, %rcx
movzbl %sil, %esi
shlq $32, %rsi
orq %rcx, %rsi
movl %edx, %ecx
andl $-16777216, %ecx ## imm = 0xFFFFFFFFFF000000
orq %rsi, %rcx
movq %rdx, %rsi
andq $16711680, %rsi ## imm = 0xFF0000
orq %rcx, %rsi
movq %rdx, %rcx
andq $65280, %rcx ## imm = 0xFF00
orq %rsi, %rcx
movzbl %dl, %esi
orq %rcx, %rsi
andq %r8, %rdx
orq %rsi, %rdx
ret
We now compile this into:
_foo: ## @foo
## BB#0: ## %entry
movzwl 12(%rdi), %eax
movzbl 14(%rdi), %ecx
shlq $16, %rcx
orl %eax, %ecx
shlq $32, %rcx
movl 8(%rdi), %edx
orq %rcx, %rdx
movq (%rdi), %rax
ret
A small improvement :-)
llvm-svn: 123520
|