| Commit message (Collapse) | Author | Age | Files | Lines | 
| | 
| 
| 
| 
| 
|  | 
still uses PathV1.
llvm-svn: 123551
 | 
| | 
| 
| 
|  | 
llvm-svn: 123549
 | 
| | 
| 
| 
|  | 
llvm-svn: 123548
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
In a silly microbenchmark on a 65 nm core2 this is 1.5x faster than the old
code in 32 bit mode and about 2x faster in 64 bit mode. It's also a lot shorter,
especially when counting 64 bit population on a 32 bit target.
I hope this is fast enough to replace Kernighan-style counting loops even when
the input is rather sparse.
llvm-svn: 123547
 | 
| | 
| 
| 
|  | 
llvm-svn: 123545
 | 
| | 
| 
| 
|  | 
llvm-svn: 123544
 | 
| | 
| 
| 
|  | 
llvm-svn: 123543
 | 
| | 
| 
| 
| 
| 
|  | 
opporuntities. Fixes PR8978.
llvm-svn: 123541
 | 
| | 
| 
| 
|  | 
llvm-svn: 123537
 | 
| | 
| 
| 
| 
| 
|  | 
Also, replace tabs with spaces. Yes, it's 2011.
llvm-svn: 123535
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
half a million non-local queries, each of which would otherwise have triggered a
linear scan over a basic block.
Also fix a fixme for memory intrinsics which dereference pointers. With this,
we prove that a pointer is non-null because it was dereferenced by an intrinsic
112 times in llvm-test.
llvm-svn: 123533
 | 
| | 
| 
| 
|  | 
llvm-svn: 123529
 | 
| | 
| 
| 
| 
| 
|  | 
realize that ConstantFoldTerminator doesn't preserve dominfo.
llvm-svn: 123527
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
The basic issue is that isel (very reasonably!) expects conditional branches
to be folded, so CGP leaving around a bunch dead computation feeding
conditional branches isn't such a good idea.  Just fold branches on constants
into unconditional branches.
llvm-svn: 123526
 | 
| | 
| 
| 
|  | 
llvm-svn: 123525
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
have objectsize folding recursively simplify away their result when it
folds.  It is important to catch this here, because otherwise we won't
eliminate the cross-block values at isel and other times.
llvm-svn: 123524
 | 
| | 
| 
| 
| 
| 
| 
|  | 
potentially invalidate it (like inline asm lowering) to be sunk into
their proper place, cleaning up a ton of code.
llvm-svn: 123523
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
and-with-constant operations.
This fixes rdar://8808586 which observed that we used to compile:
union xy {
        struct x { _Bool b[15]; } x;
        __attribute__((packed))
        struct y {
                __attribute__((packed)) unsigned long b0to7;
                __attribute__((packed)) unsigned int b8to11;
                __attribute__((packed)) unsigned short b12to13;
                __attribute__((packed)) unsigned char b14;
        } y;
};
struct x
foo(union xy *xy)
{
        return xy->x;
}
into:
_foo:                                   ## @foo
	movq	(%rdi), %rax
	movabsq	$1095216660480, %rcx    ## imm = 0xFF00000000
	andq	%rax, %rcx
	movabsq	$-72057594037927936, %rdx ## imm = 0xFF00000000000000
	andq	%rax, %rdx
	movzbl	%al, %esi
	orq	%rdx, %rsi
	movq	%rax, %rdx
	andq	$65280, %rdx            ## imm = 0xFF00
	orq	%rsi, %rdx
	movq	%rax, %rsi
	andq	$16711680, %rsi         ## imm = 0xFF0000
	orq	%rdx, %rsi
	movl	%eax, %edx
	andl	$-16777216, %edx        ## imm = 0xFFFFFFFFFF000000
	orq	%rsi, %rdx
	orq	%rcx, %rdx
	movabsq	$280375465082880, %rcx  ## imm = 0xFF0000000000
	movq	%rax, %rsi
	andq	%rcx, %rsi
	orq	%rdx, %rsi
	movabsq	$71776119061217280, %r8 ## imm = 0xFF000000000000
	andq	%r8, %rax
	orq	%rsi, %rax
	movzwl	12(%rdi), %edx
	movzbl	14(%rdi), %esi
	shlq	$16, %rsi
	orl	%edx, %esi
	movq	%rsi, %r9
	shlq	$32, %r9
	movl	8(%rdi), %edx
	orq	%r9, %rdx
	andq	%rdx, %rcx
	movzbl	%sil, %esi
	shlq	$32, %rsi
	orq	%rcx, %rsi
	movl	%edx, %ecx
	andl	$-16777216, %ecx        ## imm = 0xFFFFFFFFFF000000
	orq	%rsi, %rcx
	movq	%rdx, %rsi
	andq	$16711680, %rsi         ## imm = 0xFF0000
	orq	%rcx, %rsi
	movq	%rdx, %rcx
	andq	$65280, %rcx            ## imm = 0xFF00
	orq	%rsi, %rcx
	movzbl	%dl, %esi
	orq	%rcx, %rsi
	andq	%r8, %rdx
	orq	%rsi, %rdx
	ret
We now compile this into:
_foo:                                   ## @foo
## BB#0:                                ## %entry
	movzwl	12(%rdi), %eax
	movzbl	14(%rdi), %ecx
	shlq	$16, %rcx
	orl	%eax, %ecx
	shlq	$32, %rcx
	movl	8(%rdi), %edx
	orq	%rcx, %rdx
	movq	(%rdi), %rax
	ret
A small improvement :-)
llvm-svn: 123520
 | 
| | 
| 
| 
| 
| 
|  | 
no functionality change currently.
llvm-svn: 123517
 | 
| | 
| 
| 
|  | 
llvm-svn: 123516
 | 
| | 
| 
| 
| 
| 
|  | 
means that are about to disappear.
llvm-svn: 123515
 | 
| | 
| 
| 
|  | 
llvm-svn: 123514
 | 
| | 
| 
| 
|  | 
llvm-svn: 123505
 | 
| | 
| 
| 
| 
| 
|  | 
to use it.
llvm-svn: 123501
 | 
| | 
| 
| 
|  | 
llvm-svn: 123497
 | 
| | 
| 
| 
|  | 
llvm-svn: 123494
 | 
| | 
| 
| 
|  | 
llvm-svn: 123491
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
declaration and its assignments.
Found by clang static analyzer.
llvm-svn: 123486
 | 
| | 
| 
| 
|  | 
llvm-svn: 123480
 | 
| | 
| 
| 
| 
| 
|  | 
comments.
llvm-svn: 123479
 | 
| | 
| 
| 
| 
| 
|  | 
bitcasts, at least in simple cases.  This fixes clang's CodeGenCXX/virtual-base-dtor.cpp
llvm-svn: 123477
 | 
| | 
| 
| 
| 
| 
|  | 
description emission. Currently all the backends use table-based stuff.
llvm-svn: 123476
 | 
| | 
| 
| 
|  | 
llvm-svn: 123475
 | 
| | 
| 
| 
|  | 
llvm-svn: 123474
 | 
| | 
| 
| 
|  | 
llvm-svn: 123473
 | 
| | 
| 
| 
|  | 
llvm-svn: 123472
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
disabled in this checkin. Sorry for the large diffs due to
refactoring. New functionality is all guarded by EnableSchedCycles.
Scheduling the isel DAG is inherently imprecise, but we give it a best
effort:
- Added MayReduceRegPressure to allow stalled nodes in the queue only
  if there is a regpressure need.
- Added BUHasStall to allow checking for either dependence stalls due to
  latency or resource stalls due to pipeline hazards.
- Added BUCompareLatency to encapsulate and standardize the heuristics
  for minimizing stall cycles (vs. reducing register pressure).
- Modified the bottom-up heuristic (now in BUCompareLatency) to
  prioritize nodes by their depth rather than height. As long as it
  doesn't stall, height is irrelevant. Depth represents the critical
  path to the DAG root.
- Added hybrid_ls_rr_sort::isReady to filter stalled nodes before
  adding them to the available queue.
Related Cleanup: most of the register reduction routines do not need
to be templates.
llvm-svn: 123468
 | 
| | 
| 
| 
|  | 
llvm-svn: 123457
 | 
| | 
| 
| 
| 
| 
| 
|  | 
"promote a bunch of load and stores" logic, allowing the code to
be shared and reused.
llvm-svn: 123456
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
simplification present in fully optimized code (I think instcombine fails to
transform some of these when "X-Y" has more than one use).  Fires here and
there all over the test-suite, for example it eliminates 8 subtractions in
the final IR for 445.gobmk, 2 subs in 447.dealII, 2 in paq8p etc.
llvm-svn: 123442
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
threading of shifts over selects and phis while there.  This fires here and
there in the testsuite, to not much effect.  For example when compiling spirit
it fires 5 times, during early-cse, resulting in 6 more cse simplifications,
and 3 more terminators being folded by jump threading, but the final bitcode
doesn't change in any interesting way: other optimizations would have caught
the opportunity anyway, only later.
llvm-svn: 123441
 | 
| | 
| 
| 
| 
| 
|  | 
and one that uses SSAUpdater (-scalarrepl-ssa)
llvm-svn: 123436
 | 
| | 
| 
| 
| 
| 
| 
|  | 
static_cast from Constant* to Value* has to adjust the "this" pointer.
This is groundwork for PR889.
llvm-svn: 123435
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
instead of DomTree/DomFrontier.  This may be interesting for reducing compile 
time.  This is currently disabled, but seems to work just fine.
When this is enabled, we eliminate two runs of dominator frontier, one in the
"early per-function" optimizations and one in the "interlaced with inliner"
function passes.
llvm-svn: 123434
 | 
| | 
| 
| 
| 
| 
|  | 
This time let's rephrase to trick gcc-4.3 into not miscompiling.
llvm-svn: 123432
 | 
| | 
| 
| 
| 
| 
|  | 
llvm-gcc-i386-linux-selfhost buildbot heartburn...
llvm-svn: 123431
 | 
| | 
| 
| 
|  | 
llvm-svn: 123427
 | 
| | 
| 
| 
|  | 
llvm-svn: 123426
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
- Fixed :upper16: fix up routine. It should be shifting down the top 16 bits first.
- Added support for Thumb2 :lower16: and :upper16: fix up.
- Added :upper16: and :lower16: relocation support to mach-o object writer.
llvm-svn: 123424
 | 
| | 
| 
| 
|  | 
llvm-svn: 123423
 |