bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Convert a std::map to a DenseMap for another 1.7% speedup on -scalarrepl.	Cameron Zwarich	2011-01-18	1	-3/+3
\| \| \| \|	llvm-svn: 123732
*	Make a std::vector a SmallVector<*, 32> like the other vectors in the same	Cameron Zwarich	2011-01-18	1	-1/+1
\| \| \| \| \| \| \|	function. This seems to be about a 1.5% speedup of -scalarrepl on test-suite with SPEC2000 and SPEC2006. llvm-svn: 123731
*	Reduce indentation and remove commented out code.	Rafael Espindola	2011-01-18	1	-122/+101
\| \| \| \|	llvm-svn: 123729
*	Remove code for updating dominance frontiers and some outdated references to	Cameron Zwarich	2011-01-18	7	-105/+21
\| \| \| \| \| \|	dominance and post-dominance frontiers. llvm-svn: 123725
*	Remove outdated references to dominance frontiers.	Cameron Zwarich	2011-01-18	4	-29/+27
\| \| \| \|	llvm-svn: 123724
*	Remove dead code, that I apparently wrote a while back. We seem to be doing ↵	Owen Anderson	2011-01-17	1	-15/+0
\| \| \| \| \| \| \| \| \| \|	well enough without whatever this was trying to do. When/if someone has the time to do some empirical evaluations, it might be worth it to figure out what this code was trying to do and see if it's worth resurrecting/fixing. llvm-svn: 123684
*	Roll r123609 back in with two changes that fix test failures with expensive	Cameron Zwarich	2011-01-17	3	-61/+122
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	checks enabled: 1) Use '<' to compare integers in a comparison function rather than '<='. 2) Use the uniqued set DefBlocks rather than Info.DefiningBlocks to initialize the priority queue. The speedup of scalarrepl on test-suite + SPEC2000 + SPEC2006 is a bit less, at just under 16% rather than 17%. llvm-svn: 123662
*	Roll out r123609 due to failures on the llvm-x86_64-linux-checks bot.	Cameron Zwarich	2011-01-17	3	-121/+60
\| \| \| \|	llvm-svn: 123618
*	Eliminate the use of dominance frontiers in PromoteMemToReg. In addition to	Cameron Zwarich	2011-01-17	3	-60/+121
\| \| \| \| \| \| \| \| \| \| \| \| \|	eliminating a potentially quadratic data structure, this also gives a 17% speedup when running -scalarrepl on test-suite + SPEC2000 + SPEC2006. My initial experiment gave a greater speedup around 25%, but I moved the dominator tree level computation from dominator tree construction to PromoteMemToReg. Since this approach to computing IDFs has a much lower overhead than the old code using precomputed DFs, it is worth looking at using this new code for the second scalarrepl pass as well. llvm-svn: 123609
*	Teach DAE to look for functions whose arguments are unused, and change all ↵	Anders Carlsson	2011-01-16	1	-1/+61
\| \| \| \| \| \|	callers to pass in an undefvalue instead. llvm-svn: 123596
*	tidy up a comment, as suggested by duncan	Chris Lattner	2011-01-16	1	-2/+2
\| \| \| \|	llvm-svn: 123590
*	Don't merge two constants if we care about the address of both.	Rafael Espindola	2011-01-16	1	-22/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the original testcase in PR8927. It also causes a clang binary built with a patched clang to increase in size by 0.21%. We can probably get some of the size back by writing a pass that detects that a global never has its pointer compared and adds unnamed_addr to it (maybe extend global opt). It is also possible that there are some other cases clang could add unnamed_addr to. I will investigate extending globalopt next. llvm-svn: 123584
*	fix PR8932, a case where arg promotion could infinitely promote.	Chris Lattner	2011-01-16	1	-24/+51
\| \| \| \|	llvm-svn: 123574
*	simplify a little	Chris Lattner	2011-01-16	1	-7/+3
\| \| \| \|	llvm-svn: 123573
*	if an alloca is only ever accessed as a unit, and is accessed with ↵	Chris Lattner	2011-01-16	1	-3/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load/store instructions, then don't try to decimate it into its individual pieces. This will just make a mess of the IR and is pointless if none of the elements are individually accessed. This was generating really terrible code for std::bitset (PR8980) because it happens to be lowered by clang as an {[8 x i8]} structure instead of {i64}. The testcase now is optimized to: define i64 @test2(i64 %X) { br label %L2 L2: ; preds = %0 ret i64 %X } before we generated: define i64 @test2(i64 %X) { %sroa.store.elt = lshr i64 %X, 56 %1 = trunc i64 %sroa.store.elt to i8 %sroa.store.elt8 = lshr i64 %X, 48 %2 = trunc i64 %sroa.store.elt8 to i8 %sroa.store.elt9 = lshr i64 %X, 40 %3 = trunc i64 %sroa.store.elt9 to i8 %sroa.store.elt10 = lshr i64 %X, 32 %4 = trunc i64 %sroa.store.elt10 to i8 %sroa.store.elt11 = lshr i64 %X, 24 %5 = trunc i64 %sroa.store.elt11 to i8 %sroa.store.elt12 = lshr i64 %X, 16 %6 = trunc i64 %sroa.store.elt12 to i8 %sroa.store.elt13 = lshr i64 %X, 8 %7 = trunc i64 %sroa.store.elt13 to i8 %8 = trunc i64 %X to i8 br label %L2 L2: ; preds = %0 %9 = zext i8 %1 to i64 %10 = shl i64 %9, 56 %11 = zext i8 %2 to i64 %12 = shl i64 %11, 48 %13 = or i64 %12, %10 %14 = zext i8 %3 to i64 %15 = shl i64 %14, 40 %16 = or i64 %15, %13 %17 = zext i8 %4 to i64 %18 = shl i64 %17, 32 %19 = or i64 %18, %16 %20 = zext i8 %5 to i64 %21 = shl i64 %20, 24 %22 = or i64 %21, %19 %23 = zext i8 %6 to i64 %24 = shl i64 %23, 16 %25 = or i64 %24, %22 %26 = zext i8 %7 to i64 %27 = shl i64 %26, 8 %28 = or i64 %27, %25 %29 = zext i8 %8 to i64 %30 = or i64 %29, %28 ret i64 %30 } In this case, instcombine was able to eliminate the nonsense, but in PR8980 enough PHIs are in play that instcombine backs off. It's better to not generate this stuff in the first place. llvm-svn: 123571
*	Use an irbuilder to get some trivial constant folding when doing a store	Chris Lattner	2011-01-16	1	-21/+17
\| \| \| \| \| \|	of a constant. llvm-svn: 123570
*	remove a dead check, this was needed before we had an explicit veto on uses ↵	Chris Lattner	2011-01-16	1	-5/+0
\| \| \| \| \| \|	of phis. llvm-svn: 123569
*	enhance FoldOpIntoPhi in instcombine to try harder when a phi has	Chris Lattner	2011-01-16	2	-3/+20
\| \| \| \| \| \| \| \|	multiple uses. In some cases, all the uses are the same operation, so instcombine can go ahead and promote the phi. In the testcase this pushes an add out of the loop. llvm-svn: 123568
*	remove the AllowAggressive argument to FoldOpIntoPhi. It is forced to false ↵	Chris Lattner	2011-01-16	3	-14/+6
\| \| \| \| \| \| \| \|	in the first line of the function because it isn't a good idea, even for compares. llvm-svn: 123566
*	more cleanups: use the IR builder.	Chris Lattner	2011-01-16	1	-38/+39
\| \| \| \|	llvm-svn: 123565
*	tidy up code.	Chris Lattner	2011-01-16	1	-16/+20
\| \| \| \|	llvm-svn: 123564
*	Improve the safety of my globalopt enhancement by ensuring that the bitcast	Owen Anderson	2011-01-16	1	-12/+22
\| \| \| \| \| \|	of the stored value to the new store type is always. Also, add a testcase. llvm-svn: 123563
*	simplify this code, it is still broken but will follow up on llvm-commits.	Chris Lattner	2011-01-16	1	-15/+5
\| \| \| \|	llvm-svn: 123558
*	remove the partial specialization pass. It is unmaintained and has bugs.	Chris Lattner	2011-01-16	3	-230/+0
\| \| \| \|	llvm-svn: 123554
*	Add missing whitespace.	Nick Lewycky	2011-01-15	1	-2/+2
\| \| \| \|	llvm-svn: 123543
*	Make constmerge a two-pass algorithm so that it won't miss merging	Nick Lewycky	2011-01-15	1	-4/+34
\| \| \| \| \| \|	opporuntities. Fixes PR8978. llvm-svn: 123541
*	Try to unbreak selfhost.	Benjamin Kramer	2011-01-15	1	-0/+1
\| \| \| \|	llvm-svn: 123537
*	Add a cache that protects mergefunc's internals from more surprises in DenseSet.	Nick Lewycky	2011-01-15	1	-5/+27
\| \| \| \| \| \|	Also, replace tabs with spaces. Yes, it's 2011. llvm-svn: 123535
*	temporarily revert r123526. While working on a follow-on patch I	Chris Lattner	2011-01-15	1	-3/+0
\| \| \| \| \| \|	realize that ConstantFoldTerminator doesn't preserve dominfo. llvm-svn: 123527
*	fix rdar://8785296 - -fcatch-undefined-behavior generates inefficient code	Chris Lattner	2011-01-15	1	-0/+3
\| \| \| \| \| \| \| \| \|	The basic issue is that isel (very reasonably!) expects conditional branches to be folded, so CGP leaving around a bunch dead computation feeding conditional branches isn't such a good idea. Just fold branches on constants into unconditional branches. llvm-svn: 123526
*	simplify code, no functionality change.	Chris Lattner	2011-01-15	1	-30/+37
\| \| \| \|	llvm-svn: 123525
*	Now that instruction optzns can update the iterator as they go, we can	Chris Lattner	2011-01-15	1	-10/+16
\| \| \| \| \| \| \| \|	have objectsize folding recursively simplify away their result when it folds. It is important to catch this here, because otherwise we won't eliminate the cross-block values at isel and other times. llvm-svn: 123524
*	make the current instruction iterator an ivar, allowing xforms that	Chris Lattner	2011-01-15	1	-35/+38
\| \| \| \| \| \| \|	potentially invalidate it (like inline asm lowering) to be sunk into their proper place, cleaning up a ton of code. llvm-svn: 123523
*	implement an instcombine xform that canonicalizes casts outside of ↵	Chris Lattner	2011-01-15	1	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and-with-constant operations. This fixes rdar://8808586 which observed that we used to compile: union xy { struct x { _Bool b[15]; } x; __attribute__((packed)) struct y { __attribute__((packed)) unsigned long b0to7; __attribute__((packed)) unsigned int b8to11; __attribute__((packed)) unsigned short b12to13; __attribute__((packed)) unsigned char b14; } y; }; struct x foo(union xy *xy) { return xy->x; } into: _foo: ## @foo movq (%rdi), %rax movabsq $1095216660480, %rcx ## imm = 0xFF00000000 andq %rax, %rcx movabsq $-72057594037927936, %rdx ## imm = 0xFF00000000000000 andq %rax, %rdx movzbl %al, %esi orq %rdx, %rsi movq %rax, %rdx andq $65280, %rdx ## imm = 0xFF00 orq %rsi, %rdx movq %rax, %rsi andq $16711680, %rsi ## imm = 0xFF0000 orq %rdx, %rsi movl %eax, %edx andl $-16777216, %edx ## imm = 0xFFFFFFFFFF000000 orq %rsi, %rdx orq %rcx, %rdx movabsq $280375465082880, %rcx ## imm = 0xFF0000000000 movq %rax, %rsi andq %rcx, %rsi orq %rdx, %rsi movabsq $71776119061217280, %r8 ## imm = 0xFF000000000000 andq %r8, %rax orq %rsi, %rax movzwl 12(%rdi), %edx movzbl 14(%rdi), %esi shlq $16, %rsi orl %edx, %esi movq %rsi, %r9 shlq $32, %r9 movl 8(%rdi), %edx orq %r9, %rdx andq %rdx, %rcx movzbl %sil, %esi shlq $32, %rsi orq %rcx, %rsi movl %edx, %ecx andl $-16777216, %ecx ## imm = 0xFFFFFFFFFF000000 orq %rsi, %rcx movq %rdx, %rsi andq $16711680, %rsi ## imm = 0xFF0000 orq %rcx, %rsi movq %rdx, %rcx andq $65280, %rcx ## imm = 0xFF00 orq %rsi, %rcx movzbl %dl, %esi orq %rcx, %rsi andq %r8, %rdx orq %rsi, %rdx ret We now compile this into: _foo: ## @foo ## BB#0: ## %entry movzwl 12(%rdi), %eax movzbl 14(%rdi), %ecx shlq $16, %rcx orl %eax, %ecx shlq $32, %rcx movl 8(%rdi), %edx orq %rcx, %rdx movq (%rdi), %rax ret A small improvement :-) llvm-svn: 123520
*	one more instcombine variant that is needed to work with future changes,	Chris Lattner	2011-01-15	1	-0/+9
\| \| \| \| \| \|	no functionality change currently. llvm-svn: 123517
*	fix typo	Chris Lattner	2011-01-15	1	-1/+1
\| \| \| \|	llvm-svn: 123516
*	Catch ~x < cst just like ~x < ~y, we currently handle this through	Chris Lattner	2011-01-15	1	-4/+8
\| \| \| \| \| \|	means that are about to disappear. llvm-svn: 123515
*	reduce indentation	Chris Lattner	2011-01-15	1	-29/+29
\| \| \| \|	llvm-svn: 123514
*	Generalize LoadAndStorePromoter a bit and switch LICM	Chris Lattner	2011-01-15	3	-190/+111
\| \| \| \| \| \|	to use it. llvm-svn: 123501
*	Fix a false-positive warning.	Owen Anderson	2011-01-14	1	-1/+3
\| \| \| \|	llvm-svn: 123480
*	Enhance GlobalOpt to be able evaluate initializers that involve stores through	Owen Anderson	2011-01-14	1	-2/+49
\| \| \| \| \| \|	bitcasts, at least in simple cases. This fixes clang's CodeGenCXX/virtual-base-dtor.cpp llvm-svn: 123477
*	switch SRoA to use LoadAndStorePromoter instead of its own copy of the code.	Chris Lattner	2011-01-14	1	-136/+26
\| \| \| \|	llvm-svn: 123457
*	Add a new LoadAndStorePromoter class, which implements the general	Chris Lattner	2011-01-14	1	-0/+154
\| \| \| \| \| \| \|	"promote a bunch of load and stores" logic, allowing the code to be shared and reused. llvm-svn: 123456
*	split SROA into two passes: one that uses DomFrontiers (-scalarrepl)	Chris Lattner	2011-01-14	2	-27/+57
\| \| \| \| \| \|	and one that uses SSAUpdater (-scalarrepl-ssa) llvm-svn: 123436
*	Implement full support for promoting allocas to registers using SSAUpdater	Chris Lattner	2011-01-14	1	-5/+162
\| \| \| \| \| \| \| \| \| \| \|	instead of DomTree/DomFrontier. This may be interesting for reducing compile time. This is currently disabled, but seems to work just fine. When this is enabled, we eliminate two runs of dominator frontier, one in the "early per-function" optimizations and one in the "interlaced with inliner" function passes. llvm-svn: 123434
*	indentation	Chris Lattner	2011-01-14	1	-1/+1
\| \| \| \|	llvm-svn: 123426
*	Move some shift transforms out of instcombine and into InstructionSimplify.	Duncan Sands	2011-01-14	1	-26/+10
\| \| \| \| \| \| \| \| \| \| \| \|	While there, I noticed that the transform "undef >>a X -> undef" was wrong. For example if X is 2 then the top two bits must be equal, so the result can not be anything. I fixed this in the constant folder as well. Also, I made the transform for "X << undef" stronger: it now folds to undef always, even though X might be zero. This is in accordance with the LangRef, but I must admit that it is fairly aggressive. Also, I added "i32 X << 32 -> undef" following the LangRef and the constant folder, likewise fairly aggressive. llvm-svn: 123417
*	Fix whitespace.	Bob Wilson	2011-01-13	1	-120/+120
\| \| \| \|	llvm-svn: 123396
*	Check for empty structs, and for consistency, zero-element arrays.	Bob Wilson	2011-01-13	1	-2/+2
\| \| \| \|	llvm-svn: 123383
*	Extend SROA to handle arrays accessed as homogeneous structs and vice versa.	Bob Wilson	2011-01-13	1	-14/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a minor extension of SROA to handle a special case that is important for some ARM NEON operations. Some of the NEON intrinsics return multiple values, which are handled as struct types containing multiple elements of the same vector type. The corresponding return types declared in the arm_neon.h header have equivalent arrays. We need SROA to recognize that it can split up those arrays and structs into separate vectors, even though they are not always accessed with the same type. SROA already handles loads and stores of an entire alloca by using insertvalue/extractvalue to access the individual pieces, and that code works the same regardless of whether the type is a struct or an array. So, all that needs to be done is to check for compatible arrays and homogeneous structs. llvm-svn: 123381