bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Use references to simplify the code a bit.	Rafael Espindola	2010-12-06	3	-14/+11
\| \| \| \|	llvm-svn: 121050
*	Adding bug fix that was suppose to be part of 121044.	Wesley Peck	2010-12-06	1	-6/+6
\| \| \| \| \| \|	patch contributed by Jack Whitham! llvm-svn: 121049
*	Fixed reversed operands for IDIV and CMP instructions in MBlaze backend.	Wesley Peck	2010-12-06	1	-24/+24
\| \| \| \| \| \| \| \|	Use BRAD instead of BRD for indirect branches in MBlaze backend. patch contributed by Jack Whitham! llvm-svn: 121044
*	Refactor ELFObjectWriter.	Jason W Kim	2010-12-06	1	-216/+106
\| \| \| \| \| \| \|	+ ARM/X86/MBlaze now share a common RecordRelocation + ARM/X86/MBlaze arch specific routines are limited to GetRelocType() llvm-svn: 121043
*	replace a linear scan with a symtab lookup, reduce indentation.	Chris Lattner	2010-12-06	1	-38/+38
\| \| \| \| \| \|	No functionality change. llvm-svn: 121042
*	use getSymbolOffset.	Rafael Espindola	2010-12-06	1	-1/+1
\| \| \| \|	llvm-svn: 121041
*	Use a stronger predicate here, pointed out by Duncan	Chris Lattner	2010-12-06	1	-1/+1
\| \| \| \|	llvm-svn: 121040
*	add some DEBUG statements.	Chris Lattner	2010-12-06	1	-3/+14
\| \| \| \|	llvm-svn: 121038
*	Fix a 16-bit immediate value detection bug in the MBlaze delay slot filler.	Wesley Peck	2010-12-06	1	-48/+86
\| \| \| \| \| \| \| \|	Address more hazards in the MBlaze delay slot filler. patch contributed by Jack Whitham! llvm-svn: 121037
*	Another use of getSymbolOffset.	Rafael Espindola	2010-12-06	1	-4/+8
\| \| \| \|	llvm-svn: 121034
*	Remove the instruction fragment to data fragment lowering since it was causing	Rafael Espindola	2010-12-06	5	-116/+56
\| \| \| \| \| \|	freed data to be read. I will open a bug to track it being reenabled. llvm-svn: 121028
*	Revert r121021, which broke the buildbots.	Owen Anderson	2010-12-06	2	-34/+20
\| \| \| \|	llvm-svn: 121026
*	Trailing whitespace.	Jim Grosbach	2010-12-06	1	-1/+1
\| \| \| \|	llvm-svn: 121024
*	Improve handling of Thumb2 PC-relative loads by converting LDRpci (and ↵	Owen Anderson	2010-12-06	2	-20/+34
\| \| \| \| \| \|	friends) to Pseudos. llvm-svn: 121021
*	Encode the register operand of ARM CondCode operands correctly. ARM::CPSR if	Jim Grosbach	2010-12-06	1	-2/+2
\| \| \| \| \| \|	the instruction is predicated, reg0 otherwise. llvm-svn: 121020
*	The ARM AsmMatcher needs to know that the CCOut operand is a register value,	Jim Grosbach	2010-12-06	2	-1/+27
\| \| \| \| \| \|	not an immediate. It stores either ARM::CPSR or reg0. llvm-svn: 121018
*	Second try at making direct object emission produce the same results	Rafael Espindola	2010-12-06	7	-53/+18
\| \| \| \| \| \| \|	as llc + llvm-mc. This time ELF is not changed and I tested that llvm-gcc bootstrap on darwin10 using darwin9's assembler and linker. llvm-svn: 121006
*	Revert previous two patches while I try to find out how to make both	Rafael Espindola	2010-12-06	3	-9/+10
\| \| \| \| \| \|	linux and darwin assemblers happy :-( llvm-svn: 121004
*	Add an EmitAbsValue helper method and use it in cases where we want to be sure	Rafael Espindola	2010-12-06	3	-10/+9
\| \| \| \| \| \| \|	that no relocations are used (on MochO). Fixes llc producing different output from llc + llvm-mc. llvm-svn: 121000
*	Fix PR8735, a really terrible problem in the inliner's "alloca merging"	Chris Lattner	2010-12-06	1	-3/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	optimization. Consider: static void foo() { A = alloca ... } static void bar() { B = alloca ... call foo(); } void main() { bar() } The inliner proceeds bottom up, but lets pretend it decides not to inline foo into bar. When it gets to main, it inlines bar into main(), and says "hey, I just inlined an alloca "B" into main, lets remember that. Then it keeps going and finds that it now contains a call to foo. It decides to inline foo into main, and says "hey, foo has an alloca A, and I have an alloca B from another inlined call site, lets reuse it". The problem with this of course, is that the lifetime of A and B are nested, not disjoint. Unfortunately I can't create a reasonable testcase for this: the one in the PR is both huge and extremely sensitive, because you minor tweaks end up causing foo to get inlined into bar too early. We already have tests for the basic alloca merging optimization and this does not break them. llvm-svn: 120995
*	improve comment	Chris Lattner	2010-12-06	1	-2/+1
\| \| \| \|	llvm-svn: 120994
*	improve -debug output and comments a little.	Chris Lattner	2010-12-06	1	-3/+5
\| \| \| \|	llvm-svn: 120993
*	Support/Windows: Make MinGW happy.	Michael J. Spencer	2010-12-06	2	-7/+7
\| \| \| \|	llvm-svn: 120991
*	Support/FileSystem: Add directory_iterator implementation.	Michael J. Spencer	2010-12-06	3	-3/+94
\| \| \| \|	llvm-svn: 120989
*	Support/PathV2: Fix append to not add a slash to empty or root paths.	Michael J. Spencer	2010-12-06	1	-1/+1
\| \| \| \|	llvm-svn: 120988
*	Support/Windows: Add ScopedHandle and move some clients over to it.	Michael J. Spencer	2010-12-06	2	-10/+50
\| \| \| \|	llvm-svn: 120987
*	ptx: add shift instructions	Che-Liang Chiou	2010-12-06	1	-0/+27
\| \| \| \|	llvm-svn: 120982
*	Remove the getAddress getter, initialize Ordinal in the constructor and use	Rafael Espindola	2010-12-06	2	-1/+2
\| \| \| \| \| \|	that on the ELF writer to detect a section we created. llvm-svn: 120981
*	Simplify a bit.	Rafael Espindola	2010-12-06	1	-1/+1
\| \| \| \|	llvm-svn: 120980
*	Use getSymbolOffset on the COFF writer.	Rafael Espindola	2010-12-06	1	-1/+1
\| \| \| \|	llvm-svn: 120979
*	Don't use PadSectionToAlignment on windows.	Rafael Espindola	2010-12-06	1	-1/+1
\| \| \| \|	llvm-svn: 120978
*	Add a getSymbolOffset method and use it in the ELF writer.	Rafael Espindola	2010-12-06	2	-15/+13
\| \| \| \|	llvm-svn: 120977
*	Fix PR8728, a miscompilation I recently introduced. When optimizing	Chris Lattner	2010-12-06	1	-5/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	memcpy's like: memcpy(A, B) memcpy(A, C) we cannot delete the first memcpy as dead if A and C might be aliases. If so, we actually get: memcpy(A, B) memcpy(A, A) which is not correct to transform into: memcpy(A, A) This patch was heavily influenced by Jakub Staszak's patch in PR8728, thanks Jakub! llvm-svn: 120974
*	Eliminate unneeded #include's.	Evan Cheng	2010-12-05	1	-2/+0
\| \| \| \|	llvm-svn: 120971
*	ARM/CMakeLists.txt: Add missing MLxExpansionPass.cpp since r120960.	NAKAMURA Takumi	2010-12-05	1	-0/+1
\| \| \| \|	llvm-svn: 120966
*	Code clean up.	Evan Cheng	2010-12-05	1	-6/+6
\| \| \| \|	llvm-svn: 120965
*	Remove an unused variable.	Evan Cheng	2010-12-05	1	-2/+1
\| \| \| \|	llvm-svn: 120964
*	Some cleanup before I start committing some incremental progress on	Cameron Zwarich	2010-12-05	1	-21/+22
\| \| \| \| \| \|	StrongPHIElimination. llvm-svn: 120961
*	Making use of VFP / NEON floating point multiply-accumulate / subtraction is	Evan Cheng	2010-12-05	19	-191/+771
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Work in progress, only A+B are enabled. llvm-svn: 120960
*	Remove the PHIElimination.h header, as it is no longer needed.	Cameron Zwarich	2010-12-05	2	-82/+55
\| \| \| \|	llvm-svn: 120959
*	Fix PR 4170 by having ExtractValueInst::getIndexedType() reject ↵	Frits van Bommel	2010-12-05	1	-7/+24
\| \| \| \| \| \| \| \|	out-of-bounds indexing. Also add asserts that the indices are valid in InsertValueInst::init(). ExtractValueInst already asserts when constructed with invalid indices. llvm-svn: 120956
*	I forgot to actually remove the FindCopyInsertPoint() declaration from	Cameron Zwarich	2010-12-05	1	-8/+0
\| \| \| \| \| \|	PHIElimination.h. llvm-svn: 120953
*	Remove the SplitCriticalEdge() method declaration from PHIElimination.h. At one	Cameron Zwarich	2010-12-05	1	-6/+0
\| \| \| \| \| \| \|	time, this method existed, but now PHIElimination uses the method of the same name on MachineBasicBlock. llvm-svn: 120952
*	Move the FindCopyInsertPoint method of PHIElimination to a new standalone	Cameron Zwarich	2010-12-05	4	-45/+89
\| \| \| \| \| \|	function so that it can be shared with StrongPHIElimination. llvm-svn: 120951
*	Refactor jump threading.	Frits van Bommel	2010-12-05	1	-69/+73
\| \| \| \| \| \| \|	Should have no functional change other than the order of two transformations that are mutually-exclusive and the exact formatting of debug output. Internally, it now stores the ConstantInts as Constants, and actual undef values instead of nulls. llvm-svn: 120946
*	Remove trailing whitespace.	Frits van Bommel	2010-12-05	1	-208/+208
\| \| \| \|	llvm-svn: 120945
*	Teach SimplifyCFG to turn	Frits van Bommel	2010-12-05	1	-2/+72
\| \| \| \| \| \| \| \| \|	(indirectbr (select cond, blockaddress(@fn, BlockA), blockaddress(@fn, BlockB))) into (br cond, BlockA, BlockB). llvm-svn: 120943
*	Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags	Chris Lattner	2010-12-05	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	result. This allows us to compile: void *test12(long count) { return new int[count]; } into: test12: movl $4, %ecx movq %rdi, %rax mulq %rcx movq $-1, %rdi cmovnoq %rax, %rdi jmp __Znam ## TAILCALL instead of: test12: movl $4, %ecx movq %rdi, %rax mulq %rcx seto %cl testb %cl, %cl movq $-1, %rdi cmoveq %rax, %rdi jmp __Znam Of course it would be even better if the regalloc inverted the cmov to 'cmovoq', which would eliminate the need for the 'movq %rdi, %rax'. llvm-svn: 120936
*	it turns out that when ".with.overflow" intrinsics were added to the X86	Chris Lattner	2010-12-05	5	-20/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	backend that they were all implemented except umul. This one fell back to the default implementation that did a hi/lo multiply and compared the top. Fix this to check the overflow flag that the 'mul' instruction sets, so we can avoid an explicit test. Now we compile: void *func(long count) { return new int[count]; } into: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] seto %cl ## encoding: [0x0f,0x90,0xc1] testb %cl, %cl ## encoding: [0x84,0xc9] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL instead of: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL Other than the silly seto+test, this is using the o bit directly, so it's going in the right direction. llvm-svn: 120935
*	generalize the previous check to handle -1 on either side of the	Chris Lattner	2010-12-05	1	-7/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	select, inserting a not to compensate. Add a missing isZero check that I lost somehow. This improves codegen of: void *func(long count) { return new int[count]; } from: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] testq %rdx, %rdx ## encoding: [0x48,0x85,0xd2] movq $-1, %rdi ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff] cmoveq %rax, %rdi ## encoding: [0x48,0x0f,0x44,0xf8] jmp __Znam ## TAILCALL ## encoding: [0xeb,A] to: __Z4funcl: ## @_Z4funcl movl $4, %ecx ## encoding: [0xb9,0x04,0x00,0x00,0x00] movq %rdi, %rax ## encoding: [0x48,0x89,0xf8] mulq %rcx ## encoding: [0x48,0xf7,0xe1] cmpq $1, %rdx ## encoding: [0x48,0x83,0xfa,0x01] sbbq %rdi, %rdi ## encoding: [0x48,0x19,0xff] notq %rdi ## encoding: [0x48,0xf7,0xd7] orq %rax, %rdi ## encoding: [0x48,0x09,0xc7] jmp __Znam ## TAILCALL ## encoding: [0xeb,A] llvm-svn: 120932