bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	setcond instructions don't have aliasing implications.	Chris Lattner	2004-02-27	1	-2/+2
\| \| \| \|	llvm-svn: 11919
*	Fix Regression/Assembler/2004-02-27-SelfUseAssertError.ll	Chris Lattner	2004-02-27	1	-1/+2
\| \| \| \|	llvm-svn: 11913
*	Add memory operand folding support for the SETcc family of	Alkis Evlogimenos	2004-02-27	2	-0/+25
\| \| \| \| \| \|	instructions. llvm-svn: 11907
*	Add memory operand folding support for SHLD and SHRD instructions.	Alkis Evlogimenos	2004-02-27	2	-0/+15
\| \| \| \|	llvm-svn: 11905
*	Add memory operand folding support for SHL, SHR and SAR, SHLD instructions.	Alkis Evlogimenos	2004-02-27	2	-0/+39
\| \| \| \|	llvm-svn: 11903
*	Rename SHL, SHR, SAR, SHLD and SHLR instructions to make them	Alkis Evlogimenos	2004-02-27	2	-25/+28
\| \| \| \| \| \| \|	consistent with the rest and also pepare for the addition of their memory operand variants. llvm-svn: 11902
*	Implement test/Regression/Transforms/InstCombine/canonicalize_branch.ll	Chris Lattner	2004-02-27	1	-1/+23
\| \| \| \| \| \| \|	This is a really minor thing, but might help out the 'switch statement induction' code in simplifycfg. llvm-svn: 11900
*	Rename member function to be consistent with the rest.	Alkis Evlogimenos	2004-02-27	2	-4/+4
\| \| \| \|	llvm-svn: 11898
*	Make spiller push stores right after the definition of a register so	Alkis Evlogimenos	2004-02-27	1	-8/+33
\| \| \| \| \| \|	that they are as far away from the loads as possible. llvm-svn: 11895
*	Fix crash caused by passing register 0 to	Alkis Evlogimenos	2004-02-27	1	-1/+1
\| \| \| \| \| \|	MRegisterInfo::isPhysicalRegister(). llvm-svn: 11894
*	Clear maps right after basic block is processed.	Alkis Evlogimenos	2004-02-26	1	-4/+4
\| \| \| \|	llvm-svn: 11892
*	Fixes for PR258 and PR259.	John Criswell	2004-02-26	1	-3/+3
\| \| \| \| \| \| \| \| \|	Functions with linkonce linkage are declared with weak linkage. Global floating point constants used to represent unprintable values (such as NaN and infinity) are declared static so that they don't interfere with other CBE generated translation units. llvm-svn: 11884
*	Be a good little compiler and handle direct calls efficiently, even if there	Chris Lattner	2004-02-26	1	-10/+13
\| \| \| \| \| \|	are beastly ConstantPointerRefs in the way... llvm-svn: 11883
*	Uncomment assertions that register# != 0 on calls to	Alkis Evlogimenos	2004-02-26	4	-21/+28
\| \| \| \| \| \| \|	MRegisterInfo::is{Physical,Virtual}Register. Apply appropriate fixes to relevant files. llvm-svn: 11882
*	Since LLVM uses structure type equivalence, it isn't useful to keep around	Chris Lattner	2004-02-26	1	-10/+11
\| \| \| \| \| \| \|	multiple type names for the same structural type. Make DTE eliminate all but one of the type names llvm-svn: 11879
*	Use a map instead of annotations	Chris Lattner	2004-02-26	1	-23/+36
\| \| \| \|	llvm-svn: 11875
*	remove obsolete comment	Chris Lattner	2004-02-26	1	-1/+1
\| \| \| \|	llvm-svn: 11872
*	Make sure that at least one virtual method is defined in a .cpp file to avoid	Chris Lattner	2004-02-26	2	-1/+13
\| \| \| \| \| \|	having the compiler emit RTTI and vtables to EVERY translation unit. llvm-svn: 11871
*	turn things like:	Chris Lattner	2004-02-26	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	if (X == 0 \|\| X == 2) ...where the comparisons and branches are in different blocks... into a switch instruction. This comes up a lot in various programs, and works well with the switch/switch merging code I checked earlier. For example, this testcase: int switchtest(int C) { return C == 0 ? f(123) : C == 1 ? f(3123) : C == 4 ? f(312) : C == 5 ? f(1234): f(444); } is converted into this: switch int %C, label %cond_false.3 [ int 0, label %cond_true.0 int 1, label %cond_true.1 int 4, label %cond_true.2 int 5, label %cond_true.3 ] instead of a whole bunch of conditional branches. Admittedly the code is ugly, and incomplete. To be complete, we need to add br -> switch merging and switch -> br merging. For example, this testcase: struct foo { int Q, R, Z; }; #define A (X->Q+X->R * 123) int test(struct foo X) { return A == 123 ? X1() : A == 12321 ? X2(): (A == 111 \|\| A == 222) ? X3() : A == 875 ? X4() : X5(); } Gets compiled to this: switch int %tmp.7, label %cond_false.2 [ int 123, label %cond_true.0 int 12321, label %cond_true.1 int 111, label %cond_true.2 int 222, label %cond_true.2 ] ... cond_false.2: ; preds = %entry %tmp.52 = seteq int %tmp.7, 875 ; <bool> [#uses=1] br bool %tmp.52, label %cond_true.3, label %cond_false.3 where the branch could be folded into the switch. This kind of thing occurs ALL OF THE TIME, especially in programs like 176.gcc, which is a horrible mess of code. It contains stuff like shudder*: #define SWITCH_TAKES_ARG(CHAR) \ ( (CHAR) == 'D' \ \|\| (CHAR) == 'U' \ \|\| (CHAR) == 'o' \ \|\| (CHAR) == 'e' \ \|\| (CHAR) == 'u' \ \|\| (CHAR) == 'I' \ \|\| (CHAR) == 'm' \ \|\| (CHAR) == 'L' \ \|\| (CHAR) == 'A' \ \|\| (CHAR) == 'h' \ \|\| (CHAR) == 'z') and #define CONST_OK_FOR_LETTER_P(VALUE, C) \ ((C) == 'I' ? SMALL_INTVAL (VALUE) \ : (C) == 'J' ? SMALL_INTVAL (-(VALUE)) \ : (C) == 'K' ? (unsigned)(VALUE) < 32 \ : (C) == 'L' ? ((VALUE) & 0xffff) == 0 \ : (C) == 'M' ? integer_ok_for_set (VALUE) \ : (C) == 'N' ? (VALUE) < 0 \ : (C) == 'O' ? (VALUE) == 0 \ : (C) == 'P' ? (VALUE) >= 0 \ : 0) and #define LEGITIMIZE_ADDRESS(X,OLDX,MODE,WIN) \ { \ if (GET_CODE (X) == PLUS && CONSTANT_ADDRESS_P (XEXP (X, 1))) \ (X) = gen_rtx (PLUS, SImode, XEXP (X, 0), \ copy_to_mode_reg (SImode, XEXP (X, 1))); \ if (GET_CODE (X) == PLUS && CONSTANT_ADDRESS_P (XEXP (X, 0))) \ (X) = gen_rtx (PLUS, SImode, XEXP (X, 1), \ copy_to_mode_reg (SImode, XEXP (X, 0))); \ if (GET_CODE (X) == PLUS && GET_CODE (XEXP (X, 0)) == MULT) \ (X) = gen_rtx (PLUS, SImode, XEXP (X, 1), \ force_operand (XEXP (X, 0), 0)); \ if (GET_CODE (X) == PLUS && GET_CODE (XEXP (X, 1)) == MULT) \ (X) = gen_rtx (PLUS, SImode, XEXP (X, 0), \ force_operand (XEXP (X, 1), 0)); \ if (GET_CODE (X) == PLUS && GET_CODE (XEXP (X, 0)) == PLUS) \ (X) = gen_rtx (PLUS, Pmode, force_operand (XEXP (X, 0), NULL_RTX),\ XEXP (X, 1)); \ if (GET_CODE (X) == PLUS && GET_CODE (XEXP (X, 1)) == PLUS) \ (X) = gen_rtx (PLUS, Pmode, XEXP (X, 0), \ force_operand (XEXP (X, 1), NULL_RTX)); \ if (GET_CODE (X) == SYMBOL_REF \|\| GET_CODE (X) == CONST \ \|\| GET_CODE (X) == LABEL_REF) \ (X) = legitimize_address (flag_pic, X, 0, 0); \ if (memory_address_p (MODE, X)) \ goto WIN; } and others. These macros get used multiple times of course. These are such lovely candidates for macros, aren't they? :) This code also nicely handles LLVM constructs that look like this: if (isa<CastInst>(I)) ... else if (isa<BranchInst>(I)) ... else if (isa<SetCondInst>(I)) ... else if (isa<UnwindInst>(I)) ... else if (isa<VAArgInst>(I)) ... where the isa can obviously be a dyn_cast as well. Switch instructions are a good thing. llvm-svn: 11870
*	No need to clear the map here, it will always be empty	Chris Lattner	2004-02-26	1	-1/+0
\| \| \| \|	llvm-svn: 11868
*	Fix typo	Chris Lattner	2004-02-26	1	-1/+1
\| \| \| \|	llvm-svn: 11864
*	The node doesn't have to be _no_ node flags, it just has to be complete and	Chris Lattner	2004-02-26	1	-2/+3
\| \| \| \| \| \|	not have any globals. llvm-svn: 11863
*	Add _more_ functions	Chris Lattner	2004-02-26	1	-3/+20
\| \| \| \|	llvm-svn: 11862
*	Fix some warnings, some of which were spurious, and some of which were real	Chris Lattner	2004-02-26	1	-6/+6
\| \| \| \| \| \|	bugs. Thanks Brian! llvm-svn: 11859
*	Instructions to call and return from functions.	Misha Brukman	2004-02-26	1	-1/+25
\| \| \| \|	llvm-svn: 11858
*	Two changes:	Chris Lattner	2004-02-25	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	1. Functions do not make things incomplete, only variables 2. Constant global variables no longer need to be marked incomplete, because we are guaranteed that the initializer for the global will be in the graph we are hacking on now. This makes resolution of indirect calls happen a lot more in the bu pass, supports things like vtables and the C counterparts (giant constant arrays of function pointers), etc... Testcase here: test/Regression/Analysis/DSGraph/constant_globals.ll llvm-svn: 11852
*	When building local graphs, clone the initializer for constant globals into each	Chris Lattner	2004-02-25	1	-7/+21
\| \| \| \| \| \|	local graph that uses the global. llvm-svn: 11850
*	Fix bugs found with recent addition of assertions in	Alkis Evlogimenos	2004-02-25	1	-2/+2
\| \| \| \| \| \|	MRegisterInfo::is{Physical,Virtual}Register. llvm-svn: 11849
*	Simplify the dead node elimination stuff	Chris Lattner	2004-02-25	1	-10/+12
\| \| \| \| \| \| \| \| \| \|	Make the incompleteness marker faster by looping directly over the globals instead of over the scalars to find the globals Fix a bug where we didn't mark a global incomplete if it didn't have any outgoing edges. This wouldn't break any current clients but is still wrong. llvm-svn: 11848
*	Add a bunch more functions	Chris Lattner	2004-02-25	1	-8/+55
\| \| \| \|	llvm-svn: 11847
*	Try harder to get symbol info	Chris Lattner	2004-02-25	1	-0/+3
\| \| \| \|	llvm-svn: 11846
*	Represent va_list in interpreter as a (ec-stack-depth . var-arg-index)	Brian Gaeke	2004-02-25	1	-11/+14
\| \| \| \| \| \| \| \| \|	pair, and look up varargs in the execution stack every time, instead of just pushing iterators (which can be invalidated during callFunction()) around. (union GenericValue now has a "pair of uints" member, to support this mechanism.) Fixes Bug 234. llvm-svn: 11845
*	Great sparc renaming fallout IV: Sparc --> SparcV9.	Brian Gaeke	2004-02-25	3	-3/+3
\| \| \| \|	llvm-svn: 11844
*	Remove asssert since it is breaking cases that it shouldn't.	Alkis Evlogimenos	2004-02-25	1	-1/+0
\| \| \| \|	llvm-svn: 11841
*	Add DenseMap template and actually use it for for mapping virtual regs	Alkis Evlogimenos	2004-02-25	3	-36/+29
\| \| \| \| \| \|	to objects. llvm-svn: 11840
*	My faith in programmers has been found to be totally misplaced. One would	Chris Lattner	2004-02-25	1	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	assume that if they don't intend to write to a global variable, that they would mark it as constant. However, there are people that don't understand that the compiler can do nice things for them if they give it the information it needs. This pass looks for blatently obvious globals that are only ever read from. Though it uses a trivially simple "alias analysis" of sorts, it is still able to do amazing things to important benchmarks. 253.perlbmk, for example, contains several *GIANT* function pointer tables that are not marked constant and should be. Marking them constant allows the optimizer to turn a whole bunch of indirect calls into direct calls. Note that only a link-time optimizer can do this transformation, but perlbmk does have several strings and other minor globals that can be marked constant by this pass when run from GCCAS. 176.gcc has a ton of strings and large tables that are marked constant, both at compile time (38 of them) and at link time (48 more). Other benchmarks give similar results, though it seems like big ones have disproportionally more than small ones. This pass is extremely quick and does good things. I'm going to enable it in gccas & gccld. Not bad for 50 SLOC. llvm-svn: 11836
*	SparcV8 regs are really 32-bit, not 64! Thanks, Chris.	Misha Brukman	2004-02-25	1	-1/+1
\| \| \| \|	llvm-svn: 11835
*	Clean up the tablegen descriptions for SparcV8.	Misha Brukman	2004-02-25	2	-35/+22
\| \| \| \|	llvm-svn: 11834
*	Fix the SparcV8 register definitions that were imported from PPC template.	Misha Brukman	2004-02-25	1	-65/+25
\| \| \| \|	llvm-svn: 11833
*	SparcV8 has different types of instructions, but F1 is only used for CALL.	Misha Brukman	2004-02-25	2	-0/+105
\| \| \| \|	llvm-svn: 11832
*	Add an assertion	Chris Lattner	2004-02-25	1	-0/+1
\| \| \| \|	llvm-svn: 11830
*	Fix failures in 099.go due to the cfgsimplify pass creating switch instructions	Chris Lattner	2004-02-25	1	-5/+6
\| \| \| \| \| \|	where there did not used to be any before llvm-svn: 11829
*	SparcV8 skeleton	Brian Gaeke	2004-02-25	14	-0/+736
\| \| \| \|	llvm-svn: 11828
*	Great renaming part II: Sparc --> SparcV9 (also includes command-line ↵	Brian Gaeke	2004-02-25	3	-7/+7
\| \| \| \| \| \|	options and Makefiles) llvm-svn: 11827
*	Great renaming: Sparc --> SparcV9	Brian Gaeke	2004-02-25	58	-375/+375
\| \| \| \|	llvm-svn: 11826
*	Add a bunch more functions used by perlbmk	Chris Lattner	2004-02-25	1	-14/+50
\| \| \| \|	llvm-svn: 11824
*	Fix incorrect debug code	Chris Lattner	2004-02-25	1	-1/+2
\| \| \| \|	llvm-svn: 11821
*	Teach the instruction selector how to transform 'array' GEP computations ↵	Chris Lattner	2004-02-25	1	-24/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into X86 scaled indexes. This allows us to compile GEP's like this: int* %test([10 x { int, { int } }]* %X, int %Idx) { %Idx = cast int %Idx to long %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0 ret int* %X } Into a single address computation: test: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4] ret Before it generated: test: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] shl %ECX, 3 add %EAX, %ECX lea %EAX, DWORD PTR [%EAX + 4] ret This is useful for things like int/float/double arrays, as the indexing can be folded into the loads&stores, reducing register pressure and decreasing the pressure on the decode unit. With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot. On bzip2 for example, we go from this: 10665 asm-printer - Number of machine instrs printed 40 ra-local - Number of loads/stores folded into instructions 1708 ra-local - Number of loads added 1532 ra-local - Number of stores added 1354 twoaddressinstruction - Number of instructions added 1354 twoaddressinstruction - Number of two-address instructions 2794 x86-peephole - Number of peephole optimization performed to this: 9873 asm-printer - Number of machine instrs printed 41 ra-local - Number of loads/stores folded into instructions 1710 ra-local - Number of loads added 1521 ra-local - Number of stores added 789 twoaddressinstruction - Number of instructions added 789 twoaddressinstruction - Number of two-address instructions 2142 x86-peephole - Number of peephole optimization performed ... and these types of instructions are often in tight loops. Linear scan is also helped, but not as much. It goes from: 8787 asm-printer - Number of machine instrs printed 2389 liveintervals - Number of identity moves eliminated after coalescing 2288 liveintervals - Number of interval joins performed 3522 liveintervals - Number of intervals after coalescing 5810 liveintervals - Number of original intervals 700 spiller - Number of loads added 487 spiller - Number of stores added 303 spiller - Number of register spills 1354 twoaddressinstruction - Number of instructions added 1354 twoaddressinstruction - Number of two-address instructions 363 x86-peephole - Number of peephole optimization performed to: 7982 asm-printer - Number of machine instrs printed 1759 liveintervals - Number of identity moves eliminated after coalescing 1658 liveintervals - Number of interval joins performed 3282 liveintervals - Number of intervals after coalescing 4940 liveintervals - Number of original intervals 635 spiller - Number of loads added 452 spiller - Number of stores added 288 spiller - Number of register spills 789 twoaddressinstruction - Number of instructions added 789 twoaddressinstruction - Number of two-address instructions 258 x86-peephole - Number of peephole optimization performed Though I'm not complaining about the drop in the number of intervals. :) llvm-svn: 11820
*	* Make the previous patch more efficient by not allocating a temporary ↵	Chris Lattner	2004-02-25	1	-56/+184
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MachineInstr to do analysis. * FOLD getelementptr instructions into loads and stores when possible, making use of some of the crazy X86 addressing modes. For example, the following C++ program fragment: struct complex { double re, im; complex(double r, double i) : re(r), im(i) {} }; inline complex operator+(const complex& a, const complex& b) { return complex(a.re+b.re, a.im+b.im); } complex addone(const complex& arg) { return arg + complex(1,0); } Used to be compiled to: _Z6addoneRK7complex: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] * mov %EDX, %ECX fld QWORD PTR [%EDX] fld1 faddp %ST(1) * add %ECX, 8 fld QWORD PTR [%ECX] fldz faddp %ST(1) * mov %ECX, %EAX fxch %ST(1) fstp QWORD PTR [%ECX] *** add %EAX, 8 fstp QWORD PTR [%EAX] ret Now it is compiled to: _Z6addoneRK7complex: mov %EAX, DWORD PTR [%ESP + 4] mov %ECX, DWORD PTR [%ESP + 8] fld QWORD PTR [%ECX] fld1 faddp %ST(1) fld QWORD PTR [%ECX + 8] fldz faddp %ST(1) fxch %ST(1) fstp QWORD PTR [%EAX] fstp QWORD PTR [%EAX + 8] ret Other programs should see similar improvements, across the board. Note that in addition to reducing instruction count, this also reduces register pressure a lot, always a good thing on X86. :) llvm-svn: 11819
*	Add a helper to create an addressing mode given all of the pieces.	Chris Lattner	2004-02-25	1	-0/+8
\| \| \| \|	llvm-svn: 11818