bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Remove an unreachable 'break' following a 'return'.	Craig Topper	2013-04-22	1	-1/+0
\| \| \| \|	llvm-svn: 179991
*	Legalize vector truncates by parts rather than just splitting.	Jim Grosbach	2013-04-21	5	-38/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than just splitting the input type and hoping for the best, apply a bit more cleverness. Just splitting the types until the source is legal often leads to an illegal result time, which is then widened and a scalarization step is introduced which leads to truly horrible code generation. With the loop vectorizer, these sorts of operations are much more common, and so it's worth extra effort to do them well. Add a legalization hook for the operands of a TRUNCATE node, which will be encountered after the result type has been legalized, but if the operand type is still illegal. If simple splitting of both types ends up with the result type of each half still being legal, just do that (v16i16 -> v16i8 on ARM, for example). If, however, that would result in an illegal result type (v8i32 -> v8i8 on ARM, for example), we can get more clever with power-two vectors. Specifically, split the input type, but also widen the result element size, then concatenate the halves and truncate again. For example on ARM, To perform a "%res = v8i8 trunc v8i32 %in" we transform to: %inlo = v4i32 extract_subvector %in, 0 %inhi = v4i32 extract_subvector %in, 4 %lo16 = v4i16 trunc v4i32 %inlo %hi16 = v4i16 trunc v4i32 %inhi %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16 %res = v8i8 trunc v8i16 %in16 This allows instruction selection to generate three VMOVN instructions instead of a sequences of moves, stores and loads. Update the ARMTargetTransformInfo to take this improved legalization into account. Consider the simplified IR: define <16 x i8> @test1(<16 x i32>* %ap) { %a = load <16 x i32>* %ap %tmp = trunc <16 x i32> %a to <16 x i8> ret <16 x i8> %tmp } define <8 x i8> @test2(<8 x i32>* %ap) { %a = load <8 x i32>* %ap %tmp = trunc <8 x i32> %a to <8 x i8> ret <8 x i8> %tmp } Previously, we would generate the truly hideous: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: push {r7} mov r7, sp sub sp, sp, #20 bic sp, sp, #7 add r1, r0, #48 add r2, r0, #32 vld1.64 {d24, d25}, [r0:128] vld1.64 {d16, d17}, [r1:128] vld1.64 {d18, d19}, [r2:128] add r1, r0, #16 vmovn.i32 d22, q8 vld1.64 {d16, d17}, [r1:128] vmovn.i32 d20, q9 vmovn.i32 d18, q12 vmov.u16 r0, d22[3] strb r0, [sp, #15] vmov.u16 r0, d22[2] strb r0, [sp, #14] vmov.u16 r0, d22[1] strb r0, [sp, #13] vmov.u16 r0, d22[0] vmovn.i32 d16, q8 strb r0, [sp, #12] vmov.u16 r0, d20[3] strb r0, [sp, #11] vmov.u16 r0, d20[2] strb r0, [sp, #10] vmov.u16 r0, d20[1] strb r0, [sp, #9] vmov.u16 r0, d20[0] strb r0, [sp, #8] vmov.u16 r0, d18[3] strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] vldmia sp, {d16, d17} vmov r0, r1, d16 vmov r2, r3, d17 mov sp, r7 pop {r7} bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: push {r7} mov r7, sp sub sp, sp, #12 bic sp, sp, #7 vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d20, d21}, [r0:128] vmovn.i32 d18, q8 vmov.u16 r0, d18[3] vmovn.i32 d16, q10 strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] ldm sp, {r0, r1} mov sp, r7 pop {r7} bx lr Now, however, we generate the much more straightforward: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: add r1, r0, #48 add r2, r0, #32 vld1.64 {d20, d21}, [r0:128] vld1.64 {d16, d17}, [r1:128] add r1, r0, #16 vld1.64 {d18, d19}, [r2:128] vld1.64 {d22, d23}, [r1:128] vmovn.i32 d17, q8 vmovn.i32 d16, q9 vmovn.i32 d18, q10 vmovn.i32 d19, q11 vmovn.i16 d17, q8 vmovn.i16 d16, q9 vmov r0, r1, d16 vmov r2, r3, d17 bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d18, d19}, [r0:128] vmovn.i32 d16, q8 vmovn.i32 d17, q9 vmovn.i16 d16, q8 vmov r0, r1, d16 bx lr llvm-svn: 179989
*	ARM: Split out cost model vcvt testcases.	Jim Grosbach	2013-04-21	2	-172/+171
\| \| \| \| \| \|	They had a separate RUN line already, so may as well be in a separate file. llvm-svn: 179988
*	Passing arguments to varags functions under the SPARC v9 ABI.	Jakob Stoklund Olesen	2013-04-21	2	-0/+60
\| \| \| \| \| \| \|	Arguments after the fixed arguments never use the floating point registers. llvm-svn: 179987
*	Tidy up comment grammar.	Jim Grosbach	2013-04-21	1	-2/+2
\| \| \| \|	llvm-svn: 179986
*	Fix the SETHIimm pattern for 64-bit code.	Jakob Stoklund Olesen	2013-04-21	2	-2/+7
\| \| \| \| \| \|	Don't ignore the high 32 bits of the immediate. llvm-svn: 179985
*	Remove unused, undefined ArgFlagsTy::getArgFlagsString; add a comment about ↵	Stephen Lin	2013-04-21	1	-5/+2
\| \| \| \| \| \|	'returned' llvm-svn: 179983
*	SROA: Don't crash on a select with two identical operands.	Benjamin Kramer	2013-04-21	2	-8/+19
\| \| \| \| \| \| \|	This is an edge case that can happen if we modify a chain of multiple selects. Update all operands in that case and remove the assert. PR15805. llvm-svn: 179982
*	Revert "SimplifyCFG: If convert single conditional stores"	Arnold Schwaighofer	2013-04-21	2	-171/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is the temptation to make this tranform dependent on target information as it is not going to be beneficial on all (sub)targets. Therefore, we should probably do this in MI Early-Ifconversion. This reverts commit r179957. Original commit message: "SimplifyCFG: If convert single conditional stores This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case were the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. I am going to watch performance numbers across the builtbots and will revert this if anything unexpected comes up." llvm-svn: 179980
*	ARM: fix part of test which actually needed an asserts build	Tim Northover	2013-04-21	2	-6/+30
\| \| \| \| \| \|	This should fix a buildbot failure that occurred after r179977. llvm-svn: 179978
*	ARM: Use ldrd/strd to spill 64-bit pairs when available.	Tim Northover	2013-04-21	4	-50/+133
\| \| \| \| \| \| \|	This allows common sp-offsets to be part of the instruction and is probably faster on modern CPUs too. llvm-svn: 179977
*	Remove the executable bit on cmake files	Sylvestre Ledru	2013-04-21	2	-0/+0
\| \| \| \|	llvm-svn: 179976
*	SLPVectorize: Add support for vectorization of casts.	Nadav Rotem	2013-04-21	2	-0/+107
\| \| \| \|	llvm-svn: 179975
*	SLPVectorizer: Fix a bug in the code that scans the tree in search of nodes ↵	Nadav Rotem	2013-04-21	1	-0/+1
\| \| \| \| \| \| \| \|	with multiple users. We did not terminate the switch case and we executed the search routine twice. llvm-svn: 179974
*	[objc-arc] Cleaned up tail-call-invariant-enforcement.ll.	Michael Gottesman	2013-04-21	1	-25/+40
\| \| \| \| \| \| \| \| \| \| \| \|	Specifically: 1. Added checks that unwind is being properly added to various instructions. 2. Fixed the declaration/calling of objc_release to have a return type of void. 3. Moved all checks to precede the functions and added checks to ensure that the checks would only match inside the specific function that we are attempting to check. llvm-svn: 179973
*	[objc-arc] Check that objc-arc-expand properly handles all strictly ↵	Michael Gottesman	2013-04-21	1	-5/+71
\| \| \| \| \| \|	forwarding calls and does not touch calls which are not strictly forwarding (i.e. objc_retainBlock). llvm-svn: 179972
*	[objc-arc] Renamed the test file ↵	Michael Gottesman	2013-04-21	1	-0/+0
\| \| \| \| \| \|	clang-arc-used-intrinsic-removed-if-isolated.ll -> intrinsic-use-isolated.ll to match the other test file intrinsic-use.ll. llvm-svn: 179971
*	Remove tbaa metadata.	Bill Wendling	2013-04-21	1	-7/+3
\| \| \| \|	llvm-svn: 179970
*	When we strength reduce an objc_retainBlock call to objc_retain, increment ↵	Michael Gottesman	2013-04-21	1	-1/+6
\| \| \| \| \| \|	NumPeeps and make sure that Changed is set to true. llvm-svn: 179968
*	Fixed comment typo.	Michael Gottesman	2013-04-21	1	-1/+1
\| \| \| \|	llvm-svn: 179967
*	[objc-arc] Fixed typo in debug message.	Michael Gottesman	2013-04-21	1	-1/+1
\| \| \| \|	llvm-svn: 179966
*	[objc-arc] Fixed comment typo.	Michael Gottesman	2013-04-21	1	-1/+1
\| \| \| \|	llvm-svn: 179965
*	[objc-arc] Refactored OptimizeReturns so that it uses continue instead of a ↵	Michael Gottesman	2013-04-21	1	-25/+30
\| \| \| \| \| \|	large multi-level nested if statement. llvm-svn: 179964
*	[objc-arc] Added debug statement saying when we are resetting a sequence's ↵	Michael Gottesman	2013-04-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	progress. This will make it clearer when we are actually resetting a sequence's progress vs just changing state. This is an important distinction because the former case clears any pointers that we are tracking while the later does not. llvm-svn: 179963
*	Compile varargs functions for SPARCv9.	Jakob Stoklund Olesen	2013-04-20	2	-31/+119
\| \| \| \| \| \| \| \| \| \| \| \|	With a little help from the frontend, it looks like the standard va_* intrinsics can do the job. Also clean up an old bitcast hack in LowerVAARG that dealt with unaligned double loads. Load SDNodes can specify an alignment now. Still missing: Calling varargs functions with float arguments. llvm-svn: 179961
*	Fix PR15800. Do not try to vectorize vectors and structs.	Nadav Rotem	2013-04-20	2	-1/+24
\| \| \| \|	llvm-svn: 179960
*	SimplifyCFG: If convert single conditional stores	Arnold Schwaighofer	2013-04-20	2	-4/+171
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case were the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. I am going to watch performance numbers across the builtbots and will revert this if anything unexpected comes up. llvm-svn: 179957
*	ARM: don't add FrameIndex offset for LDMIA (has no immediate)	Tim Northover	2013-04-20	2	-1/+37
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, when spilling 64-bit paired registers, an LDMIA with both a FrameIndex and an offset was produced. This kind of instruction shouldn't exist, and the extra operand was being confused with the predicate, causing aborts later on. This removes the invalid 0-offset from the instruction being produced. llvm-svn: 179956
*	recommit tests	Nuno Lopes	2013-04-20	1	-0/+20
\| \| \| \|	llvm-svn: 179955
*	Minor renaming of tests (for consistency with an in-development patch)	Stephen Lin	2013-04-20	1	-10/+10
\| \| \| \|	llvm-svn: 179954
*	AArch64: remove useless comment	Tim Northover	2013-04-20	1	-2/+0
\| \| \| \|	llvm-svn: 179952
*	Move 'kw_align' case to proper section, reorganize function attribute ↵	Stephen Lin	2013-04-20	1	-12/+25
\| \| \| \| \| \|	keyword case statements to be consistent with r179119 llvm-svn: 179948
*	Remove unused ShouldFoldAtomicFences flag.	Tim Northover	2013-04-20	5	-32/+0
\| \| \| \| \| \| \| \|	I think it's almost impossible to fold atomic fences profitably under LLVM/C++11 semantics. As a result, this is now unused and just cluttering up the target interface. llvm-svn: 179940
*	Remove unused MEMBARRIER DAG node; it's been replaced by ATOMIC_FENCE.	Tim Northover	2013-04-20	18	-195/+2
\| \| \| \|	llvm-svn: 179939
*	Remove dead code.	Rafael Espindola	2013-04-20	1	-4/+0
\| \| \| \| \| \| \| \| \|	This is part of a future patch to use yamlio that incorrectly ended up in a cleanup patch. Thanks to Benjamin Kramer for reporting it. llvm-svn: 179938
*	Don't litter .s files in test directory.	Benjamin Kramer	2013-04-20	1	-1/+1
\| \| \| \|	llvm-svn: 179937
*	VecUtils: Clean up uses of dyn_cast.	Benjamin Kramer	2013-04-20	1	-4/+4
\| \| \| \|	llvm-svn: 179936
*	SLPVectorizer: Strength reduce SmallVectors to ArrayRefs.	Benjamin Kramer	2013-04-20	3	-30/+28
\| \| \| \| \| \|	Avoids a couple of copies and allows more flexibility in the clients. llvm-svn: 179935
*	SLPVectorizer: Reduce the compile time by eliminating the search for some of ↵	Nadav Rotem	2013-04-20	1	-1/+1
\| \| \| \| \| \|	the more expensive patterns. After this change will only check basic arithmetic trees that start at cmpinstr. llvm-svn: 179933
*	refactor tryToVectorizePair to a new method that supports vectorization of ↵	Nadav Rotem	2013-04-20	1	-0/+8
\| \| \| \| \| \|	lists. llvm-svn: 179932
*	Fix an unused variable warning.	Nadav Rotem	2013-04-20	1	-0/+1
\| \| \| \|	llvm-svn: 179931
*	SLPVectorizer: Improve the cost model for loop invariant broadcast values.	Nadav Rotem	2013-04-20	4	-11/+101
\| \| \| \|	llvm-svn: 179930
*	Report the number of stores that were found in the debug message.	Nadav Rotem	2013-04-20	1	-6/+8
\| \| \| \|	llvm-svn: 179929
*	Fix the header comment.	Nadav Rotem	2013-04-20	2	-2/+2
\| \| \| \|	llvm-svn: 179928
*	Use 64bit arithmetic for calculating distance between pointers.	Nadav Rotem	2013-04-20	1	-2/+2
\| \| \| \|	llvm-svn: 179927
*	Move PPC getSwappedPredicate for reuse	Hal Finkel	2013-04-20	3	-17/+20
\| \| \| \| \| \| \| \| \| \| \|	The getSwappedPredicate function can be used in other places (such as in improvements to the PPCCTRLoops pass). Instead of trapping it as a static function in PPCInstrInfo, move it into PPCPredicates with other predicate-related things. No functionality change intended. llvm-svn: 179926
*	Add CodeGen support for functions that always return arguments via a new ↵	Stephen Lin	2013-04-20	20	-43/+371
\| \| \| \| \| \|	parameter attribute 'returned', which is taken advantage of in target-independent tail call opportunity detection and in ARM call lowering (when placed on an integral first parameter). llvm-svn: 179925
*	Allow tail call opportunity detection through nested and/or multiple ↵	Stephen Lin	2013-04-20	2	-73/+214
\| \| \| \| \| \|	iterations of extractelement/insertelement indirection llvm-svn: 179924
*	These can be void.	Rafael Espindola	2013-04-20	1	-12/+7
\| \| \| \|	llvm-svn: 179923
*	Rename obj2yaml local namespace to avoid conflicts with llvm::yaml.	Rafael Espindola	2013-04-20	3	-7/+7
\| \| \| \|	llvm-svn: 179922