| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | Remove an unreachable 'break' following a 'return'. | Craig Topper | 2013-04-22 | 1 | -1/+0 |
| | | | | | llvm-svn: 179991 | ||||
| * | Legalize vector truncates by parts rather than just splitting. | Jim Grosbach | 2013-04-21 | 5 | -38/+81 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than just splitting the input type and hoping for the best, apply a bit more cleverness. Just splitting the types until the source is legal often leads to an illegal result time, which is then widened and a scalarization step is introduced which leads to truly horrible code generation. With the loop vectorizer, these sorts of operations are much more common, and so it's worth extra effort to do them well. Add a legalization hook for the operands of a TRUNCATE node, which will be encountered after the result type has been legalized, but if the operand type is still illegal. If simple splitting of both types ends up with the result type of each half still being legal, just do that (v16i16 -> v16i8 on ARM, for example). If, however, that would result in an illegal result type (v8i32 -> v8i8 on ARM, for example), we can get more clever with power-two vectors. Specifically, split the input type, but also widen the result element size, then concatenate the halves and truncate again. For example on ARM, To perform a "%res = v8i8 trunc v8i32 %in" we transform to: %inlo = v4i32 extract_subvector %in, 0 %inhi = v4i32 extract_subvector %in, 4 %lo16 = v4i16 trunc v4i32 %inlo %hi16 = v4i16 trunc v4i32 %inhi %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16 %res = v8i8 trunc v8i16 %in16 This allows instruction selection to generate three VMOVN instructions instead of a sequences of moves, stores and loads. Update the ARMTargetTransformInfo to take this improved legalization into account. Consider the simplified IR: define <16 x i8> @test1(<16 x i32>* %ap) { %a = load <16 x i32>* %ap %tmp = trunc <16 x i32> %a to <16 x i8> ret <16 x i8> %tmp } define <8 x i8> @test2(<8 x i32>* %ap) { %a = load <8 x i32>* %ap %tmp = trunc <8 x i32> %a to <8 x i8> ret <8 x i8> %tmp } Previously, we would generate the truly hideous: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: push {r7} mov r7, sp sub sp, sp, #20 bic sp, sp, #7 add r1, r0, #48 add r2, r0, #32 vld1.64 {d24, d25}, [r0:128] vld1.64 {d16, d17}, [r1:128] vld1.64 {d18, d19}, [r2:128] add r1, r0, #16 vmovn.i32 d22, q8 vld1.64 {d16, d17}, [r1:128] vmovn.i32 d20, q9 vmovn.i32 d18, q12 vmov.u16 r0, d22[3] strb r0, [sp, #15] vmov.u16 r0, d22[2] strb r0, [sp, #14] vmov.u16 r0, d22[1] strb r0, [sp, #13] vmov.u16 r0, d22[0] vmovn.i32 d16, q8 strb r0, [sp, #12] vmov.u16 r0, d20[3] strb r0, [sp, #11] vmov.u16 r0, d20[2] strb r0, [sp, #10] vmov.u16 r0, d20[1] strb r0, [sp, #9] vmov.u16 r0, d20[0] strb r0, [sp, #8] vmov.u16 r0, d18[3] strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] vldmia sp, {d16, d17} vmov r0, r1, d16 vmov r2, r3, d17 mov sp, r7 pop {r7} bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: push {r7} mov r7, sp sub sp, sp, #12 bic sp, sp, #7 vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d20, d21}, [r0:128] vmovn.i32 d18, q8 vmov.u16 r0, d18[3] vmovn.i32 d16, q10 strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] ldm sp, {r0, r1} mov sp, r7 pop {r7} bx lr Now, however, we generate the much more straightforward: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: add r1, r0, #48 add r2, r0, #32 vld1.64 {d20, d21}, [r0:128] vld1.64 {d16, d17}, [r1:128] add r1, r0, #16 vld1.64 {d18, d19}, [r2:128] vld1.64 {d22, d23}, [r1:128] vmovn.i32 d17, q8 vmovn.i32 d16, q9 vmovn.i32 d18, q10 vmovn.i32 d19, q11 vmovn.i16 d17, q8 vmovn.i16 d16, q9 vmov r0, r1, d16 vmov r2, r3, d17 bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d18, d19}, [r0:128] vmovn.i32 d16, q8 vmovn.i32 d17, q9 vmovn.i16 d16, q8 vmov r0, r1, d16 bx lr llvm-svn: 179989 | ||||
| * | ARM: Split out cost model vcvt testcases. | Jim Grosbach | 2013-04-21 | 2 | -172/+171 |
| | | | | | | | They had a separate RUN line already, so may as well be in a separate file. llvm-svn: 179988 | ||||
| * | Passing arguments to varags functions under the SPARC v9 ABI. | Jakob Stoklund Olesen | 2013-04-21 | 2 | -0/+60 |
| | | | | | | | | Arguments after the fixed arguments never use the floating point registers. llvm-svn: 179987 | ||||
| * | Tidy up comment grammar. | Jim Grosbach | 2013-04-21 | 1 | -2/+2 |
| | | | | | llvm-svn: 179986 | ||||
| * | Fix the SETHIimm pattern for 64-bit code. | Jakob Stoklund Olesen | 2013-04-21 | 2 | -2/+7 |
| | | | | | | | Don't ignore the high 32 bits of the immediate. llvm-svn: 179985 | ||||
| * | Remove unused, undefined ArgFlagsTy::getArgFlagsString; add a comment about ↵ | Stephen Lin | 2013-04-21 | 1 | -5/+2 |
| | | | | | | | 'returned' llvm-svn: 179983 | ||||
| * | SROA: Don't crash on a select with two identical operands. | Benjamin Kramer | 2013-04-21 | 2 | -8/+19 |
| | | | | | | | | This is an edge case that can happen if we modify a chain of multiple selects. Update all operands in that case and remove the assert. PR15805. llvm-svn: 179982 | ||||
| * | Revert "SimplifyCFG: If convert single conditional stores" | Arnold Schwaighofer | 2013-04-21 | 2 | -171/+4 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is the temptation to make this tranform dependent on target information as it is not going to be beneficial on all (sub)targets. Therefore, we should probably do this in MI Early-Ifconversion. This reverts commit r179957. Original commit message: "SimplifyCFG: If convert single conditional stores This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case were the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. I am going to watch performance numbers across the builtbots and will revert this if anything unexpected comes up." llvm-svn: 179980 | ||||
| * | ARM: fix part of test which actually needed an asserts build | Tim Northover | 2013-04-21 | 2 | -6/+30 |
| | | | | | | | This should fix a buildbot failure that occurred after r179977. llvm-svn: 179978 | ||||
| * | ARM: Use ldrd/strd to spill 64-bit pairs when available. | Tim Northover | 2013-04-21 | 4 | -50/+133 |
| | | | | | | | | This allows common sp-offsets to be part of the instruction and is probably faster on modern CPUs too. llvm-svn: 179977 | ||||
| * | Remove the executable bit on cmake files | Sylvestre Ledru | 2013-04-21 | 2 | -0/+0 |
| | | | | | llvm-svn: 179976 | ||||
| * | SLPVectorize: Add support for vectorization of casts. | Nadav Rotem | 2013-04-21 | 2 | -0/+107 |
| | | | | | llvm-svn: 179975 | ||||
| * | SLPVectorizer: Fix a bug in the code that scans the tree in search of nodes ↵ | Nadav Rotem | 2013-04-21 | 1 | -0/+1 |
| | | | | | | | | | with multiple users. We did not terminate the switch case and we executed the search routine twice. llvm-svn: 179974 | ||||
| * | [objc-arc] Cleaned up tail-call-invariant-enforcement.ll. | Michael Gottesman | 2013-04-21 | 1 | -25/+40 |
| | | | | | | | | | | | | | Specifically: 1. Added checks that unwind is being properly added to various instructions. 2. Fixed the declaration/calling of objc_release to have a return type of void. 3. Moved all checks to precede the functions and added checks to ensure that the checks would only match inside the specific function that we are attempting to check. llvm-svn: 179973 | ||||
| * | [objc-arc] Check that objc-arc-expand properly handles all strictly ↵ | Michael Gottesman | 2013-04-21 | 1 | -5/+71 |
| | | | | | | | forwarding calls and does not touch calls which are not strictly forwarding (i.e. objc_retainBlock). llvm-svn: 179972 | ||||
| * | [objc-arc] Renamed the test file ↵ | Michael Gottesman | 2013-04-21 | 1 | -0/+0 |
| | | | | | | | clang-arc-used-intrinsic-removed-if-isolated.ll -> intrinsic-use-isolated.ll to match the other test file intrinsic-use.ll. llvm-svn: 179971 | ||||
| * | Remove tbaa metadata. | Bill Wendling | 2013-04-21 | 1 | -7/+3 |
| | | | | | llvm-svn: 179970 | ||||
| * | When we strength reduce an objc_retainBlock call to objc_retain, increment ↵ | Michael Gottesman | 2013-04-21 | 1 | -1/+6 |
| | | | | | | | NumPeeps and make sure that Changed is set to true. llvm-svn: 179968 | ||||
| * | Fixed comment typo. | Michael Gottesman | 2013-04-21 | 1 | -1/+1 |
| | | | | | llvm-svn: 179967 | ||||
| * | [objc-arc] Fixed typo in debug message. | Michael Gottesman | 2013-04-21 | 1 | -1/+1 |
| | | | | | llvm-svn: 179966 | ||||
| * | [objc-arc] Fixed comment typo. | Michael Gottesman | 2013-04-21 | 1 | -1/+1 |
| | | | | | llvm-svn: 179965 | ||||
| * | [objc-arc] Refactored OptimizeReturns so that it uses continue instead of a ↵ | Michael Gottesman | 2013-04-21 | 1 | -25/+30 |
| | | | | | | | large multi-level nested if statement. llvm-svn: 179964 | ||||
| * | [objc-arc] Added debug statement saying when we are resetting a sequence's ↵ | Michael Gottesman | 2013-04-20 | 1 | -0/+1 |
| | | | | | | | | | | | progress. This will make it clearer when we are actually resetting a sequence's progress vs just changing state. This is an important distinction because the former case clears any pointers that we are tracking while the later does not. llvm-svn: 179963 | ||||
| * | Compile varargs functions for SPARCv9. | Jakob Stoklund Olesen | 2013-04-20 | 2 | -31/+119 |
| | | | | | | | | | | | | | With a little help from the frontend, it looks like the standard va_* intrinsics can do the job. Also clean up an old bitcast hack in LowerVAARG that dealt with unaligned double loads. Load SDNodes can specify an alignment now. Still missing: Calling varargs functions with float arguments. llvm-svn: 179961 | ||||
| * | Fix PR15800. Do not try to vectorize vectors and structs. | Nadav Rotem | 2013-04-20 | 2 | -1/+24 |
| | | | | | llvm-svn: 179960 | ||||
| * | SimplifyCFG: If convert single conditional stores | Arnold Schwaighofer | 2013-04-20 | 2 | -4/+171 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case were the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. I am going to watch performance numbers across the builtbots and will revert this if anything unexpected comes up. llvm-svn: 179957 | ||||
| * | ARM: don't add FrameIndex offset for LDMIA (has no immediate) | Tim Northover | 2013-04-20 | 2 | -1/+37 |
| | | | | | | | | | | | | | Previously, when spilling 64-bit paired registers, an LDMIA with both a FrameIndex and an offset was produced. This kind of instruction shouldn't exist, and the extra operand was being confused with the predicate, causing aborts later on. This removes the invalid 0-offset from the instruction being produced. llvm-svn: 179956 | ||||
| * | recommit tests | Nuno Lopes | 2013-04-20 | 1 | -0/+20 |
| | | | | | llvm-svn: 179955 | ||||
| * | Minor renaming of tests (for consistency with an in-development patch) | Stephen Lin | 2013-04-20 | 1 | -10/+10 |
| | | | | | llvm-svn: 179954 | ||||
| * | AArch64: remove useless comment | Tim Northover | 2013-04-20 | 1 | -2/+0 |
| | | | | | llvm-svn: 179952 | ||||
| * | Move 'kw_align' case to proper section, reorganize function attribute ↵ | Stephen Lin | 2013-04-20 | 1 | -12/+25 |
| | | | | | | | keyword case statements to be consistent with r179119 llvm-svn: 179948 | ||||
| * | Remove unused ShouldFoldAtomicFences flag. | Tim Northover | 2013-04-20 | 5 | -32/+0 |
| | | | | | | | | | I think it's almost impossible to fold atomic fences profitably under LLVM/C++11 semantics. As a result, this is now unused and just cluttering up the target interface. llvm-svn: 179940 | ||||
| * | Remove unused MEMBARRIER DAG node; it's been replaced by ATOMIC_FENCE. | Tim Northover | 2013-04-20 | 18 | -195/+2 |
| | | | | | llvm-svn: 179939 | ||||
| * | Remove dead code. | Rafael Espindola | 2013-04-20 | 1 | -4/+0 |
| | | | | | | | | | | This is part of a future patch to use yamlio that incorrectly ended up in a cleanup patch. Thanks to Benjamin Kramer for reporting it. llvm-svn: 179938 | ||||
| * | Don't litter .s files in test directory. | Benjamin Kramer | 2013-04-20 | 1 | -1/+1 |
| | | | | | llvm-svn: 179937 | ||||
| * | VecUtils: Clean up uses of dyn_cast. | Benjamin Kramer | 2013-04-20 | 1 | -4/+4 |
| | | | | | llvm-svn: 179936 | ||||
| * | SLPVectorizer: Strength reduce SmallVectors to ArrayRefs. | Benjamin Kramer | 2013-04-20 | 3 | -30/+28 |
| | | | | | | | Avoids a couple of copies and allows more flexibility in the clients. llvm-svn: 179935 | ||||
| * | SLPVectorizer: Reduce the compile time by eliminating the search for some of ↵ | Nadav Rotem | 2013-04-20 | 1 | -1/+1 |
| | | | | | | | the more expensive patterns. After this change will only check basic arithmetic trees that start at cmpinstr. llvm-svn: 179933 | ||||
| * | refactor tryToVectorizePair to a new method that supports vectorization of ↵ | Nadav Rotem | 2013-04-20 | 1 | -0/+8 |
| | | | | | | | lists. llvm-svn: 179932 | ||||
| * | Fix an unused variable warning. | Nadav Rotem | 2013-04-20 | 1 | -0/+1 |
| | | | | | llvm-svn: 179931 | ||||
| * | SLPVectorizer: Improve the cost model for loop invariant broadcast values. | Nadav Rotem | 2013-04-20 | 4 | -11/+101 |
| | | | | | llvm-svn: 179930 | ||||
| * | Report the number of stores that were found in the debug message. | Nadav Rotem | 2013-04-20 | 1 | -6/+8 |
| | | | | | llvm-svn: 179929 | ||||
| * | Fix the header comment. | Nadav Rotem | 2013-04-20 | 2 | -2/+2 |
| | | | | | llvm-svn: 179928 | ||||
| * | Use 64bit arithmetic for calculating distance between pointers. | Nadav Rotem | 2013-04-20 | 1 | -2/+2 |
| | | | | | llvm-svn: 179927 | ||||
| * | Move PPC getSwappedPredicate for reuse | Hal Finkel | 2013-04-20 | 3 | -17/+20 |
| | | | | | | | | | | | | The getSwappedPredicate function can be used in other places (such as in improvements to the PPCCTRLoops pass). Instead of trapping it as a static function in PPCInstrInfo, move it into PPCPredicates with other predicate-related things. No functionality change intended. llvm-svn: 179926 | ||||
| * | Add CodeGen support for functions that always return arguments via a new ↵ | Stephen Lin | 2013-04-20 | 20 | -43/+371 |
| | | | | | | | parameter attribute 'returned', which is taken advantage of in target-independent tail call opportunity detection and in ARM call lowering (when placed on an integral first parameter). llvm-svn: 179925 | ||||
| * | Allow tail call opportunity detection through nested and/or multiple ↵ | Stephen Lin | 2013-04-20 | 2 | -73/+214 |
| | | | | | | | iterations of extractelement/insertelement indirection llvm-svn: 179924 | ||||
| * | These can be void. | Rafael Espindola | 2013-04-20 | 1 | -12/+7 |
| | | | | | llvm-svn: 179923 | ||||
| * | Rename obj2yaml local namespace to avoid conflicts with llvm::yaml. | Rafael Espindola | 2013-04-20 | 3 | -7/+7 |
| | | | | | llvm-svn: 179922 | ||||

