bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	improve dependence analysis testcases	Sebastian Pop	2013-11-12	3	-1/+30
\| \| \| \| \| \| \|	print the name of the function on which the dependence analysis is performed such that changes to the testcase are easier to review. llvm-svn: 194528
*	delinearization of arrays	Sebastian Pop	2013-11-12	11	-0/+751
\| \| \| \|	llvm-svn: 194527
*	Rewrite SCEV's backedge taken count computation.	Andrew Trick	2013-11-06	5	-10/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch by Michele Scandale! Rewrite of the functions used to compute the backedge taken count of a loop on LT and GT comparisons. I decided to split the handling of LT and GT cases becasue the trick "a > b == -a < -b" in some cases prevents the trip count computation due to the multiplication by -1 on the two operands of the comparison. This issue comes from the conservative computation of value range of SCEVs: taking the negative SCEV of an expression that have a small positive range (e.g. [0,31]), we would have a SCEV with a fullset as value range. Indeed, in the new rewritten function I tried to better handle the maximum backedge taken count computation when MAX/MIN expression are used to handle the cases where no entry guard is found. Some test have been modified in order to check the new value correctly (I manually check them and reasoning on possible overflow the new values seem correct). I finally added a new test case related to the multiplication by -1 issue on GT comparisons. llvm-svn: 194116
*	Consider (x == -1) unlikely in BranchProbabilityInfo	Hal Finkel	2013-11-01	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \|	This adds another heuristic to BPI, similar to the existing heuristic that considers (x == 0) unlikely to be true. As suggested in the PACT'98 paper by Deitrich, Cheng, and Hwu, -1 is often used to indicate an invalid index, and equality comparisons with -1 are also unlikely to succeed. Local experimentation supports this hypothesis: This yields a 1-2% speedup in the test-suite sqlite benchmark on the PPC A2 core, with no significant regressions. llvm-svn: 193855
*	SCEV: Make the final add of an inbounds GEP nuw if we know that the index is ↵	Benjamin Kramer	2013-10-28	1	-2/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	positive. We can't do this for the general case as saying a GEP with a negative index doesn't have unsigned wrap isn't valid for negative indices. %gep = getelementptr inbounds i32* %p, i64 -1 But an inbounds GEP cannot run past the end of address space. So we check for the very common case of a positive index and make GEPs derived from that NUW. Together with Andy's recent non-unit stride work this lets us analyze loops like void foo3(int a, int b) { for (; a < b; a++) {} } PR12375, PR12376. Differential Revision: http://llvm-reviews.chandlerc.com/D2033 llvm-svn: 193514
*	Revert r193251 : Use address-taken to disambiguate global variable and ↵	Shuxin Yang	2013-10-27	1	-29/+0
\| \| \| \| \| \|	indirect memops. llvm-svn: 193489
*	X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate.	Benjamin Kramer	2013-10-23	1	-1/+7
\| \| \| \| \| \|	Also update the cost model. llvm-svn: 193270
*	Use address-taken to disambiguate global variable and indirect memops.	Shuxin Yang	2013-10-23	1	-0/+29
\| \| \| \| \| \| \| \| \| \|	Major steps include: 1). introduces a not-addr-taken bit-field in GlobalVariable 2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable dosen't have its address taken. 3). AA use this info for disambiguation. llvm-svn: 193251
*	Simplify testing case (Thanks Rafael for the testing case).	Manman Ren	2013-10-22	1	-28/+22
\| \| \| \|	llvm-svn: 193177
*	TBAA: fix PR17620.	Manman Ren	2013-10-22	1	-0/+51
\| \| \| \| \| \| \|	We can have a struct type with a single field and the field does not start with 0. In that case, we should correctly update the offset. llvm-svn: 193137
*	Fix creating bitcasts between address spaces in SCEV.	Matt Arsenault	2013-10-21	1	-1/+27
\| \| \| \| \| \| \| \|	The test before wasn't successfully testing this since it was missing the datalayout piece to change the size of the second address space. llvm-svn: 193102
*	SCEV should use NSW to get trip count for positive nonunit stride loops.	Andrew Trick	2013-10-18	1	-18/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SCEV currently fails to compute loop counts for nonunit stride loops. This comes up frequently. It prevents loop optimization and forces vectorization to insert extra loop checks. For example: void foo(int n, int *x) { for (int i = 0; i < n; i += 3) { x[i] = i; x[i+1] = i+1; x[i+2] = i+2; } } We need to properly handle the case in which limit > INT_MAX-stride. In the above case: n > INT_MAX-3. In this case the loop counter will step beyond the limit and overflow at the same time. However, knowing that signed integer overlow in undefined, we can assume the loop test behavior is arbitrary after overflow. This obeys both C undefined behavior rules, and the more strict LLVM poison value rules. I'm finally fixing this in response to Hal Finkel's persistence. The most probable reason that we never optimized this before is that we were being careful to handle case where the developer expected a side-effect free infinite loop relying on overflow: for (int i = 0; i < n; i += s) { ++j; } return j; If INT_MAX+1 is a multiple of s and n > INT_MAX-s, then we might expect an infinite loop. However there are plenty of ways to achieve this effect without relying on undefined behavior of signed overflow. llvm-svn: 193015
*	Remove the very substantial, largely unmaintained legacy PGO	Chandler Carruth	2013-10-02	6	-828/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	infrastructure. This was essentially work toward PGO based on a design that had several flaws, partially dating from a time when LLVM had a different architecture, and with an effort to modernize it abandoned without being completed. Since then, it has bitrotted for several years further. The result is nearly unusable, and isn't helping any of the modern PGO efforts. Instead, it is getting in the way, adding confusion about PGO in LLVM and distracting everyone with maintenance on essentially dead code. Removing it paves the way for modern efforts around PGO. Among other effects, this removes the last of the runtime libraries from LLVM. Those are being developed in the separate 'compiler-rt' project now, with somewhat different licensing specifically more approriate for runtimes. llvm-svn: 191835
*	Use CHECK-LABEL	Matt Arsenault	2013-09-30	3	-18/+20
\| \| \| \|	llvm-svn: 191713
*	TBAA: handle scalar TBAA format and struct-path aware TBAA format.	Manman Ren	2013-09-27	14	-65/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the command line argument "struct-path-tbaa" since we should not depend on command line argument to decide which format the IR file is using. Instead, we check the first operand of the tbaa tag node, if it is a MDNode, we treat it as struct-path aware TBAA format, otherwise, we treat it as scalar TBAA format. When clang starts to use struct-path aware TBAA format no matter whether struct-path-tbaa is no, and we can auto-upgrade existing bc files, the support for scalar TBAA format can be dropped. Existing testing cases are updated to use the struct-path aware TBAA format. llvm-svn: 191538
*	X86 horizontal vector reduction cost model	Yi Jiang	2013-09-19	1	-0/+271
\| \| \| \|	llvm-svn: 191021
*	Costmodel: Add support for horizontal vector reductions	Arnold Schwaighofer	2013-09-17	1	-0/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upcoming SLP vectorization improvements will want to be able to estimate costs of horizontal reductions. Add infrastructure to support this. We model reductions as a series of (shufflevector,add) tuples ultimately followed by an extractelement. For example, for an add-reduction of <4 x float> we could generate the following sequence: (v0, v1, v2, v3) \ \ / / \ \ / + + (v0+v2, v1+v3, undef, undef) \ / ((v0+v2) + (v1+v3), undef, undef) %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef> %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7 %r = extractelement <4 x float> %bin.rdx8, i32 0 This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)" that will allow clients to ask for the cost of such a reduction (as backends might generate more efficient code than the cost of the individual instructions summed up). This interface is excercised by the CostModel analysis pass which looks for reduction patterns like the one above - starting at extractelements - and if it sees a matching sequence will call the cost model interface. We will also support a second form of pairwise reduction that is well supported on common architectures (haddps, vpadd, faddp). (v0, v1, v2, v3) \ / \ / (v0+v1, v2+v3, undef, undef) \ / ((v0+v1)+(v2+v3), undef, undef, undef) %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef> %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef> %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1 %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef> %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1 %r = extractelement <4 x float> %bin.rdx.1, i32 0 llvm-svn: 190876
*	Teach ScalarEvolution about pointer address spaces	Matt Arsenault	2013-09-10	1	-0/+68
\| \| \| \|	llvm-svn: 190425
*	Fix lint assert on integer vector division	Matt Arsenault	2013-08-26	2	-0/+79
\| \| \| \|	llvm-svn: 189290
*	FileCheck-ize tests.	Bill Wendling	2013-08-22	26	-35/+98
\| \| \| \|	llvm-svn: 188971
*	[tests] Cleanup initialization of test suffixes.	Daniel Dunbar	2013-08-16	18	-30/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513
*	FileCheckize some of the testcases.	Bill Wendling	2013-08-05	13	-26/+39
\| \| \| \|	llvm-svn: 187756
*	Fixes ARM LNT bot from SLP change in O3	Renato Golin	2013-08-02	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the multiple breakages on ARM test-suite after the SLP vectorizer was introduced by default on O3. The problem was an illegal vector type on ARMTTI::getCmpSelInstrCost() <3 x i1> which is not simple. The guard protects this code from breaking (cause of the problems) but doesn't fix the issue that is generating the odd vector in the first place, which also needs to be investigated. llvm-svn: 187658
*	Add newlines at end of test files, no functionality change	Stephen Lin	2013-07-13	2	-2/+2
\| \| \| \|	llvm-svn: 186263
*	Add the nearbyint -> FNEARBYINT mapping to BasicTargetTransformInfo	Hal Finkel	2013-07-08	1	-0/+28
\| \| \| \| \| \| \| \|	This fixes an oversight that Intrinsic::nearbyint was not being mapped to ISD::FNEARBYINT (thus fixing the over-optimistic cost we were assigning to nearbyint calls for some targets). llvm-svn: 185783
*	Extend 'readonly' and 'readnone' to work on function arguments as well as	Nick Lewycky	2013-07-06	1	-1/+1
\| \| \| \| \| \| \|	functions. Make the function attributes pass add it to known library functions and when it can deduce it. llvm-svn: 185735
*	Minimize precision loss when computing cyclic probabilities.	Jakob Stoklund Olesen	2013-06-28	1	-0/+42
\| \| \| \| \| \| \|	Allow block frequencies to exceed 32 bits by using the new BlockFrequency division function. llvm-svn: 185236
*	(no commit message)	Preston Briggs	2013-06-28	1	-0/+40
\| \| \| \|	llvm-svn: 185187
*	CostModel: improve the cost model for load/store of non power-of-two types ↵	Nadav Rotem	2013-06-27	1	-0/+19
\| \| \| \| \| \|	such as <3 x float>, which are popular in graphics. llvm-svn: 185085
*	Print block frequencies in decimal form.	Jakob Stoklund Olesen	2013-06-25	1	-16/+16
\| \| \| \| \| \| \| \| \|	This is easier to read than the internal fixed-point representation. If anybody knows the correct algorithm for converting fixed-point numbers to base 10, feel free to fix it. llvm-svn: 184881
*	X86 cost model: Vectorizing integer division is a bad idea	Arnold Schwaighofer	2013-06-25	1	-0/+32
\| \| \| \| \| \|	radar://14057959 llvm-svn: 184872
*	BlockFrequency: Bump up the entry frequency a bit.	Benjamin Kramer	2013-06-25	1	-16/+16
\| \| \| \| \| \| \|	This is a band-aid to fix the most severe regressions we're seeing from basing spill decisions on block frequencies, until we have a better solution. llvm-svn: 184835
*	Revert "BlockFrequency: Saturate at 1 instead of 0 when multiplying a ↵	Benjamin Kramer	2013-06-21	1	-65/+0
\| \| \| \| \| \| \| \|	frequency with a branch probability." This reverts commit r184584. Breaks PPC selfhost. llvm-svn: 184590
*	BlockFrequency: Saturate at 1 instead of 0 when multiplying a frequency with ↵	Benjamin Kramer	2013-06-21	1	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a branch probability. Zero is used by BlockFrequencyInfo as a special "don't know" value. It also causes a sink for frequencies as you can't ever get off a zero frequency with more multiplies. This recovers a 10% regression on MultiSource/Benchmarks/7zip. A zero frequency was propagated into an inner loop causing excessive spilling. PR16402. llvm-svn: 184584
*	Unit test for SCEV fix r182989, PR16130.	Andrew Trick	2013-05-31	1	-3/+28
\| \| \| \|	llvm-svn: 183017
*	Make BasicAliasAnalysis recognize the fact a noalias argument cannot alias ↵	Michael Kuperstein	2013-05-28	1	-0/+23
\| \| \| \| \| \|	another argument, even if the other argument is not itself marked noalias. llvm-svn: 182755
*	Add a new function attribute 'cold' to functions.	Diego Novillo	2013-05-24	1	-0/+58
\| \| \| \| \| \| \| \| \| \| \|	Other than recognizing the attribute, the patch does little else. It changes the branch probability analyzer so that edges into blocks postdominated by a cold function are given low weight. Added analysis and code generation tests. Added documentation for the new attribute. llvm-svn: 182638
*	AArch64: use MCJIT by default and enable related tests.	Tim Northover	2013-05-06	1	-5/+0
\| \| \| \| \| \| \|	This just enables some testing I'd missed after implementing MCJIT support. llvm-svn: 181215
*	Fix unchecked uses of DominatorTree in MemoryDependenceAnalysis.	Matt Arsenault	2013-05-06	2	-0/+20
\| \| \| \| \| \|	Use unknown results for places where it would be needed llvm-svn: 181176
*	RegionInfo: Do not crash if unreachable block is found	Tobias Grosser	2013-05-03	1	-0/+29
\| \| \| \|	llvm-svn: 181025
*	TBAA: remove !tbaa from testing cases if not used.	Manman Ren	2013-04-29	6	-47/+24
\| \| \| \| \| \| \|	This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180743
*	Struct-path aware TBAA: change the format of TBAAStructType node.	Manman Ren	2013-04-27	1	-11/+11
\| \| \| \| \| \| \| \|	We switch the order of offset and field type to make TBAAStructType node (name, parent node, offset) similar to scalar TBAA node (name, parent node). TypeIsImmutable is added to TBAAStructTag node. llvm-svn: 180654
*	ARM cost model: Integer div and rem is lowered to a function call	Arnold Schwaighofer	2013-04-25	1	-0/+450
\| \| \| \| \| \| \| \|	Reflect this in the cost model. I observed this in MiBench/consumer-lame. radar://13354716 llvm-svn: 180576
*	Legalize vector truncates by parts rather than just splitting.	Jim Grosbach	2013-04-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than just splitting the input type and hoping for the best, apply a bit more cleverness. Just splitting the types until the source is legal often leads to an illegal result time, which is then widened and a scalarization step is introduced which leads to truly horrible code generation. With the loop vectorizer, these sorts of operations are much more common, and so it's worth extra effort to do them well. Add a legalization hook for the operands of a TRUNCATE node, which will be encountered after the result type has been legalized, but if the operand type is still illegal. If simple splitting of both types ends up with the result type of each half still being legal, just do that (v16i16 -> v16i8 on ARM, for example). If, however, that would result in an illegal result type (v8i32 -> v8i8 on ARM, for example), we can get more clever with power-two vectors. Specifically, split the input type, but also widen the result element size, then concatenate the halves and truncate again. For example on ARM, To perform a "%res = v8i8 trunc v8i32 %in" we transform to: %inlo = v4i32 extract_subvector %in, 0 %inhi = v4i32 extract_subvector %in, 4 %lo16 = v4i16 trunc v4i32 %inlo %hi16 = v4i16 trunc v4i32 %inhi %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16 %res = v8i8 trunc v8i16 %in16 This allows instruction selection to generate three VMOVN instructions instead of a sequences of moves, stores and loads. Update the ARMTargetTransformInfo to take this improved legalization into account. Consider the simplified IR: define <16 x i8> @test1(<16 x i32>* %ap) { %a = load <16 x i32>* %ap %tmp = trunc <16 x i32> %a to <16 x i8> ret <16 x i8> %tmp } define <8 x i8> @test2(<8 x i32>* %ap) { %a = load <8 x i32>* %ap %tmp = trunc <8 x i32> %a to <8 x i8> ret <8 x i8> %tmp } Previously, we would generate the truly hideous: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: push {r7} mov r7, sp sub sp, sp, #20 bic sp, sp, #7 add r1, r0, #48 add r2, r0, #32 vld1.64 {d24, d25}, [r0:128] vld1.64 {d16, d17}, [r1:128] vld1.64 {d18, d19}, [r2:128] add r1, r0, #16 vmovn.i32 d22, q8 vld1.64 {d16, d17}, [r1:128] vmovn.i32 d20, q9 vmovn.i32 d18, q12 vmov.u16 r0, d22[3] strb r0, [sp, #15] vmov.u16 r0, d22[2] strb r0, [sp, #14] vmov.u16 r0, d22[1] strb r0, [sp, #13] vmov.u16 r0, d22[0] vmovn.i32 d16, q8 strb r0, [sp, #12] vmov.u16 r0, d20[3] strb r0, [sp, #11] vmov.u16 r0, d20[2] strb r0, [sp, #10] vmov.u16 r0, d20[1] strb r0, [sp, #9] vmov.u16 r0, d20[0] strb r0, [sp, #8] vmov.u16 r0, d18[3] strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] vldmia sp, {d16, d17} vmov r0, r1, d16 vmov r2, r3, d17 mov sp, r7 pop {r7} bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: push {r7} mov r7, sp sub sp, sp, #12 bic sp, sp, #7 vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d20, d21}, [r0:128] vmovn.i32 d18, q8 vmov.u16 r0, d18[3] vmovn.i32 d16, q10 strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] ldm sp, {r0, r1} mov sp, r7 pop {r7} bx lr Now, however, we generate the much more straightforward: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: add r1, r0, #48 add r2, r0, #32 vld1.64 {d20, d21}, [r0:128] vld1.64 {d16, d17}, [r1:128] add r1, r0, #16 vld1.64 {d18, d19}, [r2:128] vld1.64 {d22, d23}, [r1:128] vmovn.i32 d17, q8 vmovn.i32 d16, q9 vmovn.i32 d18, q10 vmovn.i32 d19, q11 vmovn.i16 d17, q8 vmovn.i16 d16, q9 vmov r0, r1, d16 vmov r2, r3, d17 bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d18, d19}, [r0:128] vmovn.i32 d16, q8 vmovn.i32 d17, q9 vmovn.i16 d16, q8 vmov r0, r1, d16 bx lr llvm-svn: 179989
*	X86 cost model: Exit before calling getSimpleVT on non-simple VTs	Arnold Schwaighofer	2013-04-17	1	-0/+6
\| \| \| \| \| \| \| \|	getSimpleVT can only handle simple value types. radar://13676022 llvm-svn: 179714
*	CostModel: increase the default cost of supported floating point operations ↵	Nadav Rotem	2013-04-12	1	-2/+2
\| \| \| \| \| \|	from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles. llvm-svn: 179413
*	Aliasing rules for struct-path aware TBAA.	Manman Ren	2013-04-11	1	-0/+392
\| \| \| \| \| \| \|	Added PathAliases to check if two struct-path tags can alias. Added command line option -struct-path-tbaa. llvm-svn: 179337
*	X86 cost model: Model cost for uitofp and sitofp on SSE2	Arnold Schwaighofer	2013-04-08	2	-0/+643
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The costs are overfitted so that I can still use the legalization factor. For example the following kernel has about half the throughput vectorized than unvectorized when compiled with SSE2. Before this patch we would vectorize it. unsigned short A[1024]; double B[1024]; void f() { int i; for (i = 0; i < 1024; ++i) { B[i] = (double) A[i]; } } radar://13599001 llvm-svn: 179033
*	TargetLowering: Fix getTypeConversion handling of extended vector types	Arnold Schwaighofer	2013-04-07	3	-14/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code in getTypeConversion attempts to promote the element vector type before it trys to split or widen the vector. After it failed finding a legal vector type by promoting it would continue using the promoted vector element type. Thereby missing legal splitted vector types. For example the type v32i32 that has a legal split of 4 x v3i32 on x86/sse2 would be transformed to: v32i256 and from there on successively split to: v16i256, v8i256, v1i256 and then finally ends up as an i64 type. By resetting the vector element type to the original vector element type that existed before the promotion the code will attempt to split the vector type to smaller vector widths of the same type. llvm-svn: 178999
*	X86 cost model: Differentiate cost for vector shifts of constants	Arnold Schwaighofer	2013-04-04	3	-0/+863
\| \| \| \| \| \| \| \| \| \| \| \|	SSE2 has efficient support for shifts by a scalar. My previous change of making shifts expensive did not take this into account marking all shifts as expensive. This would prevent vectorization from happening where it is actually beneficial. With this change we differentiate between shifts of constants and other shifts. radar://13576547 llvm-svn: 178808