bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[objc-arc] KnownSafe does not imply that it is safe to perform code motion ↵	Michael Gottesman	2013-05-24	1	-0/+200
\| \| \| \| \| \| \| \|	across CFG edges since even if it is safe to remove RR pairs, we may still be able to move a retain/release into a loop. rdar://13949644 llvm-svn: 182670
*	[objc-arc] Make sure that multiple owners is propogated correctly through ↵	Michael Gottesman	2013-05-24	1	-0/+97
\| \| \| \| \| \| \| \|	the pass via the usage of a global data structure. rdar://13750319 llvm-svn: 182669
*	LoopVectorize: LoopSimplify can't canonicalize loops with an indirectbr in ↵	Benjamin Kramer	2013-05-24	1	-0/+11
\| \| \| \| \| \| \| \|	it, don't assert on those cases. Fixes PR16139. llvm-svn: 182656
*	scalarizePHI needs to insert the next ExtractElement in the same block	Joey Gouly	2013-05-24	1	-0/+25
\| \| \| \| \| \| \| \|	as the BinaryOperator, not in the block where the IRBuilder is currently inserting into. Fixes a bug where scalarizePHI would create instructions that would not dominate all uses. llvm-svn: 182639
*	SLPVectorizer: Change the order in which new instructions are added to the ↵	Nadav Rotem	2013-05-22	4	-1/+104
\| \| \| \| \| \| \| \| \| \| \| \|	function. We are not working on a DAG and I ran into a number of problems when I enabled the vectorizations of 'diamond-trees' (trees that share leafs). * Imroved the numbering API. * Changed the placement of new instructions to the last root. * Fixed a bug with external tree users with non-zero lane. * Fixed a bug in the placement of in-tree users. llvm-svn: 182508
*	This is an update to a previous commit (r181216).	Jean-Luc Duprat	2013-05-22	1	-27/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The earlier change list introduced the following inst combines: B * (uitofp i1 C) —> select C, B, 0 A * (1 - uitofp i1 C) —> select C, 0, A select C, 0, B + select C, A, 0 —> select C, A, B Together these 3 changes would simplify : A * (1 - uitofp i1 C) + B * uitofp i1 C down to : select C, B, A In practice we found that the first two substitutions can have a negative effect on performance, because they reduce opportunities to use FMA contractions; between the two options FMAs are often the better choice. This change list amends the previous one to enable just these inst combines: select C, B, 0 + select C, 0, A —> select C, B, A A * (1 - uitofp i1 C) + B * uitofp i1 C —> select C, B, A llvm-svn: 182499
*	LoopVectorize: Make Value pointers that could be RAUW'ed a VH	Arnold Schwaighofer	2013-05-22	1	-0/+50
\| \| \| \| \| \| \| \| \| \|	The Value pointers we store in the induction variable list can be RAUW'ed by a call to SCEVExpander::expandCodeFor, use a TrackingVH instead. Do the same thing in some other places where we store pointers that could potentially be RAUW'ed. Fixes PR16073. llvm-svn: 182485
*	Move the remaining simplify-libcalls tests to instcombine, merging most of ↵	Benjamin Kramer	2013-05-19	12	-138/+137
\| \| \| \| \| \|	them into a single file. llvm-svn: 182211
*	isKnownToBeAPowerOfTwo: (X & Y) + Y is a power of 2 or zero if y is also.	David Majnemer	2013-05-18	1	-0/+14
\| \| \| \| \| \| \|	This is useful if something that looks like (x & (1 << y)) ? 64 : 32 is the divisor in a modulo operation. llvm-svn: 182200
*	LoopVectorize: Handle single edge PHIs	Arnold Schwaighofer	2013-05-18	1	-0/+22
\| \| \| \| \| \| \| \|	We might encouter single edge PHIs - handle them with an identity select. Fixes PR15990. llvm-svn: 182199
*	Respect the 'nobuiltin' attribute when determining if a call is to a memory ↵	Richard Smith	2013-05-16	1	-0/+18
\| \| \| \| \| \|	builtin. llvm-svn: 181978
*	LoopVectorize: Hoist conditional loads if possible	Arnold Schwaighofer	2013-05-15	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \|	InstCombine can be uncooperative to vectorization and sink loads into conditional blocks. This prevents vectorization. Undo this optimization if there are unconditional memory accesses to the same addresses in the loop. radar://13815763 llvm-svn: 181860
*	GlobalOpt: fix an issue where CXAAtExitFn points to a deleted function.	Manman Ren	2013-05-14	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \|	CXAAtExitFn was set outside a loop and before optimizations where functions can be deleted. This patch will set CXAAtExitFn inside the loop and after optimizations. Seg fault when running LTO because of accesses to a deleted function. rdar://problem/13838828 llvm-svn: 181838
*	LoopVectorize: Handle loops with multiple forward inductions	Arnold Schwaighofer	2013-05-14	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \|	We used to give up if we saw two integer inductions. After this patch, we base further induction variables on the chosen one like we do in the reverse induction and pointer induction case. Fixes PR15720. radar://13851975 llvm-svn: 181746
*	[objc-arc-opts] In the presense of an alloca unconditionally remove RR pairs ↵	Michael Gottesman	2013-05-13	1	-0/+203
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	if and only if we are both KnownSafeBU/KnownSafeTD rather than just either or. In the presense of a block being initialized, the frontend will emit the objc_retain on the original pointer and the release on the pointer loaded from the alloca. The optimizer will through the provenance analysis realize that the two are related (albiet different), but since we only require KnownSafe in one direction, will match the inner retain on the original pointer with the guard release on the original pointer. This is fixed by ensuring that in the presense of allocas we only unconditionally remove pointers if both our retain and our release are KnownSafe (i.e. we are KnownSafe in both directions) since we must deal with the possibility that the frontend will emit what (to the optimizer) appears to be unbalanced retain/releases. An example of the miscompile is: %A = alloca retain(%x) retain(%x) <--- Inner Retain store %x, %A %y = load %A ... DO STUFF ... release(%y) call void @use(%x) release(%x) <--- Guarding Release getting optimized to: %A = alloca retain(%x) store %x, %A %y = load %A ... DO STUFF ... release(%y) call void @use(%x) rdar://13750319 llvm-svn: 181743
*	SLPVectorizer: Fix a bug in the code that generates extracts for values with ↵	Nadav Rotem	2013-05-12	1	-2/+30
\| \| \| \| \| \| \| \|	multiple users. The external user does not have to be in lane #0. We have to save the lane for each scalar so that we know which vector lane to extract. llvm-svn: 181674
*	InstCombine: Flip the order of two urem transforms	David Majnemer	2013-05-12	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two transforms in visitUrem that conflict with each other. ) One, if a divisor is a power of two, subtracts one from the divisor and turns it into a bitwise-and. ) The other unwraps both operands if they are surrounded by zext instructions. Flipping the order allows the subtraction to go beneath the sign extension. llvm-svn: 181668
*	LoopVectorize: Use the widest induction variable type	Arnold Schwaighofer	2013-05-11	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the widest induction type encountered for the cannonical induction variable. We used to turn the following loop into an empty loop because we used i8 as induction variable type and truncated 1024 to 0 as trip count. int a[1024]; void fail() { int reverse_induction = 1023; unsigned char forward_induction = 0; while ((reverse_induction) >= 0) { forward_induction++; a[reverse_induction] = forward_induction; --reverse_induction; } } radar://13862901 llvm-svn: 181667
*	InstCombine: Turn urem to bitwise-and more often	David Majnemer	2013-05-11	1	-2/+51
\| \| \| \| \| \| \|	Use isKnownToBeAPowerOfTwo in visitUrem so that we may more aggressively fold away urem instructions. llvm-svn: 181661
*	SLPVectorizer: Add support for trees with external users.	Nadav Rotem	2013-05-10	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	For example: bar() { int a = A[i]; int b = A[i+1]; B[i] = a; B[i+1] = b; foo(a); <--- a is used outside the vectorized expression. } llvm-svn: 181648
*	Add an additional testcase for PR15882.	Nadav Rotem	2013-05-10	1	-0/+45
\| \| \| \|	llvm-svn: 181646
*	InstCombine: Don't claim to be able to evaluate any shl in a zexted type.	Benjamin Kramer	2013-05-10	1	-0/+28
\| \| \| \| \| \| \| \| \| \|	The shift amount may be larger than the type leading to undefined behavior. Limit the transform to constant shift amounts. While there update the bits to clear in the result which may enable additional optimizations. PR15959. llvm-svn: 181604
*	InstCombine: Verify the type before transforming uitofp into select.	Benjamin Kramer	2013-05-10	1	-0/+18
\| \| \| \| \| \|	PR15952. llvm-svn: 181586
*	Don't replace an alias in llvm.used with its target.	Rafael Espindola	2013-05-09	1	-0/+42
\| \| \| \| \| \| \|	When we replace an internal alias with its target, be careful not to replace the entry in llvm.used (and llvm.compiler_used). llvm-svn: 181524
*	InstCombine: Don't just copy known bits from the first operand of an srem.	Benjamin Kramer	2013-05-09	1	-0/+12
\| \| \| \| \| \| \|	That's obviously wrong. Conservatively restrict it to the sign bit, which matches the original intention of this analysis. Fixes PR15940. llvm-svn: 181518
*	LoopVectorizer: Don't assert on the absence of induction variables	Arnold Schwaighofer	2013-05-09	1	-0/+34
\| \| \| \| \| \| \| \| \|	A computable loop exit count does not imply the presence of an induction variable. Scalar evolution can return a value for an infinite loop. Fixes PR15926. llvm-svn: 181495
*	Revert 181475 as the DebugIR tests are breaking (automake) buildbots that ↵	Daniel Malea	2013-05-08	14	-700/+0
\| \| \| \| \| \| \| \|	re-use build dirs - the temporaries "-debug.ll" files generated by DebugIR pass are considered tests, even though they are not llvm-svn: 181476
*	DebugIR tests -- lit tests for the line number transform	Daniel Malea	2013-05-08	14	-0/+700
\| \| \| \| \| \| \| \| \| \| \|	- simple one-function case - function-calling case - external function calling case - exception throwing case - vector case Note: these tests are somewhat coupled to the current format of debug metadata. llvm-svn: 181469
*	LoopVectorizer: Improve reduction variable identification	Arnold Schwaighofer	2013-05-07	1	-0/+119
\| \| \| \| \| \| \|	The two nested loops were confusing and also conservative in identifying reduction variables. This patch replaces them by a worklist based approach. llvm-svn: 181369
*	LoopVectorize: getConsecutiveVector must respect signed arithmetic	Arnold Schwaighofer	2013-05-07	1	-0/+79
\| \| \| \| \| \| \| \| \| \|	We were passing an i32 to ConstantInt::get where an i64 was needed and we must also pass the sign if we pass negatives numbers. The start index passed to getConsecutiveVector must also be signed. Should fix PR15882. llvm-svn: 181286
*	InstCombine: (X ^ signbit) + C -> X + (signbit ^ C)	David Majnemer	2013-05-06	1	-0/+10
\| \| \| \|	llvm-svn: 181249
*	Test results verified using FileCheck rather than grep \| count	Jean-Luc Duprat	2013-05-06	1	-1/+7
\| \| \| \|	llvm-svn: 181234
*	Rotate multi-exit loops even if the latch was simplified.	Andrew Trick	2013-05-06	1	-1/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Test case by Michele Scandale! Fixes PR10293: Load not hoisted out of loop with multiple exits. There are few regressions with this patch, now tracked by rdar:13817079, and a roughly equal number of improvements. The regressions are almost certainly back luck because LoopRotate has very little idea of whether rotation is profitable. Doing better requires a more comprehensive solution. This checkin is a quick fix that lacks generality (PR10293 has a counter-example). But it trivially fixes the case in PR10293 without interfering with other cases, and it does satify the criteria that LoopRotate is a loop canonicalization pass that should avoid heuristics and special cases. I can think of two approaches that would probably be better in the long run. Ultimately they may both make sense. (1) LoopRotate should check that the current header would make a good loop guard, and that the loop does not already has a sufficient guard. The artifical SimplifiedLoopLatch check would be unnecessary, and the design would be more general and canonical. Two difficulties: - We need a strong guarantee that we won't endlessly rotate, so the analysis would need to be precise in order to avoid the SimplifiedLoopLatch precondition. - Analysis like this are usually based on SCEV, which we don't want to rely on. (2) Rotate on-demand in late loop passes. This could even be done by shoving the loop back on the queue after the optimization that needs it. This could work well when we find LICM opportunities in multi-branch loops. This requires some work, and it doesn't really solve the problem of SCEV wanting a loop guard before the analysis. llvm-svn: 181230
*	Fix add4.ll test cmdline so that it passes	Jean-Luc Duprat	2013-05-06	1	-1/+1
\| \| \| \|	llvm-svn: 181219
*	Provide InstCombines for the following 3 cases:	Jean-Luc Duprat	2013-05-06	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \|	A * (1 - (uitofp i1 C)) -> select C, 0, A B * (uitofp i1 C) -> select C, B, 0 select C, 0, A + select C, B, 0 -> select C, B, A These come up in code that has been hand-optimized from a select to a linear blend, on platforms where that may have mattered. We want to undo such changes with the following transform: A(1 - uitofp i1 C) + B(uitofp i1 C) -> select C, A, B llvm-svn: 181216
*	Revert r164763 because it introduces new shuffles.	Nadav Rotem	2013-05-06	2	-71/+1
\| \| \| \| \| \|	Thanks Nick Lewycky for pointing this out. llvm-svn: 181177
*	Fix unchecked uses of DominatorTree in MemoryDependenceAnalysis.	Matt Arsenault	2013-05-06	1	-0/+14
\| \| \| \| \| \|	Use unknown results for places where it would be needed llvm-svn: 181176
*	Fix const merging when an alias of a const is llvm.used.	Rafael Espindola	2013-05-06	1	-0/+3
\| \| \| \| \| \| \|	We used to disable constant merging not only if a constant is llvm.used, but also if an alias of a constant is llvm.used. This change fixes that. llvm-svn: 181175
*	LoopVectorize: Add support for floating point min/max reductions	Arnold Schwaighofer	2013-05-05	1	-0/+480
\| \| \| \| \| \| \| \| \| \|	Add support for min/max reductions when "no-nans-float-math" is enabled. This allows us to assume we have ordered floating point math and treat ordered and unordered predicates equally. radar://13723044 llvm-svn: 181144
*	LoopVectorize: We don't need an identity element for min/max reductions	Arnold Schwaighofer	2013-05-05	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	We can just use the initial element that feeds the reduction. max(max(x, y), z) == max(max(x,y), max(x,z)) radar://13723044 llvm-svn: 181141
*	LoopVectorizer: Add support for if-conversion of PHINodes with 3+ incoming ↵	Nadav Rotem	2013-05-03	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	values. By supporting the vectorization of PHINodes with more than two incoming values we can increase the complexity of nested if statements. We can now vectorize this loop: int foo(int A, int B, int n) { for (int i=0; i < n; i++) { int x = 9; if (A[i] > B[i]) { if (A[i] > 19) { x = 3; } else if (B[i] < 4 ) { x = 4; } else { x = 5; } } A[i] = x; } } llvm-svn: 181037
*	TBAA: remove !tbaa from testing cases if not used.	Manman Ren	2013-05-02	5	-120/+96
\| \| \| \| \| \| \|	This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180935
*	Add a test for the foldSelectICmpAndOr fix committed in r180779.	David Majnemer	2013-05-02	1	-0/+13
\| \| \| \| \| \| \|	This tests a case where C1 and C2 were the same but X and Y were different widths. llvm-svn: 180907
*	SROA: Generate selects instead of shuffles when blending values because this ↵	Nadav Rotem	2013-05-01	1	-14/+14
\| \| \| \| \| \| \| \|	is the cannonical form. Shuffles are more difficult to lower and we usually don't touch them, while we do optimize selects more often. llvm-svn: 180875
*	Revert "InstCombine: Fold more shuffles of shuffles."	Jim Grosbach	2013-05-01	2	-10/+11
\| \| \| \| \| \| \| \| \|	This reverts commit r180802 There's ongoing discussion about whether this is the right place to make this transformation. Reverting for now while we figure it out. llvm-svn: 180834
*	InstCombine: Fold more shuffles of shuffles.	Jim Grosbach	2013-04-30	2	-11/+10
\| \| \| \| \| \| \| \| \| \| \|	Always fold a shuffle-of-shuffle into a single shuffle when there's only one input vector in the first place. Continue to be more conservative when there's multiple inputs. rdar://13402653 PR15866 llvm-svn: 180802
*	TBAA: remove !tbaa from testing cases if not used.	Manman Ren	2013-04-30	22	-183/+100
\| \| \| \| \| \| \|	This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180796
*	Fix "Combine bit test + conditional or into simple math"	David Majnemer	2013-04-30	1	-0/+109
\| \| \| \| \| \| \| \| \|	This fixes the optimization introduced in r179748 and reverted in r179750. While the optimization was sound, it did not properly respect differences in bit-width. llvm-svn: 180777
*	SimplifyCFG: If convert single conditional stores	Arnold Schwaighofer	2013-04-29	1	-0/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This resurrects r179957, but adds code that makes sure we don't touch atomic/volatile stores: This transformation will transform a conditional store with a preceeding uncondtional store to the same location: a[i] = may-alias with a[i] load if (cond) a[i] = Y into an unconditional store. a[i] = X may-alias with a[i] load tmp = cond ? Y : X; a[i] = tmp We assume that on average the cost of a mispredicted branch is going to be higher than the cost of a second store to the same location, and that the secondary benefits of creating a bigger basic block for other optimizations to work on outway the potential case where the branch would be correctly predicted and the cost of the executing the second store would be noticably reflected in performance. hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With this change we are on par with gcc's performance (gcc also performs this transformation). There was a 1.2 % performance improvement on a ARM swift chip. Other tests in the test-suite+external seem to be mostly uninfluenced in my experiments: This optimization was triggered on 41 tests such that the executable was different before/after the patch. Only 1 out of the 40 tests (dealII) was reproducable below 100% (by about .4%). Given that hmmer benefits so much I believe this to be a fair trade off. llvm-svn: 180731
*	[objc-arc] Apply the RV optimization to retains next to calls in ↵	Michael Gottesman	2013-04-29	3	-51/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ObjCARCContract instead of ObjCARCOpts. Turning retains into retainRV calls disrupts the data flow analysis in ObjCARCOpts. Thus we move it as late as we can by moving it into ObjCARCContract. We leave in the conversion from retainRV -> retain in ObjCARCOpt since it enables the dataflow analysis. rdar://10813093 llvm-svn: 180698