bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert r219223, it creates invalid PHI nodes.	Joerg Sonnenberger	2014-10-12	3	-79/+3
\| \| \| \|	llvm-svn: 219587
*	InstCombine: Turn (x != 0 & x <u C) into the canonical range check form (x-1 ↵	Benjamin Kramer	2014-10-12	1	-0/+11
\| \| \| \| \| \|	<u C-1) llvm-svn: 219585
*	InstCombine: Don't fold (X <<s log(INT_MIN)) /s INT_MIN to X	David Majnemer	2014-10-11	1	-0/+17
\| \| \| \| \| \| \| \| \| \|	Consider the case where X is 2. (2 <<s 31)/s-2147483648 is zero but we would fold to X. Note that this is valid when we are in the unsigned domain because we require NUW: 2 <<u 31 results in poison. This fixes PR21245. llvm-svn: 219568
*	InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflow	David Majnemer	2014-10-11	2	-14/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	consider: C1 = INT_MIN C2 = -1 C1 * C2 overflows without a doubt but consider the following: %x = i32 INT_MIN This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1. N. B. Move the unsigned version of this transform to InstSimplify, it doesn't create any new instructions. This fixes PR21243. llvm-svn: 219567
*	InstCombine: mul to shl shouldn't preserve nsw	David Majnemer	2014-10-11	4	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	consider: mul i32 nsw %x, -2147483648 this instruction will not result in poison if %x is 1 however, if we transform this into: shl i32 nsw %x, 31 then we will be generating poison because we just shifted into the sign bit. This fixes PR21242. llvm-svn: 219566
*	Return undef on FP <-> Int conversions that overflow (PR21330).	Sanjay Patel	2014-10-10	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The LLVM Lang Ref states for signed/unsigned int to float conversions: "If the value cannot fit in the floating point value, the results are undefined." And for FP to signed/unsigned int: "If the value cannot fit in ty2, the results are undefined." This matches the C definitions. The existing behavior pins to infinity or a max int value, but that may just lead to more confusion as seen in: http://llvm.org/bugs/show_bug.cgi?id=21130 Returning undef will hopefully lead to a less silent failure. Differential Revision: http://reviews.llvm.org/D5603 llvm-svn: 219542
*	This patch teaches ScalarEvolution to pick and use !range metadata.	Sanjoy Das	2014-10-10	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \|	It also makes it more aggressive in querying range information by adding a call to isKnownPredicateWithRanges to isLoopBackedgeGuardedByCond and isLoopEntryGuardedByCond. phabricator: http://reviews.llvm.org/D5638 Reviewed by: atrick, hfinkel llvm-svn: 219532
*	This patch de-pessimizes the calculation of loop trip counts in	Mark Heffernan	2014-10-10	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ScalarEvolution in the presence of multiple exits. Previously all loops exits had to have identical counts for a loop trip count to be considered computable. This pessimization was implemented by calling getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock) inside of ScalarEvolution::getSmallConstantTripCount() (see the FIXME in the comments of that function). The pessimization was added to fix a corner case involving undefined behavior (pr/16130). This patch more precisely handles the undefined behavior case allowing the pessimization to be removed. ControlsExit replaces IsSubExpr to more precisely track the case where undefined behavior is expected to occur. Because undefined behavior is tracked more precisely we can remove MustExit from ExitLimit. MustExit was used to track the case where the limit was computed potentially assuming undefined behavior even if undefined behavior didn't necessarily occur. llvm-svn: 219517
*	SimplifyCFG: Don't convert phis into selects if we could remove undef behavior	Arnold Schwaighofer	2014-10-10	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead We used to transform this: define void @test6(i1 %cond, i8* %ptr) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br label %bb2 bb2: %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ] store i8 2, i8* %ptr.2, align 8 ret void } into this: define void @test6(i1 %cond, i8* %ptr) { %ptr.2 = select i1 %cond, i8* null, i8* %ptr store i8 2, i8* %ptr.2, align 8 ret void } because the simplifycfg transformation into selects would happen to happen before the simplifycfg transformation that removes unreachable control flow (We have 'unreachable control flow' due to the store to null which is undefined behavior). The existing transformation that removes unreachable control flow in simplifycfg is: /// If BB has an incoming value that will always trigger undefined behavior /// (eg. null pointer dereference), remove the branch leading here. static bool removeUndefIntroducingPredecessor(BasicBlock BB) Now we generate: define void @test6(i1 %cond, i8 %ptr) { store i8 2, i8* %ptr.2, align 8 ret void } I did not see any impact on the test-suite + externals. rdar://18596215 llvm-svn: 219462
*	[Reassociate] Don't canonicalize X - undef to X + (-undef).	Chad Rosier	2014-10-09	1	-0/+21
\| \| \| \| \| \| \|	Phabricator Revision: http://reviews.llvm.org/D5674 PR21205 llvm-svn: 219434
*	[InstCombine] Fix wrong folding of constant comparisons involving ashr and ↵	Andrea Di Biagio	2014-10-09	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	negative values. This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we wrongly computed the distance between the highest bits set of two negative values. This fixes PR21222. Differential Revision: http://reviews.llvm.org/D5700 llvm-svn: 219406
*	Inliner: Non-local functions in COMDATs shouldn't be dropped	David Majnemer	2014-10-08	1	-0/+18
\| \| \| \| \| \| \| \| \| \|	A function with discardable linkage cannot be discarded if its a member of a COMDAT group without considering all the other COMDAT members as well. This sort of thing is already handled by GlobalOpt/GlobalDCE. This fixes PR21206. llvm-svn: 219335
*	Revert "[InstCombine] re-commit r218721 with fix for pr21199"	Justin Bogner	2014-10-08	3	-153/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This seems to cause a miscompile when building clang, which causes a bootstrapped clang to fail or crash in several of its tests. See: http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-RA/builds/1184 http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/7813 This reverts commit r219282. llvm-svn: 219317
*	GlobalOpt: Don't drop unused memberes of a Comdat	David Majnemer	2014-10-08	1	-0/+19
\| \| \| \| \| \| \| \| \|	A linkonce_odr member of a COMDAT shouldn't be dropped if we need to keep the entire COMDAT group. This fixes PR21191. llvm-svn: 219283
*	[InstCombine] re-commit r218721 with fix for pr21199	Gerolf Hoflehner	2014-10-08	3	-1/+153
\| \| \| \| \| \| \| \|	The icmp-select-icmp optimization targets select-icmp.eq only. This is now ensured by testing the branch predicate explictly. This commit also includes the test case for pr21199. llvm-svn: 219282
*	Revert r219175 - [InstCombine] re-commit r218721 icmp-select-icmp optimization	Hans Wennborg	2014-10-08	1	-1/+1
\| \| \| \| \| \|	This seems to have caused PR21199. llvm-svn: 219264
*	LoopUnroll: Create sub-loops in LoopInfo	Duncan P. N. Exon Smith	2014-10-07	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \|	`LoopUnrollPass` says that it preserves `LoopInfo` -- make it so. In particular, tell `LoopInfo` about copies of inner loops when unrolling the outer loop. Conservatively, also tell `ScalarEvolution` to forget about the original versions of these loops, since their inputs may have changed. Fixes PR20987. llvm-svn: 219241
*	Two case switch to select optimization	Marcello Maggioni	2014-10-07	3	-3/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block to a select or a couple of selects (depending if the default block is reachable or not). The typical case this optimization wants to be able to optimize is this one: Example: switch (a) { case 10: %0 = icmp eq i32 %a, 10 return 10; %1 = select i1 %0, i32 10, i32 4 case 20: ----> %2 = icmp eq i32 %a, 20 return 2; %3 = select i1 %2, i32 2, i32 %1 default: return 4; } It also sets the base for further optimizations that are planned and being reviewed. llvm-svn: 219223
*	DebugInfo+DeadArgElimination: Ensure llvm::Function*s from debug info are ↵	David Blaikie	2014-10-07	1	-47/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	updated even when DAE removes both varargs and non-varargs arguments on the same function. After some stellar (& inspired) help from Reid Kleckner providing a test case for some rather unstable undefined behavior showing up as assertions produced by r214761, I was able to fix this issue in DAE involving the application of both varargs removal, followed by normal argument removal. Indeed I introduced this same bug into ArgumentPromotion (r212128) by copying the code from DAE, and when I fixed the bug in ArgPromo (r213805) and commented in that patch that I didn't need to address the same issue in DAE because it was a single pass. Turns out it's two pass, one for the varargs and one for the normal arguments, so the same fix is needed (at least during varargs removal). So here it is. (the observable/net effect of this bug, even when it didn't result in assertion failure, is that debug info would describe the DAE'd function in the abstract, but wouldn't provide high/low_pc, variable locations, line table, etc (it would appear as though the function had been entirely optimized away), see the original PR14016 for details of the general problem) I'm not recommitting the assertion just yet, as there's been another regression of it since I last tried. It might just be a few test cases weren't adequately updated after Adrian or Duncan's recent schema changes. llvm-svn: 219210
*	Remove Extra lines. NFC.	Suyog Sarda	2014-10-07	1	-2/+0
\| \| \| \|	llvm-svn: 219201
*	GlobalDCE: Don't drop any COMDAT members	David Majnemer	2014-10-07	1	-0/+17
\| \| \| \| \| \| \| \| \|	If we require a single member of a comdat, require all of the other members as well. This fixes PR20981. llvm-svn: 219191
*	[InstCombine] re-commit r218721 icmp-select-icmp optimization	Gerolf Hoflehner	2014-10-07	1	-1/+1
\| \| \| \| \| \| \| \| \|	Takes care of the assert that caused build fails. Rather than asserting the code checks now that the definition and use are in the same block, and does not attempt to optimize when that is not the case. llvm-svn: 219175
*	Give the Reassociate pass a bit more flexibility and autonomy when ↵	Owen Anderson	2014-10-05	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	optimizing expressions. Particularly, it addresses cases where Reassociate breaks Subtracts but then fails to optimize combinations like I1 + -I2 where I1 and I2 have the same rank and are identical. Patch by Dmitri Shtilman. llvm-svn: 219092
*	[InstCombine] Remove redundant @llvm.assume intrinsics	Hal Finkel	2014-10-04	1	-0/+55
\| \| \| \| \| \| \| \| \|	For any @llvm.assume intrinsic, if there is another which dominates it and uses the same condition, then it is redundant and can be removed. While this does not alter the semantics of the @llvm.assume intrinsics, it makes subsequent handling more efficient (and the resulting IR easier to read). llvm-svn: 219067
*	PR21145: Teach LLVM about C++14 sized deallocation functions.	Richard Smith	2014-10-03	1	-0/+23
\| \| \| \| \| \| \| \|	C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014
*	Revert "Revert "DI: Fold constant arguments into a single MDString""	Duncan P. N. Exon Smith	2014-10-03	47	-498/+496
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010
*	Revert r215343.	James Molloy	2014-10-03	1	-33/+0
\| \| \| \| \| \|	This was contentious and needs invesigation. llvm-svn: 218971
*	Revert "DI: Fold constant arguments into a single MDString"	Duncan P. N. Exon Smith	2014-10-02	47	-496/+498
\| \| \| \| \| \|	This reverts commit r218914 while I investigate some bots. llvm-svn: 218918
*	DI: Fold constant arguments into a single MDString	Duncan P. N. Exon Smith	2014-10-02	47	-498/+496
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 218914
*	Remove unused function attribute params.	Sanjay Patel	2014-10-02	1	-2/+2
\| \| \| \|	llvm-svn: 218909
*	Optimize square root squared (PR21126).	Sanjay Patel	2014-10-02	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \|	When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X. This can happen in the real world when calculating x ** 3/2. This occurs in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c. Differential Revision: http://reviews.llvm.org/D5584 llvm-svn: 218906
*	[BUG][INDVAR] Fix for PR21014: wrong SCEV operands commuting for ↵	Zinovy Nis	2014-10-02	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \|	non-commutative instructions My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]' thus inverting index expression. This patch fixes it. Thanks to Jörg Sonnenberger for pointing. Differential Revision: http://reviews.llvm.org/D5576 llvm-svn: 218867
*	Make the sqrt intrinsic return undef for a negative input.	Sanjay Patel	2014-10-01	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html And again here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html The sqrt of a negative number when using the llvm intrinsic is undefined. We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref. This change should not affect any code that isn't using "no-nans-fp-math"; ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call. Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either. We knowingly approve of this difference with the other compilers in an attempt to flag code that is invoking undefined behavior. A front-end warning should also try to convince the user that the program will fail: http://llvm.org/bugs/show_bug.cgi?id=21093 Differential Revision: http://reviews.llvm.org/D5527 llvm-svn: 218803
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	26	-119/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
*	Revert r218778 while investigating buldbot breakage.	Adrian Prantl	2014-10-01	26	-119/+119
\| \| \| \| \| \|	"Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	26	-119/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
*	Revert r218721, r218735.	Evgeniy Stepanov	2014-10-01	2	-128/+1
\| \| \| \| \| \| \| \| \| \|	Failing bootstrap on Linux (arm, x86). http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470 http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518 llvm-svn: 218752
*	[InstCombine] Optimize icmp-select-icmp	Gerolf Hoflehner	2014-10-01	2	-1/+128
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In special cases select instructions can be eliminated by replacing them with a cheaper bitwise operation even when the select result is used outside its home block. The instances implemented are patterns like %x=icmp.eq %y=select %x,%r, null %z=icmp.eq\|neq %y, null br %z,true, false ==> %x=icmp.ne %y=icmp.eq %r,null %z=or %x,%y br %z,true,false The optimization is integrated into the instruction combiner and performed only when all uses of the select result can be replaced by the select operand proper. For this dominator information is used and dominance is now a required analysis pass in the combiner. The optimization itself is iterative. The critical step is to replace the select result with the non-constant select operand. So the select becomes local and the combiner iteratively works out simpler code pattern and eventually eliminates the select. rdar://17853760 llvm-svn: 218721
*	[SimplifyCFG] threshold for folding branches with common destination	Jingyue Wu	2014-09-30	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds a threshold that controls the number of bonus instructions allowed for folding branches with common destination. The original code allows at most one bonus instruction. With this patch, users can customize the threshold to allow multiple bonus instructions. The default threshold is still 1, so that the code behaves the same as before when users do not specify this threshold. The motivation of this change is that tuning this threshold significantly (up to 25%) improves the performance of some CUDA programs in our internal code base. In general, branch instructions are very expensive for GPU programs. Therefore, it is sometimes worth trading more arithmetic computation for a more straightened control flow. Here's a reduced example: __global__ void foo(int a, int b, int c, int d, int e, int n, const int input, int output) { int sum = 0; for (int i = 0; i < n; ++i) sum += (((i ^ a) > b) && (((i \| c ) ^ d) > e)) ? 0 : input[i]; *output = sum; } The select statement in the loop body translates to two branch instructions "if ((i ^ a) > b)" and "if (((i \| c) ^ d) > e)" which share a common destination. With the default threshold, SimplifyCFG is unable to fold them, because computing the condition of the second branch "(i \| c) ^ d > e" requires two bonus instructions. With the threshold increased, SimplifyCFG can fold the two branches so that the loop body contains only one branch, making the code conceptually look like: sum += (((i ^ a) > b) & (((i \| c ) ^ d) > e)) ? 0 : input[i]; Increasing the threshold significantly improves the performance of this particular example. In the configuration where both conditions are guaranteed to be true, increasing the threshold from 1 to 2 improves the performance by 18.24%. Even in the configuration where the first condition is false and the second condition is true, which favors shortcuts, increasing the threshold from 1 to 2 still improves the performance by 4.35%. We are still looking for a good threshold and maybe a better cost model than just counting the number of bonus instructions. However, according to the above numbers, we think it is at least worth adding a threshold to enable more experiments and tuning. Let me know what you think. Thanks! Test Plan: Added one test case to check the threshold is in effect Reviewers: nadav, eliben, meheff, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D5529 llvm-svn: 218711
*	[IndVarSimplify] Widen loop unsigned compares.	Chad Rosier	2014-09-30	1	-0/+28
\| \| \| \| \| \| \|	This patch extends r217953 to handle unsigned comparison. Phabricator revision: http://reviews.llvm.org/D5526 llvm-svn: 218659
*	[AArch64] Improve cost model to handle sdiv by a pow-of-two.	Chad Rosier	2014-09-29	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \|	This patch improves the target-specific cost model to better handle signed division by a power of two. The immediate result is that this enables the SLP vectorizer to do a better job. http://reviews.llvm.org/D5469 PR20714 llvm-svn: 218607
*	Use a loop to simplify the runtime unrolling prologue.	Kevin Qin	2014-09-29	4	-19/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. llvm-svn: 218604
*	[IndVar] Don't widen loop compare unless IV user is sign extended.	Chad Rosier	2014-09-26	1	-0/+26
\| \| \| \| \| \|	PR21030 llvm-svn: 218539
*	Ignore annotation function calls in cost computation	David Peixotto	2014-09-26	1	-0/+133
\| \| \| \| \| \| \| \| \| \| \|	The annotation instructions are dropped during codegen and have no impact on size. In some cases, the annotations were preventing the unroller from unrolling a loop because the annotation calls were pushing the cost over the unrolling threshold. Differential Revision: http://reviews.llvm.org/D5335 llvm-svn: 218525
*	Fix assertion in LICM doFinalization()	David Peixotto	2014-09-24	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The doFinalization method checks that the LoopToAliasSetMap is empty. LICM populates that map as it runs through the loop nest, deleting the entries for child loops as it goes. However, if a child loop is deleted by another pass (e.g. unrolling) then the loop will never be deleted from the map because LICM walks the loop nest to find entries it can delete. The fix is to delete the loop from the map and free the alias set when the loop is deleted from the loop nest. Differential Revision: http://reviews.llvm.org/D5305 llvm-svn: 218387
*	GlobalOpt: Preserve comdats of unoptimized initializers	Reid Kleckner	2014-09-23	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than slurping in and splatting out the whole ctor list, preserve the existing array entries without trying to understand them. Only remove the entries that we know we can optimize away. This way we don't need to wire through priority and comdats or anything else we might add. Fixes a linker issue where the .init_array or .ctors entry would point to discarded initialization code if the comdat group from the TU with the faulty global_ctors entry was dropped. llvm-svn: 218337
*	[IndVarSimplify] Partially revert r217953 to see if this fixes the bots.	Chad Rosier	2014-09-17	1	-28/+0
\| \| \| \| \| \|	Specifically, disable widening of unsigned compare instructions. llvm-svn: 217962
*	[IndVarSimplify] Widen loop compare instructions.	Chad Rosier	2014-09-17	3	-5/+172
\| \| \| \| \| \| \|	This improves other optimizations such as LSR. A sext may be added to the compare's other operand, but this can often be hoisted outside of the loop. llvm-svn: 217953
*	[InstCombine] Fix wrong folding of constant comparison involving ahsr and ↵	Andrea Di Biagio	2014-09-17	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	negative quantities (PR20945). Example: define i1 @foo(i32 %a) { %shr = ashr i32 -9, %a %cmp = icmp ne i32 %shr, -5 ret i1 %cmp } Before this fix, the instruction combiner wrongly thought that %shr could have never been equal to -5. Therefore, %cmp was always folded to 'true'. However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore, in this example, it is not valid to fold %cmp to 'true'. The problem was only affecting the case where the comparison was between negative quantities where one of the quantities was obtained from arithmetic shift of a negative constant. This patch fixes the problem with the wrong folding (fixes PR20945). With this patch, the 'icmp' from the example is now simplified to a comparison between %a and 1. This still allows us to get rid of the arithmetic shift (%shr). llvm-svn: 217950
*	InstSimplify: Don't allow (x srem y) urem y -> x srem y	David Majnemer	2014-09-17	1	-2/+21
\| \| \| \| \| \| \| \| \| \| \|	Let's consider the case where: %x i16 = 32768 %y i16 = 384 %x srem %y = 65408 (%x srem %y) urem %y = 128 llvm-svn: 217939