bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	LoopVectorize fix: LoopInfo must be valid when invoking utils like SCEVExpander.	Andrew Trick	2013-07-13	1	-0/+113
\| \| \| \| \| \| \| \| \| \| \|	In general, one should always complete CFG modifications first, update CFG-based analyses, like Dominatores and LoopInfo, then generate instruction sequences. LoopVectorizer was creating a new loop, calling SCEVExpander to generate checks, then updating LoopInfo. I just changed the order. llvm-svn: 186241
*	Add a microoptimization for urem.	Nick Lewycky	2013-07-13	1	-0/+9
\| \| \| \|	llvm-svn: 186235
*	Fix logic error optimizing "icmp pred (urem X, Y), Y" where pred is signed.	Nick Lewycky	2013-07-12	1	-1/+9
\| \| \| \| \| \|	Fixes PR16605. llvm-svn: 186229
*	Fix a crash in EvaluateInDifferentElementOrder where it would generate an	Joey Gouly	2013-07-12	1	-0/+15
\| \| \| \| \| \| \| \|	undef vector of the wrong type. LGTM'd by Nick Lewycky on IRC. llvm-svn: 186224
*	LFTR improvement to avoid truncation.	Andrew Trick	2013-07-12	1	-0/+44
\| \| \| \| \| \|	This is a reimplemntation of the patch originally in r186107. llvm-svn: 186215
*	X86 cost model: Add cost for vectorized gather/scather	Arnold Schwaighofer	2013-07-12	1	-0/+86
\| \| \| \| \| \|	radar://14351991 llvm-svn: 186189
*	ARM cost model: Add cost for gather/scather	Arnold Schwaighofer	2013-07-12	1	-0/+88
\| \| \| \| \| \| \| \| \| \|	Fixes a 35% degradation compared to unvectorized code in MiBench/automotive-susan and an equally serious regression on a private image processing benchmark. radar://14351991 llvm-svn: 186188
*	Start using CHECK-LABEL in some tests.	Stephen Lin	2013-07-12	1	-5/+5
\| \| \| \|	llvm-svn: 186163
*	Revert "indvars: Improve LFTR by eliminating truncation when comparing	Chandler Carruth	2013-07-12	1	-25/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	against a constant." This reverts commit r186107. It didn't handle wrapping arithmetic in the loop correctly and thus caused the following C program to count from 0 to UINT64_MAX instead of from 0 to 255 as intended: #include <stdio.h> int main() { unsigned char first = 0, last = 255; do { printf("%d\n", first); } while (first++ != last); } Full test case and instructions to reproduce with just the -indvars pass sent to the original review thread rather than to r186107's commit. llvm-svn: 186152
*	SLPVectorizer: Sink and enable CSE for ExtractElements.	Nadav Rotem	2013-07-12	3	-4/+4
\| \| \| \|	llvm-svn: 186145
*	SLPVectorize: Replace the code that checks for vectorization candidates in ↵	Nadav Rotem	2013-07-12	1	-0/+74
\| \| \| \| \| \| \| \|	successor blocks with code that scans PHINodes. Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler. llvm-svn: 186139
*	indvars: Improve LFTR by eliminating truncation when comparing against a ↵	Andrew Trick	2013-07-11	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	constant. Patch by Michele Scandale! Adds a special handling of the case where, during the loop exit condition rewriting, the exit value is a constant of bitwidth lower than the type of the induction variable: instead of introducing a trunc operation in order to match correctly the operand types, it allows to convert the constant value to an equivalent constant, depending on the initial value of the induction variable and the trip count, in order have an equivalent comparison between the induction variable and the new constant. llvm-svn: 186107
*	LoopVectorize: Vectorize all accesses in address space zero with unit stride	Arnold Schwaighofer	2013-07-11	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \|	We can vectorize them because in the case where we wrap in the address space the unvectorized code would have had to access a pointer value of zero which is undefined behavior in address space zero according to the LLVM IR semantics. (Thank you Duncan, for pointing this out to me). Fixes PR16592. llvm-svn: 186088
*	TryToSimplifyUncondBranchFromEmptyBlock was checking that any common	Duncan Sands	2013-07-11	1	-1/+239
\| \| \| \| \| \| \| \| \| \|	predecessors of the two blocks it is attempting to merge supply the same incoming values to any phi in the successor block. This change allows merging in the case where there is one or more incoming values that are undef. The undef values are rewritten to match the non-undef value that flows from the other edge. Patch by Mark Lacey. llvm-svn: 186069
*	Consolidate more lit tests.	Nadav Rotem	2013-07-11	3	-62/+54
\| \| \| \|	llvm-svn: 186063
*	Consolidate some of the lit tests.	Nadav Rotem	2013-07-11	4	-75/+57
\| \| \| \|	llvm-svn: 186062
*	Consolidate some of the lit tests.	Nadav Rotem	2013-07-11	5	-61/+191
\| \| \| \|	llvm-svn: 186060
*	Teach TailRecursionElimination to handle certain cases of nocapture escaping ↵	Michael Gottesman	2013-07-11	2	-25/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	allocas. Without the changes introduced into this patch, if TRE saw any allocas at all, TRE would not perform TRE or mark callsites with the tail marker. Because TRE runs after mem2reg, this inadequacy is not a death sentence. But given a callsite A without escaping alloca argument, A may not be able to have the tail marker placed on it due to a separate callsite B having a write-back parameter passed in via an argument with the nocapture attribute. Assume that B is the only other callsite besides A and B only has nocapture escaping alloca arguments (NOTE B may have other arguments that are not passed allocas). In this case not marking A with the tail marker is unnecessarily conservative since: 1. By assumption A has no escaping alloca arguments itself so it can not access the caller's stack via its arguments. 2. Since all of B's escaping alloca arguments are passed as parameters with the nocapture attribute, we know that B does not stash said escaping allocas in a manner that outlives B itself and thus could be accessed indirectly by A. With the changes introduced by this patch: 1. If we see any escaping allocas passed as a capturing argument, we do nothing and bail early. 2. If we do not see any escaping allocas passed as captured arguments but we do see escaping allocas passed as nocapture arguments: i. We do not perform TRE to avoid PR962 since the code generator produces significantly worse code for the dynamic allocas that would be created by the TRE algorithm. ii. If we do not return twice, mark call sites without escaping allocas with the tail marker. NOTE This excludes functions with escaping nocapture allocas. 3. If we do not see any escaping allocas at all (whether captured or not): i. If we do not have usage of setjmp, mark all callsites with the tail marker. ii. If there are no dynamic/variable sized allocas in the function, attempt to perform TRE on all callsites in the function. Based off of a patch by Nick Lewycky. rdar://14324281. llvm-svn: 186057
*	InstSimplify: X >> X -> 0	David Majnemer	2013-07-09	2	-3/+19
\| \| \| \|	llvm-svn: 185973
*	Fix PR16571, which is a bug in the code that checks that all of the types in ↵	Nadav Rotem	2013-07-09	1	-0/+22
\| \| \| \| \| \|	the bundle are uniform. llvm-svn: 185970
*	ValueTracking: Fix bugs in isKnownToBeAPowerOfTwo	David Majnemer	2013-07-09	1	-15/+0
\| \| \| \| \| \| \|	(add nsw x, (and x, y)) isn't a power of two if x is zero, it's zero (add nsw x, (xor x, y)) isn't a power of two if y has bits set that aren't set in x llvm-svn: 185954
*	InstCombine: variations on 0xffffffff - x >= 4	David Majnemer	2013-07-09	1	-0/+18
\| \| \| \| \| \| \| \| \| \|	The following transforms are valid if -C is a power of 2: (icmp ugt (xor X, C), ~C) -> (icmp ult X, C) (icmp ult (xor X, C), -C) -> (icmp uge X, C) These are nice, they get rid of the xor. llvm-svn: 185915
*	InstCombine: X & -C != -C -> X <= u ~C	David Majnemer	2013-07-09	1	-40/+0
\| \| \| \| \| \|	Tests were added in r185910 somehow. llvm-svn: 185912
*	Commit r185909 was a misapplied patch, fix it	David Majnemer	2013-07-09	2	-3/+60
\| \| \| \|	llvm-svn: 185910
*	InstCombine: add more transforms	David Majnemer	2013-07-09	3	-5/+45
\| \| \| \| \| \| \| \| \|	C1-X <u C2 -> (X\|(C2-1)) == C1 C1-X >u C2 -> (X\|C2) == C1 X-C1 <u C2 -> (X & -C2) == C1 X-C1 >u C2 -> (X & ~C2) == C1 llvm-svn: 185909
*	InstCombine: Fold X-C1 <u 2 -> (X & -2) == C1	David Majnemer	2013-07-08	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \|	Back in r179493 we determined that two transforms collided with each other. The fix back then was to reorder the transforms so that the preferred transform would give it a try and then we would try the secondary transform. However, it was noted that the best approach would canonicalize one transform into the other, removing the collision and allowing us to optimize IR given to us in that form. llvm-svn: 185808
*	[objc-arc] Committed test for r185770 as per dblaikie's suggestion.	Michael Gottesman	2013-07-08	1	-0/+19
\| \| \| \|	llvm-svn: 185782
*	Eliminate trivial redundant loads across nocapture+readonly calls to uncaptured	Nick Lewycky	2013-07-07	1	-0/+17
\| \| \| \| \| \|	pointer arguments. llvm-svn: 185776
*	SLPVectorizer: Implement DCE as part of vectorization.	Nadav Rotem	2013-07-07	16	-8/+645
\| \| \| \| \| \| \| \| \|	This is a complete re-write if the bottom-up vectorization class. Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization. There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design. In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree. llvm-svn: 185774
*	[objc-arc] Remove the alias analysis part of r185764.	Michael Gottesman	2013-07-07	1	-18/+0
\| \| \| \| \| \| \|	Upon further reflection, the alias analysis part of r185764 is not a safe change. llvm-svn: 185770
*	[objc-arc] Teach the ARC optimizer that objc_sync_enter/objc_sync_exit do ↵	Michael Gottesman	2013-07-07	2	-5/+42
\| \| \| \| \| \|	not modify the ref count of an objc object and additionally are inert for modref purposes. llvm-svn: 185769
*	InstCombine: typo in or_icmp_eq_B_0_icmp_ult_A_B test	David Majnemer	2013-07-06	1	-2/+2
\| \| \| \|	llvm-svn: 185737
*	Extend 'readonly' and 'readnone' to work on function arguments as well as	Nick Lewycky	2013-07-06	7	-21/+79
\| \| \| \| \| \| \|	functions. Make the function attributes pass add it to known library functions and when it can deduce it. llvm-svn: 185735
*	[TRE] Combined another test into basic.ll	Michael Gottesman	2013-07-05	2	-13/+1
\| \| \| \|	llvm-svn: 185729
*	[TRE] Merged several tests into the the test basic.ll.	Michael Gottesman	2013-07-05	5	-62/+58
\| \| \| \|	llvm-svn: 185723
*	InstCombine: (icmp eq B, 0) \| (icmp ult A, B) -> (icmp ule A, B-1)	David Majnemer	2013-07-05	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This transform allows us to turn IR that looks like: %1 = icmp eq i64 %b, 0 %2 = icmp ult i64 %a, %b %3 = or i1 %1, %2 ret i1 %3 into: %0 = add i64 %b, -1 %1 = icmp uge i64 %0, %a ret i1 %1 which means we go from lowering: cmpq %rsi, %rdi setb %cl testq %rsi, %rsi sete %al orb %cl, %al ret to lowering: decq %rsi cmpq %rdi, %rsi setae %al ret llvm-svn: 185677
*	InstCombine: Reimplementation of visitUDivOperand	David Majnemer	2013-07-04	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \|	This transform was originally added in r185257 but later removed in r185415. The original transform would create instructions speculatively and then discard them if the speculation was proved incorrect. This has been replaced with a scheme that splits the transform into two parts: preflight and fold. While we preflight, we build up fold actions that inform the folding stage on how to act. llvm-svn: 185667
*	SimplifyCFG: Teach switch generation some patterns that instcombine forms.	Benjamin Kramer	2013-07-04	1	-0/+36
\| \| \| \| \| \| \| \|	This allows us to create switches even if instcombine has munged two of the incombing compares into one and some bit twiddling. This was motivated by enum compares that are common in clang. llvm-svn: 185632
*	Change the gettimeofday test to only test on a posix platform.	Michael Gottesman	2013-07-03	1	-1/+3
\| \| \| \|	llvm-svn: 185503
*	Added support in FunctionAttrs for adding relevant function/argument ↵	Michael Gottesman	2013-07-03	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	attributes for the posix call gettimeofday. This implies annotating it as nounwind and its arguments as nocapture. To be conservative, we do not annotate the arguments with noalias since some platforms do not have restrict on the declaration for gettimeofday. llvm-svn: 185502
*	Revert r185257 (InstCombine: Be more agressive optimizing 'udiv' instrs with ↵	Hal Finkel	2013-07-02	2	-0/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'select' denoms) I'm reverting this commit because: 1. As discussed during review, it needs to be rewritten (to avoid creating and then deleting instructions). 2. This is causing optimizer crashes. Specifically, I'm seeing things like this: While deleting: i1 % Use still stuck around after Def is destroyed: <badref> = select i1 <badref>, i32 0, i32 1 opt: /src/llvm-trunk/lib/IR/Value.cpp:79: virtual llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed. I'd guess that these will go away once we're no longer creating/deleting instructions here, but just in case, I'm adding a regression test. Because the code is bring rewritten, I've just XFAIL'd the original regression test. Original commit message: InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms Real world code sometimes has the denominator of a 'udiv' be a 'select'. LLVM can handle such cases but only when the 'select' operands are symmetric in structure (both select operands are a constant power of two or a left shift, etc.). This falls apart if we are dealt a 'udiv' where the code is not symetric or if the select operands lead us to more select instructions. Instead, we should treat the LHS and each select operand as a distinct divide operation and try to optimize them independently. If we can to simplify each operation, then we can replace the 'udiv' with, say, a 'lshr' that has a new select with a bunch of new operands for the select. llvm-svn: 185415
*	LoopVectorize: Math functions only read rounding mode	Arnold Schwaighofer	2013-07-01	1	-0/+32
\| \| \| \| \| \| \| \|	Math functions are mark as readonly because they read the floating point rounding mode. Because we don't vectorize loops that would contain function calls that set the rounding mode it is safe to ignore this memory read. llvm-svn: 185299
*	DeadArgumentElimination: keep return value on functions that have a live ↵	Stephen Lin	2013-06-30	1	-0/+55
\| \| \| \| \| \|	argument with the 'returned' attribute (rather than generate invalid IR); however, if both can be eliminated, both will be llvm-svn: 185290
*	ConstantFold: Check that truncating the other side is safe under a sext when ↵	Benjamin Kramer	2013-06-30	1	-3/+17
\| \| \| \| \| \| \| \|	trying to remove a sext from a compare. Fixes PR16462. llvm-svn: 185284
*	ValueTracking: Teach isKnownToBeAPowerOfTwo about (ADD X, (XOR X, Y)) where ↵	David Majnemer	2013-06-29	1	-0/+15
\| \| \| \| \| \| \| \| \|	X is a power of two This allows us to simplify urem instructions involving the add+xor to turn into simpler math. llvm-svn: 185272
*	InstCombine: Also turn selects fed by an and into arithmetic when the types ↵	Benjamin Kramer	2013-06-29	1	-0/+36
\| \| \| \| \| \| \| \| \|	don't match. Inserting a zext or trunc is sufficient. This pattern is somewhat common in LLVM's pointer mangling code. llvm-svn: 185270
*	InstCombine: FoldGEPICmp shouldn't change sign of base pointer comparison	David Majnemer	2013-06-29	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changing the sign when comparing the base pointer would introduce all sorts of unexpected things like: %gep.i = getelementptr inbounds [1 x i8]* %a, i32 0, i32 0 %gep2.i = getelementptr inbounds [1 x i8]* %b, i32 0, i32 0 %cmp.i = icmp ult i8* %gep.i, %gep2.i %cmp.i1 = icmp ult [1 x i8]* %a, %b %cmp = icmp ne i1 %cmp.i, %cmp.i1 ret i1 %cmp into: %cmp.i = icmp slt [1 x i8]* %a, %b %cmp.i1 = icmp ult [1 x i8]* %a, %b %cmp = xor i1 %cmp.i, %cmp.i1 ret i1 %cmp By preserving the original sign, we now get: ret i1 false This fixes PR16483. llvm-svn: 185259
*	InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms	David Majnemer	2013-06-29	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Real world code sometimes has the denominator of a 'udiv' be a 'select'. LLVM can handle such cases but only when the 'select' operands are symmetric in structure (both select operands are a constant power of two or a left shift, etc.). This falls apart if we are dealt a 'udiv' where the code is not symetric or if the select operands lead us to more select instructions. Instead, we should treat the LHS and each select operand as a distinct divide operation and try to optimize them independently. If we can to simplify each operation, then we can replace the 'udiv' with, say, a 'lshr' that has a new select with a bunch of new operands for the select. llvm-svn: 185257
*	InstCombine: Optimize (1 << X) Pred CstP2 to X Pred Log2(CstP2)	David Majnemer	2013-06-28	1	-0/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We may, after other optimizations, find ourselves with IR that looks like: %shl = shl i32 1, %y %cmp = icmp ult i32 %shl, 32 Instead, we should just compare the shift count: %cmp = icmp ult i32 %y, 5 llvm-svn: 185242
*	SLP Vectorizer: Add support for trees with external users.	Nadav Rotem	2013-06-28	3	-6/+117
\| \| \| \| \| \| \|	To support this we have to insert 'extractelement' instructions to pick the right lane. We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated. llvm-svn: 185230