bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Debug Info: enable verifier for testing cases.	Manman Ren	2013-07-29	5	-5/+5
\| \| \| \|	llvm-svn: 187375
*	Add the C source code to the test to make it easier to update when debug ↵	Nadav Rotem	2013-07-29	1	-0/+9
\| \| \| \| \| \| \| \|	info changes. Thanks Eric. llvm-svn: 187368
*	SLPVectorier: update the debug location for the new instructions.	Nadav Rotem	2013-07-29	1	-0/+82
\| \| \| \|	llvm-svn: 187363
*	Debug Info: update testing cases to pass verifier.	Manman Ren	2013-07-29	8	-59/+66
\| \| \| \|	llvm-svn: 187362
*	Don't vectorize when the attribute NoImplicitFloat is used.	Nadav Rotem	2013-07-29	1	-0/+25
\| \| \| \|	llvm-svn: 187340
*	SimplifyCFG: Add missing tests from r187278	Tom Stellard	2013-07-27	3	-0/+125
\| \| \| \|	llvm-svn: 187291
*	Debug Info Verifier: verify SPs in llvm.dbg.sp.	Manman Ren	2013-07-27	10	-67/+83
\| \| \| \| \| \| \| \|	Also always add DIType, DISubprogram and DIGlobalVariable to the list in DebugInfoFinder without checking them, so we can verify them later on. llvm-svn: 187285
*	SLP Vectorier: Don't vectorize really short chains because they are already ↵	Nadav Rotem	2013-07-26	1	-1/+3
\| \| \| \| \| \|	handled by the SelectionDAG store-vectorizer, which does a better job in deciding when to vectorize. llvm-svn: 187267
*	SLP Vectorizer: Disable the vectorization of non power of two chains, such ↵	Nadav Rotem	2013-07-26	2	-33/+39
\| \| \| \| \| \|	as <3 x float>, because we dont have a good cost model for these types. llvm-svn: 187265
*	next batch of -disable-debug-info-verifier	Rafael Espindola	2013-07-26	1	-1/+1
\| \| \| \|	llvm-svn: 187260
*	When InstCombine tries to fold away (fsub x, (fneg y)) into (fadd x, y), it is	Owen Anderson	2013-07-26	1	-0/+12
\| \| \| \| \| \| \|	also worthwhile for it to look through FP extensions and truncations, whose application commutes with fneg. llvm-svn: 187249
*	Debug Info Verifier: enable verification of DICompileUnit.	Manman Ren	2013-07-26	3	-7/+10
\| \| \| \| \| \| \| \|	We used to call Verify before adding DICompileUnit to the list, and now we remove the check and always add DICompileUnit to the list in DebugInfoFinder, so we can verify them later on. llvm-svn: 187237
*	Next batch of -disable-debug-info-verifier.	Rafael Espindola	2013-07-26	4	-4/+4
\| \| \| \| \| \|	These tests fail without it if pipefail is enabled. llvm-svn: 187205
*	Re-implement the analysis of uses in mem2reg to be significantly more	Chandler Carruth	2013-07-26	2	-26/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	robust. It now uses an InstVisitor and worklist to actually walk the uses of the Alloca transitively and detect the pattern which we can directly promote: loads & stores of the whole alloca and instructions we can completely ignore. Also, with this new implementation teach both the predicate for testing whether we can promote and the promotion engine itself to use the same code so we no longer have strange divergence between the two code paths. I've added some silly test cases to demonstrate that we can handle slightly more degenerate code patterns now. See the below for why this is even interesting. Performance impact: roughly 1% regression in the performance of SROA or ScalarRepl on a large C++-ish test case where most of the allocas are basically ready for promotion. The reason is because of silly redundant work that I've left FIXMEs for and which I'll address in the next commit. I wanted to separate this commit as it changes the behavior. Once the redundant work in removing the dead uses of the alloca is fixed, this code appears to be faster than the old version. =] So why is this useful? Because the previous requirement for promotion required a specific visit pattern of the uses of the alloca to verify: we had to look for no more than 1 intervening use. The end goal is to have SROA automatically detect when an alloca is already promotable and directly hand it to the mem2reg machinery rather than trying to partition and rewrite it. This is a 25% or more performance improvement for SROA, and a significant chunk of the delta between it and ScalarRepl. To get there, we need to make mem2reg actually capable of promoting allocas which look promotable to SROA without have SROA do tons of work to massage the code into just the right form. This is actually the tip of the iceberg. There are tremendous potential savings we can realize here by de-duplicating work between mem2reg and SROA. llvm-svn: 187191
*	Debug Info: improve the verifier to check field types.	Manman Ren	2013-07-25	2	-8/+9
\| \| \| \| \| \| \|	Make sure the context field of DIType is MDNode. Fix testing cases to make them pass the verifier. llvm-svn: 187150
*	Allocate local registers in order for optimal coloring.	Andrew Trick	2013-07-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also avoid locals evicting locals just because they want a cheaper register. Problem: MI Sched knows exactly how many registers we have and assumes they can be colored. In cases where we have large blocks, usually from unrolled loops, greedy coloring fails. This is a source of "regressions" from the MI Scheduler on x86. I noticed this issue on x86 where we have long chains of two-address defs in the same live range. It's easy to see this in matrix multiplication benchmarks like IRSmk and even the unit test misched-matmul.ll. A fundamental difference between the LLVM register allocator and conventional graph coloring is that in our model a live range can't discover its neighbors, it can only verify its neighbors. That's why we initially went for greedy coloring and added eviction to deal with the hard cases. However, for singly defined and two-address live ranges, we can optimally color without visiting neighbors simply by processing the live ranges in instruction order. Other beneficial side effects: It is much easier to understand and debug regalloc for large blocks when the live ranges are allocated in order. Yes, global allocation is still very confusing, but it's nice to be able to comprehend what happened locally. Heuristics could be added to bias register assignment based on instruction locality (think late register pairing, banks...). Intuituvely this will make some test cases that are on the threshold of register pressure more stable. llvm-svn: 187139
*	Current batch of -disable-debug-info-verifier.	Rafael Espindola	2013-07-25	4	-5/+5
\| \| \| \|	llvm-svn: 187130
*	Debug Info: improve the verifier to check field types.	Manman Ren	2013-07-25	1	-7/+7
\| \| \| \| \| \| \| \|	Make sure the context and type fields are MDNodes. We will generate verification errors if those fields are non-empty strings. Fix testing cases to make them pass the verifier. llvm-svn: 187106
*	Respect llvm.used in Internalize.	Rafael Espindola	2013-07-25	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The language reference says that: "If a symbol appears in the @llvm.used list, then the compiler, assembler, and linker are required to treat the symbol as if there is a reference to the symbol that it cannot see" Since even the linker cannot see the reference, we must assume that the reference can be using the symbol table. For example, a user can add __attribute__((used)) to a debug helper function like dump and use it from a debugger. llvm-svn: 187103
*	Check that TD isn't NULL before dereferencing it down this path.	Nick Lewycky	2013-07-25	1	-0/+17
\| \| \| \|	llvm-svn: 187099
*	Update testing cases to pass debug info verifier.	Manman Ren	2013-07-24	2	-27/+29
\| \| \| \|	llvm-svn: 187083
*	add -disable-debug-info-verifier to 3 test to fix tests with pipefail.	Rafael Espindola	2013-07-24	3	-3/+3
\| \| \| \|	llvm-svn: 187064
*	Debug Info: improve the Finder.	Manman Ren	2013-07-24	2	-5/+5
\| \| \| \| \| \| \|	Improve the Finder to handle context of a DIVariable used by DbgValueInst. Fix testing cases to make them pass the verifier. llvm-svn: 187052
*	Fix a problem I introduced in r187029 where we would over-eagerly	Chandler Carruth	2013-07-24	1	-0/+37
\| \| \| \| \| \| \| \|	schedule an alloca for another iteration in SROA. This only showed up with a mixture of promotable and unpromotable selects and phis. Added a test case for this. llvm-svn: 187031
*	Fix PR16687 where we were incorrectly promoting an alloca that had	Chandler Carruth	2013-07-24	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pending speculation for a phi node. The problem here is that we were using growth of the specluation set as an indicator of whether speculation would occur, and if the phi node is already in the set we don't see it grow. This is a symptom of the fact that this signal is a total hack. Unfortunately, I couldn't really come up with a non-hacky way of signaling that promotion remains valid after speculation occurs, such that we only speculate when all else looks good for promotion. In the end, I went with at least a much more explicit approach of doing the work of queuing inside the phi and select processing and setting a preposterously named flag to convey that we're in the special state of requiring speculating before promotion. Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing a testcase for this from a pretty giant, nasty assert in a big application. =] The testcase was excellent. llvm-svn: 187029
*	Add -disable-debug-info-verifier.	Rafael Espindola	2013-07-23	1	-1/+1
\| \| \| \| \| \|	Found while testing with pipefail enabled. llvm-svn: 186937
*	Debug Info Finder: use processDeclare and processValue to list debug info	Manman Ren	2013-07-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	MDNodes used by DbgDeclareInst and DbgValueInst. Another 16 testing cases failed and they are disabled with -disable-debug-info-verifier. A total of 34 cases are disabled with -disable-debug-info-verifier and will be corrected. llvm-svn: 186902
*	When we vectorize across multiple basic blocks we may vectorize PHINodes ↵	Nadav Rotem	2013-07-22	1	-0/+58
\| \| \| \| \| \|	that create a cycle. We already break the cycle on phi-nodes, but arithmetic operations are still uplicated. This patch adds code that checks if the operation that we are vectorizing was vectorized during the visit of the operands and uses this value if it can. llvm-svn: 186883
*	Treat nothrow forms of ::operator delete and ::operator delete[] as	Richard Smith	2013-07-21	1	-0/+24
\| \| \| \| \| \|	deallocation functions. llvm-svn: 186798
*	Don't crash when llvm.compiler.used becomes empty.	Rafael Espindola	2013-07-20	1	-0/+16
\| \| \| \| \| \| \| \|	GlobalOpt simplifies llvm.compiler.used by removing any members that are also in the more strict llvm.used. Handle the special case where llvm.compiler.used becomes empty. llvm-svn: 186778
*	InstCombine: call FoldOpIntoSelect for all floating binops, not just fmul	Stephen Lin	2013-07-20	1	-0/+71
\| \| \| \|	llvm-svn: 186759
*	Have InlineCost check constant fcmps	Matt Arsenault	2013-07-20	1	-0/+31
\| \| \| \|	llvm-svn: 186758
*	s/compiler_used/compiler.used/.	Rafael Espindola	2013-07-19	1	-2/+2
\| \| \| \| \| \| \|	We were incorrectly using compiler_used instead of compiler.used. Unfortunately the passes using the broken name had tests also using the broken name. llvm-svn: 186705
*	Fix another assert failure very similar to PR16651's test case. This	Chandler Carruth	2013-07-19	1	-2/+22
\| \| \| \| \| \| \|	test case came from Benjamin and found the parallel bug in the vector promotion code. llvm-svn: 186666
*	Fix PR16651, an assert introduced in my recent re-work of the innards of	Chandler Carruth	2013-07-19	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SROA. The crux of the issue is that now we track uses of a partition of the alloca in two places: the iterators over the partitioning uses and the previously collected split uses vector. We weren't accounting for the fact that the split uses might invalidate integer widening in ways other than due to their width (in this case due to being volatile). Further reduced testcase added to the tests. llvm-svn: 186655
*	Reapply r186316 with a fix for one bug where the code could walk off the	Chandler Carruth	2013-07-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	end of a vector. This was found with ASan. I've had one other report of a crasher, but thus far been unable to reproduce the crash. It may well be fixed with this version, and if not I'd like to get more information from the build bots about what is happening. See r186316 for the full commit log for the new implementation of the SROA algorithm. llvm-svn: 186565
*	Restore r181216, which was partially reverted in r182499.	Stephen Lin	2013-07-17	1	-9/+44
\| \| \| \|	llvm-svn: 186533
*	Fix comparisons of alloca alignment in inliner merging	Hal Finkel	2013-07-17	1	-0/+33
\| \| \| \| \| \| \| \|	Duncan pointed out a mistake in my fix in r186425 when only one of the allocas being compared had the target-default alignment. This is essentially his suggested solution. Thanks! llvm-svn: 186510
*	When the inliner merges allocas, it must keep the larger alignment	Hal Finkel	2013-07-16	2	-0/+187
\| \| \| \| \| \| \| \| \| \| \| \|	For safety, the inliner cannot decrease the allignment on an alloca when merging it with another. I've included two variants of the test case for this: one with DataLayout available, and one without. When DataLayout is not available, if only one of the allocas uses the default alignment (getAlignment() == 0), then they cannot be safely merged. llvm-svn: 186425
*	PR16628: Fix a bug in the code that merges compares.	Nadav Rotem	2013-07-15	1	-0/+27
\| \| \| \| \| \|	Compares return i1 but they compare different types. llvm-svn: 186359
*	Revert r186316 while I track down an ASan failure and an assert from	Chandler Carruth	2013-07-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	a bot. This reverts the commit which introduced a new implementation of the fancy SROA pass designed to reduce its overhead. I'll skip the huge commit log here, refer to r186316 if you're looking for how this all works and why it works that way. llvm-svn: 186332
*	Reimplement SROA yet again. Same fundamental principle, but a totally	Chandler Carruth	2013-07-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	different core implementation strategy. Previously, SROA would build a relatively elaborate partitioning of an alloca, associate uses with each partition, and then rewrite the uses of each partition in an attempt to break apart the alloca into chunks that could be promoted. This was very wasteful in terms of memory and compile time because regardless of how complex the alloca or how much we're able to do in breaking it up, all of the datastructure work to analyze the partitioning was done up front. The new implementation attempts to form partitions of the alloca lazily and on the fly, rewriting the uses that make up that partition as it goes. This has a few significant effects: 1) Much simpler data structures are used throughout. 2) No more double walk of the recursive use graph of the alloca, only walk it once. 3) No more complex algorithms for associating a particular use with a particular partition. 4) PHI and Select speculation is simplified and happens lazily. 5) More precise information is available about a specific use of the alloca, removing the need for some side datastructures. Ultimately, I think this is a much better implementation. It removes about 300 lines of code, but arguably removes more like 500 considering that some code grew in the process of being factored apart and cleaned up for this all to work. I've re-used as much of the old implementation as possible, which includes the lion's share of code in the form of the rewriting logic. The interesting new logic centers around how the uses of a partition are sorted, and split into actual partitions. Each instruction using a pointer derived from the alloca gets a 'Partition' entry. This name is totally wrong, but I'll do a rename in a follow-up commit as there is already enough churn here. The entry describes the offset range accessed and the nature of the access. Once we have all of these entries we sort them in a very specific way: increasing order of begin offset, followed by whether they are splittable uses (memcpy, etc), followed by the end offset or whatever. Sorting by splittability is important as it simplifies the collection of uses into a partition. Once we have these uses sorted, we walk from the beginning to the end building up a range of uses that form a partition of the alloca. Overlapping unsplittable uses are merged into a single partition while splittable uses are broken apart and carried from one partition to the next. A partition is also introduced to bridge splittable uses between the unsplittable regions when necessary. I've looked at the performance PRs fairly closely. PR15471 no longer will even load (the module is invalid). Not sure what is up there. PR15412 improves by between 5% and 10%, however it is nearly impossible to know what is holding it up as SROA (the entire pass) takes less time than reading the IR for that test case. The analysis takes the same time as running mem2reg on the final allocas. I suspect (without much evidence) that the new implementation will scale much better however, and it is just the small nature of the test cases that makes the changes small and noisy. Either way, it is still simpler and cleaner I think. llvm-svn: 186316
*	Teach indvars to generate nsw/nuw flags when widening an induction variable.	Andrew Trick	2013-07-14	1	-0/+29
\| \| \| \| \| \|	Fixes PR16600. llvm-svn: 186272
*	Fixup to r186268 and r186269: don't append -LABEL to CHECK-NOT. No ↵	Stephen Lin	2013-07-14	4	-5/+5
\| \| \| \| \| \|	functionality change. llvm-svn: 186271
*	Catch more CHECK that can be converted to CHECK-LABEL in Transforms for ↵	Stephen Lin	2013-07-14	63	-301/+301
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	easier debugging. No functionality change. This conversion was done with the following bash script: find test/Transforms -name ".ll" \| \ while read NAME; do echo "$NAME" if ! grep -q "^; RUN: llc" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@$[A-Za-z0-9_]$(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;$.$$[A-Za-z0-9_]$:$ $define$[^@]$@$FUNC$[( ]*$\$/;\1\2-LABEL:\3define\4@$FUNC(/g" $TEMP done mv $TEMP $NAME fi done llvm-svn: 186269
*	Update Transforms tests to use CHECK-LABEL for easier debugging. No ↵	Stephen Lin	2013-07-14	444	-2477/+2477
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	functionality change. This update was done with the following bash script: find test/Transforms -name ".ll" \| \ while read NAME; do echo "$NAME" if ! grep -q "^; RUN: llc" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@$[A-Za-z0-9_]$(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;$.$$[A-Za-z0-9_]$:$ $@$FUNC$[( ]$\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP done mv $TEMP $NAME fi done llvm-svn: 186268
*	Modify two Transforms tests to explicitly check for full function names in ↵	Stephen Lin	2013-07-14	2	-2/+2
\| \| \| \| \| \| \| \|	some cases, rather than just a common prefix. No functionality change. (This is to avoid confusing a scripted mass update of these tests to use CHECK-LABEL) llvm-svn: 186267
*	Add newlines at end of test files, no functionality change	Stephen Lin	2013-07-13	6	-6/+6
\| \| \| \|	llvm-svn: 186263
*	LoopVectorizer: Disallow reductions whose header phi is used outside the loop	Arnold Schwaighofer	2013-07-13	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an outside loop user of the reduction value uses the header phi node we cannot just reduce the vectorized phi value in the vector code epilog because we would loose VF-1 reductions. lp: p = phi (0, lv) lv = lv + 1 ... brcond , lp, outside outside: usr = add 0, p (Say the loop iterates two times, the value of p coming out of the loop is one). We cannot just transform this to: vlp: p = phi (<0,0>, lv) lv = lv + <1,1> .. brcond , lp, outside outside: p_reduced = p[0] + [1]; usr = add 0, p_reduced (Because the original loop iterated two times the vectorized loop would iterate one time, but p_reduced ends up being zero instead of one). We would have to execute VF-1 iterations in the scalar remainder loop in such cases. For now, just disable vectorization. PR16522 llvm-svn: 186256
*	Make the new vectorizer test immune to TTI	Andrew Trick	2013-07-13	1	-1/+1
\| \| \| \|	llvm-svn: 186242