bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix mistake in r190442.	Eli Friedman	2013-09-10	1	-0/+7
\| \| \| \|	llvm-svn: 190446
*	Remove unused functions.	Eli Friedman	2013-09-10	1	-5/+0
\| \| \| \|	llvm-svn: 190442
*	Teach ScalarEvolution about pointer address spaces	Matt Arsenault	2013-09-10	1	-1/+1
\| \| \| \|	llvm-svn: 190425
*	Use type helper functions.	Matt Arsenault	2013-09-06	1	-2/+1
\| \| \| \|	llvm-svn: 190113
*	Teach CodeGenPrepare about address spaces	Matt Arsenault	2013-09-06	1	-4/+2
\| \| \| \|	llvm-svn: 190112
*	Revert: r189565 - Add getUnrollingPreferences to TTI	Hal Finkel	2013-08-29	1	-17/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revert unintentional commit (of an unreviewed change). Original commit message: Add getUnrollingPreferences to TTI Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. llvm-svn: 189566
*	Add getUnrollingPreferences to TTI	Hal Finkel	2013-08-29	1	-5/+17
\| \| \| \| \| \| \| \| \|	Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. llvm-svn: 189565
*	Turn MipsOptimizeMathLibCalls into a target-independent scalar transform	Richard Sandiford	2013-08-23	3	-0/+162
\| \| \| \| \| \| \| \| \| \|	...so that it can be used for z too. Most of the code is the same. The only real change is to use TargetTransformInfo to test when a sqrt instruction is available. The pass is opt-in because at the moment it only handles sqrt. llvm-svn: 189097
*	Revert r187191, which broke opt -mem2reg on the testcases included in PR16867.	Nick Lewycky	2013-08-13	2	-83/+14
\| \| \| \| \| \| \| \| \| \| \| \|	However, opt -O2 doesn't run mem2reg directly so nobody noticed until r188146 when SROA started sending more things directly down the PromoteMemToReg path. In order to revert r187191, I also revert dependent revisions r187296, r187322 and r188146. Fixes PR16867. Does not add the testcases from that PR, but both of them should get added for both mem2reg and sroa when this revert gets unreverted. llvm-svn: 188327
*	Reapply r188119 now that the bug it exposed is fixed.	Peter Collingbourne	2013-08-12	1	-160/+5
\| \| \| \|	llvm-svn: 188217
*	Re-instate r187323 which fast-tracks promotable allocas as soon as the	Chandler Carruth	2013-08-11	1	-12/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SROA-based analysis has enough information. This should work now that both mem2reg and the SSAUpdater-based AllocaPromoter have been updated to be able to promote the types of allocas that the SROA analysis detects. I've included tests for the AllocaPromoter that were only possible to write once we fast-tracked promotable allocas without rewriting them. This includes a test both for r187347 and r188145. Original commit log for r187323: """ Now that mem2reg understands how to cope with a slightly wider set of uses of an alloca, we can pre-compute promotability while analyzing an alloca for splitting in SROA. That lets us short-circuit the common case of a bunch of trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to within 20% of ScalarRepl for such code. My current benchmark for these numbers is PR15412, but it fits the general pattern of IR emitted by Clang so it should be widely applicable. """ llvm-svn: 188146
*	Finish fixing the SSAUpdater-based AllocaPromoter strategy in SROA to cope with	Chandler Carruth	2013-08-11	1	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the more general set of patterns that are now handled by mem2reg and that we can detect quickly while doing SROA's initial analysis. Notably, this allows it to promote through no-op bitcast and GEP sequences. A core part of the SSAUpdater approach is the ability to test whether a particular instruction is part of the set being promoted. Testing this becomes significantly more complex in the world where the operand to every load and store isn't the alloca itself. I ended up using the approach of walking up the def-chain until we find the alloca. I benchmarked this against keeping a set of pointer operands and keeping a set of the loads and stores we care about, and this one seemed faster although the difference was very small. No test case yet because currently the rewriting always "fixes" the inputs to not require this. The next patch which re-enables early promotion of easy cases in SROA will include a test case that specifically exercises this aspect of the alloca promoter. llvm-svn: 188145
*	Reformat some bits of AllocaPromoter and simplify the name and type of	Chandler Carruth	2013-08-11	1	-21/+20
\| \| \| \| \| \| \| \| \|	our visiting datastructures in the AllocaPromoter/SSAUpdater path of SROA. Also shift the order if clears around to be more consistent. No functionality changed here, this is just a cleanup. llvm-svn: 188144
*	Revert r188119 "Kill some duplicated code for removing unreachable BBs."	Arnold Schwaighofer	2013-08-10	1	-5/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is breaking builbots with libgmalloc enabled on Mac OS X. $ cd llvm ; mkdir release ; cd release $ ../configure --enable-optimized —prefix=$PWD/install $ make $ make check $ Release+Asserts/bin/llvm-lit -v --param use_gmalloc=1 --param \ gmalloc_path=/usr/lib/libgmalloc.dylib \ ../test/Instrumentation/DataFlowSanitizer/args-unreachable-bb.ll llvm-svn: 188142
*	Kill some duplicated code for removing unreachable BBs.	Peter Collingbourne	2013-08-09	1	-160/+5
\| \| \| \| \| \| \| \| \| \| \|	This moves removeUnreachableBlocksFromFn from SimplifyCFGPass.cpp to Utils/Local.cpp and uses it to replace the implementation of llvm::removeUnreachableBlocks, which appears to do a strict subset of what removeUnreachableBlocksFromFn does. Differential Revision: http://llvm-reviews.chandlerc.com/D1334 llvm-svn: 188119
*	JumpThreading: Turn a select instruction into branching if it allows to ↵	Benjamin Kramer	2013-08-07	1	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	thread one half of the select. This is a common pattern coming out of simplifycfg generating gross code. a: ; preds = %entry %sel = select i1 %cmp1, double %add, double 0.000000e+00 br label %b b: %cond5 = phi double [ %sel, %a ], [ %sub, %entry ] %cmp6 = fcmp oeq double %cond5, 0.000000e+00 br i1 %cmp6, label %if.then, label %if.end becomes a: br i1 %cmp1, label %b, label %if.then b: %cond5 = phi double [ %sub, %entry ], [ %add, %a ] %cmp6 = fcmp oeq double %cond5, 0.000000e+00 br i1 %cmp6, label %if.then, label %if.end Skipping block b completely if possible. llvm-svn: 187880
*	Adjust file to the coding standard.	Jakub Staszak	2013-08-06	1	-53/+49
\| \| \| \|	llvm-svn: 187808
*	Factor FlattenCFG out from SimplifyCFG	Tom Stellard	2013-08-06	4	-53/+94
\| \| \| \| \| \|	Patch by: Mei Ye llvm-svn: 187764
*	Teach the AllocaPromoter which is wrapped around the SSAUpdater	Chandler Carruth	2013-07-29	1	-15/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	infrastructure to do promotion without a domtree the same smarts about looking through GEPs, bitcasts, etc., that I just taught mem2reg about. This way, if SROA chooses to promote an alloca which still has some noisy instructions this code can cope with them. I've not used as principled of an approach here for two reasons: 1) This code doesn't really need it as we were already set up to zip through the instructions used by the alloca. 2) I view the code here as more of a hack, and hopefully a temporary one. The SSAUpdater path in SROA is a real sore point for me. It doesn't make a lot of architectural sense for many reasons: - We're likely to end up needing the domtree anyways in a subsequent pass, so why not compute it earlier and use it. - In the future we'll likely end up needing the domtree for parts of the inliner itself. - If we need to we could teach the inliner to preserve the domtree. Part of the re-work of the pass manager will allow this to be very powerful even in large SCCs with many functions. - Ultimately, computing a domtree has gotten significantly faster since the original SSAUpdater-using code went into ScalarRepl. We no longer use domfrontiers, and much of domtree is lazily done based on queries rather than eagerly. - At this point keeping the SSAUpdater-based promotion saves a total of 0.7% on a build of the 'opt' tool for me. That's not a lot of performance given the complexity! So I'm leaving this a bit ugly in the hope that eventually we just remove all of this nonsense. I can't even readily test this because this code isn't reachable except through SROA. When I re-instate the patch that fast-tracks allocas already suitable for promotion, I'll add a testcase there that failed before this change. Before that, SROA will fix any test case I give it. llvm-svn: 187347
*	Temporarily revert r187323 until I update SSAUpdater to match mem2reg.	Chandler Carruth	2013-07-28	1	-81/+12
\| \| \| \| \| \|	I forgot that we had two totally independent things here. :: sigh :: llvm-svn: 187327
*	Now that mem2reg understands how to cope with a slightly wider set of	Chandler Carruth	2013-07-28	1	-12/+81
\| \| \| \| \| \| \| \| \| \| \| \|	uses of an alloca, we can pre-compute promotability while analyzing an alloca for splitting in SROA. That lets us short-circuit the common case of a bunch of trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to within 20% of ScalarRepl for such code. My current benchmark for these numbers is PR15412, but it fits the general pattern of IR emitted by Clang so it should be widely applicable. llvm-svn: 187323
*	Thread DataLayout through the callers and into mem2reg. This will be	Chandler Carruth	2013-07-28	2	-2/+2
\| \| \| \| \| \| \|	useful in a subsequent patch, but causes an unfortunate amount of noise, so I pulled it out into a separate patch. llvm-svn: 187322
*	Don't use all the #ifdefs to hide the stats counters and instead rely on	Chandler Carruth	2013-07-27	1	-18/+0
\| \| \| \| \| \| \| \| \|	their being optimized out in debug mode. Realistically, this just isn't going to be the slow part anyways. This also fixes unused variable warnings that are breaking LLD build bots. =/ I didn't see these at first, and kept losing track of the fact that they were broken. llvm-svn: 187297
*	Reimplement isPotentiallyReachable to make nocapture deduction much stronger.	Nick Lewycky	2013-07-27	2	-2/+2
\| \| \| \| \| \| \| \| \| \|	Adds unit tests for it too. Split BasicBlockUtils into an analysis-half and a transforms-half, and put the analysis bits into a new Analysis/CFG.{h,cpp}. Promote isPotentiallyReachable into llvm::isPotentiallyReachable and move it into Analysis/CFG. llvm-svn: 187283
*	SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch ↵	Tom Stellard	2013-07-27	2	-23/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	conditions Merge consecutive if-regions if they contain identical statements. Both transformations reduce number of branches. The transformation is guarded by a target-hook, and is currently enabled only for +R600, but the correctness has been tested on X86 target using a variety of CPU benchmarks. Patch by: Mei Ye llvm-svn: 187278
*	TRE: Move class into anonymous namespace.	Benjamin Kramer	2013-07-24	1	-4/+6
\| \| \| \| \| \|	While there shrink a dangerously large SmallPtrSet. llvm-svn: 187050
*	Fix a problem I introduced in r187029 where we would over-eagerly	Chandler Carruth	2013-07-24	1	-3/+9
\| \| \| \| \| \| \| \|	schedule an alloca for another iteration in SROA. This only showed up with a mixture of promotable and unpromotable selects and phis. Added a test case for this. llvm-svn: 187031
*	Fix PR16687 where we were incorrectly promoting an alloca that had	Chandler Carruth	2013-07-24	1	-12/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pending speculation for a phi node. The problem here is that we were using growth of the specluation set as an indicator of whether speculation would occur, and if the phi node is already in the set we don't see it grow. This is a symptom of the fact that this signal is a total hack. Unfortunately, I couldn't really come up with a non-hacky way of signaling that promotion remains valid after speculation occurs, such that we only speculate when all else looks good for promotion. In the end, I went with at least a much more explicit approach of doing the work of queuing inside the phi and select processing and setting a preposterously named flag to convey that we're in the special state of requiring speculating before promotion. Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing a testcase for this from a pretty giant, nasty assert in a big application. =] The testcase was excellent. llvm-svn: 187029
*	Remove extraneous null statement. No functionality change!	Nick Lewycky	2013-07-22	1	-1/+1
\| \| \| \|	llvm-svn: 186893
*	Use switch instead of if. No functionality change.	Jakub Staszak	2013-07-22	1	-14/+17
\| \| \| \|	llvm-svn: 186892
*	OldPtr is llvm::Instruction. Remove unneeded cast<>.	Jakub Staszak	2013-07-22	1	-1/+1
\| \| \| \|	llvm-svn: 186880
*	Change tabs to spaces.	Jakub Staszak	2013-07-22	1	-2/+2
\| \| \| \|	llvm-svn: 186877
*	Fix spelling and grammar	Matt Arsenault	2013-07-22	1	-12/+12
\| \| \| \|	llvm-svn: 186858
*	SROA: Microoptimization: Remove dead entries first, then sort.	Benjamin Kramer	2013-07-20	1	-9/+4
\| \| \| \| \| \|	While there replace an explicit struct with std::mem_fun. llvm-svn: 186761
*	Cleanup the stats counters for the new implementation. These actually	Chandler Carruth	2013-07-19	1	-12/+36
\| \| \| \| \| \|	count the right things and have the right names. llvm-svn: 186667
*	Fix another assert failure very similar to PR16651's test case. This	Chandler Carruth	2013-07-19	1	-0/+2
\| \| \| \| \| \| \|	test case came from Benjamin and found the parallel bug in the vector promotion code. llvm-svn: 186666
*	Try to move to a more reasonable set of naming conventions given the new	Chandler Carruth	2013-07-19	1	-322/+305
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	implementation of the SROA algorithm. We were using the term 'partition' in many places that no longer ever represented an actual partition, but rather just an arbitrary slice of an alloca. No functionality change intended here. Mostly just renaming of types, functions, variables, and rewording of comments. Several comments were rewritten to make a lot more sense in the new structure of things. The stats are still weird and not reflective of how this really works. I'll fix those up in a separate patch as it is a touch more semantic of a change... llvm-svn: 186659
*	A long overdue cleanup in SROA to use 'DL' instead of 'TD' for the	Chandler Carruth	2013-07-19	1	-123/+123
\| \| \| \| \| \|	DataLayout variables. llvm-svn: 186656
*	Fix PR16651, an assert introduced in my recent re-work of the innards of	Chandler Carruth	2013-07-19	1	-9/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SROA. The crux of the issue is that now we track uses of a partition of the alloca in two places: the iterators over the partitioning uses and the previously collected split uses vector. We weren't accounting for the fact that the split uses might invalidate integer widening in ways other than due to their width (in this case due to being volatile). Further reduced testcase added to the tests. llvm-svn: 186655
*	Reapply r186316 with a fix for one bug where the code could walk off the	Chandler Carruth	2013-07-18	1	-1255/+976
\| \| \| \| \| \| \| \| \| \| \| \|	end of a vector. This was found with ASan. I've had one other report of a crasher, but thus far been unable to reproduce the crash. It may well be fixed with this version, and if not I'd like to get more information from the build bots about what is happening. See r186316 for the full commit log for the new implementation of the SROA algorithm. llvm-svn: 186565
*	Add 'const' qualifiers to static const char* variables.	Craig Topper	2013-07-16	1	-1/+1
\| \| \| \|	llvm-svn: 186371
*	Remove trailing whitespace	Stephen Lin	2013-07-15	1	-36/+36
\| \| \| \|	llvm-svn: 186333
*	Revert r186316 while I track down an ASan failure and an assert from	Chandler Carruth	2013-07-15	1	-972/+1255
\| \| \| \| \| \| \| \| \| \| \|	a bot. This reverts the commit which introduced a new implementation of the fancy SROA pass designed to reduce its overhead. I'll skip the huge commit log here, refer to r186316 if you're looking for how this all works and why it works that way. llvm-svn: 186332
*	Reimplement SROA yet again. Same fundamental principle, but a totally	Chandler Carruth	2013-07-15	1	-1255/+972
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	different core implementation strategy. Previously, SROA would build a relatively elaborate partitioning of an alloca, associate uses with each partition, and then rewrite the uses of each partition in an attempt to break apart the alloca into chunks that could be promoted. This was very wasteful in terms of memory and compile time because regardless of how complex the alloca or how much we're able to do in breaking it up, all of the datastructure work to analyze the partitioning was done up front. The new implementation attempts to form partitions of the alloca lazily and on the fly, rewriting the uses that make up that partition as it goes. This has a few significant effects: 1) Much simpler data structures are used throughout. 2) No more double walk of the recursive use graph of the alloca, only walk it once. 3) No more complex algorithms for associating a particular use with a particular partition. 4) PHI and Select speculation is simplified and happens lazily. 5) More precise information is available about a specific use of the alloca, removing the need for some side datastructures. Ultimately, I think this is a much better implementation. It removes about 300 lines of code, but arguably removes more like 500 considering that some code grew in the process of being factored apart and cleaned up for this all to work. I've re-used as much of the old implementation as possible, which includes the lion's share of code in the form of the rewriting logic. The interesting new logic centers around how the uses of a partition are sorted, and split into actual partitions. Each instruction using a pointer derived from the alloca gets a 'Partition' entry. This name is totally wrong, but I'll do a rename in a follow-up commit as there is already enough churn here. The entry describes the offset range accessed and the nature of the access. Once we have all of these entries we sort them in a very specific way: increasing order of begin offset, followed by whether they are splittable uses (memcpy, etc), followed by the end offset or whatever. Sorting by splittability is important as it simplifies the collection of uses into a partition. Once we have these uses sorted, we walk from the beginning to the end building up a range of uses that form a partition of the alloca. Overlapping unsplittable uses are merged into a single partition while splittable uses are broken apart and carried from one partition to the next. A partition is also introduced to bridge splittable uses between the unsplittable regions when necessary. I've looked at the performance PRs fairly closely. PR15471 no longer will even load (the module is invalid). Not sure what is up there. PR15412 improves by between 5% and 10%, however it is nearly impossible to know what is holding it up as SROA (the entire pass) takes less time than reading the IR for that test case. The analysis takes the same time as running mem2reg on the final allocas. I suspect (without much evidence) that the new implementation will scale much better however, and it is just the small nature of the test cases that makes the changes small and noisy. Either way, it is still simpler and cleaner I think. llvm-svn: 186316
*	Use SmallVectorImpl& instead of SmallVector to avoid repeating small vector ↵	Craig Topper	2013-07-14	5	-35/+39
\| \| \| \| \| \|	size. llvm-svn: 186274
*	LFTR improvement to avoid truncation.	Andrew Trick	2013-07-12	1	-6/+32
\| \| \| \| \| \|	This is a reimplemntation of the patch originally in r186107. llvm-svn: 186215
*	Cleanup LFTR logic.	Andrew Trick	2013-07-12	1	-28/+9
\| \| \| \|	llvm-svn: 186214
*	Cleanup: rename a variable to make the logic easier to follow.	Andrew Trick	2013-07-12	1	-7/+7
\| \| \| \|	llvm-svn: 186213
*	Revert "indvars: Improve LFTR by eliminating truncation when comparing	Chandler Carruth	2013-07-12	1	-23/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	against a constant." This reverts commit r186107. It didn't handle wrapping arithmetic in the loop correctly and thus caused the following C program to count from 0 to UINT64_MAX instead of from 0 to 255 as intended: #include <stdio.h> int main() { unsigned char first = 0, last = 255; do { printf("%d\n", first); } while (first++ != last); } Full test case and instructions to reproduce with just the -indvars pass sent to the original review thread rather than to r186107's commit. llvm-svn: 186152
*	indvars: Improve LFTR by eliminating truncation when comparing against a ↵	Andrew Trick	2013-07-11	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	constant. Patch by Michele Scandale! Adds a special handling of the case where, during the loop exit condition rewriting, the exit value is a constant of bitwidth lower than the type of the induction variable: instead of introducing a trunc operation in order to match correctly the operand types, it allows to convert the constant value to an equivalent constant, depending on the initial value of the induction variable and the trip count, in order have an equivalent comparison between the induction variable and the new constant. llvm-svn: 186107