bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[LoopUnrolling] Handle cast instructions.	Michael Zolotukhin	2015-07-15	1	-0/+15
\| \| \| \| \| \| \| \| \|	During estimation of unrolling effect we should be able to propagate constants through casts. Differential Revision: http://reviews.llvm.org/D10207 llvm-svn: 242257
*	Enable runtime unrolling with unroll pragma metadata	Mark Heffernan	2015-07-13	1	-2/+4
\| \| \| \| \| \| \| \| \| \|	Enable runtime unrolling for loops with unroll count metadata ("#pragma unroll N") and a runtime trip count. Also, do not unroll loops with unroll full metadata if the loop has a runtime loop count. Previously, such loops would be unrolled with a very large threshold (pragma-unroll-threshold) if runtime unrolled happened to be enabled resulting in a very large (and likely unwise) unroll factor. llvm-svn: 242047
*	Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)	Alexander Kornienko	2015-06-23	1	-1/+1
\| \| \| \| \| \|	Apparently, the style needs to be agreed upon first. llvm-svn: 240390
*	Fixed/added namespace ending comments using clang-tidy. NFC	Alexander Kornienko	2015-06-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The patch is generated using this command: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \ -checks=-,llvm-namespace-comment -header-filter='llvm/.\|clang/.*' \ llvm/lib/ Thanks to Eugene Kosov for the original patch! llvm-svn: 240137
*	Update stale comment before analyzeLoopUnrollCost. NFC.	Michael Zolotukhin	2015-06-11	1	-7/+9
\| \| \| \|	llvm-svn: 239565
*	Remove SCEVCache and FindConstantPointers from complete loop unrolling ↵	Michael Zolotukhin	2015-06-08	1	-212/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	heuristic. Summary: Using some SCEV functionality helped to entirely remove SCEVCache class and FindConstantPointers SCEV visitor. Also, this makes the code more universal - I'll take advandate of it in next patches where I start handling additional types of instructions. Test Plan: Tests would be submitted in subsequent patches. Reviewers: atrick, chandlerc Reviewed By: atrick, chandlerc Subscribers: atrick, llvm-commits Differential Revision: http://reviews.llvm.org/D10205 llvm-svn: 239282
*	[LoopUnroll] Fix truncation bug in canUnrollCompletely.	Sanjoy Das	2015-06-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: canUnrollCompletely takes `unsigned` values for `UnrolledCost` and `RolledDynamicCost` but is passed in `uint64_t`s that are silently truncated. Because of this, when `UnrolledSize` is a large integer that has a small remainder with UINT32_MAX, LLVM tries to completely unroll loops with high trip counts. Reviewers: mzolotukhin, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10293 llvm-svn: 239218
*	[Unroll] Rework the naming and structure of the new unroll heuristics.	Chandler Carruth	2015-06-05	1	-95/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new naming is (to me) much easier to understand. Here is a summary of the new state of the world: - 'Threshold' is the threshold for full unrolling. It is measured against the estimated unrolled cost as computed by getUserCost in TTI (or CodeMetrics, etc). We will exceed this threshold when unrolling loops where unrolling exposes a significant degree of simplification of the logic within the loop. - 'PercentDynamicCostSavedThreshold' is the percentage of the loop's estimated dynamic execution cost which needs to be saved by unrolling to apply a discount to the estimated unrolled cost. - 'DynamicCostSavingsDiscount' is the discount applied to the estimated unrolling cost when the dynamic savings are expected to be high. When actually analyzing the loop, we now produce both an estimated unrolled cost, and an estimated rolled cost. The rolled cost is notably a dynamic estimate based on our analysis of the expected execution of each iteration. While we're still working to build up the infrastructure for making these estimates, to me it is much more clear how* to make them better when they have reasonably descriptive names. For example, we may want to apply estimated (from heuristics or profiles) dynamic execution weights to the dynamic cost estimates. If we start doing that, we would also need to track the static unrolled cost and the dynamic unrolled cost, as only the latter could reasonably be weighted by profile information. This patch is sadly not without functionality change for the new unroll analysis logic. Buried in the heuristic management were several things that surprised me. For example, we never subtracted the optimized instruction count off when comparing against the unroll heursistics! I don't know if this just got lost somewhere along the way or what, but with the new accounting of things, this is much easier to keep track of and we use the post-simplification cost estimate to compare to the thresholds, and use the dynamic cost reduction ratio to select whether we can exceed the baseline threshold. The old values of these flags also don't necessarily make sense. My impression is that none of these thresholds or discounts have been tuned yet, and so they're just arbitrary placehold numbers. As such, I've not bothered to adjust for the fact that this is now a discount and not a tow-tier threshold model. We need to tune all these values once the logic is ready to be enabled. Differential Revision: http://reviews.llvm.org/D9966 llvm-svn: 239164
*	[Unroll] Switch from an eagerly populated SCEV cache to one that is	Chandler Carruth	2015-05-25	1	-89/+116
\| \| \| \| \| \| \| \| \| \|	lazily built. Also, make it a much more generic SCEV cache, which today exposes only a reduced GEP model description but could be extended in the future to do other profitable caching of SCEV information. llvm-svn: 238124
*	[Unroll] Separate the logic for testing each iteration of the loop,	Chandler Carruth	2015-05-22	1	-106/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	accumulating estimated cost, and other loop-centric logic from the logic used to analyze instructions in a particular iteration. This makes the visitor very narrow in scope -- all it does is visit instructions, update a map of simplified values, and return whether it is able to optimize away a particular instruction. The two cost metrics are now returned as an optional struct. When the optional is left unengaged, there is no information about the unrolled cost of the loop, when it is engaged the cost metrics are available to run against the thresholds. No functionality changed. llvm-svn: 238033
*	[Unroll] Replace a hand-wavy FIXME with a FIXME that explains the actual	Chandler Carruth	2015-05-22	1	-1/+6
\| \| \| \| \| \| \|	problem instead of suggesting doing something that is trivial to do but incorrect given the current design of the libraries. llvm-svn: 237994
*	[Unroll] Extract the logic for caching SCEV-modeled GEPs with their	Chandler Carruth	2015-05-22	1	-67/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	simplified model for use simulating each iteration into a separate helper function that just returns the cache. Building this cache had nothing to do with the rest of the unroll analysis and so this removes an unnecessary coupling, etc. It should also make it easier to think about the concept of providing fast cached access to basic SCEV models as an orthogonal concept to the overall unroll simulation. I'd really like to see this kind of caching logic folded into SCEV itself, it seems weird for us to provide it at this layer rather than making repeated queries into SCEV fast all on their own. No functionality changed. llvm-svn: 237993
*	[Unroll] Refactor the accumulation of optimized instruction costs into	Chandler Carruth	2015-05-22	1	-9/+10
\| \| \| \| \| \| \| \| \| \| \| \|	a single location. This reduces code duplication a bit and will also pave the way for a better separation between the visitation algorithm and the unroll analysis. No functionality changed. llvm-svn: 237990
*	[Unrolling] Refactor the start and step offsets to simplify overflow	Chandler Carruth	2015-05-12	1	-10/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	checking and make the cache faster and smaller. I had thought that using an APInt here would be useful, but I think I was just wrong. Notably, we don't have to do any fancy overflow checking, we can just bound the values as quite small and do the math in a higher precision integer. I've switched to a signed integer so that UBSan will even point out if we ever have integer overflow. I've added various asserts to try to catch things as well and hoisted the overflow checks so that we just leave the too-large offsets out of the SCEV-GEP cache. This makes the value in the cache quite a bit smaller which is probably worthwhile. No functionality changed here (for trip counts under 1 billion). llvm-svn: 237209
*	Reimplement heuristic for estimating complete-unroll optimization effects.	Michael Zolotukhin	2015-05-12	1	-248/+300
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch reimplements heuristic that tries to estimate optimization beneftis from complete loop unrolling. In this patch I kept the minimal changes - e.g. I removed code handling branches and folding compares. That's a promising area, but now there are too many questions to discuss before we can enable it. Test Plan: Tests are included in the patch. Reviewers: hfinkel, chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8816 llvm-svn: 237156
*	[LoopUnrollRuntime] Avoid high-cost trip count computation.	Sanjoy Das	2015-04-14	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Runtime unrolling of loops needs to emit an expression to compute the loop's runtime trip-count. Avoid runtime unrolling if this computation will be expensive. Depends on D8993. Reviewers: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8994 llvm-svn: 234846
*	Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.	Benjamin Kramer	2015-03-23	1	-2/+2
\| \| \| \|	llvm-svn: 232998
*	Move private classes into anonymous namespaces	Benjamin Kramer	2015-03-23	1	-0/+2
\| \| \| \| \| \|	NFC. llvm-svn: 232944
*	DataLayout is mandatory, update the API to reflect it with references.	Mehdi Amini	2015-03-10	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Now that the DataLayout is a mandatory part of the module, let's start cleaning the codebase. This patch is a first attempt at doing that. This patch is not exactly NFC as for instance some places were passing a nullptr instead of the DataLayout, possibly just because there was a default value on the DataLayout argument to many functions in the API. Even though it is not purely NFC, there is no change in the validation. I turned as many pointer to DataLayout to references, this helped figuring out all the places where a nullptr could come up. I had initially a local version of this patch broken into over 30 independant, commits but some later commit were cleaning the API and touching part of the code modified in the previous commits, so it seemed cleaner without the intermediate state. Test Plan: Reviewers: echristo Subscribers: llvm-commits From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231740
*	Introduce runtime unrolling disable matadata and use it to mark the scalar ↵	Kevin Qin	2015-03-09	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \|	loop from vectorization. Runtime unrolling is an expensive optimization which can bring benefit only if the loop is hot and iteration number is relatively large enough. For some loops, we know they are not worth to be runtime unrolled. The scalar loop from vectorization is one of the cases. llvm-svn: 231631
*	Transforms: Canonicalize access to function attributes, NFC	Duncan P. N. Exon Smith	2015-02-14	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229202
*	[unroll] Concede defeat and disable the unroll analyzer for now.	Chandler Carruth	2015-02-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The issues with the new unroll analyzer are more fundamental than code cleanup, algorithm, or data structure changes. I've sent an email to the original commit thread with details and a proposal for how to redesign things. I'm disabling this for now so that we don't spend time debugging issues with it in its current state. llvm-svn: 229064
*	[unroll] Merge the simplification and DCE estimation methods on the	Chandler Carruth	2015-02-13	1	-20/+16
\| \| \| \| \| \| \| \| \| \| \|	UnrollAnalyzer. Now they share a single worklist and have less implicit state between them. There was no real benefit to separating these two things out. I'm going to subsequently refactor things to share even more code. llvm-svn: 229062
*	[unroll] Remove pointless dyn_cast<>s to Instruction - the users of an	Chandler Carruth	2015-02-13	1	-12/+4
\| \| \| \| \| \|	instruction must by definition be instructions. llvm-svn: 229061
*	[unroll] Don't check the loop set for whether an instruction is	Chandler Carruth	2015-02-13	1	-4/+2
\| \| \| \| \| \| \| \| \|	contained in it each time we try to add it to the worklist, just check this when pulling it off the worklist. That way we do it at most once per instruction with the cost of the worklist set we would need to pay anyways. llvm-svn: 229060
*	[unroll] Change the other worklist in the unroll analyzer to be a set	Chandler Carruth	2015-02-13	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vector. In addition to dramatically reducing the work required for contrived example loops, this also has to correct some serious latent bugs in the cost computation. Previously, we might add an instruction onto the worklist once for every load which it used and was simplified. Then we would visit it many times and accumulate "savings" each time. I mean, fortunately this couldn't matter for things like calls with 100s of operands, but even for binary operators this code seems like it must be double counting the savings. I just noticed this by inspection and due to the runtime problems it can introduce, I don't have any test cases for cases where the cost produced by this routine is unacceptable. llvm-svn: 229059
*	[unroll] Replace a boolean, for loop, condition, and break with	Chandler Carruth	2015-02-13	1	-7/+3
\| \| \| \| \| \| \|	std::all_of and a lambda. Much cleaner, no functionality changed. llvm-svn: 229058
*	[unroll] Directly query for dead instructions.	Chandler Carruth	2015-02-13	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the unroll analyzer, it is checking each user to see if that user will become dead. However, it first checked if that user was missing from the simplified values map, and then if was also missing from the dead instructions set. We add everything from the simplified values map to the dead instructions set, so the first step is completely subsumed by the second. Moreover, the first step requires inserting something into the simplified value map which isn't what we want at all. This also replaces a dyn_cast with a cast as an instruction cannot be used by a non-instruction. llvm-svn: 229057
*	[unroll] Replace a linear time check for no uses with a constant time	Chandler Carruth	2015-02-13	1	-3/+2
\| \| \| \| \| \| \| \| \| \|	check. Also hoist this into the enqueue process as it is faster even than testing the worklist set, we should just directly filter these out much like we filter out constants and such. llvm-svn: 229056
*	[unroll] Rather than an operand set, use a setvector for the worklist.	Chandler Carruth	2015-02-13	1	-10/+14
\| \| \| \| \| \| \| \| \| \|	We don't just want to handle duplicate operands within an instruction, but also duplicates across operands of different instructions. I should have gone straight to this, but I had convinced myself that it wasn't going to be necessary briefly. I've come to my senses after chatting more with Nick, and am now happier here. llvm-svn: 229054
*	[unroll] Extract the code to enqueue operansd for the worklist in the	Chandler Carruth	2015-02-13	1	-10/+11
\| \| \| \| \| \| \|	unroll analysis into a lambda and call it. That's much simpler than duplicating all the code. llvm-svn: 229053
*	[unroll] Use a small set to de-duplicate operands prior to putting them	Chandler Carruth	2015-02-13	1	-2/+12
\| \| \| \| \| \| \|	into the worklist. This avoids allocating lots of worklist memory for them when there are large numbers of repeated operands. llvm-svn: 229052
*	[unroll] Make the unroll cost analysis terminate deterministically and	Chandler Carruth	2015-02-13	1	-23/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	reasonably quickly. I don't have a reduced test case, but for a version of FFMPEG, this makes the loop unroller start finishing at all (after over 15 minutes of running, it hadn't terminated for me, no idea if it was a true infloop or just exponential work). The key thing here is to check the DeadInstructions set when pulling things off the worklist. Without this, we would re-walk the user list of already dead instructions again and again and again. Consider phi nodes with many, many operands and other patterns. The other important aspect of this is that because we would keep re-visiting instructions that were already known dead, we kept adding their cost savings to this! This would cause our cost savings to be insanely inflated from this. While I was here, I also rotated the operand walk out of the worklist loop to make the code easier to read. There is still work to be done to minimize worklist traffic because we don't de-duplicate operands. This means we may add the same instruction onto the worklist 1000s of times if it shows up in 1000s of operansd to a PHI node for example. Still, with this patch, the ffmpeg testcase I have finishes quickly and I can't measure the runtime impact of the unroll analysis any more. I'll probably try to do a few more cleanups to this code, but not sure how much cleanup I can justify right now. llvm-svn: 229038
*	[unroll] Make range based for loops a bit more explicit and more	Chandler Carruth	2015-02-13	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	readable. The biggest thing that was causing me problems is recognizing the references vs. poniters here. I also found that for maps naming the loop variable as KeyValue helps make it obvious why you don't actually use it directly. Finally, using 'auto' instead of 'User *' doesn't seem like a good tradeoff. Much like with the other cases, I like to know its a pointer, and 'User' is just as long and tells the reader a lot more. llvm-svn: 229033
*	[unroll] Avoid the "Insn" abbreviation of Instruction. This is quite	Chandler Carruth	2015-02-13	1	-16/+17
\| \| \| \| \| \| \| \| \|	hard to type and read for me, and is inconsistent with the other abbreviation in the base class "Inst". For most of these (where they are used widely) I prefer just spelling it out as Instruction. I've changed two of the short-lived variables to use "Inst" to match the base class. llvm-svn: 229028
*	[unroll] Tidy up the integer we use to accumululate the number of	Chandler Carruth	2015-02-13	1	-2/+5
\| \| \| \| \| \| \|	instructions optimized. NFC, just separating this out from the functionality changing commit. llvm-svn: 229026
*	[unroll] Don't use a map from pointer to bool. Use a set.	Chandler Carruth	2015-02-13	1	-4/+4
\| \| \| \| \| \| \| \| \|	This is much more efficient. In particular, the query with the user instruction has to insert a false for every missing instruction into the set. This is just a cleanup a long the way to fixing the underlying algorithm problems here. llvm-svn: 228994
*	Prevent division by 0.	Michael Zolotukhin	2015-02-13	1	-1/+1
\| \| \| \| \| \| \| \|	When we try to estimate number of potentially removed instructions in loop unroller, we analyze first N iterations and then scale the computed number by TripCount/N. We should bail out early if N is 0. llvm-svn: 228988
*	[unroll] Update the new analysis logic from r228265 to use modern coding	Chandler Carruth	2015-02-13	1	-10/+10
\| \| \| \| \| \| \|	conventions for function names consistently. Some were already using this but not all. llvm-svn: 228987
*	Use estimated number of optimized insns in unroll-threshold computation.	Michael Zolotukhin	2015-02-06	1	-2/+44
\| \| \| \| \| \| \| \| \| \|	If complete-unroll could help us to optimize away N% of instructions, we might want to do this even if the final size would exceed loop-unroll threshold. However, we don't want to unroll huge loop, and we are add AbsoluteThreshold to avoid that - this threshold will never be crossed, even if we expect to optimize 99% instructions after that. llvm-svn: 228434
*	[InstSimplify] Add SimplifyFPBinOp function.	Michael Zolotukhin	2015-02-06	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is a variation of SimplifyBinOp, but it takes into account FastMathFlags. It is needed in inliner and loop-unroller to accurately predict the transformation's outcome (previously we dropped the flags and were too conservative in some cases). Example: float foo(float a, float b) { float r; if (a[1] b) r = /* a lot of expensive computations /; else r = 1; return r; } float boo(float a) { return foo(a, 0.0); } Without this patch, we don't inline 'foo' into 'boo'. llvm-svn: 228432
*	Implement new heuristic for complete loop unrolling.	Michael Zolotukhin	2015-02-05	1	-2/+332
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Complete loop unrolling can make some loads constant, thus enabling a lot of other optimizations. To catch such cases, we look for loads that might become constants and estimate number of instructions that would be simplified or become dead after substitution. Example: Suppose we have: int a[] = {0, 1, 0}; v = 0; for (i = 0; i < 3; i ++) v += b[i]a[i]; If we completely unroll the loop, we would get: v = b[0]a[0] + b[1]a[1] + b[2]a[2] Which then will be simplified to: v = b[0]* 0 + b[1]* 1 + b[2]* 0 And finally: v = b[1] llvm-svn: 228265
*	Resurrect the assertion removed by r227717	Jingyue Wu	2015-02-02	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*. Test Plan: no regression Reviewers: mkuper Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7327 llvm-svn: 227853
*	[multiversion] Kill FunctionTargetTransformInfo, TTI itself is now	Chandler Carruth	2015-02-01	1	-8/+3
\| \| \| \| \| \|	per-function and supports the exact desired interface. llvm-svn: 227743
*	[multiversion] Thread a function argument through all the callers of the	Chandler Carruth	2015-02-01	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	getTTI method used to get an actual TTI object. No functionality changed. This just threads the argument and ensures code like the inliner can correctly look up the callee's TTI rather than using a fixed one. The next change will use this to implement per-function subtarget usage by TTI. The changes after that should eliminate the need for FTTI as that will have become the default. llvm-svn: 227730
*	[NVPTX] Emit .pragma "nounroll" for loops marked with nounroll	Jingyue Wu	2015-02-01	1	-22/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA driver from unrolling a loop marked with llvm.loop.unroll.disable is not unrolled by CUDA driver, we need to emit .pragma "nounroll" at the header of that loop. This patch also extracts getting unroll metadata from loop ID metadata into a shared helper function. Test Plan: test/CodeGen/NVPTX/nounroll.ll Reviewers: eliben, meheff, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D7041 llvm-svn: 227703
*	[PM] Change the core design of the TTI analysis to use a polymorphic	Chandler Carruth	2015-01-31	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669
*	[PM] Split the LoopInfo object apart from the legacy pass, creating	Chandler Carruth	2015-01-17	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	a LoopInfoWrapperPass to wire the object up to the legacy pass manager. This switches all the clients of LoopInfo over and paves the way to port LoopInfo to the new pass manager. No functionality change is intended with this iteration. llvm-svn: 226373
*	[LoopUnroll] Fix the partial unrolling threshold for small loop sizes	Hal Finkel	2015-01-10	1	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	When we compute the size of a loop, we include the branch on the backedge and the comparison feeding the conditional branch. Under normal circumstances, these don't get replicated with the rest of the loop body when we unroll. This led to the somewhat surprising behavior that really small loops would not get unrolled enough -- they could be unrolled more and the resulting loop would be below the threshold, because we were assuming they'd take (LoopSize * UnrollingFactor) instructions after unrolling, instead of (((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation. llvm-svn: 225565
*	[PM] Split the AssumptionTracker immutable pass into two separate APIs:	Chandler Carruth	2015-01-04	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the many utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131