bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Divergence analysis for GPU programs	Jingyue Wu	2015-04-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Some optimizations such as jump threading and loop unswitching can negatively affect performance when applied to divergent branches. The divergence analysis added in this patch conservatively estimates which branches in a GPU program can diverge. This information can then help LLVM to run certain optimizations selectively. Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll Reviewers: resistor, hfinkel, eliben, meheff, jholewinski Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8576 llvm-svn: 234567
*	TTI: Add getCallInstrCost.	Michael Zolotukhin	2015-03-17	1	-0/+5
\| \| \| \| \|	Review: http://reviews.llvm.org/D8094 llvm-svn: 232524
*	Do not restrict interleaved unrolling to small loops, depending on the target.	Olivier Sallenave	2015-03-06	1	-0/+4
\| \| \| \|	llvm-svn: 231528
*	Make DataLayout Non-Optional in the Module	Mehdi Amini	2015-03-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270
*	Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.	Chad Rosier	2015-02-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	This patch adds the isProfitableToHoist API. For AArch64, we want to prevent a fmul from being hoisted in cases where it is more profitable to form a fmsub/fmadd. Phabricator Review: http://reviews.llvm.org/D7299 Patch by Lawrence Hu <lawrence@codeaurora.org> llvm-svn: 230241
*	Value soft float calls as more expensive in the inliner.	Cameron Esfahani	2015-02-05	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point. Reviewers: chandlerc, echristo Reviewed By: echristo Subscribers: t.p.northover, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D6936 llvm-svn: 228263
*	[multiversion] Remove the function parameter from the unrolling	Chandler Carruth	2015-02-01	1	-2/+2
\| \| \| \| \| \|	preferences interface on TTI now that all of TTI is per-function. llvm-svn: 227741
*	[multiversion] Implement the old pass manager's TTI wrapper pass in	Chandler Carruth	2015-02-01	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	terms of the new pass manager's TargetIRAnalysis. Yep, this is one of the nicer bits of the new pass manager's design. Passes can in many cases operate in a vacuum and so we can just nest things when convenient. This is particularly convenient here as I can now consolidate all of the TargetMachine logic on this analysis. The most important change here is that this pushes the function we need TTI for all the way into the TargetMachine, and re-creates the TTI object for each function rather than re-using it for each function. We're now prepared to teach the targets to produce function-specific TTI objects with specific subtargets cached, etc. One piece of feedback I'd love here is whether its worth renaming any of this stuff. None of the names really seem that awesome to me at this point, but TargetTransformInfoWrapperPass is particularly ... odd. TargetIRAnalysisWrapper might make more sense. I would want to do that rename separately anyways, but let me know what you think. llvm-svn: 227731
*	[PM] Port TTI to the new pass manager, introducing a TargetIRAnalysis to	Chandler Carruth	2015-02-01	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	produce it. This adds a function to the TargetMachine that produces this analysis via a callback for each function. This in turn faves the way to produce a different TTI per-function with the correct subtarget cached. I've also done the necessary wiring in the opt tool to thread the target machine down and make it available to the pass registry so that we can construct this analysis from a target machine when available. llvm-svn: 227721
*	[PM] Switch the TargetMachine interface from accepting a pass manager	Chandler Carruth	2015-01-31	1	-13/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	base which it adds a single analysis pass to, to instead return the type erased TargetTransformInfo object constructed for that TargetMachine. This removes all of the pass variants for TTI. There is now a single TTI pass in the Analysis layer. All of the Analysis <-> Target communication is through the TTI's type erased interface itself. While the diff is large here, it is nothing more that code motion to make types available in a header file for use in a different source file within each target. I've tried to keep all the doxygen comments and file boilerplate in line with this move, but let me know if I missed anything. With this in place, the next step to making TTI work with the new pass manager is to introduce a really simple new-style analysis that produces a TTI object via a callback into this routine on the target machine. Once we have that, we'll have the building blocks necessary to accept a function argument as well. llvm-svn: 227685
*	[PM] Change the core design of the TTI analysis to use a polymorphic	Chandler Carruth	2015-01-31	1	-519/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669
*	Commoning of target specific load/store intrinsics in Early CSE.	Chad Rosier	2015-01-26	1	-0/+19
\| \| \| \| \| \| \|	Phabricator revision: http://reviews.llvm.org/D7121 Patch by Sanjin Sijaric <ssijaric@codeaurora.org>! llvm-svn: 227149
*	Implemented cost model for masked load/store operations.	Elena Demikhovsky	2015-01-25	1	-0/+12
\| \| \| \|	llvm-svn: 227035
*	Intrinsics: introduce llvm_any_ty aka ValueType Any	Ramkumar Ramachandra	2015-01-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Specifically, gc.result benefits from this greatly. Instead of: gc.result.int.* gc.result.float.* gc.result.ptr.* ... We now have a gc.result.* that can specialize to literally any type. Differential Revision: http://reviews.llvm.org/D7020 llvm-svn: 226857
*	Remove empty statement. No functionality change.	Nick Lewycky	2015-01-08	1	-1/+0
\| \| \| \|	llvm-svn: 225420
*	Loop Vectorizer minor changes in the code -	Elena Demikhovsky	2014-12-14	1	-4/+4
\| \| \| \| \| \| \| \|	some comments, function names, identation. Reviewed here: http://reviews.llvm.org/D6527 llvm-svn: 224218
*	Masked Load / Store Intrinsics - the CodeGen part.	Elena Demikhovsky	2014-12-04	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm recommiting the codegen part of the patch. The vectorizer part will be send to review again. Masked Vector Load and Store Intrinsics. Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 223348
*	[Statepoints 1/4] Statepoint infrastructure for garbage collection: IR ↵	Philip Reames	2014-12-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Intrinsics The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types. The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them. There are no side tables created, no extra metadata, and no inhibited optimizations. A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation. It is the frontend's responsibility (or eventually the safepoint insertion pass we've developed, but that's not part of this patch series) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated. The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis. The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly. This is the first patch in a small series. This patch contains only the IR parts; the documentation and backend support will be following separately. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683. Reviewed by: atrick, ributzka llvm-svn: 223078
*	Revert "Masked Vector Load and Store Intrinsics."	Duncan P. N. Exon Smith	2014-11-28	1	-11/+0
\| \| \| \| \| \| \| \| \| \| \|	This reverts commit r222632 (and follow-up r222636), which caused a host of LNT failures on an internal bot. I'll respond to the commit on the list with a reproduction of one of the failures. Conflicts: lib/Target/X86/X86TargetTransformInfo.cpp llvm-svn: 222936
*	Masked Vector Load and Store Intrinsics.	Elena Demikhovsky	2014-11-23	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores. Added SDNodes for masked operations and lowering patterns for X86 code generator. Examples: <16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align /, <16 x i1> %mask) declare void @llvm.masked.store.v8f64(i8 %addr, <8 x double> %value, i32 4, <8 x i1> %mask) Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch. http://reviews.llvm.org/D6191 llvm-svn: 222632
*	Add minnum / maxnum intrinsics	Matt Arsenault	2014-10-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	These are named following the IEEE-754 names for these functions, rather than the libm fmin / fmax to avoid possible ambiguities. Some languages may implement something resembling fmin / fmax which return NaN if either operand is to propagate errors. These implement the IEEE-754 semantics of returning the other operand if either is a NaN representing missing data. llvm-svn: 220341
*	Ignore annotation function calls in cost computation	David Peixotto	2014-09-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	The annotation instructions are dropped during codegen and have no impact on size. In some cases, the annotations were preventing the unroller from unrolling a loop because the annotation calls were pushing the cost over the unrolling threshold. Differential Revision: http://reviews.llvm.org/D5335 llvm-svn: 218525
*	Add a new pass FunctionTargetTransformInfo. This pass serves as a	Eric Christopher	2014-09-18	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \|	shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004
*	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option ↵	Sanjay Patel	2014-09-10	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528
*	Remove 'virtual' keyword from methods markedwith 'override' keyword.	Craig Topper	2014-08-30	1	-3/+3
\| \| \| \|	llvm-svn: 216823
*	Allow vectorization of division by uniform power of 2.	Karthik Bhat	2014-08-25	1	-6/+8
\| \| \| \| \| \| \| \|	This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371
*	Teach the SLP Vectorizer that keeping some values live over a callsite can ↵	James Molloy	2014-08-05	1	-0/+10
\| \| \| \| \| \| \| \|	have a cost. Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account. llvm-svn: 214859
*	Add @llvm.assume, lowering, and some basic properties	Hal Finkel	2014-07-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the first commit in a series that add an @llvm.assume intrinsic which can be used to provide the optimizer with a condition it may assume to be true (when the control flow would hit the intrinsic call). Some basic properties are added here: - llvm.invariant(true) is dead. - llvm.invariant(false) is unreachable (this directly corresponds to the documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef). The intrinsic is tagged as writing arbitrarily, in order to maintain control dependencies. BasicAA has been updated, however, to return NoModRef for any particular location-based query so that we don't unnecessarily block code motion. llvm-svn: 213973
*	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE	Chandler Carruth	2014-04-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	definition below all the header #include lines, lib/Analysis/... edition. This one has a bit extra as there were other #define's before #include lines in addition to DEBUG_TYPE. I've sunk all of them as a block. llvm-svn: 206843
*	[C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵	Craig Topper	2014-04-15	1	-6/+6
\| \| \| \| \| \|	instead of comparing to nullptr. llvm-svn: 206243
*	Use TopTTI->getGEPCost from within getUserCost	Hal Finkel	2014-04-01	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	The implementation of getUserCost had duplicated (and hard-coded) the default logic in getGEPCost. Instead, it is better to use getGEPCost directly, which limits the default logic to the implementation of one function, and allows targets to override the behavior. No functionality change intended. llvm-svn: 205346
*	[Constant Hoisting] Make the constant materialization cost operand dependent	Juergen Ributzka	2014-03-21	1	-8/+8
\| \| \| \| \| \| \| \| \|	Extend the target hook to take also the operand index into account when calculating the cost of the constant materialization. Related to <rdar://problem/16381500> llvm-svn: 204435
*	Revert "[Constant Hoisting] Extend coverage of the constant hoisting pass."	Juergen Ributzka	2014-03-20	1	-8/+8
\| \| \| \| \| \|	I will break this up into smaller pieces for review and recommit. llvm-svn: 204393
*	[Constant Hoisting] Extend coverage of the constant hoisting pass.	Juergen Ributzka	2014-03-20	1	-8/+8
\| \| \| \| \| \| \| \| \|	This commit extends the coverage of the constant hoisting pass, adds additonal debug output and updates the function names according to the style guide. Related to <rdar://problem/16381500> llvm-svn: 204389
*	[TTI] There is actually no realistic way to pop TTI implementations off	Chandler Carruth	2014-03-10	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	the stack of the analysis group because they are all immutable passes. This is made clear by Craig's recent work to use override systematically -- we weren't overriding anything for 'finalizePass' because there is no such thing. This is kind of a lame restriction on the API -- we can no longer push and pop things, we just set up the stack and run. However, I'm not invested in building some better solution on top of the existing (terrifying) immutable pass and legacy pass manager. llvm-svn: 203437
*	[Modules] Move CallSite into the IR library where it belogs. It is	Chandler Carruth	2014-03-04	1	-1/+1
\| \| \| \| \| \| \|	abstracting between a CallInst and an InvokeInst, both of which are IR concepts. llvm-svn: 202816
*	Switch all uses of LLVM_OVERRIDE to just use 'override' directly.	Craig Topper	2014-03-02	1	-49/+45
\| \| \| \|	llvm-svn: 202621
*	Switch all uses of LLVM_FINAL to just use 'final', and remove the macro.	Craig Topper	2014-03-02	1	-1/+1
\| \| \| \|	llvm-svn: 202618
*	Make DataLayout a plain object, not a pass.	Rafael Espindola	2014-02-25	1	-1/+2
\| \| \| \| \| \| \|	Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM don't don't handle passes to also use DataLayout. llvm-svn: 202168
*	Make succ_iterator a real random access iterator and clean up a couple of users.	Benjamin Kramer	2014-02-10	1	-6/+1
\| \| \| \|	llvm-svn: 201088
*	Revert "Revert "Add Constant Hoisting Pass" (r200034)"	Juergen Ributzka	2014-01-25	1	-1/+21
\| \| \| \| \| \| \|	This reverts commit r200058 and adds the using directive for ARMTargetTransformInfo to silence two g++ overload warnings. llvm-svn: 200062
*	Revert "Add Constant Hoisting Pass" (r200034)	Hans Wennborg	2014-01-25	1	-21/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit caused -Woverloaded-virtual warnings. The two new TargetTransformInfo::getIntImmCost functions were only added to the superclass, and to the X86 subclass. The other targets were not updated, and the warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was hiding the two new getIntImmCost variants. We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost" to the various subclasses, or turning it off, but I suspect that it's wrong to leave the functions unimplemnted in those targets. The default implementations return TCC_Free, which I don't think is right e.g. for ARM. llvm-svn: 200058
*	Add Constant Hoisting Pass	Juergen Ributzka	2014-01-24	1	-1/+21
\| \| \| \| \| \| \| \|	Retry commit r200022 with a fix for the build bot errors. Constant expressions have (unlike instructions) module scope use lists and therefore may have users in different functions. The fix is to simply ignore these out-of-function uses. llvm-svn: 200034
*	Revert "Add Constant Hoisting Pass"	Juergen Ributzka	2014-01-24	1	-21/+1
\| \| \| \| \| \|	This reverts commit r200022 to unbreak the build bots. llvm-svn: 200024
*	Add Constant Hoisting Pass	Juergen Ributzka	2014-01-24	1	-1/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pass identifies expensive constants to hoist and coalesces them to better prepare it for SelectionDAG-based code generation. This works around the limitations of the basic-block-at-a-time approach. First it scans all instructions for integer constants and calculates its cost. If the constant can be folded into the instruction (the cost is TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't consider it expensive and leave it alone. This is the default behavior and the default implementation of getIntImmCost will always return TCC_Free. If the cost is more than TCC_BASIC, then the integer constant can't be folded into the instruction and it might be beneficial to hoist the constant. Similar constants are coalesced to reduce register pressure and materialization code. When a constant is hoisted, it is also hidden behind a bitcast to force it to be live-out of the basic block. Otherwise the constant would be just duplicated and each basic block would have its own copy in the SelectionDAG. The SelectionDAG recognizes such constants as opaque and doesn't perform certain transformations on them, which would create a new expensive constant. This optimization is only applied to integer constants in instructions and simple (this means not nested) constant cast experessions. For example: %0 = load i64* inttoptr (i64 big_constant to i64*) Reviewed by Eric llvm-svn: 200022
*	Add final and owerride keywords to TargetTransformInfo's subclasses.	Juergen Ributzka	2014-01-24	1	-45/+53
\| \| \| \|	llvm-svn: 200021
*	Re-sort all of the includes with ./utils/sort_includes.py so that	Chandler Carruth	2014-01-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. llvm-svn: 198685
*	Costmodel: Add support for horizontal vector reductions	Arnold Schwaighofer	2013-09-17	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upcoming SLP vectorization improvements will want to be able to estimate costs of horizontal reductions. Add infrastructure to support this. We model reductions as a series of (shufflevector,add) tuples ultimately followed by an extractelement. For example, for an add-reduction of <4 x float> we could generate the following sequence: (v0, v1, v2, v3) \ \ / / \ \ / + + (v0+v2, v1+v3, undef, undef) \ / ((v0+v2) + (v1+v3), undef, undef) %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef> %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7 %r = extractelement <4 x float> %bin.rdx8, i32 0 This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)" that will allow clients to ask for the cost of such a reduction (as backends might generate more efficient code than the cost of the individual instructions summed up). This interface is excercised by the CostModel analysis pass which looks for reduction patterns like the one above - starting at extractelements - and if it sees a matching sequence will call the cost model interface. We will also support a second form of pairwise reduction that is well supported on common architectures (haddps, vpadd, faddp). (v0, v1, v2, v3) \ / \ / (v0+v1, v2+v3, undef, undef) \ / ((v0+v1)+(v2+v3), undef, undef, undef) %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef> %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef> %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1 %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef> %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1 %r = extractelement <4 x float> %bin.rdx.1, i32 0 llvm-svn: 190876
*	Add getUnrollingPreferences to TTI	Hal Finkel	2013-09-11	1	-0/+7
\| \| \| \| \| \| \| \| \|	Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. llvm-svn: 190542
*	Revert: r189565 - Add getUnrollingPreferences to TTI	Hal Finkel	2013-08-29	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revert unintentional commit (of an unreviewed change). Original commit message: Add getUnrollingPreferences to TTI Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. llvm-svn: 189566