bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	constify the Function parameter to the TTI creation callback and	Eric Christopher	2015-09-16	1	-1/+1
\| \| \| \| \| \|	propagate to all callers/users/etc. llvm-svn: 247864
*	Make TargetTransformInfo keeping a reference to the Module DataLayout	Mehdi Amini	2015-07-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DataLayout is no longer optional. It was initialized with or without a DataLayout, and the DataLayout when supplied could have been the one from the TargetMachine. Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, llvm-commits, rafael, yaron.keren Differential Revision: http://reviews.llvm.org/D11021 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241774
*	Re-sort #include lines using my handy dandy ./utils/sort_includes.py	Chandler Carruth	2015-02-13	1	-1/+1
\| \| \| \| \| \|	script. This is in preparation for changes to lots of include lines. llvm-svn: 229088
*	[multiversion] Switch the TTI queries from TargetMachine to Subtarget	Chandler Carruth	2015-02-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	now that we have a correct and cached subtarget specific to the function. Also, finish providing a cached per-function subtarget in the core LLVMTargetMachine -- that layer hadn't switched over yet. The only use of the TargetMachine was to re-lookup a subtarget for a particular function to work around the fact that TTI was immutable. Now that it is per-function and we haved a cached subtarget, use it. This still leaves a few interfaces with real warts on them where we were passing Function objects through the TTI interface. I'll remove these and clean their usage up in subsequent commits now that this isn't necessary. llvm-svn: 227738
*	[multiversion] Remove the cached TargetMachine pointer from the	Chandler Carruth	2015-02-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	intermediate TTI implementation template and instead query up to the derived class for both the TargetMachine and the TargetLowering. Most of the derived types had a TLI cached already and there is no need to store a less precisely typed target machine pointer. This will in turn make it much cleaner to look up the TLI via a per-function subtarget instead of the generic subtarget, and it will pave the way toward pulling the subtarget used for unroll preferences into the same form once we are always using the function to look up the correct subtarget. llvm-svn: 227737
*	[PM] Switch the TargetMachine interface from accepting a pass manager	Chandler Carruth	2015-01-31	1	-37/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	base which it adds a single analysis pass to, to instead return the type erased TargetTransformInfo object constructed for that TargetMachine. This removes all of the pass variants for TTI. There is now a single TTI pass in the Analysis layer. All of the Analysis <-> Target communication is through the TTI's type erased interface itself. While the diff is large here, it is nothing more that code motion to make types available in a header file for use in a different source file within each target. I've tried to keep all the doxygen comments and file boilerplate in line with this move, but let me know if I missed anything. With this in place, the next step to making TTI work with the new pass manager is to introduce a really simple new-style analysis that produces a TTI object via a callback into this routine on the target machine. Once we have that, we'll have the building blocks necessary to accept a function argument as well. llvm-svn: 227685
*	[PM] Change the core design of the TTI analysis to use a polymorphic	Chandler Carruth	2015-01-31	1	-604/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669
*	Fix build failure with pointer vs reference.	Eric Christopher	2015-01-27	1	-1/+1
\| \| \| \| \|	NB: Saving files after editing helps. llvm-svn: 227178
*	Update a few calls to getSubtarget<> to either be getSubtargetImpl	Eric Christopher	2015-01-27	1	-1/+1
\| \| \| \| \| \| \|	when we didn't need the cast to the base class or the cached version off of the subtarget. llvm-svn: 227176
*	Implemented cost model for masked load/store operations.	Elena Demikhovsky	2015-01-25	1	-0/+4
\| \| \| \|	llvm-svn: 227035
*	[SelectionDAG] Allow targets to specify legality of extloads' result	Ahmed Bougacha	2015-01-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
*	Fix typo	Matt Arsenault	2014-10-22	1	-1/+1
\| \| \| \|	llvm-svn: 220353
*	Add minnum / maxnum codegen	Matt Arsenault	2014-10-21	1	-0/+2
\| \| \| \|	llvm-svn: 220342
*	Add a new pass FunctionTargetTransformInfo. This pass serves as a	Eric Christopher	2014-09-18	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004
*	Fix BasicTTI::getCmpSelInstrCost to deal with illegal vector types	Hal Finkel	2014-09-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The default implementation of getCmpSelInstrCost, which provides the cost of icmp/fcmp/select instructions, did not deal sensibly with illegal vector types that were scalarized. We'd ask for the legalization cost of the vector type, which would return something like (4, f64) given an input of <4 x double>, and we'd then check the TLI status of the ISD opcode on that scalar type. This would result in querying (ISD::VSELECT, f64), for example. Amusingly enough, ISD::VSELECT on scalar types is marked as Legal by default (as with most other operations), and most backends never change this because VSELECT is never generated on scalars. However, seeing the resulting operation as Legal, we'd neglect to add the scalarization cost before returning. The result is that we'd grossly under-estimate the cost of cmps/selects on illegal vector types. Now, if type legalization clearly results in scalarization, we skip the early return and add the scalarization cost. llvm-svn: 217859
*	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option ↵	Sanjay Patel	2014-09-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528
*	Reinstate "Nuke the old JIT."	Eric Christopher	2014-09-02	1	-3/+2
\| \| \| \| \| \| \| \|	Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982
*	Change MCSchedModel to be a struct of statically initialized data.	Pete Cooper	2014-09-02	1	-2/+2
\| \| \| \| \| \| \| \|	This removes static initializers from the backends which generate this data, and also makes this struct match the other Tablegen generated structs in behaviour Reviewed by Andy Trick and Chandler C llvm-svn: 216919
*	Allow vectorization of division by uniform power of 2.	Karthik Bhat	2014-08-25	1	-3/+5
\| \| \| \| \| \| \| \|	This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371
*	Temporarily Revert "Nuke the old JIT." as it's not quite ready to	Eric Christopher	2014-08-07	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \|	be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154
*	Nuke the old JIT.	Rafael Espindola	2014-08-07	1	-3/+2
\| \| \| \| \| \| \| \| \|	I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111
*	Remove the TargetMachine forwards for TargetSubtargetInfo based	Eric Christopher	2014-08-04	1	-1/+3
\| \| \| \| \| \|	information and update all callers. No functional change. llvm-svn: 214781
*	Add @llvm.assume, lowering, and some basic properties	Hal Finkel	2014-07-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the first commit in a series that add an @llvm.assume intrinsic which can be used to provide the optimizer with a condition it may assume to be true (when the control flow would hit the intrinsic call). Some basic properties are added here: - llvm.invariant(true) is dead. - llvm.invariant(false) is unreachable (this directly corresponds to the documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef). The intrinsic is tagged as writing arbitrarily, in order to maintain control dependencies. BasicAA has been updated, however, to return NoModRef for any particular location-based query so that we don't unnecessarily block code motion. llvm-svn: 213973
*	Add Support to Recognize and Vectorize NON SIMD instructions in SLPVectorizer.	Karthik Bhat	2014-06-20	1	-0/+23
\| \| \| \| \| \| \| \| \|	This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and vectorizes them as vector shuffles if they are profitable. These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86. Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015 llvm-svn: 211339
*	Fix a spelling error	Hal Finkel	2014-05-08	1	-1/+1
\| \| \| \|	llvm-svn: 208314
*	Move late partial-unrolling thresholds into the processor definitions	Hal Finkel	2014-05-08	1	-1/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old method used by X86TTI to determine partial-unrolling thresholds was messy (because it worked by testing target features), and also would not correctly identify the target CPU if certain target features were disabled. After some discussions on IRC with Chandler et al., it was decided that the processor scheduling models were the right containers for this information (because it is often tied to special uop dispatch-buffer sizes). This does represent a small functionality change: - For generic x86-64 (which uses the SB model and, thus, will get some unrolling). - For AMD cores (because they still currently use the SB scheduling model) - For Haswell (based on benchmarking by Louis Gerbarg, it was decided to bump the default threshold to 50; we're working on a test case for this). Otherwise, nothing has changed for any other targets. The logic, however, has been moved into BasicTTI, so other targets may now also opt-in to this functionality simply by setting LoopMicroOpBufferSize in their processor model definitions. llvm-svn: 208289
*	TTI: Estimate @llvm.fmuladd cost as fmul + fadd when FMA's aren't legal on ↵	Benjamin Kramer	2014-05-06	1	-1/+7
\| \| \| \| \| \|	the target. llvm-svn: 208115
*	[Modules] Remove potential ODR violations by sinking the DEBUG_TYPE	Chandler Carruth	2014-04-22	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837
*	Don't assert in BasicTTI::getMemoryOpCost for non-simple types	Hal Finkel	2014-04-14	1	-6/+8
\| \| \| \| \| \| \| \|	BasicTTI::getMemoryOpCost must explicitly check for non-simple types; setting AllowUnknown=true with TLI->getSimpleValueType is not sufficient because, for example, non-power-of-two vector types return non-simple EVTs (not MVT::Other). llvm-svn: 206150
*	[C++11] More 'nullptr' conversion. In some cases just using a boolean check ↵	Craig Topper	2014-04-14	1	-1/+1
\| \| \| \| \| \|	instead of comparing to nullptr. llvm-svn: 206142
*	Account for scalarization costs in BasicTTI::getMemoryOpCost for extending ↵	Hal Finkel	2014-04-03	1	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vector loads When a vector type legalizes to a larger vector type, and the target does not support the associated extending load (or truncating store), then legalization will scalarize the load (or store) resulting in an associated scalarization cost. BasicTTI::getMemoryOpCost needs to account for this. Between this, and r205487, PowerPC on the P7 with VSX enabled shows: MultiSource/Benchmarks/PAQ8p/paq8p: 43% speedup SingleSource/Benchmarks/BenchmarkGame/puzzle: 51% speedup SingleSource/UnitTests/Vectorizer/gcc-loops 28% speedup (some of these are new; some of these, such as PAQ8p, just reverse regressions that VSX support would trigger) llvm-svn: 205495
*	Fix multi-register costs in BasicTTI::getCastInstrCost	Hal Finkel	2014-04-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	For an cast (extension, etc.), the currently logic predicts a low cost if the associated operation (keyed on the destination type) is legal (or promoted). This is not true when the number of values required to legalize the type is changing. For example, <8 x i16> being sign extended by <8 x i32> is not generically cheap on PPC with VSX, even though sign extension to v4i32 is legal, because two output v4i32 values are required compared to the single v8i16 input value, and without custom logic in the target, this conversion will scalarize. llvm-svn: 205487
*	When analyzing vectors of element type that require legalization,	Raul E. Silvera	2014-03-10	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the legalization cost must be included to get an accurate estimation of the total cost of the scalarized vector. The inaccurate cost triggered unprofitable SLP vectorization on 32-bit X86. Summary: Include legalization overhead when computing scalarization cost Reviewers: hfinkel, nadav CC: chandlerc, rnk, llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2992 llvm-svn: 203509
*	[C++11] Remove 'virtual' keyword from methods marked with 'override' keyword.	Craig Topper	2014-03-10	1	-45/+42
\| \| \| \|	llvm-svn: 203444
*	[TTI] There is actually no realistic way to pop TTI implementations off	Chandler Carruth	2014-03-10	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	the stack of the analysis group because they are all immutable passes. This is made clear by Craig's recent work to use override systematically -- we weren't overriding anything for 'finalizePass' because there is no such thing. This is kind of a lame restriction on the API -- we can no longer push and pop things, we just set up the stack and run. However, I'm not invested in building some better solution on top of the existing (terrifying) immutable pass and legacy pass manager. llvm-svn: 203437
*	Switch all uses of LLVM_OVERRIDE to just use 'override' directly.	Craig Topper	2014-03-02	1	-29/+29
\| \| \| \|	llvm-svn: 202621
*	Switch all uses of LLVM_FINAL to just use 'final', and remove the macro.	Craig Topper	2014-03-02	1	-1/+1
\| \| \| \|	llvm-svn: 202618
*	Add final and owerride keywords to TargetTransformInfo's subclasses.	Juergen Ributzka	2014-01-24	1	-31/+34
\| \| \| \|	llvm-svn: 200021
*	Costmodel: Add support for horizontal vector reductions	Arnold Schwaighofer	2013-09-17	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upcoming SLP vectorization improvements will want to be able to estimate costs of horizontal reductions. Add infrastructure to support this. We model reductions as a series of (shufflevector,add) tuples ultimately followed by an extractelement. For example, for an add-reduction of <4 x float> we could generate the following sequence: (v0, v1, v2, v3) \ \ / / \ \ / + + (v0+v2, v1+v3, undef, undef) \ / ((v0+v2) + (v1+v3), undef, undef) %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef> %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7 %r = extractelement <4 x float> %bin.rdx8, i32 0 This commit adds a cost model interface "getReductionCost(Opcode, Ty, Pairwise)" that will allow clients to ask for the cost of such a reduction (as backends might generate more efficient code than the cost of the individual instructions summed up). This interface is excercised by the CostModel analysis pass which looks for reduction patterns like the one above - starting at extractelements - and if it sees a matching sequence will call the cost model interface. We will also support a second form of pairwise reduction that is well supported on common architectures (haddps, vpadd, faddp). (v0, v1, v2, v3) \ / \ / (v0+v1, v2+v3, undef, undef) \ / ((v0+v1)+(v2+v3), undef, undef, undef) %rdx.shuf.0.0 = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 0, i32 2 , i32 undef, i32 undef> %rdx.shuf.0.1 = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef> %bin.rdx.0 = fadd <4 x float> %rdx.shuf.0.0, %rdx.shuf.0.1 %rdx.shuf.1.0 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef> %rdx.shuf.1.1 = shufflevector <4 x float> %bin.rdx.0, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx.1 = fadd <4 x float> %rdx.shuf.1.0, %rdx.shuf.1.1 %r = extractelement <4 x float> %bin.rdx.1, i32 0 llvm-svn: 190876
*	Add getUnrollingPreferences to TTI	Hal Finkel	2013-09-11	1	-0/+3
\| \| \| \| \| \| \| \| \|	Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. llvm-svn: 190542
*	Revert: r189565 - Add getUnrollingPreferences to TTI	Hal Finkel	2013-08-29	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revert unintentional commit (of an unreviewed change). Original commit message: Add getUnrollingPreferences to TTI Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. llvm-svn: 189566
*	Add getUnrollingPreferences to TTI	Hal Finkel	2013-08-29	1	-0/+5
\| \| \| \| \| \| \| \| \|	Allow targets to customize the default behavior of the generic loop unrolling transformation. This will be used by the PowerPC backend when targeting the A2 core (which is in-order with a deep pipeline), and using more aggressive defaults is important. llvm-svn: 189565
*	Turn MipsOptimizeMathLibCalls into a target-independent scalar transform	Richard Sandiford	2013-08-23	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	...so that it can be used for z too. Most of the code is the same. The only real change is to use TargetTransformInfo to test when a sqrt instruction is available. The pass is opt-in because at the moment it only handles sqrt. llvm-svn: 189097
*	Add a llvm.copysign intrinsic	Hal Finkel	2013-08-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728
*	Add ISD::FROUND for libm round()	Hal Finkel	2013-08-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All libm floating-point rounding functions, except for round(), had their own ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm adding ISD::FROUND so that round() can be custom lowered as well. For the most part, this is straightforward. I've added an intrinsic and a matching ISD node just like those for nearbyint() and friends. The SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed fround). This will be used by the PowerPC backend in a follow-up commit. llvm-svn: 187926
*	LoopVectorize: Allow vectorization of loops with lifetime markers	Arnold Schwaighofer	2013-08-06	1	-0/+3
\| \| \| \| \| \|	Patch by Marc Jessome! llvm-svn: 187825
*	SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch ↵	Tom Stellard	2013-07-27	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	conditions Merge consecutive if-regions if they contain identical statements. Both transformations reduce number of branches. The transformation is guarded by a target-hook, and is currently enabled only for +R600, but the correctness has been tested on X86 target using a variety of CPU benchmarks. Patch by: Mei Ye llvm-svn: 187278
*	TargetTransformInfo: address calculation parameter for gather/scather	Arnold Schwaighofer	2013-07-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Address calculation for gather/scather in vectorized code can incur a significant cost making vectorization unbeneficial. Add infrastructure to add cost. Tests and cost model for targets will be in follow-up commits. radar://14351991 llvm-svn: 186187
*	Add the nearbyint -> FNEARBYINT mapping to BasicTargetTransformInfo	Hal Finkel	2013-07-08	1	-0/+2
\| \| \| \| \| \| \| \|	This fixes an oversight that Intrinsic::nearbyint was not being mapped to ISD::FNEARBYINT (thus fixing the over-optimistic cost we were assigning to nearbyint calls for some targets). llvm-svn: 185783
*	Access the TargetLoweringInfo from the TargetMachine object instead of ↵	Bill Wendling	2013-06-19	1	-16/+23
\| \| \| \| \| \|	caching it. The TLI may change between functions. No functionality change. llvm-svn: 184349