bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add a new pass FunctionTargetTransformInfo. This pass serves as a	Eric Christopher	2014-09-18	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \|	shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004
*	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option ↵	Sanjay Patel	2014-09-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528
*	Remove 'virtual' keyword from methods markedwith 'override' keyword.	Craig Topper	2014-08-30	1	-24/+21
\| \| \| \|	llvm-svn: 216823
*	Allow vectorization of division by uniform power of 2.	Karthik Bhat	2014-08-25	1	-8/+10
\| \| \| \| \| \| \| \|	This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371
*	Remove the TargetMachine forwards for TargetSubtargetInfo based	Eric Christopher	2014-08-04	1	-1/+1
\| \| \| \| \| \|	information and update all callers. No functional change. llvm-svn: 214781
*	Fix typo.	Eric Christopher	2014-05-22	1	-1/+1
\| \| \| \|	llvm-svn: 209377
*	[C++] Use 'nullptr'. Target edition.	Craig Topper	2014-04-25	1	-1/+1
\| \| \| \|	llvm-svn: 207197
*	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE	Chandler Carruth	2014-04-22	1	-1/+2
\| \| \| \| \| \| \|	definition below all of the header #include lines, lib/Target/... edition. llvm-svn: 206842
*	[PowerPC] [Constant Hoisting] Enable constant hoisting on PPC	Hal Finkel	2014-04-13	1	-0/+147
\| \| \| \| \| \| \| \| \| \|	Implements the various TTI functions to enable constant hoisting on PPC. The only significant test-suite change is this: MultiSource/Benchmarks/VersaBench/bmm/bmm - 20% speedup (which essentially reverses the slowdown from r206120). llvm-svn: 206141
*	[PowerPC] Remove unused TM member variable to unbreak build	Hal Finkel	2014-04-05	1	-3/+2
\| \| \| \| \| \|	Fix "error: private field 'TM' is not used [-Werror,-Wunused-private-field]" llvm-svn: 205660
*	[PowerPC] Adjust load/store costs in PPCTTI	Hal Finkel	2014-04-04	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides more realistic costs for the insert/extractelement instructions (which are load/store pairs), accounts for the cheap unaligned Altivec load sequence, and for unaligned VSX load/stores. Bad news: MultiSource/Applications/sgefa/sgefa - 35% slowdown (this will require more investigation) SingleSource/Benchmarks/McGill/queens - 20% slowdown (we no longer vectorize this, but it was a constant store that was scalarized) MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 - 2% slowdown Good news: SingleSource/Benchmarks/Shootout/ary3 - 54% speedup SingleSource/Benchmarks/Shootout-C++/ary - 40% speedup MultiSource/Benchmarks/Ptrdist/ks/ks - 35% speedup MultiSource/Benchmarks/FreeBench/neural/neural - 30% speedup MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt - 20% speedup Unfortunately, estimating the costs of the stack-based scalarization sequences is hard, and adjusting these costs is like a game of whac-a-mole :( I'll revisit this again after we have better codegen for vector extloads and truncstores and unaligned load/stores. llvm-svn: 205658
*	[PowerPC] PPCTTI Cleanup	Hal Finkel	2014-04-04	1	-4/+0
\| \| \| \| \| \|	Remove the declaration of an unimplemented function. llvm-svn: 205657
*	[PowerPC] Make PPCTTI::getMemoryOpCost call BasicTTI::getMemoryOpCost	Hal Finkel	2014-04-02	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	PPCTTI::getMemoryOpCost will now make use of BasicTTI::getMemoryOpCost to calculate the base cost of the memory access, and then adjust on top of that. There is no functionality change from this modification, but it will become important so that PPCTTI can take advantage of scalarization information for which BasicTTI::getMemoryOpCost will account in the near future. llvm-svn: 205476
*	[PowerPC] VSX loads and stores support unaligned access	Hal Finkel	2014-03-26	1	-0/+2
\| \| \| \| \| \| \|	I've not yet updated PPCTTI because I'm not sure what the actual relative cost is compared to the aligned uses. llvm-svn: 204848
*	[PowerPC] Initial support for the VSX instruction set	Hal Finkel	2014-03-13	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure. The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below). Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However: - CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4 - The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be. - A lot of unnecessary copies make it past the optimizers, and this needs to be fixed. - Many more regression tests are needed. Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures. llvm-svn: 203768
*	[TTI] There is actually no realistic way to pop TTI implementations off	Chandler Carruth	2014-03-10	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	the stack of the analysis group because they are all immutable passes. This is made clear by Craig's recent work to use override systematically -- we weren't overriding anything for 'finalizePass' because there is no such thing. This is kind of a lame restriction on the API -- we can no longer push and pop things, we just set up the stack and run. However, I'm not invested in building some better solution on top of the existing (terrifying) immutable pass and legacy pass manager. llvm-svn: 203437
*	Switch all uses of LLVM_OVERRIDE to just use 'override' directly.	Craig Topper	2014-03-02	1	-14/+14
\| \| \| \|	llvm-svn: 202621
*	Switch all uses of LLVM_FINAL to just use 'final', and remove the macro.	Craig Topper	2014-03-02	1	-1/+1
\| \| \| \|	llvm-svn: 202618
*	Add final and owerride keywords to TargetTransformInfo's subclasses.	Juergen Ributzka	2014-01-24	1	-15/+17
\| \| \| \|	llvm-svn: 200021
*	Re-sort all of the includes with ./utils/sort_includes.py so that	Chandler Carruth	2014-01-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	subsequent changes are easier to review. About to fix some layering issues, and wanted to separate out the necessary churn. Also comment and sink the include of "Windows.h" in three .inc files to match the usage in Memory.inc. llvm-svn: 198685
*	Implement TTI getUnrollingPreferences for PowerPC	Hal Finkel	2013-09-11	1	-0/+9
\| \| \| \| \| \| \| \|	The PowerPC A2 core greatly benefits from aggressive concatenation unrolling; use the new getUnrollingPreferences to enable this by default when targeting the PPC A2 core. llvm-svn: 190549
*	CostModel: Add parameter to instruction cost to further classify operand values	Arnold Schwaighofer	2013-04-04	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On certain architectures we can support efficient vectorized version of instructions if the operand value is uniform (splat) or a constant scalar. An example of this is a vector shift on x86. We can efficiently support for (i = 0 ; i < ; i += 4) w[0:3] = v[0:3] << <2, 2, 2, 2> but not for (i = 0; i < ; i += 4) w[0:3] = v[0:3] << x[0:3] This patch adds a parameter to getArithmeticInstrCost to further qualify operand values as uniform or uniform constant. Targets can then choose to return a different cost for instructions with such operand values. A follow-up commit will test this feature on x86. radar://13576547 llvm-svn: 178807
*	Add the PPC64 popcntd instruction	Hal Finkel	2013-03-28	1	-3/+2
\| \| \| \| \| \| \|	PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and tell TTI about it so that popcount-loop recognition will know about it. llvm-svn: 178233
*	Refine fix to bug 15041.	Bill Schmidt	2013-02-08	1	-18/+17
\| \| \| \| \| \| \| \| \|	Thanks to help from Nadav and Hal, I have a more reasonable (and even correct!) approach. This specifically penalizes the insertelement and extractelement operations for the performance hit that will occur on PowerPC processors. llvm-svn: 174725
*	Constrain PowerPC autovectorization to fix bug 15041.	Bill Schmidt	2013-02-07	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	Certain vector operations don't vectorize well with the current PowerPC implementation. Element insert/extract performs poorly without VSX support because Altivec requires going through memory. SREM, UREM, and VSELECT all produce bad scalar code. There's a lot of work to do for the cost model before autovectorization will be tuned well, and this is not an attempt to address the larger problem. llvm-svn: 174660
*	Remove unused variables, silences -Wunused-variable	Dmitri Gribenko	2013-01-25	1	-4/+2
\| \| \| \|	llvm-svn: 173526
*	Initial implementation of PPCTargetTransformInfo	Hal Finkel	2013-01-25	1	-0/+220
	This provides a place to add customized operation cost information and control some other target-specific IR-level transformations. The only non-trivial logic in this checkin assigns a higher cost to unaligned loads and stores (covered by the included test case). llvm-svn: 173520