bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AArch64: Do not test for CPUs, use SubtargetFeatures	Matthias Braun	2016-06-02	1	-21/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Testing for specific CPUs has a number of problems, better use subtarget features: - When some tweak is added for a specific CPU it is often desirable for the next version of that CPU as well, yet we often forget to add it. - It is hard to keep track of checks scattered around the target code; Declaring all target specifics together with the CPU in the tablegen file is a clear representation. - Subtarget features can be tweaked from the command line. To discourage people from using CPU checks in the future I removed the isCortexXX(), isCyclone(), ... functions. I added an getProcFamily() function for exceptional circumstances but made it clear in the comment that usage is discouraged. Reformat feature list in AArch64.td to have 1 feature per line in alphabetical order to simplify merging and sorting for out of tree tweaks. No functional change intended. Differential Revision: http://reviews.llvm.org/D20762 llvm-svn: 271555
*	Add parentheses to silence buildbot warning	Matthew Simpson	2016-04-27	1	-2/+2
\| \| \| \|	llvm-svn: 267734
*	[TTI] Add hook for vector extract with extension	Matthew Simpson	2016-04-27	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change adds a new hook for estimating the cost of vector extracts followed by zero- and sign-extensions. The motivating example for this change is the SMOV and UMOV instructions on AArch64. These instructions move data from vector to general purpose registers while performing the corresponding extension (sign-extend for SMOV and zero-extend for UMOV) at the same time. For these operations, TargetTransformInfo can assume the extensions are free and only report the cost of the vector extract. The SLP vectorizer has been updated to make use of the new hook. Differential Revision: http://reviews.llvm.org/D18523 llvm-svn: 267725
*	[LoopDataPrefetch] Centralize the tuning cl::opts under the pass	Adam Nemet	2016-03-29	1	-21/+6
\| \| \| \| \| \| \| \| \|	This is effectively NFC, minus the renaming of the options (-cyclone-prefetch-distance -> -prefetch-distance). The change was requested by Tim in D17943. llvm-svn: 264806
*	[LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead	Adam Nemet	2016-03-18	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It can hurt performance to prefetch ahead too much. Be conservative for now and don't prefetch ahead more than 3 iterations on Cyclone. Reviewers: hfinkel Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17949 llvm-svn: 263772
*	[LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses	Adam Nemet	2016-03-18	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: And use this TTI for Cyclone. As it was explained in the original RFC (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW prefetcher work up to 2KB strides. I am also adding tests for this and the previous change (D17943): * Cyclone prefetching accesses with a large stride * Cyclone not prefetching accesses with a small stride * Generic Aarch64 subtarget not prefetching either Reviewers: hfinkel Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17945 llvm-svn: 263771
*	[Aarch64] Add pass LoopDataPrefetch for Cyclone	Adam Nemet	2016-03-18	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This wires up the pass for Cyclone but keeps it off for now because we need a few more TTIs. The getPrefetchMinStride value is not very well tuned right now but it works well with CFP2006/433.milc which motivated this. Tests will be added as part of the upcoming large-stride prefetching patch. Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, hfinkel, rengolin Differential Revision: http://reviews.llvm.org/D17943 llvm-svn: 263770
*	[AArch64] Reduce vector insert/extract cost for Kryo	Matthew Simpson	2016-02-18	1	-0/+2
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D17379 llvm-svn: 261237
*	[AArch64] Add support for Qualcomm Kryo CPU.	Chad Rosier	2016-02-12	1	-1/+1
\| \| \| \| \| \|	Machine model description by Dave Estes <cestes@codeaurora.org>. llvm-svn: 260686
*	[AArch64][ARM] Don't base interleaved op legality on type alloc size.	Ahmed Bougacha	2015-12-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Otherwise, we think that most types that look like they'd fit in a legal vector type are legal (so, basically, any vector type with a size between 33 and 128 bits, I think, since we use pow2 alignment; e.g., v2i25, v3f32, ...). DataLayout::getTypeAllocSize rounds up based on alignment. When checking for target intrinsic legality, that's not what we want: if rounding makes a difference, the type isn't legal, and the target intrinsics shouldn't be used, as they are always assumed legal. One could make the argument that alloc size is ultimately the most relevant here, since we're dealing with LD/ST intrinsics. That's only true if we did legalize them though; that's a problem for another day. Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits. Type::getSizeInBits can't be used because that'd gratuitously break pointer vector support. Some of these uses are currently fine, because we only hit them when the type is already known legal (e.g., r114454). Update them for consistency. It's faster to avoid the rounding anyway! llvm-svn: 255089
*	[EarlyCSE] IsSimple vs IsVolatile naming clarification (NFC)	Philip Reames	2015-12-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	When the notion of target specific memory intrinsics was introduced to EarlyCSE, the commit confused the notions of volatile and simple memory access. Since I'm about to start working on this area, cleanup the naming so that patches aren't horribly confusing. Note that the actual implementation was always bailing if the load or store wasn't simple. Reminder: - "volatile" - C++ volatile, can't remove any memory operations, but in principal unordered - "ordered" - imposes ordering constraints on other nearby memory operations - "atomic" - can't be split or sheared. In LLVM terms, all "ordered" operations are also atomic so the predicate "isAtomic" is often used. - "simple" - a load which is none of the above. These are normal loads and what most of the optimizer works with. llvm-svn: 254805
*	[Aarch64] Add cost for missing extensions.	Matthew Simpson	2015-11-18	1	-17/+18
\| \| \| \| \| \| \| \| \| \|	This patch adds a cost estimate for some missing sign and zero extensions. The costs were determined by counting the number of shift instructions generated without context for each new extension. Differential Revision: http://reviews.llvm.org/D14730 llvm-svn: 253482
*	Remove templates from CostTableLookup functions. All instantiations had the ↵	Craig Topper	2015-10-28	1	-2/+2
\| \| \| \| \| \| \| \|	same type. This also lets us remove the versions of the functions that took a statically sized array as we can rely on ArrayRef implicit conversion now. llvm-svn: 251490
*	Convert cost table lookup functions to return a pointer to the entry or ↵	Craig Topper	2015-10-27	1	-9/+8
\| \| \| \| \| \| \| \| \| \|	nullptr instead of the index. This avoid mentioning the table name an extra time and allows the lookup to be done directly in the ifs by relying on the bool conversion of the pointer. While there make use of ArrayRef and std::find_if. llvm-svn: 251382
*	Use MVT::SimpleValueType instead of MVT in template parameter. NFC	Craig Topper	2015-10-25	1	-1/+2
\| \| \| \|	llvm-svn: 251217
*	Call the version of ConvertCostTableLookup that takes a statically sized ↵	Craig Topper	2015-10-24	1	-3/+2
\| \| \| \| \| \|	array rather than pointer and size. NFC llvm-svn: 251196
*	[CostModel][AArch64] Remove amortization factor for some of the vector ↵	Silviu Baranga	2015-09-09	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	select instructions Summary: We are not scalarizing the wide selects in codegen for i16 and i32 and therefore we can remove the amortization factor. We still have issues with i64 vectors in codegen though. Reviewers: mcrosier Subscribers: mcrosier, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12724 llvm-svn: 247156
*	[CostModel][AArch64] Increase cost of vector insert element and add missing ↵	Silviu Baranga	2015-08-17	1	-1/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cast costs Summary: Increase the estimated costs for insert/extract element operations on AArch64. This is motivated by results from benchmarking interleaved accesses. Add missing costs for zext/sext/trunc instructions and some integer to floating point conversions. These costs were previously calculated by scalarizing these operation and were affected by the cost increase of the insert/extract element operations. Reviewers: rengolin Subscribers: mcrosier, aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D11939 llvm-svn: 245226
*	[TTI] Make the cost APIs in TargetTransformInfo consistently use 'int'	Chandler Carruth	2015-08-05	1	-39/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rather than 'unsigned' for their costs. For something like costs in particular there is a natural "negative" value, that of savings or saved cost. As a consequence, there is a lot of code that subtracts or creates negative values based on cost, all of which is prone to awkwardness or bugs when dealing with an unsigned type. Similarly, we never want these values to wrap, as that would cause Very Bad code generation (likely percieved as an infinite loop as we try to emit over 2^32 instructions or some such insanity). All around 'int' seems a much better fit for these basic metrics. I've added asserts to ensure that at least the TTI interface never returns negative numbers here. If we ever have a use case for negative numbers, we can remove this, but this way a bug where someone used '-1' to produce a 'very large' cost will be caught by the assert. This passes all tests, and is also UBSan clean. No functional change intended. Differential Revision: http://reviews.llvm.org/D11741 llvm-svn: 244080
*	[ARM/AArch64] Fix cost model for interleaved accesses	Silviu Baranga	2015-07-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fix the cost of interleaved accesses for ARM/AArch64. We were calling getTypeAllocSize and using it to check the number of bits, when we should have called getTypeAllocSizeInBits instead. This would pottentially cause the vectorizer to generate loads/stores and shuffles which cannot be matched with an interleaved access instruction. No performance changes are expected for now since matching/generating interleaved accesses is still disabled by default. Reviewers: rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D11524 llvm-svn: 243270
*	Remove getDataLayout() from TargetLowering	Mehdi Amini	2015-07-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: yaron.keren, rafael, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11042 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241779
*	Make TargetLowering::getPointerTy() taking DataLayout as an argument	Mehdi Amini	2015-07-09	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change is part of a series of commits dedicated to have a single DataLayout during compilation by using always the one owned by the module. Reviewers: echristo Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D11028 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 241775
*	[AArch64] Lower interleaved memory accesses to ldN/stN intrinsics. This ↵	Hao Liu	2015-06-26	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	patch also adds a function to calculate the cost of interleaved memory accesses. E.g. Lower an interleaved load: %wide.vec = load <8 x i32>, <8 x i32>* %ptr %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> into: %ld2 = { <4 x i32>, <4 x i32> } call llvm.aarch64.neon.ld2(%ptr) %vec0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0 %vec1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1 E.g. Lower an interleaved store: %i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1, <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11> store <12 x i32> %i.vec, <12 x i32>* %ptr into: %sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3> %sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7> %sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11> call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr) Differential Revision: http://reviews.llvm.org/D10533 llvm-svn: 240754
*	[AArch64] Revert r239711 again. We need to discuss how to share code between ↵	Hao Liu	2015-06-15	1	-12/+0
\| \| \| \| \| \|	AArch64 and ARM backend. llvm-svn: 239713
*	[AArch64] Match interleaved memory accesses into ldN/stN instructions.	Hao Liu	2015-06-15	1	-0/+12
\| \| \| \| \| \| \|	Re-commit after adding "-aarch64-neon-syntax=generic" to fix the failure on OS X. This patch was firstly committed in r239514, then reverted in r239544 because of a syntax incompatible failure on OS X. llvm-svn: 239711
*	This reverts commit r239529 and r239514.	Rafael Espindola	2015-06-11	1	-12/+0
\| \| \| \| \| \| \| \| \|	Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions." Revert "Fixing MSVC 2013 build error." The test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X. llvm-svn: 239544
*	[AArch64] Match interleaved memory accesses into ldN/stN instructions.	Hao Liu	2015-06-11	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a pass AArch64InterleavedAccess to identify and match interleaved memory accesses. This pass transforms an interleaved load/store into ldN/stN intrinsic. As Loop Vectorizor disables optimization on interleaved accesses by default, this optimization is also disabled by default. To enable it by "-aarch64-interleaved-access-opt=true" E.g. Transform an interleaved load (Factor = 2): %wide.vec = load <8 x i32>, <8 x i32>* %ptr %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> ; Extract even elements %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> ; Extract odd elements Into: %ld2 = { <4 x i32>, <4 x i32> } call aarch64.neon.ld2(%ptr) %v0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0 %v1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1 E.g. Transform an interleaved store (Factor = 2): %i.vec = shuffle %v0, %v1, <0, 4, 1, 5, 2, 6, 3, 7> ; Interleaved vec store <8 x i32> %i.vec, <8 x i32>* %ptr Into: %v0 = shuffle %i.vec, undef, <0, 1, 2, 3> %v1 = shuffle %i.vec, undef, <4, 5, 6, 7> call void aarch64.neon.st2(%v0, %v1, %ptr) llvm-svn: 239514
*	[X86] Disable loop unrolling in loop vectorization pass when VF is 1.	Wei Mi	2015-05-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture, by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce the cost of overflow check, memory boundary check and extra prologue/epilogue code when regular unroller will unroll the loop another time. Disable it when VF==1 remove the unnecessary cost on x86. The same can be done for other platforms after verifying interleaving/memory bound checking to be not perf critical on those platforms. Differential Revision: http://reviews.llvm.org/D9515 llvm-svn: 236613
*	[AArch64] Enable partial & runtime unrolling on cortex-a57	Kevin Qin	2015-03-09	1	-0/+10
\| \| \| \| \| \| \| \|	For inner one of nested loops, it is more likely to be a hot loop, and the runtime check can be promoted out from patch 0001, so the overhead is less, we can try a doubled threshold to unroll more loops. llvm-svn: 231632
*	Make some non-constant static variables non-static or fully const.	Benjamin Kramer	2015-03-01	1	-1/+1
\| \| \| \| \| \|	Otherwise we have to emit thread-safe initialization for them. NFC. llvm-svn: 230894
*	[multiversion] Remove the function parameter from the unrolling	Chandler Carruth	2015-02-01	1	-1/+1
\| \| \| \| \| \|	preferences interface on TTI now that all of TTI is per-function. llvm-svn: 227741
*	[PM] Switch the TargetMachine interface from accepting a pass manager	Chandler Carruth	2015-01-31	1	-128/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	base which it adds a single analysis pass to, to instead return the type erased TargetTransformInfo object constructed for that TargetMachine. This removes all of the pass variants for TTI. There is now a single TTI pass in the Analysis layer. All of the Analysis <-> Target communication is through the TTI's type erased interface itself. While the diff is large here, it is nothing more that code motion to make types available in a header file for use in a different source file within each target. I've tried to keep all the doxygen comments and file boilerplate in line with this move, but let me know if I missed anything. With this in place, the next step to making TTI work with the new pass manager is to introduce a really simple new-style analysis that produces a TTI object via a callback into this routine on the target machine. Once we have that, we'll have the building blocks necessary to accept a function argument as well. llvm-svn: 227685
*	[PM] Change the core design of the TTI analysis to use a polymorphic	Chandler Carruth	2015-01-31	1	-117/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type erased interface and a single analysis pass rather than an extremely complex analysis group. The end result is that the TTI analysis can contain a type erased implementation that supports the polymorphic TTI interface. We can build one from a target-specific implementation or from a dummy one in the IR. I've also factored all of the code into "mix-in"-able base classes, including CRTP base classes to facilitate calling back up to the most specialized form when delegating horizontally across the surface. These aren't as clean as I would like and I'm planning to work on cleaning some of this up, but I wanted to start by putting into the right form. There are a number of reasons for this change, and this particular design. The first and foremost reason is that an analysis group is complete overkill, and the chaining delegation strategy was so opaque, confusing, and high overhead that TTI was suffering greatly for it. Several of the TTI functions had failed to be implemented in all places because of the chaining-based delegation making there be no checking of this. A few other functions were implemented with incorrect delegation. The message to me was very clear working on this -- the delegation and analysis group structure was too confusing to be useful here. The other reason of course is that this is much more natural fit for the new pass manager. This will lay the ground work for a type-erased per-function info object that can look up the correct subtarget and even cache it. Yet another benefit is that this will significantly simplify the interaction of the pass managers and the TargetMachine. See the future work below. The downside of this change is that it is very, very verbose. I'm going to work to improve that, but it is somewhat an implementation necessity in C++ to do type erasure. =/ I discussed this design really extensively with Eric and Hal prior to going down this path, and afterward showed them the result. No one was really thrilled with it, but there doesn't seem to be a substantially better alternative. Using a base class and virtual method dispatch would make the code much shorter, but as discussed in the update to the programmer's manual and elsewhere, a polymorphic interface feels like the more principled approach even if this is perhaps the least compelling example of it. ;] Ultimately, there is still a lot more to be done here, but this was the huge chunk that I couldn't really split things out of because this was the interface change to TTI. I've tried to minimize all the other parts of this. The follow up work should include at least: 1) Improving the TargetMachine interface by having it directly return a TTI object. Because we have a non-pass object with value semantics and an internal type erasure mechanism, we can narrow the interface of the TargetMachine to just do what we need: build and return a TTI object that we can then insert into the pass pipeline. 2) Make the TTI object be fully specialized for a particular function. This will include splitting off a minimal form of it which is sufficient for the inliner and the old pass manager. 3) Add a new pass manager analysis which produces TTI objects from the target machine for each function. This may actually be done as part of #2 in order to use the new analysis to implement #2. 4) Work on narrowing the API between TTI and the targets so that it is easier to understand and less verbose to type erase. 5) Work on narrowing the API between TTI and its clients so that it is easier to understand and less verbose to forward. 6) Try to improve the CRTP-based delegation. I feel like this code is just a bit messy and exacerbating the complexity of implementing the TTI in each target. Many thanks to Eric and Hal for their help here. I ended up blocked on this somewhat more abruptly than I expected, and so I appreciate getting it sorted out very quickly. Differential Revision: http://reviews.llvm.org/D7293 llvm-svn: 227669
*	Commoning of target specific load/store intrinsics in Early CSE.	Chad Rosier	2015-01-26	1	-0/+91
\| \| \| \| \| \| \|	Phabricator revision: http://reviews.llvm.org/D7121 Patch by Sanjin Sijaric <ssijaric@codeaurora.org>! llvm-svn: 227149
*	[AArch64] Enable partial & runtime unrolling on cortex-a57.	Kevin Qin	2014-10-09	1	-0/+10
\| \| \| \|	llvm-svn: 219401
*	[AArch64] Improve cost model to handle sdiv by a pow-of-two.	Chad Rosier	2014-09-29	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \|	This patch improves the target-specific cost model to better handle signed division by a power of two. The immediate result is that this enables the SLP vectorizer to do a better job. http://reviews.llvm.org/D5469 PR20714 llvm-svn: 218607
*	[AArch64] Revert r216141 for cyclone	Gerolf Hoflehner	2014-09-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The increase of the interleave factor to 4 has side-effects like performance losses eg. due to reminder loops being executed more frequently and may increase code size. It requires more analysis and careful heuristic tuning. Expect double digit gains in small benchmarks like lowercase.c and losses in puzzle.c. llvm-svn: 217540
*	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option ↵	Sanjay Patel	2014-09-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528
*	Allow vectorization of division by uniform power of 2.	Karthik Bhat	2014-08-25	1	-9/+11
\| \| \| \| \| \| \| \|	This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371
*	[LoopVectorize] Up the maximum unroll factor to 4 for AArch64	James Molloy	2014-08-21	1	-1/+7
\| \| \| \| \| \|	Only for Cortex-A57 and Cyclone for now, where it has shown wins. llvm-svn: 216141
*	Teach the SLP Vectorizer that keeping some values live over a callsite can ↵	James Molloy	2014-08-05	1	-0/+15
\| \| \| \| \| \| \| \|	have a cost. Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account. llvm-svn: 214859
*	Remove the TargetMachine forwards for TargetSubtargetInfo based	Eric Christopher	2014-08-04	1	-1/+1
\| \| \| \| \| \|	information and update all callers. No functional change. llvm-svn: 214781
*	AArch64: improve handling & modelling of FP_TO_XINT nodes.	Tim Northover	2014-06-15	1	-3/+20
\| \| \| \| \| \| \| \|	There's probably no acatual change in behaviour here, just updating the LowerFP_TO_INT function to be more similar to the reverse implementation and updating costs to current CodeGen. llvm-svn: 210985
*	AArch64: improve vector [su]itofp handling.	Tim Northover	2014-06-15	1	-14/+33
\| \| \| \| \| \| \|	This somehow got missed in the AArch64 merge, so should fix a performance regression since 3.4. llvm-svn: 210984
*	AArch64/ARM64: move ARM64 into AArch64's place	Tim Northover	2014-05-24	1	-0/+464
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577
*	AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64.	Tim Northover	2014-05-24	1	-109/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm doing this in two phases for a better "git blame" record. This commit removes the previous AArch64 backend and redirects all functionality to ARM64. It also deduplicates test-lines and removes orphaned AArch64 tests. The next step will be "git mv ARM64 AArch64" and rewire most of the tests. Hopefully LLVM is still functional, though it would be even better if no-one ever had to care because the rename happens straight afterwards. llvm-svn: 209576
*	[C++11] Add 'override' keywords and remove 'virtual'. Additionally add ↵	Craig Topper	2014-04-29	1	-2/+2
\| \| \| \| \| \|	'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. AArch64 edition llvm-svn: 207510
*	[C++] Use 'nullptr'. Target edition.	Craig Topper	2014-04-25	1	-1/+1
\| \| \| \|	llvm-svn: 207197
*	[Modules] Fix potential ODR violations by sinking the DEBUG_TYPE	Chandler Carruth	2014-04-22	1	-1/+2
\| \| \| \| \| \| \|	definition below all of the header #include lines, lib/Target/... edition. llvm-svn: 206842
*	This commit allows vectorized loops to be unrolled by a factor of 2 for AArch64.	Jiangning Liu	2014-04-18	1	-0/+1
\| \| \| \| \| \| \| \|	A new test case is also added for ARM64. Patched by Z.Zheng llvm-svn: 206563