bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SLPVectorizer] Add an extra parameter to cancelScheduling function, NFCI.	Dinar Temirbulatov	2017-07-05	1	-22/+23
\| \| \| \|	llvm-svn: 307158
*	Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by ↵	Teresa Johnson	2017-07-01	1	-1/+1
\| \| \| \| \| \| \| \| \|	default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306936
*	re-commit r306336: Enable vectorizer-maximize-bandwidth by default.	Teresa Johnson	2017-07-01	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306935
*	revert r306336 for breaking ppc test.	Teresa Johnson	2017-07-01	1	-1/+1
\| \| \| \|	llvm-svn: 306934
*	Enable vectorizer-maximize-bandwidth by default.	Teresa Johnson	2017-07-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306933
*	[SLPVectorizer] Add isOdd() helper function, NFCI.	Dinar Temirbulatov	2017-06-30	1	-2/+7
\| \| \| \|	llvm-svn: 306887
*	[LV] Sink casts to unravel first order recurrence	Ayal Zaks	2017-06-30	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \|	Check if a single cast is preventing handling a first-order-recurrence Phi, because the scheduling constraints it imposes on the first-order-recurrence shuffle are infeasible; but they can be made feasible by moving the cast downwards. Record such casts and move them when vectorizing the loop. Differential Revision: https://reviews.llvm.org/D33058 llvm-svn: 306884
*	[LV] Optimize for size when vectorizing loops with tiny trip count	Ayal Zaks	2017-06-30	1	-29/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It may be detrimental to vectorize loops with very small trip count, as various costs of the vectorized loop body as well as enclosing overheads including runtime tests and scalar iterations may outweigh the gains of vectorizing. The current cost model measures the cost of the vectorized loop body only, expecting it will amortize other costs, and loops with known or expected very small trip counts are not vectorized at all. This patch allows loops with very small trip counts to be vectorized, but under OptForSize constraints, which ensure the cost of the loop body is dominant, having no runtime guards nor scalar iterations. Patch inspired by D32451. Differential Revision: https://reviews.llvm.org/D34373 llvm-svn: 306803
*	Remove the BBVectorize pass.	Chandler Carruth	2017-06-30	3	-3285/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	It served us well, helped kick-start much of the vectorization efforts in LLVM, etc. Its time has come and past. Back in 2014: http://lists.llvm.org/pipermail/llvm-dev/2014-November/079091.html Time to actually let go and move forward. =] I've updated the release notes both about the removal and the deprecation of the corresponding C API. llvm-svn: 306797
*	Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by ↵	Daniel Jasper	2017-06-30	1	-1/+1
\| \| \| \| \| \| \| \| \|	default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306792
*	[SLPVectorizer] Moving Entry->NeedToGather check out of inner loop,	Dinar Temirbulatov	2017-06-29	1	-4/+4
\| \| \| \| \| \|	since it is invariant there. NFCI. llvm-svn: 306749
*	[SLPVectorizer] Introducing getTreeEntry() helper function [NFC]	Dinar Temirbulatov	2017-06-29	1	-34/+33
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D34756 llvm-svn: 306655
*	[LV] Fix PR33613 - retain order of insertelement per part	Ayal Zaks	2017-06-28	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \|	r306381 caused PR33613, by reversing the order in which insertelements were generated per unroll part. This patch fixes PR33613 by retraining this order, placing each set of insertelements per part immediately after the last scalar being packed for this part. Includes a test case derived from PR33613. Reference: https://bugs.llvm.org/show_bug.cgi?id=33613 Differential Revision: https://reviews.llvm.org/D34760 llvm-svn: 306575
*	re-commit r306336: Enable vectorizer-maximize-bandwidth by default.	Dehao Chen	2017-06-27	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306473
*	Recommitting 306331.	Ayal Zaks	2017-06-27	1	-287/+300
\| \| \| \| \| \| \|	Undoing revert 306338 after fixed bug: add metadata to the load instead of the reverse shuffle added to it, retaining the original ValueMap implementation. llvm-svn: 306381
*	revert r306336 for breaking ppc test.	Dehao Chen	2017-06-26	1	-1/+1
\| \| \| \|	llvm-svn: 306344
*	reverting 306331.	Ayal Zaks	2017-06-26	1	-293/+286
\| \| \| \| \| \|	Causes TBAA metadata to be generates on reverse shuffles, investigating. llvm-svn: 306338
*	Enable vectorizer-maximize-bandwidth by default.	Dehao Chen	2017-06-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306336
*	[LV] Changing the interface of ValueMap, NFC.	Ayal Zaks	2017-06-26	1	-286/+293
\| \| \| \| \| \| \| \| \| \| \| \|	Instead of providing access to the internal MapStorage holding all Values associated with a given Key, used for setting or resetting them all together, ValueMap keeps its MapStorage internal; its new interface allows getting, setting or resetting a single Value, per part or per part-and-lane. Follows the discussion in https://reviews.llvm.org/D32871. Differential Revision: https://reviews.llvm.org/D34473 llvm-svn: 306331
*	Revert "Enable vectorizer-maximize-bandwidth by default."	Diana Picus	2017-06-22	1	-1/+1
\| \| \| \| \| \|	This reverts commit r305960 because it broke self-hosting on AArch64. llvm-svn: 305990
*	Enable vectorizer-maximize-bandwidth by default.	Dehao Chen	2017-06-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 305960
*	Improve profile-guided heuristics to use estimated trip count.	Taewook Oh	2017-06-19	1	-27/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Existing heuristic uses the ratio between the function entry frequency and the loop invocation frequency to find cold loops. However, even if the loop executes frequently, if it has a small trip count per each invocation, vectorization is not beneficial. On the other hand, even if the loop invocation frequency is much smaller than the function invocation frequency, if the trip count is high it is still beneficial to vectorize the loop. This patch uses estimated trip count computed from the profile metadata as a primary metric to determine coldness of the loop. If the estimated trip count cannot be computed, it falls back to the original heuristics. Reviewers: Ayal, mssimpso, mkuper, danielcdh, wmi, tejohnson Reviewed By: tejohnson Subscribers: tejohnson, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D32451 llvm-svn: 305729
*	Remove brackets, NFC.	Dinar Temirbulatov	2017-06-19	1	-4/+2
\| \| \| \|	llvm-svn: 305706
*	[LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops.	George Burgess IV	2017-06-09	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we're shrinking a binary operation, it may be the case that the new operations wraps where the old didn't. If this happens, the behavior should be well-defined. So, we can't always carry wrapping flags with us when we shrink operations. If we do, we get incorrect optimizations in cases like: void foo(const unsigned char from, unsigned char to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] - 128; } which gets optimized to: void foo(const unsigned char from, unsigned char to, int n) { for (int i = 0; i < n; i++) to[i] = from[i] \| 128; } Because: - InstCombine turned `sub i32 %from.i, 128` into `add nuw nsw i32 %from.i, 128`. - LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a vector full of `i8 128`s - InstCombine took advantage of the fact that the newly-shrunken add "couldn't wrap", and changed the `add` to an `or`. InstCombine seems happy to figure out whether we can add nuw/nsw on its own, so I just decided to drop the flags. There are already a number of places in LoopVectorize where we rely on InstCombine to clean up. llvm-svn: 305053
*	Sort the remaining #include lines in include/... and lib/....	Chandler Carruth	2017-06-06	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
*	[LV] Make scalarizeInstruction() non-virtual. NFC.	Ayal Zaks	2017-06-04	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	Following the request made in https://reviews.llvm.org/D32871, scalarizeInstruction() which is no longer overridden by InnerLoopUnroller is hereby made non-virtual in InnerLoopVectorizer. Should have been part of r297580 originally. llvm-svn: 304685
*	Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.	Galina Kistanova	2017-06-03	2	-0/+3
\| \| \| \|	llvm-svn: 304636
*	[SLP] Improve comments and naming of functions/variables/members, NFC.	Alexey Bataev	2017-06-03	1	-91/+59
\| \| \| \| \| \| \| \| \|	Fixed some comments, added an additional description of the algorithms, improved readability of the code. Differential revision: https://reviews.llvm.org/D33320 llvm-svn: 304616
*	Revert "[SLP] Improve comments and naming of functions/variables/members, NFC."	Alexey Bataev	2017-06-02	1	-59/+91
\| \| \| \| \| \|	This reverts commit 6e311de8b907aa20da9a1a13ab07c3ce2ef4068a. llvm-svn: 304609
*	[SLP] Improve comments and naming of functions/variables/members, NFC.	Alexey Bataev	2017-06-02	1	-91/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixed some comments, added an additional description of the algorithms, improved readability of the code. Reviewers: anemet Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33320 llvm-svn: 304593
*	[LV] Reapply r303763 with fix for PR33193	Matthew Simpson	2017-05-30	1	-10/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r303763 caused build failures in some out-of-tree tests due to an assertion in TTI. The original patch updated cost estimates for induction variable update instructions marked for scalarization. However, it didn't consider that the incoming value of an induction variable phi node could be a cast instruction. This caused queries for cast instruction costs with a mix of vector and scalar types. This patch includes a fix for cast instructions and the test case from PR33193. The fix was suggested by Jonas Paulsson <paulsson@linux.vnet.ibm.com>. Reference: https://bugs.llvm.org/show_bug.cgi?id=33193 Original Differential Revision: https://reviews.llvm.org/D33457 llvm-svn: 304235
*	Revert r303763, results in asserts i.e. while building Ruby.	Joerg Sonnenberger	2017-05-29	1	-15/+6
\| \| \| \|	llvm-svn: 304179
*	[LV] Update type in cost model for scalarization	Matthew Simpson	2017-05-24	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For non-uniform instructions marked for scalarization, we should update `VectorTy` when computing instruction costs to reflect the scalar type. In addition to determining instruction costs, this type is also used to signal that all instructions in the loop will be scalarized. This currently affects memory instructions and non-pointer induction variables and their updates. (We also mark GEPs scalar after vectorization, but their cost is computed together with memory instructions.) For scalarized induction updates, this patch also scales the scalar cost by the vectorization factor, corresponding to each induction step. llvm-svn: 303763
*	[LoopVectorizer] Let target prefer scalar addressing computations.	Jonas Paulsson	2017-05-24	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744
*	[LV] Report multiple reasons for not vectorizing under allowExtraAnalysis	Ayal Zaks	2017-05-23	1	-20/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The default behavior of -Rpass-analysis=loop-vectorizer is to report only the first reason encountered for not vectorizing, if one is found, at which time the vectorizer aborts its handling of the loop. This patch allows multiple reasons for not vectorizing to be identified and reported, at the potential expense of additional compile-time, under allowExtraAnalysis which can currently be turned on by Clang's -fsave-optimization-record and opt's -pass-remarks-missed. Removed from LoopVectorizationLegality::canVectorize() the redundant checking and reporting if we CantComputeNumberOfIterations, as LAI::canAnalyzeLoop() also does that. This redundancy is caught by a lit test once multiple reasons are reported. Patch initially developed by Dror Barak. Differential Revision: https://reviews.llvm.org/D33396 llvm-svn: 303613
*	Fix vector pass-through value being unused in IRBuilder::CreateMaskedGather	Amara Emerson	2017-05-19	1	-1/+1
\| \| \| \| \| \|	Also s/0/nullptr in the call site in LV. llvm-svn: 303416
*	[IR] De-virtualize ~Value to save a vptr	Reid Kleckner	2017-05-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implements PR889 Removing the virtual table pointer from Value saves 1% of RSS when doing LTO of llc on Linux. The impact on time was positive, but too noisy to conclusively say that performance improved. Here is a link to the spreadsheet with the original data: https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing This change makes it invalid to directly delete a Value, User, or Instruction pointer. Instead, such code can be rewritten to a null check and a call Value::deleteValue(). Value objects tend to have their lifetimes managed through iplist, so for the most part, this isn't a big deal. However, there are some places where LLVM deletes values, and those places had to be migrated to deleteValue. I have also created llvm::unique_value, which has a custom deleter, so it can be used in place of std::unique_ptr<Value>. I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which derives from User outside of lib/IR. Code in IR cannot include MemorySSA headers or call the MemoryAccess object destructors without introducing a circular dependency, so we need some level of indirection. Unfortunately, no class derived from User may have any virtual methods, because adding a virtual method would break User::getHungOffOperands(), which assumes that it can find the use list immediately prior to the User object. I've added a static_assert to the appropriate OperandTraits templates to help people avoid this trap. Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv Reviewed By: chandlerc Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D31261 llvm-svn: 303362
*	Revert 303174, 303176, and 303178	Matthew Simpson	2017-05-16	1	-2/+2
\| \| \| \| \| \|	These commits are breaking the bots. Reverting to investigate. llvm-svn: 303182
*	[LV] Avoid potentential division by zero when selecting IC	Matthew Simpson	2017-05-16	1	-2/+2
\| \| \| \|	llvm-svn: 303174
*	[SLP] Enable 64-bit wide vectorization on AArch64	Adam Nemet	2017-05-15	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116
*	[ValueTracking] Replace all uses of ComputeSignBit with computeKnownBits.	Craig Topper	2017-05-15	1	-4/+3
\| \| \| \| \| \| \| \|	This patch finishes off the conversion of ComputeSignBit to computeKnownBits. Differential Revision: https://reviews.llvm.org/D33166 llvm-svn: 303035
*	[LoopOptimizer][Fix]PR32859, PR24738	Simon Pilgrim	2017-05-13	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form. Here it is: before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ] and after this change we have: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ] Committed on behalf of @dtemirbulatov Differential Revision: https://reviews.llvm.org/D33055 llvm-svn: 302988
*	[KnownBits] Add bit counting methods to KnownBits struct and use them where ↵	Craig Topper	2017-05-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925
*	[SLP] Emit optimization remarks	Adam Nemet	2017-05-11	1	-6/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811
*	[LV] Refactor ILV.vectorize{Loop}() by introducing LVP.executePlan(); NFC	Ayal Zaks	2017-05-11	1	-80/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from ILV to LVP. These changes facilitate building VPlans and using them to generate code, following https://reviews.llvm.org/D28975 and its tentative breakdown. Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to improve clarity; it's contents remain intact. Differential Revision: https://reviews.llvm.org/D32200 llvm-svn: 302790
*	[AArch64] Consider widening instructions in cost calculations	Matthew Simpson	2017-05-09	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582
*	[LV] Fix insertion point for shuffle vectors in first order recurrence	Anna Thomas	2017-05-09	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In first order recurrence vectorization, when the previous value is a phi node, we need to set the insertion point to the first non-phi node. We can have the previous value being a phi node, due to the generation of new IVs as part of trunc optimization [1]. [1] https://reviews.llvm.org/rL294967 Reviewers: mssimpso, mkuper Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D32969 llvm-svn: 302532
*	Introduce experimental generic intrinsics for horizontal vector reductions.	Amara Emerson	2017-05-09	2	-65/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- This change allows targets to opt-in to using them instead of the log2 shufflevector algorithm. - The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle. - For CodeGen, basic legalization support is added. Differential Revision: https://reviews.llvm.org/D30086 llvm-svn: 302514
*	Use right function in LoopVectorize.	Jonas Paulsson	2017-05-04	1	-1/+1
\| \| \| \| \| \| \| \|	- unsigned AS = getMemInstAlignment(I); + unsigned AS = getMemInstAddressSpace(I); Review: Hal Finkel llvm-svn: 302114
*	Rename WeakVH to WeakTrackingVH; NFC	Sanjoy Das	2017-05-01	1	-12/+15
\| \| \| \| \| \|	This relands r301424. llvm-svn: 301812