bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[LV] Reapply r303763 with fix for PR33193	Matthew Simpson	2017-05-30	1	-10/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r303763 caused build failures in some out-of-tree tests due to an assertion in TTI. The original patch updated cost estimates for induction variable update instructions marked for scalarization. However, it didn't consider that the incoming value of an induction variable phi node could be a cast instruction. This caused queries for cast instruction costs with a mix of vector and scalar types. This patch includes a fix for cast instructions and the test case from PR33193. The fix was suggested by Jonas Paulsson <paulsson@linux.vnet.ibm.com>. Reference: https://bugs.llvm.org/show_bug.cgi?id=33193 Original Differential Revision: https://reviews.llvm.org/D33457 llvm-svn: 304235
*	Revert r303763, results in asserts i.e. while building Ruby.	Joerg Sonnenberger	2017-05-29	1	-15/+6
\| \| \| \|	llvm-svn: 304179
*	[LV] Update type in cost model for scalarization	Matthew Simpson	2017-05-24	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For non-uniform instructions marked for scalarization, we should update `VectorTy` when computing instruction costs to reflect the scalar type. In addition to determining instruction costs, this type is also used to signal that all instructions in the loop will be scalarized. This currently affects memory instructions and non-pointer induction variables and their updates. (We also mark GEPs scalar after vectorization, but their cost is computed together with memory instructions.) For scalarized induction updates, this patch also scales the scalar cost by the vectorization factor, corresponding to each induction step. llvm-svn: 303763
*	[LoopVectorizer] Let target prefer scalar addressing computations.	Jonas Paulsson	2017-05-24	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744
*	[LV] Report multiple reasons for not vectorizing under allowExtraAnalysis	Ayal Zaks	2017-05-23	1	-20/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The default behavior of -Rpass-analysis=loop-vectorizer is to report only the first reason encountered for not vectorizing, if one is found, at which time the vectorizer aborts its handling of the loop. This patch allows multiple reasons for not vectorizing to be identified and reported, at the potential expense of additional compile-time, under allowExtraAnalysis which can currently be turned on by Clang's -fsave-optimization-record and opt's -pass-remarks-missed. Removed from LoopVectorizationLegality::canVectorize() the redundant checking and reporting if we CantComputeNumberOfIterations, as LAI::canAnalyzeLoop() also does that. This redundancy is caught by a lit test once multiple reasons are reported. Patch initially developed by Dror Barak. Differential Revision: https://reviews.llvm.org/D33396 llvm-svn: 303613
*	Fix vector pass-through value being unused in IRBuilder::CreateMaskedGather	Amara Emerson	2017-05-19	1	-1/+1
\| \| \| \| \| \|	Also s/0/nullptr in the call site in LV. llvm-svn: 303416
*	[IR] De-virtualize ~Value to save a vptr	Reid Kleckner	2017-05-18	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implements PR889 Removing the virtual table pointer from Value saves 1% of RSS when doing LTO of llc on Linux. The impact on time was positive, but too noisy to conclusively say that performance improved. Here is a link to the spreadsheet with the original data: https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing This change makes it invalid to directly delete a Value, User, or Instruction pointer. Instead, such code can be rewritten to a null check and a call Value::deleteValue(). Value objects tend to have their lifetimes managed through iplist, so for the most part, this isn't a big deal. However, there are some places where LLVM deletes values, and those places had to be migrated to deleteValue. I have also created llvm::unique_value, which has a custom deleter, so it can be used in place of std::unique_ptr<Value>. I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which derives from User outside of lib/IR. Code in IR cannot include MemorySSA headers or call the MemoryAccess object destructors without introducing a circular dependency, so we need some level of indirection. Unfortunately, no class derived from User may have any virtual methods, because adding a virtual method would break User::getHungOffOperands(), which assumes that it can find the use list immediately prior to the User object. I've added a static_assert to the appropriate OperandTraits templates to help people avoid this trap. Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv Reviewed By: chandlerc Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D31261 llvm-svn: 303362
*	Revert 303174, 303176, and 303178	Matthew Simpson	2017-05-16	1	-2/+2
\| \| \| \| \| \|	These commits are breaking the bots. Reverting to investigate. llvm-svn: 303182
*	[LV] Avoid potentential division by zero when selecting IC	Matthew Simpson	2017-05-16	1	-2/+2
\| \| \| \|	llvm-svn: 303174
*	[SLP] Enable 64-bit wide vectorization on AArch64	Adam Nemet	2017-05-15	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116
*	[ValueTracking] Replace all uses of ComputeSignBit with computeKnownBits.	Craig Topper	2017-05-15	1	-4/+3
\| \| \| \| \| \| \| \|	This patch finishes off the conversion of ComputeSignBit to computeKnownBits. Differential Revision: https://reviews.llvm.org/D33166 llvm-svn: 303035
*	[LoopOptimizer][Fix]PR32859, PR24738	Simon Pilgrim	2017-05-13	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form. Here it is: before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ] and after this change we have: %e.0.ph = phi i32 [ 0, %for.inc.2.i ] %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ] Committed on behalf of @dtemirbulatov Differential Revision: https://reviews.llvm.org/D33055 llvm-svn: 302988
*	[KnownBits] Add bit counting methods to KnownBits struct and use them where ↵	Craig Topper	2017-05-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	possible This patch adds min/max population count, leading/trailing zero/one bit counting methods. The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give. Differential Revision: https://reviews.llvm.org/D32931 llvm-svn: 302925
*	[SLP] Emit optimization remarks	Adam Nemet	2017-05-11	1	-6/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811
*	[LV] Refactor ILV.vectorize{Loop}() by introducing LVP.executePlan(); NFC	Ayal Zaks	2017-05-11	1	-80/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from ILV to LVP. These changes facilitate building VPlans and using them to generate code, following https://reviews.llvm.org/D28975 and its tentative breakdown. Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to improve clarity; it's contents remain intact. Differential Revision: https://reviews.llvm.org/D32200 llvm-svn: 302790
*	[AArch64] Consider widening instructions in cost calculations	Matthew Simpson	2017-05-09	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The AArch64 instruction set has a few "widening" instructions (e.g., uaddl, saddl, uaddw, etc.) that take one or more doubleword operands and produce quadword results. The operands are automatically sign- or zero-extended as appropriate. However, in LLVM IR, these extends are explicit. This patch updates TTI to consider these widening instructions as single operations whose cost is attached to the arithmetic instruction. It marks extends that are part of a widening operation "free" and applies a sub-target specified overhead (zero by default) to the arithmetic instructions. Differential Revision: https://reviews.llvm.org/D32706 llvm-svn: 302582
*	[LV] Fix insertion point for shuffle vectors in first order recurrence	Anna Thomas	2017-05-09	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In first order recurrence vectorization, when the previous value is a phi node, we need to set the insertion point to the first non-phi node. We can have the previous value being a phi node, due to the generation of new IVs as part of trunc optimization [1]. [1] https://reviews.llvm.org/rL294967 Reviewers: mssimpso, mkuper Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D32969 llvm-svn: 302532
*	Introduce experimental generic intrinsics for horizontal vector reductions.	Amara Emerson	2017-05-09	2	-65/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- This change allows targets to opt-in to using them instead of the log2 shufflevector algorithm. - The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle. - For CodeGen, basic legalization support is added. Differential Revision: https://reviews.llvm.org/D30086 llvm-svn: 302514
*	Use right function in LoopVectorize.	Jonas Paulsson	2017-05-04	1	-1/+1
\| \| \| \| \| \| \| \|	- unsigned AS = getMemInstAlignment(I); + unsigned AS = getMemInstAddressSpace(I); Review: Hal Finkel llvm-svn: 302114
*	Rename WeakVH to WeakTrackingVH; NFC	Sanjoy Das	2017-05-01	1	-12/+15
\| \| \| \| \| \|	This relands r301424. llvm-svn: 301812
*	[ValueTracking] Introduce a KnownBits struct to wrap the two APInts for ↵	Craig Topper	2017-04-26	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	computeKnownBits This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit. Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch. I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases. Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero\|One) so we don't write it out everywhere. Maybe a method for (Zero\|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with. Differential Revision: https://reviews.llvm.org/D32376 llvm-svn: 301432
*	Reverts commit r301424, r301425 and r301426	Sanjoy Das	2017-04-26	1	-15/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Commits were: "Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts" "Add a new WeakVH value handle; NFC" "Rename WeakVH to WeakTrackingVH; NFC" The changes assumed pointers are 8 byte aligned on all architectures. llvm-svn: 301429
*	[LV] Handle external uses of floating-point induction variables	Matthew Simpson	2017-04-26	1	-2/+6
\| \| \| \| \| \| \|	Reference: https://bugs.llvm.org/show_bug.cgi?id=32758 Differential Revision: https://reviews.llvm.org/D32445 llvm-svn: 301428
*	Rename WeakVH to WeakTrackingVH; NFC	Sanjoy Das	2017-04-26	1	-13/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I plan to use WeakVH to mean "nulls itself out on deletion, but does not track RAUW" in a subsequent commit. Reviewers: dblaikie, davide Reviewed By: davide Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D32266 llvm-svn: 301424
*	Skip bitcasts while looking for GEP in LoadStoreVectorizer	Stanislav Mekhanoshin	2017-04-25	1	-4/+19
\| \| \| \| \| \|	Differential Revisison: https://reviews.llvm.org/D32101 llvm-svn: 301343
*	[APInt] Use isSubsetOf, intersects, and bit counting methods to reduce ↵	Craig Topper	2017-04-25	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	temporary APInts This patch uses various APInt methods to reduce temporary APInt creation. This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking. I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time. Differential Revision: https://reviews.llvm.org/D32495 llvm-svn: 301338
*	[LV] Remove redundant basic block split	Gil Rapaport	2017-04-25	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	This patch is part of D28975's breakdown. Genreating the control-flow to guard predicated instructions modified to only use SplitBlockAndInsertIfThen() for producing the if-then construct. Differential Revision: https://reviews.llvm.org/D32224 llvm-svn: 301293
*	[LV] Model if-converted phi node costs	Matthew Simpson	2017-04-21	1	-2/+10
\| \| \| \| \| \| \| \| \|	Phi nodes in non-header blocks are converted to select instructions after if-conversion. This patch updates the cost model to account for the selects. Differential Revision: https://reviews.llvm.org/D31906 llvm-svn: 300980
*	[SLP vectorizer] Allow phi node reordering in tryToVectorizeList.	Easwaran Raman	2017-04-18	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In tryToVectorizeList, under a very limited circumstance (when entered from tryToVectorizePair), the values may be reordered (swapped) and the SLP tree is built with the new order. This extends that to the case when starting from phis in vectorizeChainsInBlock when there are exactly two phis. The textual order of phi nodes shouldn't really matter. Without this change, the loop body in the accompnaying test case is fully vectorized when we swap the orde of the phis but not with this order. While this doesn't solve the phi-ordering problem in a general way (for more than 2 phis), this is simple fix that piggybacks on an existing mechanism and is useful in cases like multiplying two complex numbers. Differential revision: https://reviews.llvm.org/D32065 llvm-svn: 300574
*	[LV] Cache block mask values	Gil Rapaport	2017-04-18	1	-7/+17
\| \| \| \| \| \| \| \| \| \| \| \|	This patch is part of D28975's breakdown. Add caching for block masks similar to the cache already used for edge masks, replacing generation per user with reusing the first generated value which dominates all uses. Differential Revision: https://reviews.llvm.org/D32054 llvm-svn: 300557
*	[LV] Remove implicit single basic block assumption	Gil Rapaport	2017-04-14	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is part of D28975's breakdown - no change in output intended. LV's code currently assumes the vectorized loop is a single basic block up until predicateInstructions() is called. This patch removes two manifestations of this assumption (loop phi incoming values, dominator tree update) by replacing the use of vectorLoopBody with the vectorized loop's latch/header. Differential Revision: https://reviews.llvm.org/D32040 llvm-svn: 300310
*	[LV] Fix the vector code generation for first order recurrence	Anna Thomas	2017-04-13	1	-12/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In first order recurrences where phi's are used outside the loop, we should generate an additional vector.extract of the second last element from the vectorized phi update. This is because we require the phi itself (which is the value at the second last iteration of the vector loop) and not the phi's update within the loop. Also fix the code gen when we just unroll, but don't vectorize. Fixes PR32396. Reviewers: mssimpso, mkuper, anemet Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D31979 llvm-svn: 300238
*	[LV] Refactor ILV to provide vectorizeInstruction(); NFC	Ayal Zaks	2017-04-13	1	-310/+302
\| \| \| \| \| \| \| \| \| \| \|	Refactoring InnerLoopVectorizer's vectorizeBlockInLoop() to provide vectorizeInstruction(). Aligning DeadInstructions with its only user. Facilitates driving the transformation by VPlan - follows https://reviews.llvm.org/D28975 and its tentative breakdown. Differential Revision: https://reviews.llvm.org/D31997 llvm-svn: 300183
*	[SLPVectorizer] Pass the right type argument to getCmpSelInstrCost()	Jonas Paulsson	2017-04-12	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	In getEntryCost(), make the scalar type for a compare instruction that of the operands, not i1. This is needed in order to call getCmpSelInstrCost() for a compare in a sensible way, the same way as the LoopVectorizer does. New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll Review: Matthew Simpson https://reviews.llvm.org/D31601 llvm-svn: 300061
*	[LoopVectorizer] Improve handling of branches during cost estimation.	Jonas Paulsson	2017-04-12	1	-1/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The cost for a branch after vectorization is very different depending on if the vectorizer will if-convert the block (branch is eliminated), or if scalarized and predicated blocks will be produced (branch duplicated before each block). There is also the case of remaining scalar branches, such as the back-edge branch. This patch handles these cases differently with TTI based cost estimates. Review: Matthew Simpson https://reviews.llvm.org/D31175 llvm-svn: 300058
*	[LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore()	Jonas Paulsson	2017-04-12	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since SystemZ supports vector element load/store instructions, there is no need for extracts/inserts if a vector load/store gets scalarized. This patch lets Target specify that it supports such instructions by means of a new TTI hook that defaults to false. The use for this is in the LoopVectorizer getScalarizationOverhead() method, which will with this patch produce a smaller sum for a vector load/store on SystemZ. New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll Review: Adam Nemet https://reviews.llvm.org/D30680 llvm-svn: 300056
*	[SystemZ] TargetTransformInfo cost functions implemented.	Jonas Paulsson	2017-04-12	3	-23/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052
*	[LV] Avoid vectorizing first order recurrence when phi uses are outside loop	Anna Thomas	2017-04-11	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the vectorization of first order recurrence, we vectorize such that the last element in the vector will be the one extracted to pass into the scalar remainder loop. However, this is not true when there is a phi (other than the primary induction variable) is used outside the loop. In such a case, we need the value from the second last iteration (i.e. the phi value), not the last iteration (which would be the phi update). I've added a test case for this. Also see PR32396. A follow up patch would generate the correct code gen for such cases, and turn this vectorization on. Differential Revision: https://reviews.llvm.org/D31910 Reviewers: mssimpso llvm-svn: 299985
*	Reapply r298620: [LV] Vectorize GEPs	Matthew Simpson	2017-04-07	1	-79/+206
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch reapplies r298620. The original patch was reverted because of two issues. First, the patch exposed a bug in InstCombine that caused the Chromium builds to fail (PR32414). This issue was fixed in r299017. Second, the patch introduced a bug in the vectorizer's scalars analysis that caused test suite builds to fail on SystemZ. The scalars analysis was too aggressive and marked a memory instruction scalar, even though it was going to be vectorized. This issue has been fixed in the current patch and several new test cases for the scalars analysis have been added. llvm-svn: 299770
*	[LV] Transform truncations of non-primary induction variables	Matthew Simpson	2017-03-27	1	-11/+10
\| \| \| \| \| \| \| \| \| \| \| \|	The vectorizer tries to replace truncations of induction variables with new induction variables having the smaller type. After r295063, this optimization was applied to all integer induction variables, including non-primary ones. When optimizing the truncation of a non-primary induction variable, we still need to transform the new induction so that it has the correct start value. This should fix PR32419. Reference: https://bugs.llvm.org/show_bug.cgi?id=32419 llvm-svn: 298882
*	Revert r298620: [LV] Vectorize GEPs	Ivan Krasin	2017-03-24	1	-117/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes) LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413 Original change description: [LV] Vectorize GEPs This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Original Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298735
*	[LV] Vectorize GEPs	Matthew Simpson	2017-03-23	1	-67/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for vectorizing GEPs. Previously, we only generated vector GEPs on-demand when creating gather or scatter operations. All GEPs from the original loop were scalarized by default, and if a pointer was to be stored to memory, we would have to build up the pointer vector with insertelement instructions. With this patch, we will vectorize all GEPs that haven't already been marked for scalarization. The patch refines collectLoopScalars to more exactly identify the scalar GEPs. The function now more closely resembles collectLoopUniforms. And the patch moves vector GEP creation out of vectorizeMemoryInstruction and into the main vectorization loop. The vector GEPs needed for gather and scatter operations will have already been generated before vectoring the memory accesses. Differential Revision: https://reviews.llvm.org/D30710 llvm-svn: 298620
*	[LV] Delete unneeded scalar GEP creation code	Matthew Simpson	2017-03-23	1	-33/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The code for generating scalar base pointers in vectorizeMemoryInstruction is not needed. We currently scalarize all GEPs and maintain the scalarized values in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as that in scalarizeInstruction. The test cases that changed as a result of this patch changed because we were able to reuse the scalarized GEP that we previously generated instead of cloning a new one. Differential Revision: https://reviews.llvm.org/D30587 llvm-svn: 298615
*	Test commit access	Yi Kong	2017-03-21	1	-10/+10
\| \| \| \| \| \|	Remove some trailing whitespaces. llvm-svn: 298379
*	[LV] Refactor cross-iteration phi's back-patching; NFC	Gil Rapaport	2017-03-14	1	-232/+244
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch refactors the PHisToFix loop as follows: - The loop itself now resides in its own method. - The new method iterates on scalar-loop's header; the PHIsToFix map formerly propagated as an output parameter and filled during phi widening is removed. - The code handling reductions is moved into its own method, similar to the existing fixFirstOrderRecurrence(). Differential Revision: https://reviews.llvm.org/D30755 llvm-svn: 297740
*	[LV] Refactor Cost Model's selectVectorizationFactor(); NFC	Ayal Zaks	2017-03-14	1	-73/+132
\| \| \| \| \| \| \| \| \| \| \|	Refactoring Cost Model's selectVectorizationFactor() so that it handles only the selection of the best VF from a pre-computed range of candidate VF's, extracting early-exit criteria and the computation of a MaxVF upper-bound to other methods, all driven by a newly introduced LoopVectorizationPlanner. Differential Revision: https://reviews.llvm.org/D30653 llvm-svn: 297737
*	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved	Jonas Paulsson	2017-03-14	3	-26/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705
*	[LV] Set memcheck metadata also for VF==1	Gil Rapaport	2017-03-13	1	-5/+1
\| \| \| \| \| \| \| \|	This commit is a follow-up on r297580. It fixes the FIXME added temporarily by that commit to keep the removal of Unroller's specialized version of scalarizeInstruction() an NFC. See https://reviews.llvm.org/D30715 for details. llvm-svn: 297610
*	[LV] A unified scalarizeInstruction() for Vectorizer and Unroller; NFC	Gil Rapaport	2017-03-12	1	-66/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unroller's specialized scalarizeInstruction() is mostly duplicating Vectorizer's variant. OTOH Vectorizer's scalarizeInstruction() already supports the special case of VF==1 except for avoiding mask-bit extraction in that case. This patch removes Unroller's specialized version in favor of a unified method. The only functional difference between the two variants seems to be setting memcheck metadata for loads and stores only in Vectorizer's variant, which is a bug in Unroller. To keep this patch an NFC the unified method doesn't set memcheck metadata for VF==1. Differential Revision: https://reviews.llvm.org/D30715 llvm-svn: 297580
*	Test commit.	Ayal Zaks	2017-03-12	1	-0/+1
\| \| \| \|	llvm-svn: 297579