bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps"	Matthew Simpson	2017-02-14	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \|	This reapplies commit r294967 with a fix for the execution time regressions caught by the clang-cmake-aarch64-quick bot. We now extend the truncate optimization to non-primary induction variables only if the truncate isn't already free. Differential Revision: https://reviews.llvm.org/D29847 llvm-svn: 295063
*	Revert "[LoopVectorize] Added address space check when analysing interleaved ↵	Karl-Johan Karlsson	2017-02-14	1	-37/+0
\| \| \| \| \| \| \| \| \|	accesses" This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed. I'm reverting to investigate. llvm-svn: 295042
*	[LoopVectorize] Added address space check when analysing interleaved accesses	Karl-Johan Karlsson	2017-02-14	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prevent memory objects of different address spaces to be part of the same load/store groups when analysing interleaved accesses. This is fixing pr31900. Reviewers: HaoLiu, mssimpso, mkuper Reviewed By: mssimpso, mkuper Subscribers: llvm-commits, efriedma, mzolotukhin Differential Revision: https://reviews.llvm.org/D29717 llvm-svn: 295038
*	[Loop Vectorizer] Cost-based decision for vectorization form of memory ↵	Elena Demikhovsky	2017-02-08	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instruction. Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction. The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution. This patch includes the following changes: - Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision. - Arrays of Uniform and Scalar values are moved from Legality to Cost Model. - Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form. - Vectorization of memory instruction is performed according to the CM decision. Differential Revision: https://reviews.llvm.org/D27919 llvm-svn: 294503
*	[LV] Add new ARM/AArch64 interleaved access cost model tests (NFC)	Matthew Simpson	2017-02-07	1	-0/+89
\| \| \| \|	llvm-svn: 294342
*	[LV] Simplify ARM/AArch64 interleaved access cost model tests (NFC)	Matthew Simpson	2017-02-07	1	-61/+50
\| \| \| \| \| \| \| \|	This patch removes unneeded instructions from the existing ARM/AArch64 interleaved access cost model tests. I'll be adding a similar set of tests in a follow-on patch to increase coverage. llvm-svn: 294336
*	Reapply "[LV] Enable vectorization of loops with conditional stores by default"	Matthew Simpson	2016-12-16	1	-1/+1
\| \| \| \| \| \| \| \|	This patch reapplies r289863. The original patch was reverted because it exposed a bug causing the loop vectorizer to crash in the Python runtime on PPC. The underlying issue was fixed with r289958. llvm-svn: 289975
*	Revert r289863: [LV] Enable vectorization of loops with conditional	Chandler Carruth	2016-12-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	stores by default This uncovers a crasher in the loop vectorizer on PPC when building the Python runtime. I'll send the testcase to the review thread for the original commit. llvm-svn: 289934
*	[LV] Enable vectorization of loops with conditional stores by default	Matthew Simpson	2016-12-15	1	-1/+1
\| \| \| \| \| \| \| \| \|	This patch sets the default value of the "-enable-cond-stores-vec" command line option to "true". Differential Revision: https://reviews.llvm.org/D27814 llvm-svn: 289863
*	[LV] Scalarize operands of predicated instructions	Matthew Simpson	2016-12-07	2	-2/+209
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch attempts to scalarize the operand expressions of predicated instructions if they were conditionally executed in the original loop. After scalarization, the expressions will be sunk inside the blocks created for the predicated instructions. The transformation essentially performs un-if-conversion on the operands. The cost model has been updated to determine if scalarization is profitable. It compares the cost of a vectorized instruction, assuming it will be if-converted, to the cost of the scalarized instruction, assuming that the instructions corresponding to each vector lane will be sunk inside a predicated block, possibly avoiding execution. If it's more profitable to scalarize the entire expression tree feeding the predicated instruction, the expression will be scalarized; otherwise, it will be vectorized. We only consider the cost of the entire expression to accurately estimate the cost of the required insertelement and extractelement instructions. Differential Revision: https://reviews.llvm.org/D26083 llvm-svn: 288909
*	Second attempt at r285517.	Dorit Nuzman	2016-10-31	1	-1/+1
\| \| \| \|	llvm-svn: 285568
*	Revert r285517 due to build failures.	Dorit Nuzman	2016-10-30	1	-1/+1
\| \| \| \|	llvm-svn: 285518
*	[LoopVectorize] Make interleaved-accesses analysis less conservative about	Dorit Nuzman	2016-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	possible pointer-wrap-around concerns, in some cases. Before this patch, collectConstStridedAccesses (part of interleaved-accesses analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when examining all candidate pointers. This is too conservative. Instead, this patch makes collectConstStridedAccesses use an optimistic approach, calling getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the candidate interleave groups have been formed, revisits the pointer-wrapping analysis but only where it matters: namely, in groups that have gaps, and where the gaps are not at the very end of the group (in which case the loop is peeled). This second time getPtrStride is called with [Assume=false, ShouldCheckWrap=true], but this could further be improved to using Assume=true, once we also add the logic to track that we are not going to meet the scev runtime checks threshold. Differential Revision: https://reviews.llvm.org/D25276 llvm-svn: 285517
*	[LV] Correct misleading comments in test (NFC)	Matthew Simpson	2016-10-28	1	-9/+5
\| \| \| \|	llvm-svn: 285402
*	[LV] Account for predicated stores in instruction costs	Matthew Simpson	2016-10-13	1	-1/+39
\| \| \| \| \| \| \|	This patch ensures that we scale the estimated cost of predicated stores by block probability. This is a follow-on patch for r284123. llvm-svn: 284126
*	[LV] Avoid rounding errors for predicated instruction costs	Matthew Simpson	2016-10-13	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \|	This patch modifies the cost calculation of predicated instructions (div and rem) to avoid the accumulation of rounding errors due to multiple truncating integer divisions. The calculation for predicated stores will be addressed in a follow-on patch since we currently don't scale the cost of predicated stores by block probability. Differential Revision: https://reviews.llvm.org/D25333 llvm-svn: 284123
*	[LV] Move insertelement sequence after scalar definitions	Matthew Simpson	2016-08-29	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	After r279649 when getting a vector value from VectorLoopValueMap, we create an insertelement sequence on-demand if the value has been scalarized instead of vectorized. We previously inserted this insertelement sequence before the value's first vector user. However, this insert location is problematic if that user is the phi node of a first-order recurrence. With this patch, we move the insertelement sequence after the last scalar instruction we created when scalarizing the value. Thus, the value's vector definition in the new loop will immediately follow its scalar definitions. This should fix PR30183. Reference: https://llvm.org/bugs/show_bug.cgi?id=30183 llvm-svn: 280001
*	[LV] Unify vector and scalar maps	Matthew Simpson	2016-08-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch unifies the data structures we use for mapping instructions from the original loop to their corresponding instructions in the new loop. Previously, we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap. WidenMap maintained the vector values each instruction from the old loop was represented with, and ScalarIVMap maintained the scalar values each scalarized induction variable was represented with. With this patch, all values created for the new loop are maintained in VectorLoopValueMap. The change allows for several simplifications. Previously, when an instruction was scalarized, we had to insert the scalar values into vectors in order to maintain the mapping in WidenMap. Then, if a user of the scalarized value was also scalar, we had to extract the scalar values from the temporary vector we created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions. If a scalarized value is used by a scalar instruction, the scalar value is used directly. However, if the scalarized value is needed by a vector instruction, we generate the needed insertelement instructions on-demand. A common idiom in several locations in the code (including the scalarization code), is to first get the vector values an instruction from the original loop maps to, and then extract a particular scalar value. This patch adds getScalarValue for this purpose along side getVectorValue as an interface into VectorLoopValueMap. These functions work together to return the requested values if they're available or to produce them if they're not. The mapping has also be made less permissive. Entries can be added to VectorLoopValue map with the new initVector and initScalar functions. getVectorValue has been modified to return a constant reference to the mapped entries. There's no real functional change with this patch; however, in some cases we will generate slightly different code. For example, instead of an insertelement sequence following the definition of an instruction, it will now precede the first use of that instruction. This can be seen in the test case changes. Differential Revision: https://reviews.llvm.org/D23169 llvm-svn: 279649
*	Reapply "[TTI] Refine default cost for interleaved load groups with gaps"	Matthew Simpson	2016-06-10	1	-0/+42
\| \| \| \| \| \| \| \|	This reapplies commit r272385 with a fix. The build was failing when compiled with gcc, but not with clang. With the fix, we now get the data layout from the current TTI implementation, which will hopefully solve the issue. llvm-svn: 272395
*	Revert "[TTI] Refine default cost for interleaved load groups with gaps"	Matthew Simpson	2016-06-10	1	-42/+0
\| \| \| \| \| \| \|	This reverts commit r272385. This commit broke the build. I'm temporarily reverting to investigate. llvm-svn: 272391
*	[TTI] Refine default cost for interleaved load groups with gaps	Matthew Simpson	2016-06-10	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \|	This patch refines the default cost for interleaved load groups having gaps. If a load group has gaps, the legalized instructions corresponding to the unused elements will be dead. Thus, we don't need to account for them in the cost model. Instead, we only need to account for the fraction of legalized loads that will actually be used. Differential Revision: http://reviews.llvm.org/D20873 llvm-svn: 272385
*	[LAA] Rename forwarding conflict detection option (NFC)	Matthew Simpson	2016-05-16	1	-1/+1
\| \| \| \| \| \| \|	This patch renames the option enabling the store-to-load forwarding conflict detection optimization. This change was requested in the review of D20241. llvm-svn: 269668
*	[LV] Ensure safe VF for loops with interleaved accesses	Matthew Simpson	2016-05-16	1	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \|	The selection of the vectorization factor currently doesn't consider interleaved accesses. The vectorization factor is based on the maximum safe dependence distance computed by LAA. However, for loops with interleaved groups, we should instead base the vectorization factor on the maximum safe dependence distance divided by the maximum interleave factor of all the interleaved groups. Interleaved accesses not in a group will be scalarized. Differential Revision: http://reviews.llvm.org/D20241 llvm-svn: 269659
*	Revert "[VectorUtils] Query number of sign bits to allow more truncations"	James Molloy	2016-05-10	1	-36/+0
\| \| \| \| \| \| \| \|	This was a fairly simple patch but on closer inspection was seriously flawed and caused PR27690. This reverts commit r268921. llvm-svn: 269051
*	[VectorUtils] Query number of sign bits to allow more truncations	James Molloy	2016-05-09	1	-0/+36
\| \| \| \| \| \|	When deciding if a vector calculation can be done in a smaller bitwidth, use sign bit information from ValueTracking to add more information and allow more truncations. llvm-svn: 268921
*	[LoopUtils, LV] Fix PR27246 (first-order recurrences)	Matthew Simpson	2016-04-11	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \|	This patch ensures that when we detect first-order recurrences, we reject a phi node if its previous value is also a phi node. During vectorization the initial and previous values of the recurrence are shuffled together to create the value for the current iteration. However, phi nodes are not widened like other instructions. This fixes PR27246. Differential Revision: http://reviews.llvm.org/D18971 llvm-svn: 265983
*	Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA ↵	Silviu Baranga	2016-04-08	1	-0/+166
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and LV This re-commits r265535 which was reverted in r265541 because it broke the windows bots. The problem was that we had a PointerIntPair which took a pointer to a struct allocated with new. The problem was that new doesn't provide sufficient alignment guarantees. This pattern was already present before r265535 and it just happened to work. To fix this, we now separate the PointerToIntPair from the ExitNotTakenInfo struct into a pointer and a bool. Original commit message: Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265786
*	Revert r265535 until we know how we can fix the bots	Silviu Baranga	2016-04-06	1	-166/+0
\| \| \| \|	llvm-svn: 265541
*	[SCEV] Introduce a guarded backedge taken count and use it in LAA and LV	Silviu Baranga	2016-04-06	1	-0/+166
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265535
*	[VectorUtils] Don't try and truncate PHIs to a smaller bitwidth	James Molloy	2016-03-30	1	-0/+58
\| \| \| \| \| \| \| \| \| \|	We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means. However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018. This fixes PR17018. llvm-svn: 264852
*	[LoopUtils, LV] Fix PR26734	Matthew Simpson	2016-03-03	1	-0/+49
\| \| \| \| \| \| \| \|	The vectorization of first-order recurrences (r261346) caused PR26734. When detecting these recurrences, we need to ensure that the previous value is actually defined inside the loop. This patch includes the fix and test case. llvm-svn: 262624
*	[LV] Vectorize first-order recurrences	Matthew Simpson	2016-02-19	1	-0/+209
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables the vectorization of first-order recurrences. A first-order recurrence is a non-reduction recurrence relation in which the value of the recurrence in the current loop iteration equals a value defined in the previous iteration. The load PRE of the GVN pass often creates these recurrences by hoisting loads from within loops. In this patch, we add a new recurrence kind for first-order phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction. Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org> Differential Revision: http://reviews.llvm.org/D16197 llvm-svn: 261346
*	[LV] Add support for insertelt/extractelt processing during type truncation	Silviu Baranga	2016-02-15	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: While shrinking types according to the required bits, we can encounter insert/extract element instructions. This will cause us to reach an llvm_unreachable statement. This change adds support for truncating insert/extract element operations, and adds a regression test. Reviewers: jmolloy Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17078 llvm-svn: 260893
*	[DemandedBits] Revert r249687 due to PR26071	James Molloy	2016-02-03	1	-34/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This regresses a test in LoopVectorize, so I'll need to go away and think about how to solve this in a way that isn't broken. From the writeup in PR26071: What's happening is that ComputeKnownZeroes is telling us that all bits except the LSB are zero. We're then deciding that only the LSB needs to be demanded from the icmp's inputs. This is where we're wrong - we're assuming that after simplification the bits that were known zero will continue to be known zero. But they're not - during trivialization the upper bits get changed (because an XOR isn't shrunk), so the icmp fails. The fault is in demandedbits - its contract does clearly state that a non-demanded bit may either be zero or one. llvm-svn: 259649
*	[LoopVectorize] Use MapVector rather than DenseMap for MinBWs.	Charlie Turner	2015-11-26	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The order in which instructions are truncated in truncateToMinimalBitwidths effects code generation. Switch to a map with a determinisic order, since the iteration order over a DenseMap is not defined. This code is not hot, so the difference in container performance isn't interesting. Many thanks to David Blaikie for making me aware of MapVector! Fixes PR25490. Differential Revision: http://reviews.llvm.org/D14981 llvm-svn: 254179
*	[LoopVectorize] Address post-commit feedback on r250032	James Molloy	2015-11-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Implemented as many of Michael's suggestions as were possible: * clang-format the added code while it is still fresh. * tried to change Value* to Instruction* in many places in computeMinimumValueSizes - unfortunately there are several places where Constants need to be handled so this wasn't possible. * Reduce the pass list on loop-vectorization-factors.ll. * Fix a bug where we were querying MinBWs for I->getOperand(0) but using MinBWs[I]. llvm-svn: 252469
*	[LoopVectorize] Shrink integer operations into the smallest type possible	James Molloy	2015-10-12	1	-0/+243
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int type (e.g. i32) whenever arithmetic is performed on them. For targets with native i8 or i16 operations, usually InstCombine can shrink the arithmetic type down again. However InstCombine refuses to create illegal types, so for targets without i8 or i16 registers, the lengthening and shrinking remains. Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when their scalar equivalents do not, so during vectorization it is important to remove these lengthens and truncates when deciding the profitability of vectorization. The algorithm this uses starts at truncs and icmps, trawling their use-def chains until they terminate or instructions outside the loop are found (or unsafe instructions like inttoptr casts are found). If the use-def chains starting from different root instructions (truncs/icmps) meet, they are unioned. The demanded bits of each node in the graph are ORed together to form an overall mask of the demanded bits in the entire graph. The minimum bitwidth that graph can be truncated to is the bitwidth minus the number of leading zeroes in the overall mask. The intention is that this algorithm should "first do no harm", so it will never insert extra cast instructions. This is why the use-def graphs are unioned, so that subgraphs with different minimum bitwidths do not need casts inserted between them. This algorithm works hard to reduce compile time impact. DemandedBits are only queried if there are extends of illegal types and if a truncate to an illegal type is seen. In the general case, this results in a simple linear scan of the instructions in the loop. No non-noise compile time impact was seen on a clang bootstrap build. llvm-svn: 250032
*	[LV] Relax Small Size Reduction Type Requirement	Matthew Simpson	2015-09-10	1	-3/+66
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch enables small size reductions in which the source types are smaller than the reduction type (e.g., computing an i16 sum from the values in an i8 array). The previous behavior was to only allow small size reductions if the source types and reduction type were the same. The change accounts for the fact that the existing sign- and zero-extend instructions in these cases should still be included in the cost model. Differential Revision: http://reviews.llvm.org/D12770 llvm-svn: 247337
*	[AArch64] Turn on by default interleaved access vectorization	Silviu Baranga	2015-09-01	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change turns on by default interleaved access vectorization for AArch64. We also clean up some tests which were spedifically enabling this behaviour. Reviewers: rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12149 llvm-svn: 246542
*	[LoopVectorize] Move test from r246149 into a target-specific folder to ↵	Chad Rosier	2015-08-27	1	-0/+128
\| \| \| \| \| \|	appease bots. llvm-svn: 246154
*	The tests added in r243270 require asserts to be enabled	Silviu Baranga	2015-07-27	1	-1/+1
\| \| \| \|	llvm-svn: 243274
*	Fix the tests added in r243270. Use 2>&1 instead of \|&	Silviu Baranga	2015-07-27	1	-1/+1
\| \| \| \|	llvm-svn: 243273
*	[ARM/AArch64] Fix cost model for interleaved accesses	Silviu Baranga	2015-07-27	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fix the cost of interleaved accesses for ARM/AArch64. We were calling getTypeAllocSize and using it to check the number of bits, when we should have called getTypeAllocSizeInBits instead. This would pottentially cause the vectorizer to generate loads/stores and shuffles which cannot be matched with an interleaved access instruction. No performance changes are expected for now since matching/generating interleaved accesses is still disabled by default. Reviewers: rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D11524 llvm-svn: 243270
*	[LoopVectorize] Teach Loop Vectorizor about interleaved memory accesses.	Hao Liu	2015-06-08	1	-20/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector. E.g. for (i = 0; i < N; i+=2) { a = A[i]; // load of even element b = A[i+1]; // load of odd element ... // operations on a, b, c, d A[i] = c; // store of even element A[i+1] = d; // store of odd element } The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like: %wide.vec = load <8 x i32>, <8 x i32>* %ptr %vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6> %vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7> The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like: %interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7> store <8 x i32> %interleaved.vec, <8 x i32>* %ptr This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'. llvm-svn: 239291
*	[opaque pointer type] Add textual IR support for explicit type parameter to ↵	David Blaikie	2015-02-27	5	-31/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load instruction Essentially the same as the GEP change in r230786. A similar migration script can be used to update test cases, though a few more test case improvements/changes were required this time around: (r229269-r229278) import fileinput import sys import re pat = re.compile(r"((?:=\|:\|^)\sload (?:atomic )?(?:volatile )?(.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$)") for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line)) Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7649 llvm-svn: 230794
*	[opaque pointer type] Add textual IR support for explicit type parameter to ↵	David Blaikie	2015-02-27	5	-19/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getelementptr instruction One of several parallel first steps to remove the target type of pointers, replacing them with a single opaque pointer type. This adds an explicit type parameter to the gep instruction so that when the first parameter becomes an opaque pointer type, the type to gep through is still available to the instructions. * This doesn't modify gep operators, only instructions (operators will be handled separately) * Textual IR changes only. Bitcode (including upgrade) and changing the in-memory representation will be in separate changes. * geps of vectors are transformed as: getelementptr <4 x float> %x, ... ->getelementptr float, <4 x float> %x, ... Then, once the opaque pointer type is introduced, this will ultimately look like: getelementptr float, <4 x ptr> %x with the unambiguous interpretation that it is a vector of pointers to float. * address spaces remain on the pointer, not the type: getelementptr float addrspace(1)* %x ->getelementptr float, float addrspace(1)* %x Then, eventually: getelementptr float, ptr addrspace(1) %x Importantly, the massive amount of test case churn has been automated by same crappy python code. I had to manually update a few test cases that wouldn't fit the script's model (r228970,r229196,r229197,r229198). The python script just massages stdin and writes the result to stdout, I then wrapped that in a shell script to handle replacing files, then using the usual find+xargs to migrate all the files. update.py: import fileinput import sys import re ibrep = re.compile(r"(^.?[^%\w]getelementptr inbounds )(((?:<\d x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") normrep = re.compile( r"(^.?[^%\w]getelementptr )(((?:<\d* x )?)(.?)(\| addrspace$\d$) \(\|>)(?:$\| (?:%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|\[\[[a-zA-Z]\|\{\{).$))") def conv(match, line): if not match: return line line = match.groups()[0] if len(match.groups()[5]) == 0: line += match.groups()[2] line += match.groups()[3] line += ", " line += match.groups()[1] line += "\n" return line for line in sys.stdin: if line.find("getelementptr ") == line.find("getelementptr inbounds"): if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("): line = conv(re.match(ibrep, line), line) elif line.find("getelementptr ") != line.find("getelementptr ("): line = conv(re.match(normrep, line), line) sys.stdout.write(line) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name *.ll \| xargs ./apply.sh After that, check-all (with llvm, clang, clang-tools-extra, lld, compiler-rt, and polly all checked out). The extra 'rm' in the apply.sh script is due to a few files in clang's test suite using interesting unicode stuff that my python script was throwing exceptions on. None of those files needed to be migrated, so it seemed sufficient to ignore those cases. Reviewers: rafael, dexonsmith, grosser Differential Revision: http://reviews.llvm.org/D7636 llvm-svn: 230786
*	Move the target specific test case arbitrary-induction-step.ll to ↵	Hao Liu	2015-01-30	1	-0/+150
\| \| \| \| \| \|	test/Transforms/LoopVectorize/AArch64 folder. llvm-svn: 227561
*	Addition to r216371 (SLP and Loop Vectorization) and r218607 where	Suyog Sarda	2014-11-11	1	-0/+31
\| \| \| \| \| \| \| \| \| \|	cost model for signed division by power of 2 was improved for AArch64. The revision r218607 missed test case for Loop Vectorization. Adding it in this revision. Differential Revision: http://reviews.llvm.org/D6181 llvm-svn: 221674
*	Reduce verbiage of lit.local.cfg files	Alp Toker	2014-06-09	1	-2/+1
\| \| \| \| \| \|	We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496
*	Use AArch64 instead of now removed ARM64 in test configs	Alexey Samsonov	2014-06-05	1	-1/+1
\| \| \| \|	llvm-svn: 210229