bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[LV] Fix predication for branches with matching true and false succs.	Florian Hahn	2020-02-06	1	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently due to the edge caching, we create wrong predicates for branches with matching true and false successors. We will cache the condition for the edge from the true successor, and then lookup the same edge (src and dst are the same) for the edge to the false successor. If both successors match, the condition should always be true. At the moment, we cannot really create constant VPValues, but we can just create a true condition as X \| !X. Later passes will clean that up. Fixes PR44488. Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D73079 (cherry picked from commit f14f2a856802e086662d919e2ead641718b27555)
*	[LV] Do not try to sink dead instructions.	Florian Hahn	2020-01-29	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dead instructions do not need to be sunk. Currently we try and record the recipies for them, but there are no recipes emitted for them and there's nothing to sink. They can be removed from SinkAfter while marking them for recording. Fixes PR44634. Reviewers: rengolin, hsaito, fhahn, Ayal, gilr Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D73423 (cherry picked from commit a911fef3dd79e0a04b241be7b476dde7e99744c4)
*	[ARM][MVE] MVE-I should not be disabled by -mfpu=none	Momchil Velikov	2020-01-09	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Architecturally, it's allowed to have MVE-I without an FPU, thus -mfpu=none should not disable MVE-I, or moves to/from FP-registers. This patch removes `+/-fpregs` from features unconditionally added to target feature list, depending on FPU and moves the logic to Clang driver, where the negative form (`-fpregs`) is conditionally added to the target features list for the cases of `-mfloat-abi=soft`, or `-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative form is added by the driver, the positive one is derived from other features in the backend. Differential Revision: https://reviews.llvm.org/D71843
*	[LV] Still vectorise when tail-folding can't find a primary inducation variable	Sjoerd Meijer	2020-01-09	2	-0/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This addresses a vectorisation regression for tail-folded loops that are counting down, e.g. loops as simple as this: void foo(char A, char B, char C, uint32_t N) { while (N > 0) { C++ = A++ + B++; N--; } } These are loops that can be vectorised, but when tail-folding is requested, it can't find a primary induction variable which we do need for predicating the loop. As a result, the loop isn't vectorised at all, which it is able to do when tail-folding is not attempted. So, this adds a check for the primary induction variable where we decide how to lower the scalar epilogue. I.e., when there isn't a primary induction variable, a scalar epilogue loop is allowed (i.e. don't request tail-folding) so that vectorisation could still be triggered. Having this check for the primary induction variable make sense anyway, and in addition, in a follow-up of this I will look into discovering earlier the primary induction variable for counting down loops, so that this can also be tail-folded. Differential revision: https://reviews.llvm.org/D72324
*	llc: Change behavior of -mcpu with existing attribute	Matt Arsenault	2020-01-07	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	Don't overwrite existing target-cpu attributes. I've often found the replacement behavior annoying, and this is inconsistent with how the fast math command line flags interact with the function attributes. Does not yet change target-features, since I think that should behave as a concatenation.
*	[PowerPC][LoopVectorize] Extend getRegisterClassForType to consider double ↵	Jinsong Ji	2020-01-06	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	and other floating point type In https://reviews.llvm.org/D67148, we use isFloatTy to test floating point type, otherwise we return GPRRC. So 'double' will be classified as GPRRC, which is not accurate. This patch covers other floating point types. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D71946
*	[PowerPC][LoopVectorize] Add tests for fp128 and fp16	Jinsong Ji	2020-01-03	1	-0/+58
\| \| \| \|	Add two tests to reg-usage.ll
*	[PowerPC][LoopVectorize]Add floating point reg usage test	Jinsong Ji	2019-12-27	1	-0/+91
\| \| \| \|	Copied two tests from x86 to test floating point reg usage.
*	Migrate function attribute "no-frame-pointer-elim"="false" to ↵	Fangrui Song	2019-12-24	6	-6/+6
\| \| \| \|	"frame-pointer"="none" as cleanups after D56351
*	Migrate function attribute "no-frame-pointer-elim-non-leaf" to ↵	Fangrui Song	2019-12-24	2	-2/+2
\| \| \| \|	"frame-pointer"="non-leaf" as cleanups after D56351
*	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵	Fangrui Song	2019-12-24	13	-16/+16
\| \| \| \|	as cleanups after D56351
*	[LV] Strip wrap flags from vectorized reductions	Ayal Zaks	2019-12-20	8	-21/+79
\| \| \| \| \| \| \| \| \| \| \| \|	A sequence of additions or multiplications that is known not to wrap, may wrap if it's order is changed (i.e., reassociated). Therefore when vectorizing integer sum or product reductions, their no-wrap flags need to be removed. Fixes PR43828 Patch by Denis Antrushin Differential Revision: https://reviews.llvm.org/D69563
*	[PowerPC] Add missing legalization for vector BSWAP	Nemanja Ivanovic	2019-12-17	1	-0/+97
\| \| \| \| \| \| \| \|	We somehow missed doing this when we were working on Power9 exploitation. This just adds the missing legalization and cost for producing the vector intrinsics. Differential revision: https://reviews.llvm.org/D70436
*	[ARM] Add missing REQUIRES: asserts to test. NFC	David Green	2019-12-09	1	-0/+1
\|
*	[ARM] Enable MVE masked loads and stores	David Green	2019-12-09	4	-6/+6
\| \| \| \| \| \| \|	With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does. Differential Revision: https://reviews.llvm.org/D70968
*	[ARM] Teach the Arm cost model that a Shift can be folded into other ↵	David Green	2019-12-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
*	[ARM] Additional tests and minor formatting. NFC	David Green	2019-12-09	1	-0/+86
\| \| \| \| \| \|	This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.
*	[ARM] Disable VLD4 under MVE	David Green	2019-12-08	2	-22/+109
\| \| \| \| \| \| \| \| \| \|	Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109
*	[LV] Pick correct BB as insert point when fixing PHI for FORs.	Florian Hahn	2019-12-07	1	-0/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we fail to pick the right insertion point when PreviousLastPart of a first-order-recurrence is a PHI node not in the LoopVectorBody. This can happen when PreviousLastPart is produce in a predicated block. In that case, we should pick the insertion point in the BB the PHI is in. Fixes PR44020. Reviewers: hsaito, fhahn, Ayal, dorit Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D71071
*	[LV] Scalar with predication must not be uniform	Ayal Zaks	2019-12-03	1	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix PR40816: avoid considering scalar-with-predication instructions as also uniform-after-vectorization. Instructions identified as "scalar with predication" will be "vectorized" using a replicating region. If such instructions are also optimized as "uniform after vectorization", namely when only the first of VF lanes is used, such a replicating region becomes erroneous - only the first instance of the region can and should be formed. Fix such cases by not considering such instructions as "uniform after vectorization". Differential Revision: https://reviews.llvm.org/D70298
*	[InstCombine] Revert rL341831: relax one-use check in foldICmpAddConstant() ↵	Roman Lebedev	2019-12-02	2	-43/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(PR44100) rL341831 moved one-use check higher up, restricting a few folds that produced a single instruction from two instructions to the case where the inner instruction would go away. Original commit message: > InstCombine: move hasOneUse check to the top of foldICmpAddConstant > > There were two combines not covered by the check before now, > neither of which actually differed from normal in the benefit analysis. > > The most recent seems to be because it was just added at the top of the > function (naturally). The older is from way back in 2008 (r46687) > when we just didn't put those checks in so routinely, and has been > diligently maintained since. From the commit message alone, there doesn't seem to be a deeper motivation, deeper problem that was trying to solve, other than 'fixing the wrong one-use check'. As i have briefly discusses in IRC with Tim, the original motivation can no longer be recovered, too much time has passed. However i believe that the original fold was doing the right thing, we should be performing such a transformation even if the inner `add` will not go away - that will still unchain the comparison from `add`, it will no longer need to wait for `add` to compute. Doing so doesn't seem to break any particular idioms, as least as far as i can see. References https://bugs.llvm.org/show_bug.cgi?id=44100
*	[IVDescriptors] Skip FOR where we have multiple sink points for now.	Florian Hahn	2019-11-28	1	-0/+30
\| \| \| \| \|	This fixes a crash with instructions where multiple operands are first-order-recurrences.
*	[x86] make SLM extract vector element more expensive than default	Sanjay Patel	2019-11-27	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
*	Recommit f0c2a5a "[LV] Generalize conditions for sinking instrs for first ↵	Florian Hahn	2019-11-24	2	-0/+290
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	order recurrences." This version contains 2 fixes for reported issues: 1. Make sure we do not try to sink terminator instructions. 2. Make sure we bail out, if we try to sink an instruction that needs to stay in place for another recurrence. Original message: If the recurrence PHI node has a single user, we can sink any instruction without side effects, given that all users are dominated by the instruction computing the incoming value of the next iteration ('Previous'). We can sink instructions that may cause traps, because that only causes the trap to occur later, but not on any new paths. With the relaxed check, we also have to make sure that we do not have a direct cycle (meaning PHI user == 'Previous), which indicates a reduction relation, which potentially gets missed by ReductionDescriptor. As follow-ups, we can also sink stores, iff they do not alias with other instructions we move them across and we could also support sinking chains of instructions and multiple users of the PHI. Fixes PR43398. Reviewers: hsaito, dcaballe, Ayal, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D69228
*	[LV] PreferPredicateOverEpilog respecting option	Sjoerd Meijer	2019-11-21	1	-0/+18
\| \| \| \| \| \| \| \|	Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not to predicate the loop. Differential Revision: https://reviews.llvm.org/D70382
*	[ARM] MVE interleaving load and stores.	David Green	2019-11-19	1	-54/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392
*	[ARM] Add and update a lot of VLDn tests. NFC	David Green	2019-11-19	1	-0/+695
\|
*	[ARM][MVE] tail-predication	Sjoerd Meijer	2019-11-15	1	-0/+26
\| \| \| \| \| \| \|	This is a follow up of d90804d, to also flag fmcp instructions as instructions that we do not support in tail-predicated vector loops. Differential Revision: https://reviews.llvm.org/D70295
*	[LV] PreferPredicateOverEpilog respecting predicate loop hint	Sjoerd Meijer	2019-11-14	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \|	The vectoriser queries TTI->preferPredicateOverEpilogue to determine if tail-folding is preferred for a loop, but it was not respecting loop hint 'predicate' that can disable this, which has now been added. This showed that we were incorrectly initialising loop hint 'vectorize.predicate.enable' with 0 (i.e. FK_Disabled) but this should have been FK_Undefined, which has been fixed. Differential Revision: https://reviews.llvm.org/D70125
*	[ARM][MVE] canTailPredicateLoop	Sjoerd Meijer	2019-11-13	1	-27/+592
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a first version and it operates on single block loops only. With this change, the vectoriser will now determine if tail-folding scalar remainder loops is possible/desired, which is the first step to generate MVE tail-predicated vector loops. This is disabled by default for now. I.e,, this is depends on option -disable-mve-tail-predication, which is off by default. I will follow up on this soon with a patch for the vectoriser to respect loop hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled, we don't want to tail-fold and we shouldn't query this TTI hook, which is done in D70125. Differential Revision: https://reviews.llvm.org/D69845
*	[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)	Gil Rapaport	2019-11-09	2	-0/+84
\| \| \| \| \| \| \| \| \|	This recommits 11ed1c0239fd51fd2f064311dc7725277ed0a994 (reverted in 9f08ce0d2197d4f163dfa4633eae2347ce8fc881 for failing an assert) with a fix: tryToWidenMemory() now first checks if the widening decision is to interleave, thus maintaining previous behavior where tryToInterleaveMemory() was called first, giving priority to interleave decisions over widening/scalarization. This commit adds the test case that exposed this bug as a LIT.
*	Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations ↵	Gil Rapaport	2019-11-08	1	-35/+0
\| \| \| \| \| \|	(NFCI)" This reverts commit 11ed1c0239fd51fd2f064311dc7725277ed0a994 - causes an assert failure.
*	[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)	Gil Rapaport	2019-11-08	1	-0/+35
\| \| \| \| \| \| \| \|	This recommits 100e797adb433724a17c9b42b6533cd634cb796b (reverted in 009e032634b3bd7fc32071ac2344b12142286477 for failing an assert). While the root cause was independently reverted in eaff3004019f97c64c88ab76da6b25106b659b30, this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not try to sink branch instructions.
*	Revert f0c2a5a "[LV] Generalize conditions for sinking instrs for first ↵	Hans Wennborg	2019-11-07	1	-245/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	order recurrences." It broke Chromium, causing "Instruction does not dominate all uses!" errors. See https://bugs.chromium.org/p/chromium/issues/detail?id=1022297#c1 for a reproducer. > If the recurrence PHI node has a single user, we can sink any > instruction without side effects, given that all users are dominated by > the instruction computing the incoming value of the next iteration > ('Previous'). We can sink instructions that may cause traps, because > that only causes the trap to occur later, but not on any new paths. > > With the relaxed check, we also have to make sure that we do not have a > direct cycle (meaning PHI user == 'Previous), which indicates a > reduction relation, which potentially gets missed by > ReductionDescriptor. > > As follow-ups, we can also sink stores, iff they do not alias with > other instructions we move them across and we could also support sinking > chains of instructions and multiple users of the PHI. > > Fixes PR43398. > > Reviewers: hsaito, dcaballe, Ayal, rengolin > > Reviewed By: Ayal > > Differential Revision: https://reviews.llvm.org/D69228
*	[TTI][LV] preferPredicateOverEpilogue	Sjoerd Meijer	2019-11-06	2	-2/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040
*	[LV] Generalize conditions for sinking instrs for first order recurrences.	Florian Hahn	2019-11-02	1	-0/+245
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the recurrence PHI node has a single user, we can sink any instruction without side effects, given that all users are dominated by the instruction computing the incoming value of the next iteration ('Previous'). We can sink instructions that may cause traps, because that only causes the trap to occur later, but not on any new paths. With the relaxed check, we also have to make sure that we do not have a direct cycle (meaning PHI user == 'Previous), which indicates a reduction relation, which potentially gets missed by ReductionDescriptor. As follow-ups, we can also sink stores, iff they do not alias with other instructions we move them across and we could also support sinking chains of instructions and multiple users of the PHI. Fixes PR43398. Reviewers: hsaito, dcaballe, Ayal, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D69228
*	[LV] Move interleave_short_tc.ll into the X86 directory to hopefully make ↵	Craig Topper	2019-11-01	1	-0/+0
\| \| \| \|	fix non-X86 bots.
*	[LV] Add test case that was supposed to go with D67948	Craig Topper	2019-10-31	1	-0/+59
\| \| \| \|	I forgot to git add it when I committed for Evgeniy.
*	[ConstantFold] Fold extractelement of getelementptr	Jay Foad	2019-10-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Getelementptr has vector type if any of its operands are vectors (the scalar operands being implicitly broadcast to all vector elements). Extractelement applied to a vector getelementptr can be folded by applying the extractelement in turn to all of the vector operands. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69379
*	[LV] Interleaving should not exceed estimated loop trip count.	Craig Topper	2019-10-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Currently we may do iterleaving by more than estimated trip count coming from the profile or computed maximum trip count. The solution is to use "best known" trip count instead of exact one in interleaving analysis. Patch by Evgeniy Brevnov. Differential Revision: https://reviews.llvm.org/D67948
*	[DAGCombine][ARM] Enable extending masked loads	Sam Parker	2019-10-17	1	-3/+139
\| \| \| \| \| \| \| \| \| \| \|	Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085
*	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure ↵	Zi Xuan Wu	2019-10-12	3	-10/+165
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634
*	[LV] Emitting SCEV checks with OptForSize	Sjoerd Meijer	2019-10-09	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When optimising for size and SCEV runtime checks need to be emitted to check overflow behaviour, the loop vectorizer can run in this assert: LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks( llvm::Loop , llvm::BasicBlock ): Assertion `!BB->getParent()->hasOptSize() && "Cannot SCEV check stride or overflow when opt We should not generate predicates while optimising for size because code will be generated for predicates such as these SCEV overflow runtime checks. This should fix PR43371. Differential Revision: https://reviews.llvm.org/D68082 llvm-svn: 374166
*	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure ↵	Jinsong Ji	2019-10-08	3	-215/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a. This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091
*	[NFC] Add REQUIRES for r374017 in testcase	Zi Xuan Wu	2019-10-08	1	-0/+1
\| \| \| \|	llvm-svn: 374027
*	[LoopVectorize][PowerPC] Estimate int and float register pressure separately ↵	Zi Xuan Wu	2019-10-08	3	-10/+214
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374017
*	[LoopVectorize] add test that asserted after cost model change (PR43582); NFC	Sanjay Patel	2019-10-07	1	-0/+127
\| \| \| \|	llvm-svn: 373913
*	[LV] Forced vectorization with runtime checks and OptForSize	Sjoerd Meijer	2019-09-24	1	-1/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When vectorisation is forced with a pragma, we optimise for min size, and we need to emit runtime memory checks, then allow this code growth and don't run in an assert like we currently do. This is the result of D65197 and D66803, and was a use-case not really considered before. If this now happens, we emit an optimisation remark warning about the code-size expansion, which can be avoided by not forcing vectorisation or possibly source-code modifications. Differential Revision: https://reviews.llvm.org/D67764 llvm-svn: 372694
*	[LV] Add ARM MVE tail-folding tests	Sjoerd Meijer	2019-09-16	1	-0/+89
\| \| \| \| \| \| \| \|	Now that the vectorizer can do tail-folding (rL367592), and the ARM backend understands MVE masked loads/stores (rL371932), it's time to add the MVE tail-folding equivalent of the X86 tests that I added. llvm-svn: 371996
*	[ARM] Masked loads and stores	David Green	2019-09-15	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932