bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[ARM][MVE] MVE-I should not be disabled by -mfpu=none	Momchil Velikov	2020-01-09	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Architecturally, it's allowed to have MVE-I without an FPU, thus -mfpu=none should not disable MVE-I, or moves to/from FP-registers. This patch removes `+/-fpregs` from features unconditionally added to target feature list, depending on FPU and moves the logic to Clang driver, where the negative form (`-fpregs`) is conditionally added to the target features list for the cases of `-mfloat-abi=soft`, or `-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative form is added by the driver, the positive one is derived from other features in the backend. Differential Revision: https://reviews.llvm.org/D71843
*	[LV] Still vectorise when tail-folding can't find a primary inducation variable	Sjoerd Meijer	2020-01-09	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This addresses a vectorisation regression for tail-folded loops that are counting down, e.g. loops as simple as this: void foo(char A, char B, char C, uint32_t N) { while (N > 0) { C++ = A++ + B++; N--; } } These are loops that can be vectorised, but when tail-folding is requested, it can't find a primary induction variable which we do need for predicating the loop. As a result, the loop isn't vectorised at all, which it is able to do when tail-folding is not attempted. So, this adds a check for the primary induction variable where we decide how to lower the scalar epilogue. I.e., when there isn't a primary induction variable, a scalar epilogue loop is allowed (i.e. don't request tail-folding) so that vectorisation could still be triggered. Having this check for the primary induction variable make sense anyway, and in addition, in a follow-up of this I will look into discovering earlier the primary induction variable for counting down loops, so that this can also be tail-folded. Differential revision: https://reviews.llvm.org/D72324
*	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" ↵	Fangrui Song	2019-12-24	1	-2/+2
\| \| \| \|	as cleanups after D56351
*	[ARM] Add missing REQUIRES: asserts to test. NFC	David Green	2019-12-09	1	-0/+1
\|
*	[ARM] Enable MVE masked loads and stores	David Green	2019-12-09	4	-6/+6
\| \| \| \| \| \| \|	With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does. Differential Revision: https://reviews.llvm.org/D70968
*	[ARM] Teach the Arm cost model that a Shift can be folded into other ↵	David Green	2019-12-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966
*	[ARM] Additional tests and minor formatting. NFC	David Green	2019-12-09	1	-0/+86
\| \| \| \| \| \|	This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.
*	[ARM] Disable VLD4 under MVE	David Green	2019-12-08	2	-22/+109
\| \| \| \| \| \| \| \| \| \|	Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109
*	[LV] PreferPredicateOverEpilog respecting option	Sjoerd Meijer	2019-11-21	1	-0/+18
\| \| \| \| \| \| \| \|	Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not to predicate the loop. Differential Revision: https://reviews.llvm.org/D70382
*	[ARM] MVE interleaving load and stores.	David Green	2019-11-19	1	-54/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392
*	[ARM] Add and update a lot of VLDn tests. NFC	David Green	2019-11-19	1	-0/+695
\|
*	[ARM][MVE] tail-predication	Sjoerd Meijer	2019-11-15	1	-0/+26
\| \| \| \| \| \| \|	This is a follow up of d90804d, to also flag fmcp instructions as instructions that we do not support in tail-predicated vector loops. Differential Revision: https://reviews.llvm.org/D70295
*	[LV] PreferPredicateOverEpilog respecting predicate loop hint	Sjoerd Meijer	2019-11-14	1	-9/+6
\| \| \| \| \| \| \| \| \| \| \|	The vectoriser queries TTI->preferPredicateOverEpilogue to determine if tail-folding is preferred for a loop, but it was not respecting loop hint 'predicate' that can disable this, which has now been added. This showed that we were incorrectly initialising loop hint 'vectorize.predicate.enable' with 0 (i.e. FK_Disabled) but this should have been FK_Undefined, which has been fixed. Differential Revision: https://reviews.llvm.org/D70125
*	[ARM][MVE] canTailPredicateLoop	Sjoerd Meijer	2019-11-13	1	-27/+592
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a first version and it operates on single block loops only. With this change, the vectoriser will now determine if tail-folding scalar remainder loops is possible/desired, which is the first step to generate MVE tail-predicated vector loops. This is disabled by default for now. I.e,, this is depends on option -disable-mve-tail-predication, which is off by default. I will follow up on this soon with a patch for the vectoriser to respect loop hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled, we don't want to tail-fold and we shouldn't query this TTI hook, which is done in D70125. Differential Revision: https://reviews.llvm.org/D69845
*	[TTI][LV] preferPredicateOverEpilogue	Sjoerd Meijer	2019-11-06	2	-2/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040
*	[DAGCombine][ARM] Enable extending masked loads	Sam Parker	2019-10-17	1	-3/+139
\| \| \| \| \| \| \| \| \| \| \|	Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085
*	[LV] Add ARM MVE tail-folding tests	Sjoerd Meijer	2019-09-16	1	-0/+89
\| \| \| \| \| \| \| \|	Now that the vectorizer can do tail-folding (rL367592), and the ARM backend understands MVE masked loads/stores (rL371932), it's time to add the MVE tail-folding equivalent of the X86 tests that I added. llvm-svn: 371996
*	[ARM] Masked loads and stores	David Green	2019-09-15	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932
*	[ARM] Don't pretend we know how to generate MVE VLDn	David Green	2019-08-16	1	-0/+416
\| \| \| \| \| \| \| \| \| \|	We don't yet know how to generate these instructions for MVE. And in the case of VLD3, we don't even have the instruction. For the moment don't tell the vectoriser that we have VLD4, just to end up serialising the results. Differential Revision: https://reviews.llvm.org/D66009 llvm-svn: 369101
*	[ARM] Permit auto-vectorization using MVE	David Green	2019-08-11	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	With enough codegen complete, we can now correctly report the number and size of vector registers for MVE, allowing auto vectorisation. This also allows FP auto-vectorization for MVE without -Ofast/-ffast-math, due to support for IEEE FP arithmetic and parity between scalar and vector FP behaviour. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63728 llvm-svn: 368529
*	[lit] Delete empty lines at the end of lit.local.cfg NFC	Fangrui Song	2019-06-17	1	-1/+0
\| \| \| \|	llvm-svn: 363538
*	Revert "Temporarily Revert "Add basic loop fusion pass.""	Eric Christopher	2019-04-17	10	-0/+1067
\| \| \| \| \| \| \| \|	The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
*	Temporarily Revert "Add basic loop fusion pass."	Eric Christopher	2019-04-17	10	-1067/+0
\| \| \| \| \| \| \| \|	As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
*	[LV] Preserve inbounds on created GEPs	Daniel Neilson	2018-05-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a fix for PR23997. The loop vectorizer is not preserving the inbounds property of GEPs that it creates. This is inhibiting some optimizations. This patch preserves the inbounds property in the case where a load/store is being fed by an inbounds GEP. Reviewers: mkuper, javed.absar, hsaito Reviewed By: hsaito Subscribers: dcaballe, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D46191 llvm-svn: 331269
*	[ARM] add loop vectorizer test based on 482.sphinx3 from SPEC2006; NFC	Sanjay Patel	2018-02-27	1	-0/+165
\| \| \| \| \| \| \| \|	This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326221
*	Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by ↵	Teresa Johnson	2017-07-01	1	-3/+3
\| \| \| \| \| \| \| \| \|	default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306936
*	re-commit r306336: Enable vectorizer-maximize-bandwidth by default.	Teresa Johnson	2017-07-01	1	-3/+3
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306935
*	revert r306336 for breaking ppc test.	Teresa Johnson	2017-07-01	1	-3/+3
\| \| \| \|	llvm-svn: 306934
*	Enable vectorizer-maximize-bandwidth by default.	Teresa Johnson	2017-07-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306933
*	Revert "r306473 - re-commit r306336: Enable vectorizer-maximize-bandwidth by ↵	Daniel Jasper	2017-06-30	1	-3/+3
\| \| \| \| \| \| \| \| \|	default." This still breaks PPC tests we have. I'll forward reproduction instructions to dehao. llvm-svn: 306792
*	re-commit r306336: Enable vectorizer-maximize-bandwidth by default.	Dehao Chen	2017-06-27	1	-3/+3
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306473
*	revert r306336 for breaking ppc test.	Dehao Chen	2017-06-26	1	-3/+3
\| \| \| \|	llvm-svn: 306344
*	Enable vectorizer-maximize-bandwidth by default.	Dehao Chen	2017-06-26	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 306336
*	Revert "Enable vectorizer-maximize-bandwidth by default."	Diana Picus	2017-06-22	1	-3/+3
\| \| \| \| \| \|	This reverts commit r305960 because it broke self-hosting on AArch64. llvm-svn: 305990
*	Enable vectorizer-maximize-bandwidth by default.	Dehao Chen	2017-06-21	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact: spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58 -0.99% spec/2006/int/C++/471.omnetpp 22.06 +1.87% spec/2006/int/C++/473.astar 22.65 -0.12% spec/2006/int/C++/483.xalancbmk 33.69 +4.97% spec/2006/int/C/400.perlbench 33.43 +1.70% spec/2006/int/C/401.bzip2 23.02 -0.19% spec/2006/int/C/403.gcc 32.57 -0.43% spec/2006/int/C/429.mcf 40.35 +0.27% spec/2006/int/C/445.gobmk 26.96 +0.06% spec/2006/int/C/456.hmmer 24.4 +0.19% spec/2006/int/C/458.sjeng 27.91 -0.08% spec/2006/int/C/462.libquantum 57.47 -0.20% spec/2006/int/C/464.h264ref 46.52 +1.35% geometric mean +0.29% The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag. I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent. Reviewers: hfinkel, mkuper, davidxl, chandlerc Reviewed By: chandlerc Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33341 llvm-svn: 305960
*	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved	Jonas Paulsson	2017-03-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705
*	[ARM/AArch64] Update costs for interleaved accesses with wide types	Matthew Simpson	2017-03-02	1	-20/+35
\| \| \| \| \| \| \| \| \|	After r296750, we're able to match interleaved accesses having types wider than 128 bits. This patch updates the associated TTI costs. Differential Revision: https://reviews.llvm.org/D29675 llvm-svn: 296751
*	[ARM] Make f16 interleaved accesses expensive.	Ahmed Bougacha	2017-02-11	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|	There are no vldN/vstN f16 variants, even with +fullfp16. We could use the i16 variants, but, in practice, even with +fullfp16, the f16 sequence leading to the i16 shuffle usually gets scalarized. We'd need to improve our support for f16 codegen before getting there. Teach the cost model to consider f16 interleaved operations as expensive. Otherwise, we are all but guaranteed to end up with a large block of scalarized vector code. llvm-svn: 294819
*	[LV] Add new ARM/AArch64 interleaved access cost model tests (NFC)	Matthew Simpson	2017-02-07	1	-0/+64
\| \| \| \|	llvm-svn: 294342
*	[LV] Simplify ARM/AArch64 interleaved access cost model tests (NFC)	Matthew Simpson	2017-02-07	1	-28/+26
\| \| \| \| \| \| \| \|	This patch removes unneeded instructions from the existing ARM/AArch64 interleaved access cost model tests. I'll be adding a similar set of tests in a follow-on patch to increase coverage. llvm-svn: 294336
*	[Verifier] Add verification for TBAA metadata	Sanjoy Das	2016-12-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change adds some verification in the IR verifier around struct path TBAA metadata. Other than some basic sanity checks (e.g. we get constant integers where we expect constant integers), this checks: - That by the time an struct access tuple `(base-type, offset)` is "reduced" to a scalar base type, the offset is `0`. For instance, in C++ you can't start from, say `("struct-a", 16)`, and end up with `("int", 4)` -- by the time the base type is `"int"`, the offset better be zero. In particular, a variant of this invariant is needed for `llvm::getMostGenericTBAA` to be correct. - That there are no cycles in a struct path. - That struct type nodes have their offsets listed in an ascending order. - That when generating the struct access path, you eventually reach the access type listed in the tbaa tag node. Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26438 llvm-svn: 289402
*	Second attempt at r285517.	Dorit Nuzman	2016-10-31	1	-1/+1
\| \| \| \|	llvm-svn: 285568
*	Revert r285517 due to build failures.	Dorit Nuzman	2016-10-30	1	-1/+1
\| \| \| \|	llvm-svn: 285518
*	[LoopVectorize] Make interleaved-accesses analysis less conservative about	Dorit Nuzman	2016-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	possible pointer-wrap-around concerns, in some cases. Before this patch, collectConstStridedAccesses (part of interleaved-accesses analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when examining all candidate pointers. This is too conservative. Instead, this patch makes collectConstStridedAccesses use an optimistic approach, calling getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the candidate interleave groups have been formed, revisits the pointer-wrapping analysis but only where it matters: namely, in groups that have gaps, and where the gaps are not at the very end of the group (in which case the loop is peeled). This second time getPtrStride is called with [Assume=false, ShouldCheckWrap=true], but this could further be improved to using Assume=true, once we also add the logic to track that we are not going to meet the scev runtime checks threshold. Differential Revision: https://reviews.llvm.org/D25276 llvm-svn: 285517
*	[ARM] AArch32 v8 NEON is still not IEEE-754 compliant	Renato Golin	2016-04-18	1	-14/+8
\| \| \| \|	llvm-svn: 266603
*	[test] Require 'asserts' for a test which uses -debug-only	Vedant Kumar	2016-04-14	1	-0/+1
\| \| \| \| \| \| \|	Without this line, bots which run check-all on Release compilers will break. llvm-svn: 266386
*	[ARM] Adding IEEE-754 SIMD detection to loop vectorizer	Renato Golin	2016-04-14	1	-0/+335
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON. This patch teaches the loop vectorizer to only allow transformations of loops that either contain no floating-point operations or have enough allowance flags supporting lack of precision (ex. -ffast-math, Darwin). For that, the target description now has a method which tells us if the vectorizer is allowed to handle FP math without falling into unsafe representations, plus a check on every FP instruction in the candidate loop to check for the safety flags. This commit makes LLVM behave like GCC with respect to ARM NEON support, but it stops short of fixing the underlying problem: sub-normals. Neither GCC nor LLVM have a flag for allowing sub-normal operations. Before this patch, GCC only allows it using unsafe-math flags and LLVM allows it by default with no way to turn it off (short of not using NEON at all). As a first step, we push this change to make it safe and in sync with GCC. The second step is to discuss a new sub-normal's flag on both communitues and come up with a common solution. The third step is to improve the FastMath flags in LLVM to encode sub-normals and use those flags to restrict NEON FP. Fixes PR16275. llvm-svn: 266363
*	Simplify testcase added in r246759. NFC	Silviu Baranga	2015-09-04	1	-43/+11
\| \| \| \|	llvm-svn: 246848
*	Fix IRBuilder CreateBitOrPointerCast for vector types	Silviu Baranga	2015-09-03	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This function was not taking into account that the input type could be a vector, and wasn't properly working for vector types. This caused an assert when building spec2k6 perlbmk for armv8. Reviewers: rengolin, mzolotukhin Subscribers: silviu.baranga, mzolotukhin, rengolin, eugenis, jmolloy, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D12559 llvm-svn: 246759
*	[ARM] Turn on by default interleaved access vectorization	Silviu Baranga	2015-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change turns on by default interleaved access vectorization on ARM, as it has shown to be beneficial on ARM. Reviewers: rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12146 llvm-svn: 246541