bcm5719-llvm/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp, branch meklort-10.0.1

bcm5719-llvm/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp, branch meklort-10.0.1 Project Ortega BCM5719 LLVM https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1 2020-01-09T11:57:34+00:00 [ARM][MVE] Don't unroll intrinsic loops. 2020-01-09T11:57:34+00:00 Sam Parker sam.parker@arm.com 2020-01-09T11:57:04+00:00 urn:sha1:15c7fa4d11eeb50095ae571c645427b9a267bdee We don't unroll vector loops for MVE targets, but we miss the case when loops only contain intrinsic calls. So just move the logic a bit to catch this case. Differential Revision: https://reviews.llvm.org/D72440 [ARM][MVE] Enable masked gathers from vector of pointers 2020-01-08T13:43:12+00:00 Anna Welker anna.welker@arm.com 2020-01-08T13:08:27+00:00 urn:sha1:346f6b54bd1237a9a5a2d9bb1e424b57dc178998 Adds a pass to the ARM backend that takes a v4i32 gather and transforms it into a call to MVE's masked gather intrinsics. Differential Revision: https://reviews.llvm.org/D71743 Rename TTI::getIntImmCost for instructions and intrinsics 2019-12-12T02:00:20+00:00 Reid Kleckner rnk@google.com 2019-12-11T19:54:58+00:00 urn:sha1:85ba5f637af83336151d31f83708128372a232c9 Soon Intrinsic::ID will be a plain integer, so this overload will not be possible. Rename both overloads to ensure that downstream targets observe this as a build failure instead of a runtime failure. Split off from D71320 Reviewers: efriedma Differential Revision: https://reviews.llvm.org/D71381 [ARM] Enable MVE masked loads and stores 2019-12-09T11:37:34+00:00 David Green david.green@arm.com 2019-12-08T16:10:01+00:00 urn:sha1:b1aba0378e52be51cfb7fb6f03417ebf408d66cc With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does. Differential Revision: https://reviews.llvm.org/D70968 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions 2019-12-09T10:24:33+00:00 David Green david.green@arm.com 2019-12-08T15:33:24+00:00 urn:sha1:be7a1070700e591732b254e29f2dd703325fb52a This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966 [ARM] Additional tests and minor formatting. NFC 2019-12-09T10:24:33+00:00 David Green david.green@arm.com 2019-12-08T15:26:32+00:00 urn:sha1:f008b5b8ce724d60f0f0eeafceee0119c42022d4 This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC. [ARM] MVE interleaving load and stores. 2019-11-19T18:37:30+00:00 David Green david.green@arm.com 2019-11-19T18:37:21+00:00 urn:sha1:882f23caeae5ad3ec1806eb6ec387e3611649d54 Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392 [ARM][MVE] tail-predication 2019-11-15T11:01:13+00:00 Sjoerd Meijer sjoerd.meijer@arm.com 2019-11-15T11:01:13+00:00 urn:sha1:71327707b056c1de28fb0b2c2046740ce1e5cb0d This is a follow up of d90804d, to also flag fmcp instructions as instructions that we do not support in tail-predicated vector loops. Differential Revision: https://reviews.llvm.org/D70295 [ARM][MVE] canTailPredicateLoop 2019-11-13T13:24:33+00:00 Sjoerd Meijer sjoerd.meijer@arm.com 2019-11-13T13:02:16+00:00 urn:sha1:d90804d26befeda36641fade3edba107682cc5cf This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a first version and it operates on single block loops only. With this change, the vectoriser will now determine if tail-folding scalar remainder loops is possible/desired, which is the first step to generate MVE tail-predicated vector loops. This is disabled by default for now. I.e,, this is depends on option -disable-mve-tail-predication, which is off by default. I will follow up on this soon with a patch for the vectoriser to respect loop hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled, we don't want to tail-fold and we shouldn't query this TTI hook, which is done in D70125. Differential Revision: https://reviews.llvm.org/D69845 [TTI][LV] preferPredicateOverEpilogue 2019-11-06T10:14:20+00:00 Sjoerd Meijer sjoerd.meijer@arm.com 2019-11-06T09:58:36+00:00 urn:sha1:6c2a4f5ff93e16c3b86c18543e02a193ced2d956 We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040