<feed xmlns='http://www.w3.org/2005/Atom'>
<title>bcm5719-llvm/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp, branch meklort-10.0.1</title>
<subtitle>Project Ortega BCM5719 LLVM</subtitle>
<id>https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1</id>
<link rel='self' href='https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/'/>
<updated>2020-01-09T11:57:34+00:00</updated>
<entry>
<title>[ARM][MVE] Don't unroll intrinsic loops.</title>
<updated>2020-01-09T11:57:34+00:00</updated>
<author>
<name>Sam Parker</name>
<email>sam.parker@arm.com</email>
</author>
<published>2020-01-09T11:57:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=15c7fa4d11eeb50095ae571c645427b9a267bdee'/>
<id>urn:sha1:15c7fa4d11eeb50095ae571c645427b9a267bdee</id>
<content type='text'>
We don't unroll vector loops for MVE targets, but we miss the case
when loops only contain intrinsic calls. So just move the logic a
bit to catch this case.

Differential Revision: https://reviews.llvm.org/D72440
</content>
</entry>
<entry>
<title>[ARM][MVE] Enable masked gathers from vector of pointers</title>
<updated>2020-01-08T13:43:12+00:00</updated>
<author>
<name>Anna Welker</name>
<email>anna.welker@arm.com</email>
</author>
<published>2020-01-08T13:08:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=346f6b54bd1237a9a5a2d9bb1e424b57dc178998'/>
<id>urn:sha1:346f6b54bd1237a9a5a2d9bb1e424b57dc178998</id>
<content type='text'>
Adds a pass to the ARM backend that takes a v4i32
gather and transforms it into a call to MVE's
masked gather intrinsics.

Differential Revision: https://reviews.llvm.org/D71743
</content>
</entry>
<entry>
<title>Rename TTI::getIntImmCost for instructions and intrinsics</title>
<updated>2019-12-12T02:00:20+00:00</updated>
<author>
<name>Reid Kleckner</name>
<email>rnk@google.com</email>
</author>
<published>2019-12-11T19:54:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=85ba5f637af83336151d31f83708128372a232c9'/>
<id>urn:sha1:85ba5f637af83336151d31f83708128372a232c9</id>
<content type='text'>
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.

Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.

Split off from D71320

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D71381
</content>
</entry>
<entry>
<title>[ARM] Enable MVE masked loads and stores</title>
<updated>2019-12-09T11:37:34+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2019-12-08T16:10:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=b1aba0378e52be51cfb7fb6f03417ebf408d66cc'/>
<id>urn:sha1:b1aba0378e52be51cfb7fb6f03417ebf408d66cc</id>
<content type='text'>
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://reviews.llvm.org/D70968
</content>
</entry>
<entry>
<title>[ARM] Teach the Arm cost model that a Shift can be folded into other instructions</title>
<updated>2019-12-09T10:24:33+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2019-12-08T15:33:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=be7a1070700e591732b254e29f2dd703325fb52a'/>
<id>urn:sha1:be7a1070700e591732b254e29f2dd703325fb52a</id>
<content type='text'>
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
</content>
</entry>
<entry>
<title>[ARM] Additional tests and minor formatting. NFC</title>
<updated>2019-12-09T10:24:33+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2019-12-08T15:26:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=f008b5b8ce724d60f0f0eeafceee0119c42022d4'/>
<id>urn:sha1:f008b5b8ce724d60f0f0eeafceee0119c42022d4</id>
<content type='text'>
This adds some extra cost model tests for shifts, and does some minor
adjustments to some Neon code to make it clear as to what it applies to.
Both NFC.
</content>
</entry>
<entry>
<title>[ARM] MVE interleaving load and stores.</title>
<updated>2019-11-19T18:37:30+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2019-11-19T18:37:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=882f23caeae5ad3ec1806eb6ec387e3611649d54'/>
<id>urn:sha1:882f23caeae5ad3ec1806eb6ec387e3611649d54</id>
<content type='text'>
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.

We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.

Differential Revision: https://reviews.llvm.org/D69392
</content>
</entry>
<entry>
<title>[ARM][MVE] tail-predication</title>
<updated>2019-11-15T11:01:13+00:00</updated>
<author>
<name>Sjoerd Meijer</name>
<email>sjoerd.meijer@arm.com</email>
</author>
<published>2019-11-15T11:01:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=71327707b056c1de28fb0b2c2046740ce1e5cb0d'/>
<id>urn:sha1:71327707b056c1de28fb0b2c2046740ce1e5cb0d</id>
<content type='text'>
This is a follow up of d90804d, to also flag fmcp instructions as instructions
that we do not support in tail-predicated vector loops.

Differential Revision: https://reviews.llvm.org/D70295
</content>
</entry>
<entry>
<title>[ARM][MVE] canTailPredicateLoop</title>
<updated>2019-11-13T13:24:33+00:00</updated>
<author>
<name>Sjoerd Meijer</name>
<email>sjoerd.meijer@arm.com</email>
</author>
<published>2019-11-13T13:02:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=d90804d26befeda36641fade3edba107682cc5cf'/>
<id>urn:sha1:d90804d26befeda36641fade3edba107682cc5cf</id>
<content type='text'>
This implements TTI hook 'preferPredicateOverEpilogue' for MVE.  This is a
first version and it operates on single block loops only. With this change, the
vectoriser will now determine if tail-folding scalar remainder loops is
possible/desired, which is the first step to generate MVE tail-predicated
vector loops.

This is disabled by default for now. I.e,, this is depends on option
-disable-mve-tail-predication, which is off by default.

I will follow up on this soon with a patch for the vectoriser to respect loop
hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled,
we don't want to tail-fold and we shouldn't query this TTI hook, which is
done in D70125.

Differential Revision: https://reviews.llvm.org/D69845
</content>
</entry>
<entry>
<title>[TTI][LV] preferPredicateOverEpilogue</title>
<updated>2019-11-06T10:14:20+00:00</updated>
<author>
<name>Sjoerd Meijer</name>
<email>sjoerd.meijer@arm.com</email>
</author>
<published>2019-11-06T09:58:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=6c2a4f5ff93e16c3b86c18543e02a193ced2d956'/>
<id>urn:sha1:6c2a4f5ff93e16c3b86c18543e02a193ced2d956</id>
<content type='text'>
We have two ways to steer creating a predicated vector body over creating a
scalar epilogue. To force this, we have 1) a command line option and 2) a
pragma available. This adds a third: a target hook to TargetTransformInfo that
can be queried whether predication is preferred or not, which allows the
vectoriser to make the decision without forcing it.

While this change behaves as a non-functional change for now, it shows the
required TTI plumbing, usage of this new hook in the vectoriser, and the
beginning of an ARM MVE implementation. I will follow up on this with:
- a complete MVE implementation, see D69845.
- a patch to disable this, i.e. we should respect "vector_predicate(disable)"
  and its corresponding loophint.

Differential Revision: https://reviews.llvm.org/D69040
</content>
</entry>
</feed>
