<feed xmlns='http://www.w3.org/2005/Atom'>
<title>bcm5719-llvm/llvm/test/Transforms/LoopVectorize/ARM, branch meklort-10.0.1</title>
<subtitle>Project Ortega BCM5719 LLVM</subtitle>
<id>https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1</id>
<link rel='self' href='https://git.raptorcs.com/git/bcm5719-llvm/atom?h=meklort-10.0.1'/>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/'/>
<updated>2020-01-09T14:03:25+00:00</updated>
<entry>
<title>[ARM][MVE] MVE-I should not be disabled by -mfpu=none</title>
<updated>2020-01-09T14:03:25+00:00</updated>
<author>
<name>Momchil Velikov</name>
<email>momchil.velikov@arm.com</email>
</author>
<published>2020-01-09T13:47:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=173b711e83d7b61a46f55eb44f03ea98f69a1dd6'/>
<id>urn:sha1:173b711e83d7b61a46f55eb44f03ea98f69a1dd6</id>
<content type='text'>
Architecturally, it's allowed to have MVE-I without an FPU, thus
-mfpu=none should not disable MVE-I, or moves to/from FP-registers.

This patch removes `+/-fpregs` from features unconditionally added to
target feature list, depending on FPU and moves the logic to Clang
driver, where the negative form (`-fpregs`) is conditionally added to
the target features list for the cases of `-mfloat-abi=soft`, or
`-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative
form is added by the driver, the positive one is derived from other
features in the backend.

Differential Revision: https://reviews.llvm.org/D71843
</content>
</entry>
<entry>
<title>[LV] Still vectorise when tail-folding can't find a primary inducation variable</title>
<updated>2020-01-09T09:14:00+00:00</updated>
<author>
<name>Sjoerd Meijer</name>
<email>sjoerd.meijer@arm.com</email>
</author>
<published>2020-01-09T09:14:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=8f1887456ab4ba24a62ccb19d0d04b08972a0289'/>
<id>urn:sha1:8f1887456ab4ba24a62ccb19d0d04b08972a0289</id>
<content type='text'>
This addresses a vectorisation regression for tail-folded loops that are
counting down, e.g. loops as simple as this:

  void foo(char *A, char *B, char *C, uint32_t N) {
    while (N &gt; 0) {
      *C++ = *A++ + *B++;
       N--;
    }
  }

These are loops that can be vectorised, but when tail-folding is requested, it
can't find a primary induction variable which we do need for predicating the
loop. As a result, the loop isn't vectorised at all, which it is able to do
when tail-folding is not attempted. So, this adds a check for the primary
induction variable where we decide how to lower the scalar epilogue. I.e., when
there isn't a primary induction variable, a scalar epilogue loop is allowed
(i.e. don't request tail-folding) so that vectorisation could still be
triggered.

Having this check for the primary induction variable make sense anyway, and in
addition, in a follow-up of this I will look into discovering earlier the
primary induction variable for counting down loops, so that this can also be
tail-folded.

Differential revision: https://reviews.llvm.org/D72324
</content>
</entry>
<entry>
<title>Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351</title>
<updated>2019-12-24T23:57:33+00:00</updated>
<author>
<name>Fangrui Song</name>
<email>maskray@google.com</email>
</author>
<published>2019-12-24T23:52:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=502a77f125f43ffde57af34d3fd1b900248a91cd'/>
<id>urn:sha1:502a77f125f43ffde57af34d3fd1b900248a91cd</id>
<content type='text'>
</content>
</entry>
<entry>
<title>[ARM] Add missing REQUIRES: asserts to test. NFC</title>
<updated>2019-12-09T11:43:43+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2019-12-09T11:43:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=d6642ed1c867f97fdf951aac751c7854fbc7c51f'/>
<id>urn:sha1:d6642ed1c867f97fdf951aac751c7854fbc7c51f</id>
<content type='text'>
</content>
</entry>
<entry>
<title>[ARM] Enable MVE masked loads and stores</title>
<updated>2019-12-09T11:37:34+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2019-12-08T16:10:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=b1aba0378e52be51cfb7fb6f03417ebf408d66cc'/>
<id>urn:sha1:b1aba0378e52be51cfb7fb6f03417ebf408d66cc</id>
<content type='text'>
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://reviews.llvm.org/D70968
</content>
</entry>
<entry>
<title>[ARM] Teach the Arm cost model that a Shift can be folded into other instructions</title>
<updated>2019-12-09T10:24:33+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2019-12-08T15:33:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=be7a1070700e591732b254e29f2dd703325fb52a'/>
<id>urn:sha1:be7a1070700e591732b254e29f2dd703325fb52a</id>
<content type='text'>
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
</content>
</entry>
<entry>
<title>[ARM] Additional tests and minor formatting. NFC</title>
<updated>2019-12-09T10:24:33+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2019-12-08T15:26:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=f008b5b8ce724d60f0f0eeafceee0119c42022d4'/>
<id>urn:sha1:f008b5b8ce724d60f0f0eeafceee0119c42022d4</id>
<content type='text'>
This adds some extra cost model tests for shifts, and does some minor
adjustments to some Neon code to make it clear as to what it applies to.
Both NFC.
</content>
</entry>
<entry>
<title>[ARM] Disable VLD4 under MVE</title>
<updated>2019-12-08T10:37:29+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2019-12-08T09:58:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=3a6eb5f16054e8c0f41a37542a5fc806016502a0'/>
<id>urn:sha1:3a6eb5f16054e8c0f41a37542a5fc806016502a0</id>
<content type='text'>
Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.

Differential Revision: https://reviews.llvm.org/D71109
</content>
</entry>
<entry>
<title>[LV] PreferPredicateOverEpilog respecting option</title>
<updated>2019-11-21T14:06:10+00:00</updated>
<author>
<name>Sjoerd Meijer</name>
<email>sjoerd.meijer@arm.com</email>
</author>
<published>2019-11-21T14:03:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=901cd3b3f62d0c700e5d2c3f97eff97d634bec5e'/>
<id>urn:sha1:901cd3b3f62d0c700e5d2c3f97eff97d634bec5e</id>
<content type='text'>
Follow-up of cb47b8783: don't query TTI-&gt;preferPredicateOverEpilogue when
option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not
to predicate the loop.

Differential Revision: https://reviews.llvm.org/D70382
</content>
</entry>
<entry>
<title>[ARM] MVE interleaving load and stores.</title>
<updated>2019-11-19T18:37:30+00:00</updated>
<author>
<name>David Green</name>
<email>david.green@arm.com</email>
</author>
<published>2019-11-19T18:37:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.raptorcs.com/git/bcm5719-llvm/commit/?id=882f23caeae5ad3ec1806eb6ec387e3611649d54'/>
<id>urn:sha1:882f23caeae5ad3ec1806eb6ec387e3611649d54</id>
<content type='text'>
Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering
for MVE. This works the same way as Neon, recognising the load/shuffles
combination and converting them into intrinsics in a pre-isel pass,
which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad
and lowerInterleavedStore.

The main difference to Neon is that we do not have a VLD3 instruction.
Otherwise most of the code works very similarly, with just some minor
differences in the form of the intrinsics to work around. VLD3 is
disabled by making isLegalInterleavedAccessType return false for those
cases.

We may need some other future adjustments, such as VLD4 take up half the
available registers so should maybe cost more. This patch should get the
basics in though.

Differential Revision: https://reviews.llvm.org/D69392
</content>
</entry>
</feed>
