| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.
Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.
Split off from D71320
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D71381
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, getIntImmCost returns TCC_Free for almost all intrinsics.
For most AArch64 specific intrinsics however, it looks like integer
constants cannot be folded into most of them (at least the ones I checked).
Unless we know that we can fold integer operands with the intrinsic, we
handle more cases correctly by returning the cost to materialize the
immediate than return TCC_Free.
Reviewers: SjoerdMeijer, dmgreen, t.p.northover, ributzka
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D70669
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69307
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved
moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo.
Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Moving the default TargetTransformInfoImplBase implementation to a default
MCSubtarget implementation
Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC
and SystemZ. AArch64 already has a custom subtarget implementation, so its
custom TTI implementation is migrated to use the new facilities in BasicTTIImpl
to invoke its custom subtarget implementation. The custom TTI implementations
continue to exist for the other targets with this change. They are not moved
over to subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to the system
model defined by the target. With this change, the default MCSubtargetInfo
implementation essentially returns the defaults TargetTransformInfoImplBase used
to return. Existing users of TTI defaults will hit the defaults now in
MCSubtargetInfo. Targets that define their own custom TTI implementations won't
use the BasicTTIImpl implementations that route to the subtarget.
Once system models are in place for the targets that use these interfaces, their
custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
llvm-svn: 374205
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Patch D56593 by @courbet results in calls to `bcmp()` in some cases, should
the target support the it. Unless `TTI::MemCmpExpansionOptions()`
is overridden by the target.
In a proprietary benchmark we see a performance drop of about 12% on PNG
compression before this patch, though it passes all tests.
This patch mirrors X86 for AArch64 and initializes
`TTI::MemCmpExpansionOptions()` to then expand calls to `bcmp()` when
appropriate. No tuning of the parameters was performed, but, at this point,
it's enough to recover the performance drop above.
This problem also exists on ARM. Once a consensus is reached for AArch64, we
can work to fix ARM as well.
Authors:
- Evandro Menezes (@evandro) <e.menezes@samsung.com>
- Brian Rzycki (@brzycki) <b.rzycki@samsung.com>
Differential revision: https://reviews.llvm.org/D64805
llvm-svn: 367898
|
|
|
|
|
|
|
|
| |
This broke some PPC prefetching tests.
This reverts commit 9fdfb045ae8bb643ab0d0455dcf9ecaea3b1eb3c.
llvm-svn: 365680
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rework the TTI cache and software prefetching APIs to prepare for the
introduction of a general system model. Changes include:
- Marking existing interfaces const and/or override as appropriate
- Adding comments
- Adding BasicTTIImpl interfaces that delegate to a subtarget
implementation
- Adding a default "no information" subtarget implementation
Only a handful of targets use these interfaces currently: AArch64,
Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget
implementation, so its custom TTI implementation is migrated to use
the new facilities in BasicTTIImpl to invoke its custom subtarget
implementation. The custom TTI implementations continue to exist for
the other targets with this change. They are not moved over to
subtarget-based implementations.
The end goal is to have the default subtarget implementation defer to
the system model defined by the target. With this change, the default
subtarget implementation essentially returns "no information" for
these interfaces. None of the existing users of TTI will hit that
implementation because they define their own custom TTI
implementations and won't use the BasicTTIImpl implementations.
Once system models are in place for the targets that use these
interfaces, their custom TTI implementations can be removed.
Differential Revision: https://reviews.llvm.org/D63614
llvm-svn: 365676
|
|
|
|
| |
llvm-svn: 361813
|
|
|
|
|
|
|
|
|
|
|
| |
It uses the generic AArch64_IMM::expandMOVImm to get the correct
number of instruction used in immediate materialization.
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D58461
llvm-svn: 356391
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
optsize using masked wide loads
Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53668
llvm-svn: 345705
|
|
|
|
|
|
|
|
|
|
| |
I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles.
This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time.
Differential Revision: https://reviews.llvm.org/D53570
llvm-svn: 345253
|
|
|
|
| |
llvm-svn: 344475
|
|
|
|
| |
llvm-svn: 344473
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
interleave-group
The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.
Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53011
llvm-svn: 344472
|
|
|
|
|
|
| |
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}
llvm-svn: 338293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.
A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:
define void @foo(<4 x i16> %x, i8* nocapture %p) {
%0 = trunc <4 x i16> %x to <4 x i8>
%1 = bitcast i8* %p to <4 x i8>*
store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
ret void
}
Can be optimized from:
umov w8, v0.h[3]
umov w9, v0.h[2]
umov w10, v0.h[1]
umov w11, v0.h[0]
strb w8, [x0, #3]
strb w9, [x0, #2]
strb w10, [x0, #1]
strb w11, [x0]
ret
To:
xtn v0.8b, v0.8h
str s0, [x0]
ret
The patch also adjust the memory cost for autovectorization, so the C
code:
void foo (const int *src, int width, unsigned char *dst)
{
for (int i = 0; i < width; i++)
*dst++ = *src++;
}
can be vectorized to:
.LBB0_4: // %vector.body
// =>This Inner Loop Header: Depth=1
ldr q0, [x0], #16
subs x12, x12, #4 // =4
xtn v0.4h, v0.4s
xtn v0.8b, v0.8h
st1 { v0.s }[0], [x2], #4
b.ne .LBB0_4
Instead of byte operations.
llvm-svn: 335735
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.
This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.
I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.
Differential Revision: https://reviews.llvm.org/D48172
llvm-svn: 335329
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With custom lowering for vector MULLH{S,U}, it is now profitable to
vectorize a divide by constant loop for the custom types (v16i8, v8i16,
and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which
uses a multiply by constant plus adjustment to express a divide by
constant.
Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by
Instruction::AShr.
llvm-svn: 331873
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.
Patch produced by
for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done
Differential Revision: https://reviews.llvm.org/D46290
llvm-svn: 331272
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.
Differential Revision: https://reviews.llvm.org/D45982
llvm-svn: 330941
|
|
|
|
|
|
|
|
|
|
| |
This patch provides an implementation of getArithmeticReductionCost for
AArch64. We can specialize the cost of add reductions since they are computed
using the 'addv' instruction.
Differential Revision: https://reviews.llvm.org/D44490
llvm-svn: 327702
|
|
|
|
|
|
|
|
|
|
| |
Since there is no instruction for integer vector division, factor in the
cost of singling out each element to be used with the scalar division
instruction.
Differential revision: https://reviews.llvm.org/D43974
llvm-svn: 326955
|
|
|
|
|
|
| |
These arise because enums are 'int' by default.
llvm-svn: 321887
|
|
|
|
|
|
|
|
| |
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
|
|
|
|
|
|
|
| |
Many of these uses can get by with forward declarations. Hopefully this
speeds up compilation after adding a single intrinsic.
llvm-svn: 312759
|
|
|
|
| |
llvm-svn: 306588
|
|
|
|
|
|
|
|
|
|
|
|
| |
unrolling.
Reviewers: t.p.northover, mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34533
llvm-svn: 306584
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper
Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D34531
llvm-svn: 306554
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Similar to X86, it should be safe to inline callees if their target-features
are a subset of the caller. This change matches GCC's inlining behavior
with respect to attributes [1].
[1] https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html#AArch64-Function-Attributes
Reviewers: kristof.beyls, javed.absar, rengolin, t.p.northover
Reviewed By: t.p.northover
Subscribers: aemerson, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D34698
llvm-svn: 306478
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
|
|
|
|
|
|
|
| |
The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions
which didn't have a lowering.
llvm-svn: 303211
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This caused PR33053.
Original commit message:
> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247
llvm-svn: 303115
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new experimental reduction intrinsics can now be used, so I'm enabling this
for AArch64. We will need this for SVE anyway, so it makes sense to do this for
NEON reductions as well.
The existing code to match shufflevector patterns are replaced with a direct
lowering of the reductions to AArch64-specific nodes. Tests updated with the
new, simpler, representation.
Differential Revision: https://reviews.llvm.org/D32247
llvm-svn: 302678
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The AArch64 instruction set has a few "widening" instructions (e.g., uaddl,
saddl, uaddw, etc.) that take one or more doubleword operands and produce
quadword results. The operands are automatically sign- or zero-extended as
appropriate. However, in LLVM IR, these extends are explicit. This patch
updates TTI to consider these widening instructions as single operations whose
cost is attached to the arithmetic instruction. It marks extends that are part
of a widening operation "free" and applies a sub-target specified overhead
(zero by default) to the arithmetic instructions.
Differential Revision: https://reviews.llvm.org/D32706
llvm-svn: 302582
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
|
|
|
|
|
|
|
|
|
|
|
| |
This patch refactors and strengthens the type checks performed for interleaved
accesses. The primary functional change is to ensure that the interleaved
accesses have valid element types. The added test cases previously failed
because the element type is f128.
Differential Revision: https://reviews.llvm.org/D31817
llvm-svn: 299864
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Move the aarch64-type-promotion pass within the existing type promotion framework in CGP.
This change also support forking sexts when a new sext is required for promotion.
Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853.
Reviewers: jmolloy, mcrosier, javed.absar, qcolombet
Reviewed By: qcolombet
Subscribers: llvm-commits, aemerson, rengolin, mcrosier
Differential Revision: https://reviews.llvm.org/D28680
llvm-svn: 299379
|
|
|
|
|
|
| |
All this did before was assert in EarlyCSE.
llvm-svn: 298724
|
|
|
|
|
|
|
|
|
| |
After r296750, we're able to match interleaved accesses having types wider than
128 bits. This patch updates the associated TTI costs.
Differential Revision: https://reviews.llvm.org/D29675
llvm-svn: 296751
|
|
|
|
|
|
|
|
|
|
|
|
| |
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.
special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq.
In case if the real operands bitwidth <= 16.
Differential Revision: https://reviews.llvm.org/D28104
llvm-svn: 291657
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original code considered only v2i64 as slow for this feature. This patch
consider all 128-bit long vector types as slow candidates.
In internal tests, extending this feature to all 128-bit vector types
resulted in an overall improvement of 1% on Exynos M1.
Differential revision: https://reviews.llvm.org/D27998
llvm-svn: 291616
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
stride seems to be 'complex' and need some extra cost for address computation handling.
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.
Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.
Differential Revision: https://reviews.llvm.org/D27518
llvm-svn: 291106
|
|
|
|
|
|
|
|
|
| |
This patch checks that the SlowMisaligned128Store subtarget feature is set
when penalizing such stores in getMemoryOpCost.
Differential Revision: https://reviews.llvm.org/D27677
llvm-svn: 289845
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Testing for specific CPUs has a number of problems, better use subtarget
features:
- When some tweak is added for a specific CPU it is often desirable for
the next version of that CPU as well, yet we often forget to add it.
- It is hard to keep track of checks scattered around the target code;
Declaring all target specifics together with the CPU in the tablegen
file is a clear representation.
- Subtarget features can be tweaked from the command line.
To discourage people from using CPU checks in the future I removed the
isCortexXX(), isCyclone(), ... functions. I added an getProcFamily()
function for exceptional circumstances but made it clear in the comment
that usage is discouraged.
Reformat feature list in AArch64.td to have 1 feature per line in
alphabetical order to simplify merging and sorting for out of tree
tweaks.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D20762
llvm-svn: 271555
|
|
|
|
| |
llvm-svn: 267734
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
|