| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary
In several places we need to enumerate all constrained intrinsics or IR
nodes that should be represented by them. It is easy to miss some of
the cases. To make working with these intrinsics more convenient and
robust, this change introduces file containing definitions of all
constrained intrinsics and some of their properties. This file can be
included to generate constrained intrinsics processing code.
Reviewers: kpn, andrew.w.kaylor, cameron.mcinally, uweigand
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69887
|
| |
|
|
| |
Differential Revision: https://reviews.llvm.org/D70220
|
| |
|
|
|
|
|
|
| |
A call site parameter description of a memory operand needs to
unambiguously convey the size of the operand to prevent incorrect entry
value evaluation.
Thanks for David Stenberg for pointing this issue out!
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Cleanup handling of the denormal-fp-math attribute. Consolidate places
checking the allowed names in one place.
This is in preparation for introducing FP type specific variants of
the denormal-fp-mode attribute. AMDGPU will switch to using this in
place of the current hacky use of subtarget features for the denormal
mode.
Introduce a new header for dealing with FP modes. The constrained
intrinsic classes define related enums that should also be moved into
this header for uses in other contexts.
The verifier could use a check to make sure the denorm-fp-mode
attribute is sane, but there currently isn't one.
Currently, DAGCombiner incorrectly asssumes non-IEEE behavior by
default in the one current user. Clang must be taught to start
emitting this attribute by default to avoid regressions when this is
switched to assume ieee behavior if the attribute isn't present.
|
| |
|
|
|
|
|
|
| |
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Assert in getFunctionLocalOffsetAfterInsn() fails when processing a call
MachineInstr inside a bundle and compiling with debug info. This is
because labels are added by DwarfDebug::beginInstruction() which is
called for each top-level MI by EmitFunctionBody()'s for-loop iteration
but constructCallSiteEntryDIEs() which calls
getFunctionLocalOffsetAfterInsn() iterates over all MIs.
This commit modifies constructCallSiteEntryDIEs() to get the associated
bundle MI for call MIs inside a bundle and use that to when calling
getFunctionLocalOffsetAfterInsn() and getLabelAfterInsn(). It also skips
loop iterations for bundle MIs since the loop statements are concerned
with debug info for each physical instructions and bundles represent a
group of instructions. It also fix the comment about PCAddr since the
code is getting the return address and not the call address.
Reviewers: dstenb, vsk, aprantl, djtodoro, dblaikie, NikolaPrica
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70293
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
LegalizeTypes and LegalizeDAG to one version in TaretLowering.
Reviewers: RKSimon, efriedma, spatel
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70354
|
| |
|
|
|
|
|
|
|
|
| |
Previously we mutated the node and then converted it to a libcall. But this loses the chain information.
This patch keeps the chain, but unfortunately breaks tail call optimization as the functions involved in deciding if a node is in tail call position can't handle the chain. But correct ordering seems more important to be right.
Somehow the SystemZ tests improved. I looked at one of them and it seemed that we're handling the split vector elements in a different order and that made the copies work better.
Differential Revision: https://reviews.llvm.org/D70334
|
| |
|
|
|
|
|
|
| |
-ffp-model=, and -ffp-exception-behavior="
and a follow-up NFC rearrangement as it's causing a crash on valid. Testcase is on the original review thread.
This reverts commits af57dbf12e54f3a8ff48534bf1078f4de104c1cd and e6584b2b7b2de06f1e59aac41971760cac1e1b79
|
| |
|
|
|
| |
This reverts commit 423f541c1a322963cf482683fe9777ef0692082d, which
breaks llvm-c ABI.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Implements scalable size queries for MVTs, split out from D53137.
* Contains a fix for FindMemType to avoid using scalable vector type
to contain non-scalable types.
* Explicit casts for several places where implicit integer sign
changes or promotion from 32 to 64 bits caused problems.
* CodeGenDAGPatterns will treat scalable and non-scalable vector types
as different.
Reviewers: greened, cameron.mcinally, sdesmalen, rovka
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D66871
|
| |
|
|
|
|
|
|
|
| |
and remove assert from the strict fp path.
These were both recently added. While the call to GetSoftenedFloat
is a little more optimal, we don't do it in the expand for
FP_TO_SINT/UINT so there's no real reason to do it here. This
avoids a FIXME for strict fp.
|
| |
|
|
| |
MVT::SimpleValueType just to assign back to EVT. NFC
|
| |
|
|
|
|
|
|
|
| |
STRICT_LLROUND and STRICT_LLRINT.
This doesn't handle softening the input type, but we don't handle
softening any of the strict nodes yet. Skipping that made it easy
to reuse an existing function for creating a libcall from a node
with a chain.
|
| |
|
|
|
|
|
|
|
| |
call GetSoftenedFloat if the floating point input needs to be softened.
Before this we were emitting a bitcast to integer from the lowering
code that itself will need to be legalized. By calling
GetSoftenedFloat we get the integer conversion in one step without
needing to relegalize a bitcast.
|
| |
|
|
|
|
|
|
|
|
|
| |
This code isn't exercised, and was in the wrong place. If we need
this, we would need to promote the type before figuring out which
libcall to use.
I'm choosing to remove it rather than fixing since we don't
support PromoteFloat for LRINT/LROUND/LLRINT/LLROUND when the
result type is legal so I don't see much reason to support it
for the case where the result type isn't legal.
|
| |
|
|
|
|
|
|
|
| |
function that handles both. NFC
These too functions are were the same except for which libcall gets
emitted. Just merge them into one.
This is prep work for some other work including strict fp support.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch, adds support for DW_AT_alignment[DWARF5] attribute, to be emitted with typdef DIE.
When explicit alignment is specified.
Patch by Awanish Pandey <Awanish.Pandey@amd.com>
Reviewers: aprantl, dblaikie, jini.susan.george, SouraVX, alok,
deadalinx
Differential Revision: https://reviews.llvm.org/D70111
|
| |
|
|
|
|
|
|
|
|
| |
relocations
This only implements the non-dwo part, but loclistx is necessary to use
location lists in DWARFv5, so it's a precursor to that work - and
generally reduces relocations (only using one reloc, then
indexes/relative offsets for all location list references) in non-split
DWARF.
|
| |
|
|
|
|
| |
operands.
NFC
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
shuffle vector
LLVM IR of 1-element vectors get lower into scalar in GISel. As a
result, shuffle vector may also produce a scalar.
This patch teaches the shuffle combiner how to deal with scalars when
they are in the destination type of a shuffle vector.
For now, we just support the easy case where this can be lowered to
a plain copy. For other cases, we leave the shuffle vector as is.
This type of IR are seen in O0 pipelines. E.g., as produced with
SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c.
rdar://problem/57198904
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://reviews.llvm.org/D70210
Previously:
Due to sensitivity of the algorithm with gaps, and extra instructions,
when diffing, often we see naming being off by a few. Makes the diff
unreadable even for tests with 7 and 8 instructions respectively.
Naming can change depending on candidates (and order of picking
candidates). Suddenly if there's one extra instruction somewhere, the
entire subtree would be named completely differently.
No consistent naming of similar instructions which occur in different
functions. If we try to do something like count the frequency
distribution of various differences across suite, then the above
sensitivity issues are going to result in poor results.
Instead:
Name instruction based on semantics of the instruction (hash of the
opcode and operands). Essentially for a given instruction that occurs in
any module/function it'll be named similarly (ie semantic). This has
some nice properties
Can easily look at many instructions and just check the hash and if
they're named similarly, then it's the same instruction. Makes it very
easy to spot the same instruction both multiple times, as well as across
many functions (useful for frequency distribution).
Independent of traversal/candidates/depth of graph. No need to keep
track of last index/gaps/skip count etc.
No off by few issues with diffs. I've tried the old vs new
implementation in files ranging from 30 to 700 instructions. In both
cases with the old algorithm, diffs are a sea of red, where as for the
semantic version, in both cases, the diffs line up beautifully.
Simplified implementation of the main loop (simple iteration) , no keep
track of what's visited and not.
Handle collision just by incrementing a counter. Roughly
bb[N]_hash_[CollisionCount].
Additionally with the new implementation, we can probably avoid doing
the hoisting of instructions to various places, as they'll likely be
named the same resulting in differences only based on collision (ie
regardless of whether the instruction is hoisted or not/close to use or
not, it'll be named the same hash which should result in use of the
instruction be identical with the only change being the collision count)
which is very easy to spot visually.
|
| |
|
|
|
|
|
|
|
|
| |
SUMMARY:
The patch will emit read-only variable assembly code for aix.
Reviewers: daltenty,Xiangling_Liao
Subscribers: rupprecht, seiyai,hiraditya
Differential Revision: https://reviews.llvm.org/D70182
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enumerations that describe rounding mode and exception behavior were
defined inside ConstrainedFPIntrinsic. It makes sense to use the same
definitions to represent the same properties in other cases, not only
in constrained intrinsics. It was however inconvenient as required to
include constrained intrinsics definitions even if they were not needed.
Also using long scope prefix reduced readability.
This change moves these definitioins to the namespace llvm::fp.
No functional changes.
Differential Revision: https://reviews.llvm.org/D69552
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SmallVector reserve() call in
MachineInstrExpressionTrait::getHashValue accounted for over 3% of all
calls to malloc() when I compiled a bunch of graphics shaders for the
AMDGPU target. Its initial size was only enough for machine instructions
with up to 7 operands, but for AMDGPU 8 and 10 operands are very common.
Here's a histogram of number of operands for each call to getHashValue,
gathered from the same collection of shaders:
1 13503
2 254273
3 135781
4 422508
5 614997
6 194953
7 287248
8 1517255
9 31218
10 1191269
11 70731
12 24
13 77
15 84
17 4692
27 16
33 705
49 6
Typical instructions with 8 and 10 operands are floating point
arithmetic and multiply-accumulate instructions like:
%83:vgpr_32 = V_MUL_F32_e64 0, killed %82:vgpr_32, 0, killed %81:vgpr_32, 0, 0, implicit $exec
%330:vgpr_32 = V_MAC_F32_e64 0, killed %327:vgpr_32, 0, killed %329:sgpr_32, 0, %328:vgpr_32(tied-def 0), 0, 0, implicit $exec
Differential Revision: https://reviews.llvm.org/D70301
|
| | |
|
| |
|
|
|
| |
These are a pre-requisite to removing #include "llvm/Support/Options.h"
from LLVMContext.h: https://reviews.llvm.org/D70280
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow call site paramter descriptions to reference spill slots. Spill
slots are not visible to high-level LLVM IR, so they can safely be
referenced during entry value evaluation (as they cannot be clobbered by
some other function).
This gives a 5% increase in the number of call site parameter DIEs in an
LTO x86_64 build of the xnu kernel.
This reverts commit eb4c98ca3d2590bad9f6542afbf3a7824d2b53fa (
[DebugInfo] Exclude memory location values as parameter entry values),
effectively reintroducing the portion of D60716 which dealt with memory
locations (authored by Djordje, Nikola, Ananth, and Ivan).
This partially addresses llvm.org/PR43343. However, not all memory
operands forwarded to callees live in spill slots. In the xnu build, it
may be possible to use an escape analysis to increase the number of call
site parameter by another 15% (more details in PR43343).
Differential Revision: https://reviews.llvm.org/D70254
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
reductions.
We were previously pushing all intrinsics used in a function to the
worklist. This is wasteful for memory in a function with a lot of
intrinsics.
We also ask TTI if we should expand every intrinsic, but we only
have expansion support for the reduction intrinsics. This just
wastes time for the non-reduction intrinsics.
This patch only pushes reduction intrinsics into the worklist and
skips other intrinsics.
Differential Revision: https://reviews.llvm.org/D69470
|
| |
|
|
|
|
|
|
| |
I reviewed the diff hunks of 05da2fe52162c80dfa that don't contain
'#include' lines, and found two unintended changes. I deleted a header
banner inadvertently while inserting a header, and changed the
indentation of a constructor in an odd way. Add back the banner, and
reformat the constructor.
|
| | |
|
| | |
|
| |
|
|
|
|
|
| |
Avoids the need to include TargetMachine.h from various places just for
an enum. Various other enums live here, such as the optimization level,
TLS model, etc. Data suggests that this change probably doesn't matter,
but it seems nice to have anyway.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
|
| |
|
|
|
|
|
| |
This method is private and only called from this file and doesn't need
to be inline. Saves a TargetMachine.h include in MachineFunction.h, a
popular header. The include was introduced in 98603a8153086 despite the
forward decl of LLVMTargetMachine.
|
| |
|
|
|
|
|
|
|
|
| |
type break down for v256i1 and other types to be stored correctly
v256i1 on X86 without avx512 breaks down to 256 i8 values when passed between basic blocks. But the NumRegistersForVT was sized at a byte for each VT. This results in 256 being stored as 0.
This patch enlarges the type to 16 bits and adds an assert to ensure that no information is lost when the entry is stored.
Differential Revision: https://reviews.llvm.org/D70138
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.
This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.
E.g., let say we want to merge:
V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>
We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
V2: sub0 sub1 sub2 sub3
V1: <offset> sub0 sub1
This offset will look like a composed subregidx in the the class:
V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
=> V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.
Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.
For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)
This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.
looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.
None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Entry values are considered for parameters that have register-described
DBG_VALUEs in the entry block (along with other conditions).
If a parameter's value has been propagated from the caller to the
callee, then the parameter's DBG_VALUE in the entry block may be
described using a register defined by some instruction, and entry values
should not be emitted for the parameter, which can currently occur.
One such case was seen in the attached test case, in which the second
parameter, which is described by a redefinition of the first parameter's
register, would incorrectly get an entry value using the first
parameter's register. This commit intends to solve such cases by keeping
track of register defines, and ignoring DBG_VALUEs in the entry block
that are described by such registers.
In a RelWithDebInfo build of clang-8, the average size of the set was
27, and in a RelWithDebInfo+ASan build it was 30.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: djtodoro, vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D69889
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The conditions that are used to determine if entry values should be
emitted for a parameter are quite many, and will grow slightly
in a follow-up commit, so move those to a helper function, as was
suggested in the code review for D69889.
Reviewers: djtodoro, NikolaPrica
Reviewed By: djtodoro
Subscribers: probinson, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69955
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a target interface to set the StackID for a given type,
which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a
'sve-vec' StackID, so it is allocated in the SVE area of the stack frame.
Reviewers: ostannard, efriedma, rengolin, cameron.mcinally
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D70080
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Replaces
```
unsigned getShiftAmountThreshold(EVT VT)
```
by
```
bool shouldAvoidTransformToShift(EVT VT, unsigned amount)
```
thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not.
Updates the MSP430 target with a custom implementation.
This continues D69116, D69120, D69326 and updates them, so all of them must be committed before this.
Existing tests apply, a few more have been added.
Reviewers: asl, spatel
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70042
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In MachineCopyPropagation, when propagating the source of a copy into
the operand of a later instruction, bail if a destination overlaps
(partly defines) the copy source. If the instruction where the
substitution is happening is also a copy, allowing the propagation
confuses the tracking mechanism.
Differential Revision: https://reviews.llvm.org/D69953
Change-Id: Ic570754f878f2d91a4a50a9bdcf96fbaa240726d
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch redefines freeze instruction from being UnaryOperator to a subclass of UnaryInstruction.
ConstantExpr freeze is removed, as discussed in the previous review.
FreezeOperator is not added because there's no ConstantExpr freeze.
`freeze i8* null` test is added to `test/Bindings/llvm-c/freeze.ll` as well, because the null pointer-related bug in `tools/llvm-c/echo.cpp` is now fixed.
InstVisitor has visitFreeze now because freeze is not unaryop anymore.
Reviewers: whitequark, deadalnix, craig.topper, jdoerfert, lebedev.ri
Reviewed By: craig.topper, lebedev.ri
Subscribers: regehr, nlopes, mehdi_amini, hiraditya, steven_wu, dexonsmith, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69932
|
| |
|
|
|
|
|
|
| |
For XCOFF, globals mapped into the .bss section are linked as COMMON
definitions. This behaviour is incorrect for zero initialized data, so
emit those to the .data section instead.
Differential Revision: https://reviews.llvm.org/D69528
|
| |
|
|
|
|
|
|
|
| |
In current Hoist() function of machine licm pass, it will not check the source and destination basic block frequencies that a instruction is hoisted from/to.
There is a chance that instruction is hoisted from a cold to a hot basic block.
In this patch, we add options to disable machine instruction hoisting if destination block is hotter.
Differential Revision: https://reviews.llvm.org/D63676
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The new experimental expansion has a problem when a value has a data
dependency with an instruction from a previous stage. This is due to
the way we peel out the kernel. To fix that I'm changing the way we
peel out the kernel. We now peel the kernel NumberStage - 1 times.
The code would be correct at this point if we didn't have to handle
cases where the loop iteration is smaller than the number of stages.
To handle this case we move instructions between different epilogues
based on their stage and remap the PHI instructions correctly.
Differential Revision: https://reviews.llvm.org/D69538
|
| |
|
|
|
|
|
|
|
| |
Simple change to call target hook analyzeLoopForPipelining before
changing the loop. After peeling analyzing the loop may be more
complicated for target that don't have a loop instruction. This doesn't
affect Hexagone and PPC as they have hardware loop instructions.
Differential Revision: https://reviews.llvm.org/D69912
|