| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
The newly added VSX D-Form (register + offset) memory ops target the upper half
of the VSX register set. The existing ones target the lower half. In order to
unify these and have the ability to target all the VSX registers using D-Form
operations, this patch defines Pseudo-ops for the loads/stores which are
expanded post-RA. The expansion then choses the correct opcode based on the
register that was allocated for the operation.
llvm-svn: 283212
|
|
|
|
| |
llvm-svn: 283191
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
https://reviews.llvm.org/D23155
This patch removes the VSHRC register class (based on D20310) and adds
exploitation of the Power9 sub-word integer loads into VSX registers as well
as vector sign extensions.
The new instructions are useful for a few purposes:
Int to Fp conversions of 1 or 2-byte values loaded from memory
Building vectors of 1 or 2-byte integers with values loaded from memory
Storing individual 1 or 2-byte elements from integer vectors
This patch implements all of those uses.
llvm-svn: 283190
|
|
|
|
|
|
|
|
| |
This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2.
Causing crashes on aarch64 build.
llvm-svn: 283172
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
llvm-svn: 283164
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change enables soft-float for PowerPC64, and also makes soft-float disable
all vector instruction sets for both 32-bit and 64-bit modes. This latter part
is necessary because the PPC backend canonicalizes many Altivec vector types to
floating-point types, and so soft-float breaks scalarization support for many
operations. Both for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware floating-point
also disables vector instructions (embedded targets without hardware floating
point support are unlikely to have Altivec, etc. and operating system kernels
desiring not to use floating-point registers to lower syscall cost are unlikely
to want to use vector registers either). If someone needs this to work, we'll
need to change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is enabled,
hardware floating-point support needs to be expressed as a positive feature,
like the others, and not a negative feature, because target features cannot
have dependencies on the disabling of some other feature. So +soft-float has
now become -hard-float.
Fixes PR26970.
llvm-svn: 283060
|
|
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT
llvm-svn: 282604
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill
behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 282600
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
https://reviews.llvm.org/D24396
This patch adds support for the "vector count trailing zeroes",
"vector compare not equal" and "vector compare not equal or zero instructions"
as well as "scalar count trailing zeroes" instructions. It also changes the
vector negation to use XXLNOR (when VSX is enabled) so as not to increase
register pressure (previously this was done with a splat immediate of all
ones followed by an XXLXOR). This was done because the altivec.h
builtins (patch to follow) use vector negation and the use of an additional
register for the splat immediate is not optimal.
llvm-svn: 282478
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
https://reviews.llvm.org/D21135
This patch exploits the following instructions:
mtvsrws
lxvwsx
mtvsrdd
mfvsrld
In order to improve some build_vector and extractelement patterns.
llvm-svn: 282246
|
|
|
|
|
|
|
|
| |
Atomic comparison instructions use the sub-word load instruction on
Power8 and up but the value is not sign extended prior to the signed word
compare instruction. This patch adds that sign extension.
llvm-svn: 282182
|
|
|
|
|
|
|
|
|
|
|
| |
Follow-up to r280705: Make sure that the SP is only restored after all data
is loaded from the stack frame, if there is no red zone.
This completes the fix for https://llvm.org/bugs/show_bug.cgi?id=26519.
Differential Revision: https://reviews.llvm.org/D24466
llvm-svn: 282174
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
https://reviews.llvm.org/D19825
The new lxvx/stxvx instructions do not require the swaps to line the elements
up correctly. In order to select them over the lxvd2x/lxvw4x instructions which
require swaps, the patterns for the old instruction have a predicate that
ensures they won't be selected on Power9 and newer CPUs.
llvm-svn: 282143
|
|
|
|
| |
llvm-svn: 281759
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It was previously not possible for tools to use solely the stackmap
information emitted to reconstruct the return addresses of callsites in
the map, which is necessary to use the information to walk a stack. This
patch adds per-function callsite counts when emitting the stackmap
section in order to resolve the problem. Note that this slightly alters
the stackmap format, so external tools parsing these maps will need to
be updated.
**Problem Details:**
Records only store their offset from the beginning of the function they
belong to. While these records and the functions are output in program
order, it is not possible to determine where the end of one function's
records are without the callsite count when processing the records to
compute return addresses.
Patch by Kavon Farvardin!
Reviewers: atrick, ributzka, sanjoy
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D23487
llvm-svn: 281532
|
|
|
|
|
|
|
|
|
|
|
| |
This patch corresponds to review:
https://reviews.llvm.org/D24021
In the initial implementation of this instruction, I forgot to account for
variable indices. This patch fixes PR30189 and should probably be merged into
3.9.1 (I'll open a bug according to the new instructions).
llvm-svn: 281479
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D23764
llvm-svn: 281308
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.
Fixes PR30362. A program for upgrading test cases is attached to that
bug.
Differential Revision: http://reviews.llvm.org/D20147
llvm-svn: 281284
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When folding an addi into a memory access that can take an immediate offset, we
were implicitly assuming that the existing offset was zero. This was incorrect.
If we're dealing with an addi with a plain constant, we can add it to the
existing offset (assuming that doesn't overflow the immediate, etc.), but if we
have anything else (i.e. something that will become a relocation expression),
we'll go back to requiring the existing immediate offset to be zero (because we
don't know what the requirements on that relocation expression might be - e.g.
maybe it is paired with some addis in some relevant way).
On the other hand, when dealing with a plain addi with a regular constant
immediate, the alignment restrictions (from the TOC base pointer, etc.) are
irrelevant.
I've added the test case from PR30280, which demonstrated the bug, but also
demonstrates a missed optimization opportunity (i.e. we don't need the memory
accesses at all).
Fixes PR30280.
llvm-svn: 280789
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I might have called this "r246507, the sequel". It fixes the same issue, as the
issue has cropped up in a few more places. The underlying problem is that
isSetCCEquivalent can pick up select_cc nodes with a result type that is not
legal for a setcc node to have, and if we use that type to create new setcc
nodes, nothing fixes that (and so we've violated the contract that the
infrastructure has with the backend regarding setcc node types).
Fixes PR30276.
For convenience, here's the commit message from r246507, which explains the
problem is greater detail:
[DAGCombine] Fixup SETCC legality checking
SETCC is one of those special node types for which operation actions (legality,
etc.) is keyed off of an operand type, not the node's value type. This makes
sense because the value type of a legal SETCC node is determined by its
operands' value type (via the TLI function getSetCCResultType). When the
SDAGBuilder creates SETCC nodes, it either creates them with an MVT::i1 value
type, or directly with the value type provided by TLI.getSetCCResultType.
The first problem being fixed here is that DAGCombine had several places
querying TLI.isOperationLegal on SETCC, but providing the return of
getSetCCResultType, instead of the operand type directly. This does not mean
what the author thought, and "luckily", most in-tree targets have SETCC with
Custom lowering, instead of marking them Legal, so these checks return false
anyway.
The second problem being fixed here is that two of the DAGCombines could create
SETCC nodes with arbitrary (integer) value types; specifically, those that
would simplify:
(setcc a, b, op1) and|or (setcc a, b, op2) -> setcc a, b, op3
(which is possible for some combinations of (op1, op2))
If the operands of the and|or node are actual setcc nodes, then this is not an
issue (because the and|or must share the same type), but, the relevant code in
DAGCombiner::visitANDLike and DAGCombiner::visitORLike actually calls
DAGCombiner::isSetCCEquivalent on each operand, and that function will
recognise setcc-like select_cc nodes with other return types. And, thus, when
creating new SETCC nodes, we need to be careful to respect the value-type
constraint. This is even true before type legalization, because it is quite
possible for the SELECT_CC node to have a legal type that does not happen to
match the corresponding TLI.getSetCCResultType type.
To be explicit, there is nothing that later fixes the value types of SETCC
nodes (if the type is legal, but does not happen to match
TLI.getSetCCResultType). Creating SETCCs with an MVT::i1 value type seems to
work only because, either MVT::i1 is not legal, or it is what
TLI.getSetCCResultType returns if it is legal. Fixing that is a larger change,
however. For the time being, restrict the relevant transformations to produce
only SETCC nodes with a value type matching TLI.getSetCCResultType (or MVT::i1
prior to type legalization).
Fixes PR24636.
llvm-svn: 280767
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it
there is no guarantee that this part of the stack will not be modified
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.
This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.
Differential Revision: https://reviews.llvm.org/D24093
llvm-svn: 280705
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As it turns out, whether we zero-extend or sign-extend i8/i16 constants, which
are illegal types promoted to i32 on PowerPC, is a choice constrained by
assumptions within the infrastructure. Specifically, the logic in
FunctionLoweringInfo::ComputePHILiveOutRegInfo assumes that constant PHI
operands will be zero extended, and so, at least when materializing constants
that are PHI operands, we must do the same.
The rest of our fast-isel implementation does not appear to depend on the fact
that we were sign-extending i8/i16 constants, and all other targets also appear
to zero-extend small-bitwidth constants in fast-isel; we'll now do the same (we
had been doing this only for i1 constants, and sign-extending the others).
Fixes PR27721.
llvm-svn: 280614
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we have an offset into a global, etc. that is accessed relative to the TOC
base pointer, and the offset is larger than the minimum alignment of the global
itself and the TOC base pointer (which is 8-byte aligned), we can still fold
the @toc@ha into the memory access, but we must update the addis instruction's
symbol reference with the offset as the symbol addend. When there is only one
use of the addi to be folded and only one use of the addis that would need its
symbol's offset adjusted, then we can make the adjustment and fold the @toc@l
into the memory access.
llvm-svn: 280545
|
|
|
|
|
|
|
|
|
|
| |
As Sanjay suggested when he added the hook, PPC should return true from
hasAndNotCompare. We have an efficient negated 'and' on PPC (which can feed a
compare).
Fixes PR27203.
llvm-svn: 280457
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Following a suggestion by Sanjay, we should lower:
%shl = shl i32 1, %y
%and = and i32 %x, %shl
%cmp = icmp eq i32 %and, %shl
ret i1 %cmp
into:
subfic r4, r4, 32
rlwnm r3, r3, r4, 31, 31
Add this pattern and some associated patterns for the 64-bit case and the
not-equal case. Fixes PR27356.
llvm-svn: 280454
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
greater than 7
When applying our address-formation PPC64 peephole, we are reusing the @ha TOC
addis value with the low parts associated with different offsets (i.e.
different effective symbol addends). We were assuming this was okay so long as
the offsets were less than the alignment of the global variable being accessed.
This ignored the fact, however, that the TOC base pointer itself need only be
8-byte aligned. As a result, what we were doing is legal only for offsets less
than 8 regardless of the alignment of the object being accessed.
Fixes PR28727.
llvm-svn: 280441
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The logic in this function assumes that the P8 supports fusion of addis/addi,
but it does not. As a result, there is no advantage to restricting our peephole
application, merging addi instructions into dependent memory accesses, even
when the addi has multiple users, regardless of whether or not we're optimizing
for size.
We might need something like this again for the P9; I suspect we'll revisit
this code when we work on P9 tuning.
llvm-svn: 280440
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible
__builtin_dwarf_cfa() builtin. As pointed out in PR26761, this is currently
broken on PowerPC (and likely on ARM as well). Currently, @llvm.eh.dwarf.cfa is
lowered using:
ADD(FRAMEADDR, FRAME_TO_ARGS_OFFSET)
where FRAME_TO_ARGS_OFFSET defaults to the constant zero. On x86,
FRAME_TO_ARGS_OFFSET is lowered to 2*SlotSize. This setup, however, does not
work for PowerPC. Because of the way that the stack layout works, the canonical
frame address is not exactly (FRAMEADDR + FRAME_TO_ARGS_OFFSET) on PowerPC
(there is a lower save-area offset as well), so it is not just a matter of
implementing FRAME_TO_ARGS_OFFSET for PowerPC (unless we redefine its
semantics -- We can do that, since it is currently used only for
@llvm.eh.dwarf.cfa lowering, but the better to directly lower the CFA construct
itself (since it can be easily represented as a fixed-offset FrameIndex)). Mips
currently does this, but by using a custom lowering for ADD that specifically
recognizes the (FRAMEADDR, FRAME_TO_ARGS_OFFSET) pattern.
This change introduces a ISD::EH_DWARF_CFA node, which by default expands using
the existing logic, but can be directly lowered by the target. Mips is updated
to use this method (which simplifies its implementation, and I suspect makes it
more robust), and updates PowerPC to do the same.
Fixes PR26761.
Differential Revision: https://reviews.llvm.org/D24038
llvm-svn: 280350
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As discussed in https://reviews.llvm.org/D22666, our current mechanism to
support -pg profiling, where we insert calls to mcount(), or some similar
function, is fundamentally broken. We insert these calls in the frontend, which
means they get duplicated when inlining, and so the accumulated execution
counts for the inlined-into functions are wrong.
Because we don't want the presence of these functions to affect optimizaton,
they should be inserted in the backend. Here's a pass which would do just that.
The knowledge of the name of the counting function lives in the frontend, so
we're passing it here as a function attribute. Clang will be updated to use
this mechanism.
Differential Revision: https://reviews.llvm.org/D22825
llvm-svn: 280347
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a function contains something, such as inline asm, which explicitly
clobbers the register used as the frame pointer, don't spill it twice. If we
need a frame pointer, it will be saved/restored in the prologue/epilogue code.
Explicitly spilling it again will reuse the same spill slot used by the
prologue/epilogue code, thus clobbering the saved value. The same applies
to the base-pointer or PIC-base register.
Partially fixes PR26856. Thanks to Ulrich for his analysis and the small
inline-asm reproducer.
llvm-svn: 280188
|
|
|
|
|
|
|
|
|
| |
Implement Bill's suggested fix for 32-bit targets for PR22711 (for the
alignment of each entry). As pointed out in the bug report, we could just force
the section alignment, since we only add pointer-sized things currently, but
this fix is somewhat more future-proof.
llvm-svn: 280049
|
|
|
|
|
|
|
|
|
|
|
| |
The "long call" option forces the use of the indirect calling sequence for all
calls (even those that don't really need it). GCC provides this option; This is
helpful, under certain circumstances, for building very-large binaries, and
some other specialized use cases.
Fixes PR19098.
llvm-svn: 280040
|
|
|
|
|
|
| |
Otherwise, running the test on Darwin systems will not work.
llvm-svn: 280034
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For little-Endian PowerPC, we generally target only P8 and later by default.
However, generic (older) 64-bit configurations are still an option, and in that
case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics
without true i8/i16 atomic operations, we emulate using i32 atomics in
combination with a bunch of shifting and masking, etc. The amount by which to
shift in little-Endian mode is different from the amount in big-Endian mode (it
is inverted -- meaning we can leave off the xor when computing the amount).
Fixes PR22923.
llvm-svn: 280022
|
|
|
|
|
|
| |
Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818.
llvm-svn: 279933
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
compute it
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.
Differential Revision: http://reviews.llvm.org/D23850
llvm-svn: 279698
|
|
|
|
| |
llvm-svn: 279680
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
print/parser it
tracksSubRegLiveness only depends on the Subtarget and a cl::opt, there
is not need to change it or save/parse it in a .mir file.
Make the field const and move the initialization LiveIntervalAnalysis to the
MachineRegisterInfo constructor. Also cleanup some code and fix some
instances which better use MachineRegisterInfo::subRegLivenessEnabled() instead
of TargetSubtargetInfo::enableSubRegLiveness().
llvm-svn: 279676
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.
If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.
Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.
define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
%tmp1434 = icmp eq i32 %a, %b ; <i1> [#uses=1]
br i1 %tmp1434, label %bb17, label %bb.outer
bb.outer: ; preds = %cond_false, %entry
%b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
%a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
br label %bb
bb: ; preds = %cond_true, %bb.outer
%indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
%tmp. = sub i32 0, %b_addr.021.0.ph
%tmp.40 = mul i32 %indvar, %tmp.
%a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
%tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
br i1 %tmp3, label %cond_true, label %cond_false
cond_true: ; preds = %bb
%tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
%tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
%indvar.next = add i32 %indvar, 1
br i1 %tmp1437, label %bb17, label %bb
cond_false: ; preds = %bb
%tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
%tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
br i1 %tmp14, label %bb17, label %bb.outer
bb17: ; preds = %cond_false, %cond_true, %entry
%a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
ret i32 %a_addr.026.1
}
Without tail-merging or diamond-tail if conversion:
LBB1_1: @ %bb
@ =>This Inner Loop Header: Depth=1
cmp r0, r1
ble LBB1_3
@ BB#2: @ %cond_true
@ in Loop: Header=BB1_1 Depth=1
subs r0, r0, r1
cmp r1, r0
it ne
cmpne r0, r1
bgt LBB1_4
LBB1_3: @ %cond_false
@ in Loop: Header=BB1_1 Depth=1
subs r1, r1, r0
cmp r1, r0
bne LBB1_1
LBB1_4: @ %bb17
bx lr
With diamond-tail if conversion, but without tail-merging:
@ BB#0: @ %entry
cmp r0, r1
it eq
bxeq lr
LBB1_1: @ %bb
@ =>This Inner Loop Header: Depth=1
cmp r0, r1
ite le
suble r1, r1, r0
subgt r0, r0, r1
cmp r1, r0
bne LBB1_1
@ BB#2: @ %bb17
bx lr
llvm-svn: 279671
|
|
|
|
|
|
|
|
|
| |
Specifying isSSA is an extra line at best and results in invalid MI at
worst. Compute the value instead.
Differential Revision: http://reviews.llvm.org/D22722
llvm-svn: 279600
|
|
|
|
|
|
| |
This reverts commit 0fda93481c4231c06b838ef476c0c404c51ff875.
llvm-svn: 279288
|
|
|
|
|
|
|
|
|
|
|
| |
The ppc64 multistage bot fails on this.
This reverts commit r279124.
Also Revert "CodeGen: Add/Factor out LiveRegUnits class; NFCI" because it depends on the previous change
This reverts commit r279171.
llvm-svn: 279199
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.
If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.
Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.
Regression on self-hosting bots with no obvious explanation. Tidied up range
handling to be more obviously correct, but there was no smoking gun.
define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
%tmp1434 = icmp eq i32 %a, %b ; <i1> [#uses=1]
br i1 %tmp1434, label %bb17, label %bb.outer
bb.outer: ; preds = %cond_false, %entry
%b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
%a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
br label %bb
bb: ; preds = %cond_true, %bb.outer
%indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
%tmp. = sub i32 0, %b_addr.021.0.ph
%tmp.40 = mul i32 %indvar, %tmp.
%a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
%tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
br i1 %tmp3, label %cond_true, label %cond_false
cond_true: ; preds = %bb
%tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
%tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
%indvar.next = add i32 %indvar, 1
br i1 %tmp1437, label %bb17, label %bb
cond_false: ; preds = %bb
%tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
%tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
br i1 %tmp14, label %bb17, label %bb.outer
bb17: ; preds = %cond_false, %cond_true, %entry
%a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
ret i32 %a_addr.026.1
}
Without tail-merging or diamond-tail if conversion:
LBB1_1: @ %bb
@ =>This Inner Loop Header: Depth=1
cmp r0, r1
ble LBB1_3
@ BB#2: @ %cond_true
@ in Loop: Header=BB1_1 Depth=1
subs r0, r0, r1
cmp r1, r0
it ne
cmpne r0, r1
bgt LBB1_4
LBB1_3: @ %cond_false
@ in Loop: Header=BB1_1 Depth=1
subs r1, r1, r0
cmp r1, r0
bne LBB1_1
LBB1_4: @ %bb17
bx lr
With diamond-tail if conversion, but without tail-merging:
@ BB#0: @ %entry
cmp r0, r1
it eq
bxeq lr
LBB1_1: @ %bb
@ =>This Inner Loop Header: Depth=1
cmp r0, r1
ite le
suble r1, r1, r0
subgt r0, r0, r1
cmp r1, r0
bne LBB1_1
@ BB#2: @ %bb17
bx lr
llvm-svn: 279168
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply r276044 with off-by-1 instruction fix for the reload placement.
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.
This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.
Differential Revision: http://reviews.llvm.org/D21885
llvm-svn: 279124
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a quick work around, because in some cases, e.g. caller's stack
size > callee's stack size, we are still able to apply sibling call
optimization even callee has any byval arg.
This patch fix: https://llvm.org/bugs/show_bug.cgi?id=28328
Reviewers: hfinkel kbarton nemanjai amehsan
Subscribers: hans, tjablin
https://reviews.llvm.org/D23441
llvm-svn: 278900
|
|
|
|
|
|
|
|
|
|
|
|
| |
If AnalyzeBranch can't analyze a block and it is possible to
fallthrough, then duplicating the block doesn't make sense, as only one
block can be the layout predecessor for the un-analyzable fallthrough.
Submitted wit a test case, but NOTE: the test case doesn't currently
fail. However, the test case fails with D20505 and would have saved me
some time debugging.
llvm-svn: 278866
|
|
|
|
|
|
|
|
|
| |
This reverts commit r278287.
This commit broke the clang-cmake-thumbv7-a15-full-sh bot.
See https://llvm.org/bugs/show_bug.cgi?id=28949
llvm-svn: 278621
|
|
|
|
|
|
|
|
|
| |
This reverts commit r278288.
r278287 broke the clang-cmake-thumbv7-a15-full-sh bot.
Revert this so we can get to r278287.
llvm-svn: 278620
|
|
|
|
|
|
|
|
|
|
|
|
| |
If AnalyzeBranch can't analyze a block and it is possible to
fallthrough, then duplicating the block doesn't make sense, as only one
block can be the layout predecessor for the un-analyzable fallthrough.
Submitted wit a test case, but NOTE: the test case doesn't currently
fail. However, the test case fails with D20505 and would have saved me
some time debugging.
llvm-svn: 278288
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.
If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.
Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.
define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
%tmp1434 = icmp eq i32 %a, %b ; <i1> [#uses=1]
br i1 %tmp1434, label %bb17, label %bb.outer
bb.outer: ; preds = %cond_false, %entry
%b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
%a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
br label %bb
bb: ; preds = %cond_true, %bb.outer
%indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
%tmp. = sub i32 0, %b_addr.021.0.ph
%tmp.40 = mul i32 %indvar, %tmp.
%a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
%tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
br i1 %tmp3, label %cond_true, label %cond_false
cond_true: ; preds = %bb
%tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
%tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
%indvar.next = add i32 %indvar, 1
br i1 %tmp1437, label %bb17, label %bb
cond_false: ; preds = %bb
%tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
%tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
br i1 %tmp14, label %bb17, label %bb.outer
bb17: ; preds = %cond_false, %cond_true, %entry
%a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
ret i32 %a_addr.026.1
}
Without tail-merging or diamond-tail if conversion:
LBB1_1: @ %bb
@ =>This Inner Loop Header: Depth=1
cmp r0, r1
ble LBB1_3
@ BB#2: @ %cond_true
@ in Loop: Header=BB1_1 Depth=1
subs r0, r0, r1
cmp r1, r0
it ne
cmpne r0, r1
bgt LBB1_4
LBB1_3: @ %cond_false
@ in Loop: Header=BB1_1 Depth=1
subs r1, r1, r0
cmp r1, r0
bne LBB1_1
LBB1_4: @ %bb17
bx lr
With diamond-tail if conversion, but without tail-merging:
@ BB#0: @ %entry
cmp r0, r1
it eq
bxeq lr
LBB1_1: @ %bb
@ =>This Inner Loop Header: Depth=1
cmp r0, r1
ite le
suble r1, r1, r0
subgt r0, r0, r1
cmp r1, r0
bne LBB1_1
@ BB#2: @ %bb17
bx lr
llvm-svn: 278287
|