| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
| |
llvm-svn: 248647
|
| |
|
|
|
|
|
|
| |
Trying to use the version with the explicit output operand
would complain because of the missing WriteSALU. I'm not sure
why it doesn't complain about this with the implicit VCC def.
llvm-svn: 248646
|
| |
|
|
| |
llvm-svn: 248644
|
| |
|
|
|
|
| |
Since r248294, we emit clrex, but it doesn't exist on v6.
llvm-svn: 248640
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`. This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.
Depends on D12948
Depends on D12949
The original checkin, r248608 had to be backed out due to an issue with
a ObjCXX unit test. That issue is now fixed, so re-landing.
Reviewers: atrick, reames, majnemer, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12950
llvm-svn: 248638
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change teaches SCEV's `isImpliedCond` two new identities:
A u< B u< -C => (A + C) u< (B + C)
A s< B s< INT_MIN - C => (A + C) s< (B + C)
While these are useful on their own, they're really intended to support
D12950.
The original checkin, r248606 had to be backed out due to an issue with
a ObjCXX unit test. That issue is now fixed, so re-landing.
Reviewers: atrick, reames, majnemer, nlewycky, hfinkel
Subscribers: aadg, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12948
llvm-svn: 248637
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
I realized that the live-out set computed for the return block is
missing the callee saved registers (the non-pristine ones to be exact).
This only affects the liveness computed for instructions inside the
function epilogue which currently none of the LivePhysRegs users in llvm
cares about, so this is just a drive-by fix without a testcase.
Differential Revision: http://reviews.llvm.org/D13180
llvm-svn: 248636
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723
My first attempt at this was to change what I thought was the root problem:
xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32
...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!
My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I
abandoned that idea.
I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):
bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb %sil, %dil
xorb $1, %dil
movb %dil, %al
retq
Differential Revision: http://reviews.llvm.org/D12705
llvm-svn: 248634
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change:
Pros:
1. It uses only a half space of the current one.
2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer.
Cons:
1. Constructing a probability using arbitrary numerator and denominator needs additional calculations.
2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio.
One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern.
Differential revision: http://reviews.llvm.org/D12603
llvm-svn: 248633
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Print simple operands inline instead of their pointer/value number.
Simple operands are SDNodes without predecessors like Constant(FP), Register,
UNDEF. This unifies the behaviour with dumpr() which was already doing this.
Previously:
t0: ch = EntryToken
t1: i64 = Register %vreg0
t2: i64,ch = CopyFromReg t0, t1
t3: i64 = Constant<1>
t4: i64 = add t2, t3
t5: i64 = Constant<2>
t6: i64 = add t2, t5
t10: i64 = undef
t11: i8,ch = load t0, t2, t10<LD1[%tmp81]>
t12: i8,ch = load t0, t4, t10<LD1[%tmp10]>
t13: i8,ch = load t0, t6, t10<LD1[%tmp12]>
Now:
t0: ch = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
t4: i64 = add t2, Constant:i64<1>
t6: i64 = add t2, Constant:i64<2>
t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64
t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64
t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64
Differential Revision: http://reviews.llvm.org/D12567
llvm-svn: 248628
|
| |
|
|
|
|
|
|
|
| |
It's easier to understand creating a full instruction
than the current situation where sometimes a new
instruction is created and sometimes it is awkwardly
mutated in place.
llvm-svn: 248627
|
| |
|
|
|
|
| |
This is the simpler check. NFC.
llvm-svn: 248625
|
| |
|
|
|
|
|
| |
This makes it more convenient to print lane masks and lead to more
uniform printing.
llvm-svn: 248624
|
| |
|
|
|
|
| |
apropriate; NFC
llvm-svn: 248623
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
access TLI hook (PR21711)
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).
The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.
This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.
The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.
Differential Revision: http://reviews.llvm.org/D12635
llvm-svn: 248622
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The algorithm would not modify the live-in list of blocks below the save
block point which is correct unless it happens to be a restore point at
the same time.
Also fixes the benign issue of live-in registers being added twice in
some cases.
The testcase is based on a test submitted by Kit Barton.
Differential Revision: http://reviews.llvm.org/D13176
llvm-svn: 248620
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, grosbach, rafael
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D12424
llvm-svn: 248619
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
omitted
Summary:
The default behavior is to omit the .section directive for .text, .data,
and sometimes .bss, but some targets may want to omit this directive for
other sections too.
The AMDGPU backend will uses this to emit a simplified syntax for section
switches. For example if the section directive is not omitted (current
behavior), section switches to .hsatext will be printed like this:
.section .hsatext,#alloc,#execinstr,#write
This is actually wrong, because .hsatext has some custom STT_* flags,
which MC doesn't know how to print or parse.
If the section directive is omitted (made possible by this commit),
section switches will be printed like this:
.hsatext
The motivation for this patch is to make it possible to emit sections
with custom STT_* flags without having to teach MC about all the target
specific STT_* flags.
Reviewers: rafael, grosbach
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12423
llvm-svn: 248618
|
| |
|
|
| |
llvm-svn: 248617
|
| |
|
|
|
|
| |
r248606: "[SCEV] Exploit A < B => (A+K) < (B+K) when possible"
r248608: "[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts."
llvm-svn: 248614
|
| |
|
|
| |
llvm-svn: 248613
|
| |
|
|
|
|
|
|
|
|
|
|
| |
If a virtual register is copied and another copy was already
seen, replace with the previous copy. This only handles the
simplest cases for now.
This pattern shows up from various operand restrictions
AMDGPU has which require inserting copies depending
on the register class of the operands.
llvm-svn: 248611
|
| |
|
|
| |
llvm-svn: 248610
|
| |
|
|
| |
llvm-svn: 248609
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`. This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.
Depends on D12948
Depends on D12949
Reviewers: atrick, reames, majnemer, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12950
llvm-svn: 248608
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This new helper routine will be used in a subsequent change.
Reviewers: hfinkel
Subscribers: hfinkel, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12949
llvm-svn: 248607
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change teaches SCEV's `isImpliedCond` two new identities:
A u< B u< -C => (A + C) u< (B + C)
A s< B s< INT_MIN - C => (A + C) s< (B + C)
While these are useful on their own, they're really intended to support
D12950.
Reviewers: atrick, reames, majnemer, nlewycky, hfinkel
Subscribers: aadg, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12948
llvm-svn: 248606
|
| |
|
|
|
|
| |
This matches how it is defined in the generated implementation.
llvm-svn: 248598
|
| |
|
|
| |
llvm-svn: 248593
|
| |
|
|
|
|
|
|
|
|
|
| |
Don't run passes related to stack maps, garbage collection,
exceptions since these aren't useful for GPUs.
There might be a few more to turn off that I'm less sure about
(e.g. ShrinkWrapping) or I'm not sure how to disable
(SafeStack and StackProtector)
llvm-svn: 248591
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a select error when the i64 source was also
bitcasted to v2i32 in the original source.
Instead of awkwardly trying to select the modified source value and
the store, replace before isel begins.
Uses a worklist to avoid possible problems from mutating the DAG,
although it seems to work OK without it.
llvm-svn: 248589
|
| |
|
|
|
|
|
| |
SIFixSGPRCopies does not modify the CFG, but this was
being recomputed before running SIFoldOperands.
llvm-svn: 248587
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When buffer resource descriptors were built, the upper two components
of the descriptor were first composed into a 64-bit register because
legalizeOperands assumed all operands had the same register class.
Fix that problem, but keep the workaround. I'm not sure anything
actually is actually emitting such a REG_SEQUENCE now.
If multiple resource descriptors are set up with different base
pointers, this is copied with a single s_mov_b64. We probably
should fix this better by recognizing a pair of s_mov_b32 later,
but for now delete the dead code.
llvm-svn: 248585
|
| |
|
|
|
|
| |
This avoids needting to re-legalize the new REG_SEQUENCE.
llvm-svn: 248584
|
| |
|
|
|
|
|
|
|
| |
This was only set on the final _si/_vi version, but not
on the pseudos most of codegen sees.
No test since these instructions aren't used yet.
llvm-svn: 248583
|
| |
|
|
|
|
|
|
|
|
|
| |
These were all using the default 32-bit VALU write class,
but the i64/f64 compares are half rate.
I'm not sure this is really correct, because they are still using
the write to VALU write class, even though they really write
to the SALU.
llvm-svn: 248582
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Arguments to function calls marked "nocapture" can be marked as
non-escaping. However, nocapture is defined in terms of the lifetime
of the callee, and if the callee can directly or indirectly recurse to
the caller, the semantics of nocapture are invalid.
Therefore, we eagerly discover which SCC each function belongs to,
and later can check if callee and caller of a callsite belong to
the same SCC, in which case there could be recursion.
This means that we can't be so optimistic in
getModRefInfo(ImmutableCallsite) - previously we assumed all call
arguments never aliased with an escaping global. Now we need to check,
because a global could now be passed as an argument but still not
escape.
This also solves a related conformance problem: MemCpyOptimizer can
turn non-escaping stores of globals into calls to intrinsics like
llvm.memcpy/llvm/memset. This confuses GlobalsAA, which knows the
global can't escape and so returns NoModRef when queried, when
obviously a memcpy/memset call does indeed reference and modify its
arguments.
This fixes PR24800, PR24801, and PR24802.
llvm-svn: 248576
|
| |
|
|
|
|
|
| |
The value was only used in an assertion. Sink the variable usage into the
assertion.
llvm-svn: 248562
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We now emit the compiler generated divide by zero check that was needed for the
MSVC routines. We construct a psuedo-instruction for the DBZ check as the
operation requires splitting up the BB. For the 64-bit operations, we need to
custom expand the node as we need to insert the DBZ check and then emit the
libcall to the appropriate name. Because this is target specific, it seemed
better to reproduce the expansion operation from the target-agnostic type
legalization rather than sink this there to avoid the duplication. The division
library calls now match MSVC semantically.
llvm-svn: 248561
|
| |
|
|
| |
llvm-svn: 248553
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This also adds the first set of tests for operand bundles.
The optimizer has not been audited to ensure that it does the right
thing with operand bundles.
Depends on D12456.
Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner
Subscribers: maksfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D12457
llvm-svn: 248551
|
| |
|
|
| |
llvm-svn: 248549
|
| |
|
|
|
|
| |
In this context, MI is an add/sub instruction not a loads/store.
llvm-svn: 248540
|
| |
|
|
|
|
|
|
| |
element
Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H
llvm-svn: 248536
|
| |
|
|
| |
llvm-svn: 248533
|
| |
|
|
|
|
|
|
|
|
| |
These are necessary for implementing mem_fence for
OpenCL 2.0.
The VI assembler tests are disabled since it seems to be
using the wrong encoding or opcode.
llvm-svn: 248532
|
| |
|
|
|
|
|
|
| |
The pre- and post-increment version update the base register, but the post-
version was defined incorrectly. There is no test case as we don't currently
generate these instructions, but I plan on changing that in the near future.
llvm-svn: 248528
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change teaches `CallInst`s and `InvokeInst`s to maintain a set of
operand bundles as part of its operands. `CallInst`s and `InvokeInst`s
with operand bundles co-allocate some space before their `Use` array to
hold meta information about which of its operands are part of an operand
bundle.
The strings corresponding to the bundle tags are interned into
`LLVMContextImpl::BundleTagCache`
This change does not include any parsing / bitcode support. That's the
next change.
Depends on D12455.
Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner
Subscribers: MatzeB, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12456
llvm-svn: 248527
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a
hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with
a FIXME: attached.
This patch changes the handling of +t2dsp to be in line with other
architecture extensions.
Following a revert of r248152 and new review comments, this patch also includes
renaming FeatureDSPThumb2 -> FeatureDSP, hasThumb2DSP() -> hasDSP(), etc.
The spelling of "t2dsp" is preserved, pending a further investigation of its
possible external usage.
Differential Revision: http://reviews.llvm.org/D12937
llvm-svn: 248519
|
| |
|
|
|
|
|
|
| |
If the shifter operand is a constant, and all of the bits shifted out
are known to be zero, then if X is known non-zero at least one
non-zero bit must remain.
llvm-svn: 248508
|