| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 314105
|
| |
|
|
|
|
| |
This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us.
llvm-svn: 314083
|
| |
|
|
|
|
|
|
| |
IFMA instructions when the load is on operand1 of the instrinsic.
We need to enable commuting during isel to catch this since the load folding tables can't handle broadcasts.
llvm-svn: 314082
|
| |
|
|
|
|
| |
commutable for the multiply operands.
llvm-svn: 314080
|
| |
|
|
|
|
| |
elements (PR22415)
llvm-svn: 314077
|
| |
|
|
|
|
|
|
|
|
| |
This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type.
Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases.
Differential Revision: https://reviews.llvm.org/D35320
llvm-svn: 314076
|
| |
|
|
|
|
|
|
| |
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential review.
llvm-svn: 314073
|
| |
|
|
|
|
|
|
| |
instructions when VLX isn't available.
We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'.
llvm-svn: 314072
|
| |
|
|
| |
llvm-svn: 314068
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up from D38181 (r314023). We have to put 64-bit
constants into a register using a separate instruction, so we
should try harder to avoid that.
From what I see, we're not likely to encounter this pattern in the
DAG because the upstream setcc combines from this don't (usually?)
produce this pattern. If we fix that, then this will become more
relevant. Since the cost of handling this case is just loosening
the predicate of the existing fold, we might as well do it now.
llvm-svn: 314064
|
| |
|
|
| |
llvm-svn: 314063
|
| |
|
|
|
|
|
|
| |
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314062
|
| |
|
|
|
|
|
|
| |
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314060
|
| |
|
|
|
|
|
|
| |
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314055
|
| |
|
|
| |
llvm-svn: 314027
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81.
There are also better improvements based on known-bits that allow us to eliminate the mask entirely.
As noted, this could be extended. There are potentially other wins from always shifting first, but doing
that reveals a tangle of problems in other pattern matching. We do this transform generically in
instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this
in the backend.
Differential Revision: https://reviews.llvm.org/D38181
llvm-svn: 314023
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest.
Reviewers: dberris, echristo
Subscribers: sanjoy, nemanjai, hiraditya, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D38102
llvm-svn: 314005
|
| |
|
|
|
|
|
|
|
| |
If the two instructions being compared for equivalence have corresponding operands
that are integer constants, then check their values to determine equivalence.
Patch by Suyog Sarda!
llvm-svn: 313993
|
| |
|
|
| |
llvm-svn: 313986
|
| |
|
|
| |
llvm-svn: 313985
|
| |
|
|
| |
llvm-svn: 313984
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
constant arguments
Combine CMOV[i16]<-[SIGN,ZERO,ANY]_EXTEND to [i32,i64] into CMOV[i32,i64].
One example of where it is useful is:
before (20 bytes)
<foo>:
test $0x1,%dil
mov $0x307e,%ax
mov $0xffff,%cx
cmovne %ax,%cx
movzwl %cx,%eax
retq
after (18 bytes)
<foo>:
test $0x1,%dil
mov $0x307e,%ecx
mov $0xffff,%eax
cmovne %ecx,%eax
retq
Reviewers: craig.topper, aaboud, spatel, RKSimon, zvi
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36711
llvm-svn: 313982
|
| |
|
|
|
|
|
|
|
|
| |
This patch re-commits the patch that was pulled out due to a
problem it caused, but with a fix for the problem. The fix
was reviewed separately by Eric Christopher and Hal Finkel.
Differential Revision: https://reviews.llvm.org/D38054
llvm-svn: 313978
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the following function:
double fn1(double d0, double d1, double d2) {
double a = -d0 - d1 * d2;
return a;
}
on ARM, LLVM generates code along the lines of
vneg.f64 d0, d0
vmls.f64 d0, d1, d2
i.e., a negate and a multiply-subtract.
The attached patch adds instruction selection patterns to allow it to generate the single instruction
vnmla.f64 d0, d1, d2
(multiply-add with negation) instead, like GCC does.
Committed on behalf of @gergo- (Gergö Barany)
Differential Revision: https://reviews.llvm.org/D35911
llvm-svn: 313972
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38163
llvm-svn: 313964
|
| |
|
|
|
|
|
|
|
| |
The previous SwiftCC support for AAPCS64 was partially correct. It
setup swiftself parameters in the proper register but failed to setup
swifterror in the correct register. This would break compilation of
swift code for non-Darwin AAPCS64 conforming environments.
llvm-svn: 313956
|
| |
|
|
| |
llvm-svn: 313936
|
| |
|
|
|
|
|
|
|
|
| |
HexagonVectorLoopCarriedReuse pass"
This reverts commit r313926.
It was failing in some bots.
llvm-svn: 313931
|
| |
|
|
|
|
| |
added HexagonVectorLoopCarriedReuse pass
llvm-svn: 313926
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Avoid using XZR/WZR directly as operands to split stores of zero
vectors. Doing so can lead to the XZR/WZR being used by an instruction
that doesn't allow it (e.g. add).
Fixes bug 34674.
Reviewers: t.p.northover, efriedma, MatzeB
Subscribers: aemerson, rengolin, javed.absar, mcrosier, eraman, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38146
llvm-svn: 313916
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
SelectionDAGISel::LowerArguments is associating arguments
with frame indices (FuncInfo->setArgumentFrameIndex). That
information is later on used by EmitFuncArgumentDbgValue to
create DBG_VALUE instructions that denotes that a variable
can be found on the stack.
I discovered that for our (big endian) out-of-tree target
the association created by SelectionDAGISel::LowerArguments
sometimes is wrong. I've seen this happen when a 64-bit value
is passed on the stack. The argument will occupy two stack
slots (frame index X, and frame index X+1). The fault is
that a call to setArgumentFrameIndex is associating the
64-bit argument with frame index X+1. The effect is that the
debug information (DBG_VALUE) will point at the least significant
part of the arguement on the stack. When printing the
argument in a debugger I will get the wrong value.
I managed to create a test case for PowerPC that seems to
show the same kind of problem.
The bugfix will look at the datalayout, taking endianness into
account when examining a BUILD_PAIR node, assuming that the
least significant part is in the first operand of the BUILD_PAIR.
For big endian targets we should use the frame index from
the second operand, as the most significant part will be stored
at the lower address (using the highest frame index).
Reviewers: bogner, rnk, hfinkel, sdardis, aprantl
Reviewed By: aprantl
Subscribers: nemanjai, aprantl, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D37740
llvm-svn: 313901
|
| |
|
|
|
|
|
|
| |
instructions/intrinsics/builtins.
Differential Revision: https://reviews.llvm.org/D38148
llvm-svn: 313898
|
| |
|
|
| |
llvm-svn: 313893
|
| |
|
|
| |
llvm-svn: 313890
|
| |
|
|
|
|
|
|
|
|
| |
This patch updates register allocation to enable spilling gprs to
volatile vector registers rather than the stack. It can be enabled
for Power9 with option -ppc-enable-gpr-to-vsr-spills.
Differential Revision: https://reviews.llvm.org/D34815
llvm-svn: 313886
|
| |
|
|
| |
llvm-svn: 313883
|
| |
|
|
|
|
|
|
|
|
| |
More conversions to load-and-test can be made with this patch by adding a
forward search in optimizeCompareZero().
Review: Ulrich Weigand
https://reviews.llvm.org/D38076
llvm-svn: 313877
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: jbhateja
Reviewed By: jbhateja
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38127
llvm-svn: 313869
|
| |
|
|
|
|
|
| |
This inverts the behavior of the AlwaysInline pass to mark
every function not already marked alwaysinline as noinline.
llvm-svn: 313865
|
| |
|
|
|
|
|
|
| |
We can have a v_mac with an immediate src0.
We can still fold if it's an inline immediate,
otherwise it already uses the constant bus.
llvm-svn: 313852
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38090
llvm-svn: 313820
|
| |
|
|
| |
llvm-svn: 313814
|
| |
|
|
|
|
|
|
|
|
|
| |
The Swift CC is identical to Win64 CC with the exception of swift error
being passed in r12 which is a CSR. However, since this calling
convention is only used in swift -> swift code, it does not impact
interoperability and can be treated entirely as Win64 CC. We would
previously incorrectly lower the frame setup as we did not treat the
frame as conforming to Win64 specifications.
llvm-svn: 313813
|
| |
|
|
|
|
|
|
| |
Also add some tests that should be able to use v_mad_mixhi_f16,
but do not yet. This is trickier because we don't really model
the partial update of the register done by 16-bit instructions.
llvm-svn: 313806
|
| |
|
|
| |
llvm-svn: 313797
|
| |
|
|
|
|
|
|
|
|
| |
Add support for passing SwiftError through a register on the Windows x64
calling convention. This allows the use of swifterror attributes on
parameters which is used by the swift front end for the `Error`
parameter. This partially enables building the swift standard library
for Windows x86_64.
llvm-svn: 313791
|
| |
|
|
| |
llvm-svn: 313755
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
BBs.
This version of the patch fixes an off-by-one error causing PR34596. We
do not need to use std::next(BlockIter) when calling updateDepths, as
BlockIter already points to the next element.
Original commit message:
> For large basic blocks with lots of combinable instructions, the
> MachineTraceMetrics computations in MachineCombiner can dominate the compile
> time, as computing the trace information is quadratic in the number of
> instructions in a BB and it's relevant successors/predecessors.
> In most cases, knowing the instruction depth should be enough to make
> combination decisions. As we already iterate over all instructions in a basic
> block, the instruction depth can be computed incrementally. This reduces the
> cost of machine-combine drastically in cases where lots of instructions
> are combined. The major drawback is that AFAIK, computing the critical path
> length cannot be done incrementally. Therefore we only compute
> instruction depths incrementally, for basic blocks with more
> instructions than inc_threshold. The -machine-combiner-inc-threshold
> option can be used to set the threshold and allows for easier
> experimenting and checking if using incremental updates for all basic
> blocks has any impact on the performance.
>
> Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn
>
> Reviewed By: fhahn
>
> Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D36619
llvm-svn: 313751
|
| |
|
|
|
|
|
| |
These tests should have been included in r310697 / D34099 but apparently
I missed them.
llvm-svn: 313737
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Also starts selecting global loads for constant address
in some cases. Some end up selecting to mubuf still, which
requires investigation.
We still get sub-optimal regalloc and extra waitcnts inserted
due to not really tracking the liveness of the separate register
halves.
llvm-svn: 313716
|