| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
| |
operands are non-negative
Since now SCEV can handle 'urem', an 'urem' is a better canonical form than an 'srem' because it has well-defined behavior
This is a follow up of D34598
Differential Revision: https://reviews.llvm.org/D38072
llvm-svn: 314125
|
| |
|
|
|
|
| |
A ternary is clearer here. No functionality change.
llvm-svn: 314123
|
| |
|
|
|
|
| |
Noticed by inspection
llvm-svn: 314121
|
| |
|
|
| |
llvm-svn: 314118
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The transform to convert an extract-of-a-select-of-vectors was added at:
rL194013
And a question about the validity of this transform was raised in the review:
https://reviews.llvm.org/D1539:
...but not answered AFAICT>
Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with
the original commit, but they are not regressing even after we remove the transform in this patch.
The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do
those transforms as canonicalizations.
The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301:
https://bugs.llvm.org/show_bug.cgi?id=33301
...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see.
Differential Revision: https://reviews.llvm.org/D38006
llvm-svn: 314117
|
| |
|
|
| |
llvm-svn: 314115
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This code iterates the 'Orders' vector in parallel with the DbgValue
list, emitting all DBG_VALUEs that occurred between the last IR order
insertion point and the next insertion point. This assumes the
SDDbgValue list is sorted in IR order, which it usually is. However, it
is not sorted when a node with a debug value is replaced with another
one. When this happens, TransferDbgValues is called, and the new value
is added to the end of the list.
The problem can be solved by stably sorting the list by IR order.
Reviewers: aprantl, Ka-Ka
Reviewed By: aprantl
Subscribers: MatzeB, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D38197
llvm-svn: 314114
|
| |
|
|
| |
llvm-svn: 314113
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86InterleavedAccess (VF8 stride 4):
This patch expands the support of lowerInterleavedStore to 8x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2.
In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future.
The patch goal is to optimize the following sequence:
At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding
each 8 chars:
c0, c1, , c7
m0, m1, , m7
y0, y1, , y7
k0, k1, ., k7
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Reviewers
DavidKreitzer
Farhana
zvi
igorb
guyblank
RKSimon
Ayal
Differential Revision: https://reviews.llvm.org/D36058
Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32
llvm-svn: 314109
|
| |
|
|
|
|
|
|
| |
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential review.
llvm-svn: 314106
|
| |
|
|
| |
llvm-svn: 314105
|
| |
|
|
|
| |
Change-Id: I1ddc619169fae6a56308deef8dae5db3da702cf4
llvm-svn: 314103
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions.
Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions.
Patch fixes PR26956.
Reviewers: spatel, mkuper, hfinkel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27846
llvm-svn: 314101
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TargetTransformInfo::enableMemCmpExpansion.
Summary:
Right now there are two functions with the same name, one does the work
and the other one returns true if expansion is needed. Rename
TargetTransformInfo::expandMemCmp to make it more consistent with other
members of TargetTransformInfo.
Remove the unused Instruction* parameter.
Differential Revision: https://reviews.llvm.org/D38165
llvm-svn: 314096
|
| |
|
|
|
|
| |
This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us.
llvm-svn: 314083
|
| |
|
|
|
|
| |
commutable for the multiply operands.
llvm-svn: 314080
|
| |
|
|
| |
llvm-svn: 314078
|
| |
|
|
|
|
|
|
|
|
| |
This patch acts as a reverse to combineBitcastvxi1 - bitcasting a scalar integer to a boolean vector and extending it 'in place' to the requested legal type.
Currently this doesn't handle AVX512 at all - but the current mask register approach is lacking for some cases.
Differential Revision: https://reviews.llvm.org/D35320
llvm-svn: 314076
|
| |
|
|
|
|
|
|
| |
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential review.
llvm-svn: 314073
|
| |
|
|
|
|
|
|
| |
instructions when VLX isn't available.
We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'.
llvm-svn: 314072
|
| |
|
|
|
|
|
|
| |
we shrink 256/512 bit zeroing xors to 128-bit.
Not sure if anything really cares, but this seems like the right thing to do.
llvm-svn: 314071
|
| |
|
|
|
|
|
|
|
| |
This fixes the avr-rust issue (#75) with floating-point comparisons generating broken code.
By default, LLVM assumes these comparisons return 32-bit values, but ours are 8-bit.
Patch By Thomas Backman.
llvm-svn: 314070
|
| |
|
|
|
|
|
|
| |
The code wasn't yelling at the user when there's a reference
from a DIGlobalVariableExpression. Thanks to Adrian for the
reduced testcase. Fixes PR34672.
llvm-svn: 314069
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up from D38181 (r314023). We have to put 64-bit
constants into a register using a separate instruction, so we
should try harder to avoid that.
From what I see, we're not likely to encounter this pattern in the
DAG because the upstream setcc combines from this don't (usually?)
produce this pattern. If we fix that, then this will become more
relevant. Since the cost of handling this case is just loosening
the predicate of the existing fold, we might as well do it now.
llvm-svn: 314064
|
| |
|
|
|
|
|
|
| |
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314062
|
| |
|
|
|
|
|
|
| |
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314060
|
| |
|
|
|
|
|
|
| |
helper functions over to X86ISelDAGToDAG.cpp
Redefine them to call getI8Imm and return that directly.
llvm-svn: 314059
|
| |
|
|
|
|
| |
The only insert_subvector/extract_subvector nodes that make it to isel are guaranteed to match.
llvm-svn: 314058
|
| |
|
|
|
|
|
|
| |
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential revision.
llvm-svn: 314055
|
| |
|
|
|
|
|
| |
This class isn't similar to anything from the STL, so it shouldn't use
the STL naming conventions.
llvm-svn: 314050
|
| |
|
|
| |
llvm-svn: 314049
|
| |
|
|
|
|
| |
What You Use warnings; other minor fixes (NFC).
llvm-svn: 314046
|
| |
|
|
|
|
|
| |
Fixed suboptimal encoding of instruction memory operand when assembler is used to select 32 bit fixup rather than 8 bit immediate for encoding memory offset value.
Differential Revision: https://reviews.llvm.org/D38117
llvm-svn: 314044
|
| |
|
|
| |
llvm-svn: 314043
|
| |
|
|
|
|
|
|
| |
equality comparisons when the min/max ranges intersect in a single value.
This is the inverse of what we do for SGT/SLT/UGT/ULT.
llvm-svn: 314032
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch just adds the missing information to the P9 scheduling model to allow
the model to be marked as complete.
The model has been verified against P9 documentation. The model was verified
with utils/schedcover.py.
Differential Revision: https://reviews.llvm.org/D35695
llvm-svn: 314026
|
| |
|
|
|
|
| |
in foldICmpUsingKnownBits.
llvm-svn: 314025
|
| |
|
|
|
|
|
| |
x86 re-education camp is in session. The LLVM LangRef agrees with x86 too.
The DAG nodes are undocumented and ambiguous as always. :)
llvm-svn: 314024
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The (non-)obvious win comes from saving 3 bytes by using the 0x83 'and' opcode variant instead of 0x81.
There are also better improvements based on known-bits that allow us to eliminate the mask entirely.
As noted, this could be extended. There are potentially other wins from always shifting first, but doing
that reveals a tangle of problems in other pattern matching. We do this transform generically in
instcombine, but we often have icmp IR that doesn't match that pattern, so we must account for this
in the backend.
Differential Revision: https://reviews.llvm.org/D38181
llvm-svn: 314023
|
| |
|
|
|
|
|
|
| |
instead of calling it outside and passing its result through a flag. NFCI
The result of the isSignBitCheck isn't used anywhere else and this allows us to share the m_APInt call in the likely case that it isn't a sign bit check.
llvm-svn: 314018
|
| |
|
|
|
|
| |
foldICmpUsingKnownBits by just checking Op1Min==Op1Max rather than going through m_APInt.
llvm-svn: 314017
|
| |
|
|
|
|
| |
they use similar code. NFC
llvm-svn: 314016
|
| |
|
|
|
|
| |
Part of a patch by Jake Ehrlich!
llvm-svn: 314012
|
| |
|
|
|
|
|
| |
Before we were aligning the member after the symbol table to 4 but
other members to 8.
llvm-svn: 314010
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Conditional returns were not taken into consideration at all. Implement them by turning them into jumps and normal returns. This means there is a slightly higher performance penalty for conditional returns, but this is the best we can do, and it still disturbs little of the rest.
Reviewers: dberris, echristo
Subscribers: sanjoy, nemanjai, hiraditya, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D38102
llvm-svn: 314005
|
| |
|
|
|
|
|
| |
This is mostly for getting stricter testing in preparation for future
changes.
llvm-svn: 314000
|
| |
|
|
|
|
| |
This returns "falkor" for Falkor CPU.
llvm-svn: 313998
|
| |
|
|
|
|
|
|
|
| |
If the two instructions being compared for equivalence have corresponding operands
that are integer constants, then check their values to determine equivalence.
Patch by Suyog Sarda!
llvm-svn: 313993
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
m*trunc(x)+n*trunc(y)
Summary:
A SCEV such as:
{%v2,+,((-1 * (trunc i64 (-1 * %v1) to i32)) + (-1 * (trunc i64 %v1 to i32)))}<%loop>
can be folded into, simply, {%v2,+,0}. However, the current code in ::getAddExpr()
will not try to apply the simplification m*trunc(x)+n*trunc(y) -> trunc(trunc(m)*x+trunc(n)*y)
because it only keys off having a non-multiplied trunc as the first term in the simplification.
This patch generalizes this code to try to do a more generic fold of these trunc
expressions.
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37888
llvm-svn: 313988
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
constant arguments
Combine CMOV[i16]<-[SIGN,ZERO,ANY]_EXTEND to [i32,i64] into CMOV[i32,i64].
One example of where it is useful is:
before (20 bytes)
<foo>:
test $0x1,%dil
mov $0x307e,%ax
mov $0xffff,%cx
cmovne %ax,%cx
movzwl %cx,%eax
retq
after (18 bytes)
<foo>:
test $0x1,%dil
mov $0x307e,%ecx
mov $0xffff,%eax
cmovne %ecx,%eax
retq
Reviewers: craig.topper, aaboud, spatel, RKSimon, zvi
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D36711
llvm-svn: 313982
|