| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
from the instruction.
We were accidentally connecting it to result 0 instead of result 1. This was caught by the machine verifier that noticed the flags were dead, but we were using them somehow. I'm still not clear what actually happened downstream.
llvm-svn: 336925
|
|
|
|
|
|
|
|
| |
x86_sse_cvttss2si and similar intrinsics.
This should fix a machine verifier error.
llvm-svn: 336924
|
|
|
|
|
|
| |
These instructions are added to AArch64 only.
llvm-svn: 336913
|
|
|
|
|
|
|
|
|
|
| |
canWidenShuffleElements can do a better job if given a mask with ZeroableElements info. Apparently, ZeroableElements was being only used to identify AllZero candidates, but possibly we could plug it into more shuffle matchers.
Original Patch by Zvi Rackover @zvi
Differential Revision: https://reviews.llvm.org/D42044
llvm-svn: 336903
|
|
|
|
|
|
|
|
|
|
| |
Noticed while updating D42044, lowerV2X128VectorShuffle can improve the shuffle mask with the zeroable data to create a target shuffle mask to recognise more 'zero upper 128' patterns.
NOTE: lowerV4X128VectorShuffle could benefit as well but the code needs refactoring first to discriminate between SM_SentinelUndef and SM_SentinelZero for negative shuffle indices.
Differential Revision: https://reviews.llvm.org/D49092
llvm-svn: 336900
|
|
|
|
|
|
|
|
|
|
|
| |
Mark standard encoded instructions and pseudo "standard encoded"
as not being in MIPS16e by default.
Patch by Simon Dardis.
Differential revision: https://reviews.llvm.org/D48379
llvm-svn: 336893
|
|
|
|
|
|
| |
i128 isn't a legal type in our x86 implementation today. So remove this and the few patterns that used it until it becomes necessary.
llvm-svn: 336889
|
|
|
|
|
|
| |
We now use llvm.fma.f32/f64 or llvm.x86.fmadd.f32/f64 intrinsics that use scalar types rather than vector types. So we don't these special ISD nodes that operate on the lowest element of a vector.
llvm-svn: 336883
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
there for a long time.
The boolean tracking whether we saw a kill of the flags was supposed to
be per-block we are scanning and instead was outside that loop and never
cleared. It requires a quite contrived test case to hit this as you have
to have multiple levels of successors and interleave them with kills.
I've included such a test case here.
This is another bug found testing SLH and extracted to its own focused
patch.
llvm-svn: 336876
|
|
|
|
|
|
|
|
| |
with zero.
These showed up in some of the upgraded FMA code. We really need to improve these test cases more, but this helps for now.
llvm-svn: 336875
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
multiple successors where some of the uses end up killing the EFLAGS
register.
There was a bug where rather than skipping to the next basic block
queued up with uses once we saw a kill, we stopped processing the blocks
entirely. =/
Test case produces completely nonsensical code w/o this tiny fix.
This was found testing Speculative Load Hardening and split out of that
work.
Differential Revision: https://reviews.llvm.org/D49211
llvm-svn: 336874
|
|
|
|
|
|
| |
This converts them to what clang is now using for codegen. Unfortunately, there seem to be a few kinks to work out still. I'll try to address with follow up patches.
llvm-svn: 336871
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is marginally helpful for removing redundant extensions, and the
code is easier to read, so it seems like an all-around win. In the new
test i8-phi-ext.ll, we used to emit an AssertSext i8; now we emit an
AssertZext i2, which allows the extension of the return value to be
eliminated.
Differential Revision: https://reviews.llvm.org/D49004
llvm-svn: 336868
|
|
|
|
|
|
|
|
| |
SITargetLowering queries SIInstrInfo in its constructor, so SIInstrInfo
must be initialized first. This fixes msan buildbot failures and was
introduced by r336851.
llvm-svn: 336861
|
|
|
|
|
|
| |
This was added in r336851.
llvm-svn: 336853
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
|
|
|
|
|
|
|
|
| |
We can instead block the load folding isProfitableToFold. Then isel will emit a register->register move for the zeroing part and a separate load. The PostProcessISelDAG should be able to remove the register->register move.
This saves us patterns and fixes the fact that we only had unaligned load patterns. The test changes show places where we should have been using an aligned load.
llvm-svn: 336828
|
|
|
|
|
|
|
| |
size instead of calculating it again when filling
out the metadata.
llvm-svn: 336825
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was
inferred directly from the "default" pattern associated with the instruction
definition.
r336728 removed special node X86Movlps, and all the patterns associated to it.
Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the
'mayLoad/hasSideEffects' flags are left unset.
When the instruction info is emitted by tablegen, method
CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a
pattern, and flags are undefined. So, it conservatively sets the
"hasSideEffects" flag for it.
As a consequence, we were losing the 'mayLoad' flag, and we were gaining a
'hasSideEffect' flag in its place.
This patch fixes the issue (originally reported by Michael Holmen).
The mca tests show the differences in the instruction info flags. Instructions
that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm.
Differential Revision: https://reviews.llvm.org/D49182
llvm-svn: 336818
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mostly brings the P5600 scheduler model to a mostly complete
status. There are a number of instructions which trigger the
`error:'MipsP5600Model' lacks information for` error. These are certain
codegen only instructions relating to MIPS64 which can be addressed by
using the correct predicates for them. That will be done in a full-up
patch.
Patch by Simon Dardis.
Differential revision: https://reviews.llvm.org/D45245
llvm-svn: 336802
|
|
|
|
|
|
|
|
|
|
| |
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.
Differential revision: https://reviews.llvm.org/D49125
llvm-svn: 336795
|
|
|
|
|
|
|
|
|
|
|
| |
The compact instruction shuffles active elements of vector
into lowest numbered elements and sets remaining elements
to zero.
e.g.
compact z0.s, p0, z1.s
llvm-svn: 336789
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The LASTB and LASTA instructions extract the last active element,
or element after the last active, from the source vector.
The added variants are:
Scalar:
last(a|b) w0, p0, z0.b
last(a|b) w0, p0, z0.h
last(a|b) w0, p0, z0.s
last(a|b) x0, p0, z0.d
SIMD & FP Scalar:
last(a|b) b0, p0, z0.b
last(a|b) h0, p0, z0.h
last(a|b) s0, p0, z0.s
last(a|b) d0, p0, z0.d
The CLASTB and CLASTA conditionally extract the last or element after
the last active element from the source vector.
The added variants are:
Scalar:
clast(a|b) w0, p0, w0, z0.b
clast(a|b) w0, p0, w0, z0.h
clast(a|b) w0, p0, w0, z0.s
clast(a|b) x0, p0, x0, z0.d
SIMD & FP Scalar:
clast(a|b) b0, p0, b0, z0.b
clast(a|b) h0, p0, h0, z0.h
clast(a|b) s0, p0, s0, z0.s
clast(a|b) d0, p0, d0, z0.d
Vector:
clast(a|b) z0.b, p0, z0.b, z1.b
clast(a|b) z0.h, p0, z0.h, z1.h
clast(a|b) z0.s, p0, z0.s, z1.s
clast(a|b) z0.d, p0, z0.d, z1.d
Please refer to the architecture specification for more details on
the semantics of the added instructions.
llvm-svn: 336783
|
|
|
|
| |
llvm-svn: 336777
|
|
|
|
|
|
|
| |
This fixes compile error in r336759. llvm::value::dump is not available
in released build.
llvm-svn: 336770
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:
%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
%tobool = icmp eq i32 %v, 0
%.op = add nuw nsw i32 %cnt, 1
%add = select i1 %tobool, i32 0, i32 %.op
and x86 asm code:
bsfl %edi, %ecx
addl $1, %ecx
testl %edi, %edi
movl $0, %eax
cmovnel %ecx, %eax
In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.
We now produce the following code:
bsfl %edi, %ecx
movl $-1, %eax
cmovnel %ecx, %eax
addl $1, %eax
Patch by Ivan Kulagin
Reviewers: davide, craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48765
llvm-svn: 336768
|
|
|
|
|
|
|
|
|
|
|
|
| |
These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load.
In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load.
But we have patterns that do each of those 3 things individually so there's no reason to build large patterns.
Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before.
llvm-svn: 336762
|
|
|
|
|
|
|
|
| |
See https://bugs.llvm.org/show_bug.cgi?id=35385
Differential Revision: https://reviews.llvm.org/D48471
llvm-svn: 336759
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement this as it is done on GCC:
__float128 a, b, c, d;
a = __builtin_fmaf128_round_to_odd (b, c, d); // generates xsmaddqpo
a = __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsmsubqpo
a = - __builtin_fmaf128_round_to_odd (b, c, d); // generates xsnmaddqpo
a = - __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsnmsubpqp
Differential Revision: https://reviews.llvm.org/D48218
llvm-svn: 336754
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original code attempted to do this, but the std::abs() call didn't
actually do anything due to implicit type conversions. Fix the type
conversions, and perform the correct check for negative immediates.
This probably has very little practical impact, but it's worth fixing
just to avoid confusion in the future, I think.
Differential Revision: https://reviews.llvm.org/D48907
llvm-svn: 336742
|
|
|
|
|
|
|
|
| |
Some added 20 and some added 15. Its unclear when to use which value and whether they are required at all.
This patch removes them all. If we start finding real world issues we may need to add them back with proper tests.
llvm-svn: 336735
|
|
|
|
|
|
| |
class -> struct in forward declaration.
llvm-svn: 336733
|
|
|
|
|
|
|
|
|
|
| |
BLEND under optsize when the immediate allows it.
Isel currently emits movss/movsd a lot of the time and an accidental double commute turns it into a blend.
Ideally we'd select blend directly in isel under optspeed and not rely on the double commute to create blend.
llvm-svn: 336731
|
|
|
|
|
|
|
|
|
|
|
|
| |
These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load.
There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load.
We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason.
So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns.
llvm-svn: 336728
|
|
|
|
| |
llvm-svn: 336722
|
|
|
|
|
|
| |
It points to an opcode that doesn't exist.
llvm-svn: 336720
|
|
|
|
|
|
|
|
| |
I believe isProfitableToFold will stop the load folding that this was intended to overcome.
Given an (xor load, -1), isProfitableToFold will see that the immediate can be folded with the xor using a one byte immediate since it can be sign extended. It doesn't know about NOT, but the one byte immediate check is enough to stop the fold.
llvm-svn: 336712
|
|
|
|
|
|
| |
There were only 3 patterns with this node as a root and they all the same AddedComplexity. So this doesn't really do anything.
llvm-svn: 336711
|
|
|
|
|
|
|
|
| |
Move all metadata construction into AMDGPUHSAMetadataStreamer.
Differential Revision: https://reviews.llvm.org/D48176
llvm-svn: 336707
|
|
|
|
|
|
| |
The instruction selection is automatically handled by tablegen
llvm-svn: 336703
|
|
|
|
|
|
|
|
| |
amdgpu-implicitarg-num-bytes attribute
Differential Revision: https://reviews.llvm.org/D49096
llvm-svn: 336697
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for the following instructions:
CLS (Count Leading Sign bits)
CLZ (Count Leading Zeros)
CNT (Count non-zero bits)
CNOT (Logically invert boolean condition in vector)
NOT (Bitwise invert vector)
FABS (Floating-point absolute value)
FNEG (Floating-point negate)
All operations are predicated and unary, e.g.
clz z0.s, p0/m, z1.s
- CLS, CLZ, CNT, CNOT and NOT have variants for 8, 16, 32
and 64 bit elements.
- FABS and FNEG have variants for 16, 32 and 64 bit elements.
llvm-svn: 336677
|
|
|
|
|
|
| |
This reverts commit r336623
llvm-svn: 336675
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An explicit untied use is not sufficient to maintain liveness of a
register redefined in a predicated instruction. For example
%1 = COPY %0
...
%1 = A2_paddif %2, %1, 1
could become
$r1 = COPY $r0
...
$r1 = A2_paddif $p0, $r1, 1
and later
$r1 = COPY $r0 ;; this is not really dead!
...
$r1 = A2_paddif $p0, $r0, 1
llvm-svn: 336662
|
|
|
|
|
|
|
|
|
|
| |
Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data.
This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage.
Differential Revision: https://reviews.llvm.org/D48936
llvm-svn: 336642
|
|
|
|
|
|
|
|
| |
insert_subreg instead of artifically increasing pattern complexity to give priority.
This is a much more direct way to solve the issue than just giving extra priority.
llvm-svn: 336639
|
|
|
|
|
|
|
|
| |
We're missing the EVEX equivalents of these patterns and seem to get along fine.
I think we end up with X86vzload for the obvious IR cases that would produce this DAG.
llvm-svn: 336638
|
|
|
|
|
|
|
|
| |
floating point load bitcasted to integer.
DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load.
llvm-svn: 336626
|
|
|
|
|
|
| |
The only places it was used where places where VT was the same as FloatVT. So switch those uses to VT and drop it.
llvm-svn: 336624
|
|
|
|
|
|
|
| |
This reverts commit r336587, it was causing test failures on the
sanitizer bots.
llvm-svn: 336623
|