| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
| |
Converting InlineCost interface and its internals into CallBase usage.
Inliners themselves are still not converted.
Reviewed By: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60636
llvm-svn: 358982
|
| |
|
|
|
|
|
|
|
|
|
| |
The check for creating CBZ in constant island pass recently obtained the
ability to search backwards to find a Cmp instruction. The code in IfCvt should
mirror this to allow more conversions to the smaller form. The common code has
been pulled out into a separate function to be shared between the two places.
Differential Revision: https://reviews.llvm.org/D60090
llvm-svn: 358977
|
| |
|
|
|
|
|
|
|
|
| |
Ifcvt can replicate instructions as it converts them to be predicated. This
stops that from happening on thumb2 targets at minsize where an extra IT
instruction is likely needed.
Differential Revision: https://reviews.llvm.org/D60089
llvm-svn: 358974
|
| |
|
|
| |
llvm-svn: 358969
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
The intrinsics are described in detail in the latest
ACLE Q1 2019 documentation: https://developer.arm.com/docs/101028/latest
Reviewed by: David Spickett
Differential Revision: https://reviews.llvm.org/D60486
llvm-svn: 358963
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add missing D and Q lane VLDSTLane lowering
for fp16 elements.
Reviewers: efriedma, kosarev, SjoerdMeijer, ostannard
Reviewed By: efriedma
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60874
llvm-svn: 358962
|
| |
|
|
|
|
|
|
|
|
|
| |
This change partially reverts https://reviews.llvm.org/D54647 in favor
of bailing out during computeAddress instead.
This catches the condition earlier and handles more cases.
Differential Revision: https://reviews.llvm.org/D60986
llvm-svn: 358948
|
| |
|
|
|
|
|
|
|
|
| |
This was supposed to be NFC, but the change in SDLoc
definitions causes instruction scheduling changes.
There's nothing x86-specific in this code, and it can
likely be used from DAGCombiner's simplifyVBinOp().
llvm-svn: 358930
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Only apply packed literal `op_sel_hi` skipping on operands requiring
packed literals. Even an instruction is `packed`, it may have operand
requiring non-packed literal, such as `v_dot2_f32_f16`.
Reviewers: rampitec, arsenm, kzhuravl
Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60978
llvm-svn: 358922
|
| |
|
|
|
|
|
|
|
|
| |
These are inserted after branch relaxation, and for some reason it's
decided to put them in the long branch expansion block. It's probably
not great to rely on the source block address, so this should probably
be switched to being PC relative instead of relying on the block
address
llvm-svn: 358909
|
| |
|
|
| |
llvm-svn: 358894
|
| |
|
|
|
|
|
| |
Effectively reverts r356956. The check for isFullCopy was excessive,
but there still needs to be a check that this is a copy.
llvm-svn: 358890
|
| |
|
|
|
|
|
|
|
|
| |
See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D60624
llvm-svn: 358888
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.
The AMDGPU backend needed an extra (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.
The X86 changes are all definite wins.
Differential Revision: https://reviews.llvm.org/D60462
llvm-svn: 358887
|
| |
|
|
|
|
| |
not enabled. Same for 256 bit and AVX.
llvm-svn: 358872
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This does two main things, firstly adding some at least basic addressing modes
for i64 types, and secondly treats floats and doubles sensibly when there is no
fpu. The floating point change can help codesize in some cases, especially with
D60294.
Most backends seems to not consider the exact VT in isLegalAddressingMode,
instead switching on type size. That is now what this does when the target does
not have an fpu (as the float data will be loaded using LDR's). i64's currently
use the address range of an LDRD (even though they may be legalised and loaded
with an LDR). This is at least better than marking them all as illegal
addressing modes.
I have not attempted to do much with vectors yet. That will need changing once
MVE is added.
Differential Revision: https://reviews.llvm.org/D60677
llvm-svn: 358845
|
| |
|
|
|
|
| |
instructions.
llvm-svn: 358844
|
| |
|
|
|
|
| |
fpclass only has a single use.
llvm-svn: 358841
|
| |
|
|
|
|
|
| |
The last attempt fixed gcc and consumer-typeset, but Obsequi seems to fail with
a different issue.
llvm-svn: 358829
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If you pass two 1024 bit vectors in IR with AVX2 on Windows 64. Both vectors will be split in four 256 bit pieces. The four pieces of the first argument will be passed indirectly using 4 gprs. The second argument will get passed via pointers in memory.
The PartOffsets stored for the second argument are all in terms of its original 1024 bit size. So the PartOffsets for each piece are 32 bytes apart. So if we consider it for copy elision we'll only load an 8 byte pointer, but we'll move the address 32 bytes. The stack object size we create for the first part is probably wrong too.
This issue was encountered by ISPC. I'm working on getting a reduce test case, but wanted to go ahead and get feedback on the fix.
Reviewers: rnk
Reviewed By: rnk
Subscribers: dbabokin, llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60801
llvm-svn: 358817
|
| |
|
|
|
|
|
|
|
|
| |
Fix for https://bugs.llvm.org/show_bug.cgi?id=41477. On the x32 ABI
with stack probing a dynamic alloca will result in a WIN_ALLOCA_32
with a 32-bit size. The current implementation tries to copy it into
RAX, resulting in a physreg copy error. Fix this by copying to EAX
instead. Also fix incorrect opcodes or registers used in subs.
llvm-svn: 358807
|
| |
|
|
|
|
|
|
|
|
| |
the original AND can represented by MOVZX.
The MOVZX doesn't require an immediate to be encoded at all. Though it does use
a 2 byte opcode so its the same size as a 1 byte immediate. But it has a
separate source and dest register so can help avoid copies.
llvm-svn: 358805
|
| |
|
|
|
|
|
|
|
| |
(C1 >> C2), C2) if the AND could match a movzx.
There's one slight regression in here because we don't check that the immediate
already allowed movzx before the shift. I'll fix that next.
llvm-svn: 358804
|
| |
|
|
|
|
|
|
|
| |
and stores""
We were shifting the wrong component of a split load when trying to combine them
back into a single value.
llvm-svn: 358800
|
| |
|
|
|
|
|
|
|
|
| |
Exactly the same as G_FCEIL, G_FABS, etc.
Add tests for the fp16/nofp16 behaviour, update arm64-vfloatintrinsics, etc.
Differential Revision: https://reviews.llvm.org/D60895
llvm-svn: 358799
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
selectCall
My understanding is that once BuildMI has been called we can't fallback
to SelectionDAG.
This change moves the fallback for when getRegForValue() fails for
that target of an indirect call. This was failing in -fPIC mode when
the callee is GlobalValue.
Add a test case that tickles this.
Differential Revision: https://reviews.llvm.org/D60908
llvm-svn: 358793
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
VK_SABS is part of the SymLoc bitfield in the variant kind which should
be compared for equality, not by checking the VK_SABS bit.
As far as I know, the existing code happened to produce the correct
results in all cases, so this is just a cleanup.
Patch by Stephen Crane.
Differential Revision: https://reviews.llvm.org/D60596
llvm-svn: 358788
|
| |
|
|
|
|
| |
This introduces some runtime failures which I'll need to investigate further.
llvm-svn: 358771
|
| |
|
|
|
|
|
|
|
|
| |
This instruction is legalized in the same way as G_FSIN, G_FCOS, G_FLOG10, etc.
Update legalize-pow.mir and arm64-vfloatintrinsics.ll to reflect the change.
Differential Revision: https://reviews.llvm.org/D60218
llvm-svn: 358764
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).
The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.
Reviewers: hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60856
llvm-svn: 358744
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Ignore edges to non-SUnits (e.g. ExitSU) when checking
for low latency instructions.
When calling the function isLowLatencyInstruction(),
an ExitSU could be on the list of successors, not necessarily
a regular SU. In other places in the code there is a check
"Succ->NodeNum >= DAGSize" to prevent further processing of
ExitSU as "Succ->getInstr()" is NULL in such a case.
Also, 8 out of 9 cases of "SUnit *Succ = SuccDep.getSUnit())"
has the guard, so it is clearly an omission here.
Change-Id: Ica86f0327c7b2e6bcb56958e804ea6c71084663b
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: MatzeB, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60864
llvm-svn: 358740
|
| |
|
|
|
|
|
|
| |
AND could match a movzx.
Could get further improvements by recognizing (i64 and (anyext (i32 shl))).
llvm-svn: 358737
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
default code after the switch in matchAddressRecursively
Summary:
There are two places where we create a HandleSDNode in address matching in order to handle the case where N is changed by CSE. But if we end up not matching, we fall back to code at the bottom of the switch that really would like N to point to something that wasn't CSEd away. So we should make sure we copy the handle back to N on any paths that can reach that code.
This appears to be the true reason we needed to check DELETED_NODE in the negation matching. In pr32329.ll we had two subtracts back to back. We recursed through the first subtract, and onto the second subtract. The second subtract called matchAddressRecursively on its LHS which caused that subtract to CSE. We ultimately failed the match and ended up in the default code. But N was pointing at the old node that had been deleted, but the default code didn't know that and took it as the base register. Then we unwound back to the first subtract and tried to access this bogus base reg requiring the check for deleted node. With this patch we now use the CSE result as the base reg instead.
matchAdd has been broken since sometime in 2015 when it was pulled out of the switch into a helper function. The assignment to N at the end was still there, but N was passed by value and not by reference so the update didn't go anywhere.
Reviewers: niravd, spatel, RKSimon, bkramer
Reviewed By: niravd
Subscribers: llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60843
llvm-svn: 358735
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds legalization for G_SEXT, G_ZEXT, and G_ANYEXT for v8s8s.
We were falling back on G_ZEXT in arm64-vabs.ll before, preventing us from
selecting the @llvm.aarch64.neon.sabd.v8i8 intrinsic.
This adds legalizer support for those 3, which gives us selection via the
importer. Update the relevant tests (legalize-ext.mir, select-int-ext.mir) and
add a GISel line to arm64-vabs.ll.
Differential Revision: https://reviews.llvm.org/D60881
llvm-svn: 358715
|
| |
|
|
|
|
|
|
| |
Add legalizer support for loads of v8s8 and update legalize-load-store.mir.
Differential Revision: https://reviews.llvm.org/D60877
llvm-svn: 358714
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
combineVectorTruncationWithPACKUS is currently splitting the upper bit bit masking into 128-bit subregs and then concatenating them back together.
This was originally done to avoid regressions that caused existing subregs to be concatenated to the larger type just for the AND masking before being extracted again. This was fixed by @spatel (notably rL303997 and rL347356).
This also lets SimplifyDemandedBits do some further improvements before it hits the recursive depth limit.
My only annoyance with this is that we were broadcasting some xmm masks but we seem to have lost them by moving to ymm - but that's a known issue as the logic in lowerBuildVectorAsBroadcast isn't great.
Differential Revision: https://reviews.llvm.org/D60375#inline-539623
llvm-svn: 358692
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
power-of-2.
This replaces the MOVMSK combine introduced at D52121/rL342326
(movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))
with the more general icmp lowering so it can pick up more cases through bitcasts - notably vXi8 cases which use vXi16 shifts+masks, this patch can remove the mask and use pcmpgtb(0,x) for the sra.
Differential Revision: https://reviews.llvm.org/D60625
llvm-svn: 358651
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177
When the two operands for BUILD_VECTOR are same, we will get assert error.
llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&):
Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) &&
"The loads cannot be both consecutive and reverse consecutive."' failed.
This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We
should use `getScalarType().getStoreSize();` to get the ElemSize instread of
`getScalarSizeInBits() / 8`.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60811
llvm-svn: 358644
|
| |
|
|
|
|
|
|
|
|
|
| |
fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but
creating the new fadd folds to just fneg(A). When A has multiple uses,
this confuses it and you get an assert. Fixed.
Differential Revision: https://reviews.llvm.org/D60633
Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c
llvm-svn: 358640
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The test file has pairs of tests that are logically equivalent:
https://rise4fun.com/Alive/2zQ
%t4 = and i8 %t1, 8
%t5 = zext i8 %t4 to i16
%sh = shl i16 %t5, 2
%t6 = add i16 %sh, %t0
=>
%t4 = and i8 %t1, 8
%sh2 = shl i8 %t4, 2
%z5 = zext i8 %sh2 to i16
%t6 = add i16 %z5, %t0
...so if we can fold the shift op into LEA in the 1st pattern, then we
should be able to do the same in the 2nd pattern (unnecessary 'movzbl'
is a separate bug I think).
We don't want to do this any sooner though because that would conflict
with generic transforms that try to narrow the width of the shift.
Differential Revision: https://reviews.llvm.org/D60789
llvm-svn: 358622
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do
basically the same thing (Aarch64 did not correctly handle immediates,
ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses
%a with an immediate) for a flag that should be target independent
anyways.
Reviewers: echristo, peter.smith
Reviewed By: echristo
Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60841
llvm-svn: 358618
|
| |
|
|
|
|
|
|
|
|
| |
Legalize things like i24 load/store by splitting them into smaller power of 2 operations.
This matches how SelectionDAG handles these operations.
Differential Revision: https://reviews.llvm.org/D59971
llvm-svn: 358613
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
None of these derived classes do anything that the base class cannot.
If we remove these case statements, then the base class can handle them
just fine.
Reviewers: peter.smith, echristo
Reviewed By: echristo
Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60803
llvm-svn: 358603
|
| |
|
|
|
|
|
|
|
|
| |
See bug 41156: https://bugs.llvm.org/show_bug.cgi?id=41156
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D60622
llvm-svn: 358596
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches.
Reviewers: arsen, nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60824
llvm-svn: 358592
|
| |
|
|
|
|
|
|
|
|
| |
See bug 41280: https://bugs.llvm.org/show_bug.cgi?id=41280
Reviewers: artem.tamazov, arsenm
Differential Revision: https://reviews.llvm.org/D60621
llvm-svn: 358581
|
| |
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D60731
Change-Id: I821d93dec8b9cdd247b8172d92fb5e15340a9e7d
llvm-svn: 358579
|
| |
|
|
|
|
|
|
| |
On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest......
Differential Revision: https://reviews.llvm.org/D60403
llvm-svn: 358574
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
GR32<->XMM register copies.
We have two versions of some instructions, VR128 versions and FR32 versions that
are marked as CodeGenOnly.
This change switches to using the VR128 versions for these copies. It's after
register allocation so the class size no longer matters. This matches how GR64
works.
llvm-svn: 358555
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The printOperand function takes a default parameter, for which there are
zero call sites that explicitly pass such a parameter. As such, there
is no case to support. This means that the method
printVecModifiedImmediate is purly dead code, and can be removed.
The eventual goal for some of these AsmPrinter refactoring is to have
printOperand be a virtual method; making it easier to print operands
from the base class for more generic Asm printing. It will help if all
printOperand methods have the same function signature (ie. no Modifier
argument when not needed).
Reviewers: echristo, tra
Reviewed By: echristo
Subscribers: jholewinski, hiraditya, llvm-commits, craig.topper, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60727
llvm-svn: 358527
|