| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
For VLX target getSetccResultType returns vXi1 which prevents the target independent DAG combine from doing this tranform itself.
llvm-svn: 323555
|
| |
|
|
|
|
|
|
| |
Similar to the existing support for X86ISD::VTRUNCUS.
Differential Revision: https://reviews.llvm.org/D42544
llvm-svn: 323553
|
| |
|
|
|
|
|
|
|
|
|
|
| |
blank lines are printed during isel process to make things more sensibly grouped.
Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table.
It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search.
There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line.
llvm-svn: 323551
|
| |
|
|
|
|
| |
gcc recently fixed this bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83546
llvm-svn: 323550
|
| |
|
|
| |
llvm-svn: 323548
|
| |
|
|
| |
llvm-svn: 323545
|
| |
|
|
|
|
|
|
| |
v4i32/v4f32
Extension to D42431, adding support for v4i32/v4f32 as well as v2i64/v2f64 now that D42308 has landed
llvm-svn: 323542
|
| |
|
|
|
|
|
|
|
|
| |
We currently coalesce v4i32 extracts from all 4 elements to 2 v2i64 extracts + shifts/sign-extends.
This seems to have been added back in the days when we tended to spill vectors and reload scalars, or ended up with repeated shuffles moving everything down to 0'th index. I don't think either of these are likely these days as we have better EXTRACT_VECTOR_ELT and VECTOR_SHUFFLE handling, and the existing code tends to make it very difficult for various vector and load combines.
Differential Revision: https://reviews.llvm.org/D42308
llvm-svn: 323541
|
| |
|
|
|
|
| |
As mentioned in D42258, we don't need this any more
llvm-svn: 323540
|
| |
|
|
|
|
|
|
|
| |
See bug 36000: https://bugs.llvm.org/show_bug.cgi?id=36000
Differential Revision: https://reviews.llvm.org/D42483
Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323538
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This fixes an assertion when building the FreeBSD MIPS64 kernel.
Reviewers: atanasyan, sdardis, emaste
Reviewed By: sdardis
Subscribers: krytarowski, llvm-commits
Differential Revision: https://reviews.llvm.org/D42571
llvm-svn: 323536
|
| |
|
|
|
|
|
|
|
| |
See bug 35998: https://bugs.llvm.org/show_bug.cgi?id=35998
Differential Revision: https://reviews.llvm.org/D42469
Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323534
|
| |
|
|
|
|
|
|
|
| |
See bug 35988: https://bugs.llvm.org/show_bug.cgi?id=35988
Differential Revision: https://reviews.llvm.org/D42186
Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323527
|
| |
|
|
| |
llvm-svn: 323526
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic
Reviewed by: b-sumner
Differential Revision: https://reviews.llvm.org/D42383
llvm-svn: 323516
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
load instruction
The function `Thumb1InstrInfo::loadRegFromStackSlot` accepts only the `tGPR`
register class. The function serves to emit a `tLDRspi` instruction and
certainly any subset of the `tGPR` register class is a valid destination of the
load.
Differential revision: https://reviews.llvm.org/D42535
llvm-svn: 323514
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the groundwork for Armv8.2-A FP16 code generation .
Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering in the ARM
backend soon, but for now this means that this:
_Float16 sub(_Float16 a, _Float16 b) {
return a + b;
}
gets lowered to this:
define float @sub(float %a.coerce, float %b.coerce) {
entry:
%0 = bitcast float %a.coerce to i32
%tmp.0.extract.trunc = trunc i32 %0 to i16
%1 = bitcast i16 %tmp.0.extract.trunc to half
<SNIP>
%add = fadd half %1, %3
<SNIP>
}
When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
legalization for "free", i.e. nothing changes and everything works as before.
And also f16 argument passing/returning is handled.
When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
we need to patch up: f16 argument passing and returning, which involves minor
tweaks to avoid unnecessary code generation for some bitcasts.
As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing
that we can codegen this instruction from IR, but more importantly, also to
some conversion instructions. These conversions were causing issue before in
the FP16 and FullFP16 cases.
I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed
that these loads and stores had the wrong addressing mode specified: AddrMode5
instead of AddrMode5FP16, which turned out not be implemented at all, so that
has also been added.
This is the minimal patch that shows all the different moving parts. In patch
2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
remaining Armv8.2-A FP16 instruction descriptions.
Thanks to Sam Parker and Oliver Stannard for their help and reviews!
Differential Revision: https://reviews.llvm.org/D38315
llvm-svn: 323512
|
| |
|
|
|
|
| |
"in in" -> "in", "on on" -> "on" etc.
llvm-svn: 323508
|
| |
|
|
| |
llvm-svn: 323507
|
| |
|
|
|
|
|
|
| |
element type in 32-bit mode.
Type legalization would prevent any i64 operands to the build_vector from existing before we get here. The coverage bots show this code as uncovered.
llvm-svn: 323506
|
| |
|
|
|
|
|
|
| |
previous native IR for kunpck intrinsics.
The original autoupgrade for kunpck intrinsics used a bitcasted scalar shift, or, and. This combine would turn this into a concat_vectors. Now the kunpck intrinsics are autoupgraded to a vector shuffle that will become a concat_vectors.
llvm-svn: 323504
|
| |
|
|
| |
llvm-svn: 323503
|
| |
|
|
|
|
| |
This listed all legal 128-bit integer types individually, but since we already know we have a legal type and its integer, we can just check is128BitVector.
llvm-svn: 323502
|
| |
|
|
|
|
|
|
| |
type. NFC
These kinds of setccs are promoted by a DAG combine before they ever get to legalization.
llvm-svn: 323501
|
| |
|
|
|
|
| |
This code was added in r321967, but ultimately I fixed the issue in the legalizer and this code was no longer required.
llvm-svn: 323500
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When pass creates a MOV instruction for
lea (%base,%index,1), %dst => mov %base,%dst; add %index,%dst
modification it should clean the killed flag for base
if base is equal to index.
Otherwise verifier complains about usage of killed register in add instruction.
Reviewers: lsaba, zvi, zansari, aaboud
Reviewed By: lsaba
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42522
llvm-svn: 323497
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables aggressive FMA by default on T99, and provides a -mllvm
option to enable the same on other AArch64 micro-arch's (-mllvm
-aarch64-enable-aggressive-fma).
Test case demonstrating the effects on T99 is included.
Patch by: steleman (Stefan Teleman)
Differential Revision: https://reviews.llvm.org/D40696
llvm-svn: 323474
|
| |
|
|
|
|
|
|
| |
parsed from the asm parser.
The asm parser puts the lock prefix in the MCInst flags so we need to check that in addition to TSFlags. This matches what the ATT printer does.
llvm-svn: 323469
|
| |
|
|
| |
llvm-svn: 323468
|
| |
|
|
| |
llvm-svn: 323452
|
| |
|
|
|
|
| |
Cleanup from D42544
llvm-svn: 323439
|
| |
|
|
|
|
| |
This reverts r323374. The fix needs a different approach.
llvm-svn: 323438
|
| |
|
|
|
|
|
|
|
|
|
|
| |
to some of them in SkylakeClient model.
The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results.
This changes them all to use explicit instrs instead.
While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too.
llvm-svn: 323406
|
| |
|
|
|
|
|
|
|
|
| |
implemented.
The IMUL instruction names mixed with the prefix matching of the instregex lead to some strange matches. The worst being that several memory instructions are using the register form latency.
I don't know what the right answer is, so I've left TODOs and will try to work with the AMD folks to get this cleaned up.
llvm-svn: 323405
|
| |
|
|
| |
llvm-svn: 323403
|
| |
|
|
|
|
|
|
|
|
| |
consistency. NFC
MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation.
SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128.
AVX2 instructions should use a Y to indicate 256-bits.
llvm-svn: 323402
|
| |
|
|
|
|
|
|
| |
expressions.
These were treated as optional suffixes, but the regular expressions are already prefix matches so this is unnecessary. It breaks the binary search optimization in tablegen due to the top level question mark.
llvm-svn: 323401
|
| |
|
|
|
|
|
|
|
| |
The code in EmitFunctionEntryCode needs to know the maximum stack
alignment, but it runs very early in the selection process (before
lowering). The final stack alignment may change during lowering, so
the code needs to be moved to where the alignment is known.
llvm-svn: 323374
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tablegen imported patterns for sext(load(a)) don't check for single uses
of the load or delete the original after matching. As a result two loads are
left in the generated code. This particular issue will be fixed by adding
support for a G_SEXTLOAD opcode in future.
There are however other potential issues around this that wouldn't be fixed by
a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile
loads at all in the AArch64 selector.
Fixes/works around PR36018.
llvm-svn: 323371
|
| |
|
|
|
|
|
|
|
|
|
|
| |
leading zeros
As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD:
On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it.
On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this.
Differential Revision: https://reviews.llvm.org/D42258
llvm-svn: 323367
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Move reserveRegisterTuples into AMDGPURegisterInfo and use it in
R600RegisterInfo::getReservedRegs and
R600InstrInfo::reserveIndirectRegisters to ensure that all super
registers of reserved registers are also marked as reserved.
Before this change, under certain circumstances, the registers %t1_x and
%t1_xyzw would be marked as reserved, but %t1_xy and %t1_xyz would not
be, leading to the register allocator sometimes assigning a register to
%t1_xy, which is invalid since %t1_x is reserved.
Reviewers: arsenm, tstellar, MatzeB, qcolombet
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D42448
llvm-svn: 323356
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.
Reviewers: samparker
Reviewed By: samparker
Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42401
llvm-svn: 323354
|
| |
|
|
|
|
|
|
| |
(V)PEXTRW/(V)PINSRW
The weirdest being that PEXTRWrr was tagged as a memory operation.
llvm-svn: 323353
|
| |
|
|
|
|
| |
for consistency and to maybe enable more regular expression compaction in the scheduler models. NFCI
llvm-svn: 323352
|
| |
|
|
|
|
| |
The regexs are treated as a prefix match already so the checking for optional text at the end provides no value. Instead it prevents the binary search optimization in tablegen from kicking in due to the top level question mark.
llvm-svn: 323351
|
| |
|
|
| |
llvm-svn: 323346
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Loads/stores of some NEON vector types are promoted to other vector
types with different lane sizes but same vector size. This is not a
problem in little-endian but, when in big-endian, it requires
additional byte reversals required to preserve the lane ordering
while keeping the right endianness of the data inside each lane.
For example:
%1 = load <4 x half>, <4 x half>* %p
results in the following assembly:
ld1 { v0.2s }, [x1]
rev32 v0.4h, v0.4h
This patch changes the promotion of these loads/stores so that the
actual vector load/store (LD1/ST1) takes care of the endianness
correctly and there is no need for further byte reversals. The
previous code now results in the following assembly:
ld1 { v0.4h }, [x1]
Reviewers: olista01, SjoerdMeijer, efriedma
Reviewed By: efriedma
Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D42235
llvm-svn: 323325
|
| |
|
|
| |
llvm-svn: 323324
|
| |
|
|
|
|
|
|
|
|
|
|
| |
to target shuffles (PR32037)
Don't bother making recursive calls to combineX86ShufflesRecursively if we have more shuffle source operands than will be combined together with the remaining recursive depth.
See https://bugs.llvm.org/show_bug.cgi?id=32037#c26 and https://bugs.llvm.org/show_bug.cgi?id=32037#c27 for the reduction in compile times from this patch.
Differential Revision: https://reviews.llvm.org/D42378
llvm-svn: 323320
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matches what MSVC does for alloca() function calls on ARM.
Even if MSVC doesn't support VLAs at the language level, it does
support the alloca function.
On the clang level, both the _alloca() (when emulating MSVC, which is
what the alloca() function expands to) and __builtin_alloca() builtin
functions, and VLAs, map to the same LLVM IR "alloca" function - so
within LLVM they're not distinguishable from each other.
Differential Revision: https://reviews.llvm.org/D42292
llvm-svn: 323308
|