| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
some test CHECK lines.
llvm-svn: 335414
|
| |
|
|
|
|
| |
used on a rip-relative address.
llvm-svn: 335413
|
| |
|
|
|
|
|
|
|
| |
-Ensure EIP isn't used with an index reigster.
-Ensure EIP isn't used as index register.
-Ensure base register isn't a vector register.
-Ensure eiz/riz usage matches the size of their base register.
llvm-svn: 335412
|
| |
|
|
| |
llvm-svn: 335409
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implements PR34259
Intrinsics.h is a very popular header. Most LLVM TUs care about things
like dbg_value, but they don't care how they are implemented. After I
split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU
from scanning 1.7 MB of source that gets pre-processed away.
It also means we can modify intrinsic properties without triggering a
full rebuild, but that's probably less of a win.
I think the next best thing to do would be to split out the target
intrinsics into their own header. Very, very few TUs care about
target-specific intrinsics. It's very hard to split up the target
independent intrinsics like llvm.expect, assume, and dbg.value, though.
llvm-svn: 335407
|
| |
|
|
|
|
|
|
|
|
| |
in/out instructions.
Previously, to support (%dx) we left a wide open hole in our 16-bit memory address checking. This let this address value be used with any instruction without error in the parser. It would later fail in the encoder with an assertion failure on debug builds and who knows what on release builds.
This patch passes the mnemonic down to the memory operand parsing function so we can allow the (%dx) form only on specific instructions.
llvm-svn: 335403
|
| |
|
|
|
|
|
|
|
|
| |
parsing an address in Intel syntax. Use it for improved error checking.
This allows us to check these:
-16-bit addressing doesn't support scale so we should error if we find one there.
-Multiplying ESP/RSP by a scale even if the scale is 1 should be an error because ESP/RSP can't be an index.
llvm-svn: 335398
|
| |
|
|
|
|
|
|
|
|
| |
second register in memory expressions like [EAX+ESP].
By default, the second register gets assigned to the index register slot. But ESP can't be an index register so we need to swap it with the other register.
There's still a slight bug that we allow [EAX+ESP*1]. The existence of the multiply even though its with 1 should force ESP to the index register and trigger an error, but it doesn't currently.
llvm-svn: 335394
|
| |
|
|
|
|
|
|
|
|
| |
The second register is the index register and should only be %si or %di if used with a base register. And in that case the base register should be %bp or %bx.
This makes us compatible with gas.
We do still need to support both orders with Intel syntax which uses [bp+si] and [si+bp]
llvm-svn: 335384
|
| |
|
|
|
|
|
|
| |
zero displacement.
(%bp) can't be encoded without a displacement. The encoding is instead used for displacement alone. So a 1 byte displacement of 0 must be used. But if there is an index register we can encode without a displacement.
llvm-svn: 335379
|
| |
|
|
| |
llvm-svn: 335373
|
| |
|
|
|
|
| |
Fixes PR37892
llvm-svn: 335370
|
| |
|
|
| |
llvm-svn: 335331
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
SK_PermuteSingleSrc
AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion.
This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174.
I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more.
Differential Revision: https://reviews.llvm.org/D48172
llvm-svn: 335329
|
| |
|
|
| |
llvm-svn: 335328
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline,
because it has no support for unaligned accesses.
It looks like we always pass target feature "+strict-align" from
Clang, so this is not a user facing problem, but querying the subtarget
(in e.g. llc) for unaligned access support is incorrect.
Differential Revision: https://reviews.llvm.org/D48437
llvm-svn: 335326
|
| |
|
|
|
|
|
|
| |
Not sure why the 32/64 split is needed in the atomic_load
store hierarchies. The regular PatFrags do this, but we don't
do it for the existing handling for global.
llvm-svn: 335325
|
| |
|
|
|
|
|
|
|
| |
Changing the logic of scalar mask folding to check for valid input types rather
than against invalid ones, making it more robust and fixing PR37879.
Differential Revision: https://reviews.llvm.org/D48366
llvm-svn: 335323
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We can select all instructions that are marked as legal in a full piglit run,
so now is a good time to make the TableGen'd instruction selector default
for all opcodes. This is NFC for a full piglit run, which is why there are
no tests.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48198
llvm-svn: 335319
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D48196
llvm-svn: 335318
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48195
llvm-svn: 335316
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D46151
llvm-svn: 335315
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D46150
llvm-svn: 335307
|
| |
|
|
|
|
| |
MCJIT can't handle R_X86_64_GOT64 yet.
llvm-svn: 335300
|
| |
|
|
| |
llvm-svn: 335298
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The large code model allows code and data segments to exceed 2GB, which
means that some symbol references may require a displacement that cannot
be encoded as a displacement from RIP. The large PIC model even relaxes
the assumption that the GOT itself is within 2GB of all code. Therefore,
we need a special code sequence to materialize it:
.LtmpN:
leaq .LtmpN(%rip), %rbx
movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch
addq %rax, %rbx # GOT base reg
From that, non-local references go through the GOT base register instead
of being PC-relative loads. Local references typically use GOTOFF
symbols, like this:
movq extern_gv@GOT(%rbx), %rax
movq local_gv@GOTOFF(%rbx), %rax
All calls end up being indirect:
movabsq $local_fn@GOTOFF, %rax
addq %rbx, %rax
callq *%rax
The medium code model retains the assumption that the code segment is
less than 2GB, so calls are once again direct, and the RIP-relative
loads can be used to access the GOT. Materializing the GOT is easy:
leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg
DSO local data accesses will use it:
movq local_gv@GOTOFF(%rbx), %rax
Non-local data accesses will use RIP-relative addressing, which means we
may not always need to materialize the GOT base:
movq extern_gv@GOTPCREL(%rip), %rax
Direct calls are basically the same as they are in the small code model:
They use direct, PC-relative addressing, and the PLT is used for calls
to non-local functions.
This patch adds reasonably comprehensive testing of LEA, but there are
lots of interesting folding opportunities that are unimplemented.
Reviewers: chandlerc, echristo
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47211
llvm-svn: 335297
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48234
llvm-svn: 335288
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update AMDGPU assembler syntax behind the code-object-v3 feature:
* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Provide more diagnostics (e.g. values out of range, missing values, repeated
values).
* Provide path for backwards compatibility, even with underlying descriptor
changes.
Differential Revision: https://reviews.llvm.org/D47736
llvm-svn: 335281
|
| |
|
|
| |
llvm-svn: 335269
|
| |
|
|
|
|
|
|
|
|
| |
BlockWaitcntProcessedSet was not being cleared between calls, so it was
producing incorrect counts in cases where MBB addresses happened to coincide
across multiple calls.
Differential Revision: https://reviews.llvm.org/D48391
llvm-svn: 335268
|
| |
|
|
|
|
|
|
|
|
|
|
| |
and everything that comes with it from implementation
and v3 header files.
Leave definition in v2 header files for backwards
compatibility.
Differential Revision: https://reviews.llvm.org/D48191
llvm-svn: 335267
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit d8f57105010cc7e78026e511d5def873fc91e0e7.
Original Commit:
Author: Haicheng Wu <haicheng@codeaurora.org>
Date: Sun Feb 18 13:51:33 2018 +0000
[AArch64] Coalesce Copy Zero during instruction selection
Add special case for copy of zero to avoid a double copy.
Differential Revision: https://reviews.llvm.org/D36104
Author's intention is to remove a BB that has one mov instruction. In
order to do that, d8f571050 pessmizes MachineSinking by introducing a
copy, such that mov instruction is NOT moved to the BB. Optimization
downstream gets rid of the BB with only mov instruction. This works well
if we have only one fall through branch as there is only one "extra"
mov instruction.
If we have multiple fall throughs, we will have a lot of redundant movs.
In such a case, it's better to have this BB which has one mov instruction.
This is causing degradation in jpeg, fft and other codebases. I believe
if we want to remove a BB with only one branch instruction, we should not
pessimize Machine Sinking at all, and find some other solution.
llvm-svn: 335251
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This option allows codegen (such as DAGCombine or MI scheduling) to use alias
analysis information, which can help with the codegen on in-order cpu's,
especially machine scheduling. Here I have done things the same way as AArch64,
adding a subtarget feature to enable this for specific cores, and enabled it for
the R52 where we have a schedule to make use of it.
Differential Revision: https://reviews.llvm.org/D48074
llvm-svn: 335249
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When expanding the PseudoTail in expandFunctionCall() we were using X6
to save the return address. Since this is a tail call the return
address is not needed, this patch replaces it with X0 to be ignored.
This matches the behaviour listed in the ISA V2.2 document page 110.
tail offset -----> jalr x0, x6, offset
GCC exhibits the same behavior.
Reviewers: apazos, asb, mgrang
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01
Differential Revision: https://reviews.llvm.org/D48343
llvm-svn: 335239
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This should help in lowering the following four intrinsics:
_mm256_cvtepi32_epi8
_mm256_cvtepi64_epi16
_mm256_cvtepi64_epi8
_mm512_cvtepi64_epi8
Differential Revision: https://reviews.llvm.org/D46957
llvm-svn: 335238
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For sample and gather ops, we can accurately determine the set of
vaddr-size instruction variants that are required. This reduces
the size of instruction tables by ~5%.
The number of machine instruction opcodes is reduced from 10002
to 9476.
Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D48168
llvm-svn: 335232
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.
Mesa uses dimension-aware image intrinsics since
commit a9a7993441.
Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a
Reviewers: arsenm, rampitec, mareko, tpr, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48167
llvm-svn: 335231
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Having TableGen patterns for image intrinsics is hitting limitations:
for D16 we already have to manually pre-lower the packing of data
values, and we will have to do the same for A16 eventually.
Since there is already some custom C++ code anyway, it is arguably easier
to just do everything in C++, now that we can use the beefed-up generic
tables backend of TableGen to provide all the required metadata and map
intrinsics to corresponding opcodes. With this approach, all image
intrinsic lowering happens in SITargetLowering::lowerImage. That code is
dense due to all the cases that it handles, but it should still be easier
to follow than what we had before, by virtue of it all being done in a
single location, and by virtue of not relying on the TableGen pattern
magic that very few people really understand.
This means that we will have MachineSDNodes with MIMG instructions
during DAG combining, but that seems alright: previously we had
intrinsic nodes instead, but those are similarly opaque to the generic
CodeGen infrastructure, and the final pattern matching just did a 1:1
translation to machine instructions anyway. If anything, the fact that
we now merge the address words into a vector before DAG combine should
be an advantage.
Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6
Reviewers: arsenm, rampitec, rtaylor, tstellar
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48017
llvm-svn: 335228
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This allows us to access rich information about MIMG opcodes from C++ code.
Simplifying the mapping between equivalent opcodes of different data size
becomes quite natural.
This also flattens the MIMG-related class and multiclass hierarchy a little,
and collapses together some of the scaffolding for sample and gather4 opcodes.
Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48016
llvm-svn: 335227
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48014
Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641
llvm-svn: 335226
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This will allows us to provide rich metadata about the instructions
in tables that are accessible by custom C++ code.
Change-Id: Id9305a26304ab6a6cceb6c65c8cd49141cc0101d
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D48011
llvm-svn: 335224
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Kill instructions sometimes do use SCC in unusual circumstances, when
v_cmpx cannot be used due to the operands that are involved.
Additionally, even if SCC was never defined by the expansion, kill pseudos
could previously occur between an s_cmp and an s_cbranch_scc, which breaks
the SCC liveness tracking when the pseudo is expanded to split the basic
block. While it would be possible to explicitly mark the SCC as live-in for
the successor basic block, it's simpler to just mark the pseudo as using SCC,
so that such a sequence is never emitted by instruction selection in the
first place.
A similar issue affects indirect source/dest pseudos in principle, although
I haven't been able to come up with a test case where it actually matters
(this affects instruction selection, so a MIR test can't be used).
Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always
Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47761
llvm-svn: 335223
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This allows us to reduce the number of different machine instruction
opcodes, which reduces the table sizes and helps flatten the TableGen
multiclass hierarchies.
We can do this because for each hardware MIMG opcode, we have a full set
of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata
and vaddr registers. Instead of having separate D16 machine instructions,
a packed D16 instructions loading e.g. 4 components can simply use the
same V2 opcode variant that non-D16 instructions use.
We still require a TSFlag for D16 buffer instructions, because the
D16-ness of buffer instructions is part of the opcode. Renaming the flag
should help avoid future confusion.
The one non-obvious code change is that for gather4 instructions, the
disassembler can no longer automatically decide whether to use a V2 or
a V4 variant. The existing logic which choose the correct variant for
other MIMG instruction is extended to cover gather4 as well.
As a bonus, some of the assembler error messages are now more helpful
(e.g., complaining about a wrong data size instead of a non-existing
instruction).
While we're at it, delete a whole bunch of dead legacy TableGen code.
Change-Id: I89b02c2841c06f95e662541433e597f5d4553978
Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor
Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D47434
llvm-svn: 335222
|
| |
|
|
|
|
| |
These were being over cautious for costs for one/two op general shuffles - VSHUFPD doesn't have to replicate the same shuffle in both lanes like VSHUFPS does.
llvm-svn: 335216
|
| |
|
|
|
|
| |
instruction instead.
llvm-svn: 335199
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
These are identical but use microMIPS instructions instead of MIPS instructions.
Also, flatten the 'let AdditionalPredicates = [InMicroMips]' by using the
ISA_MICROMIPS adjective. Add tests for constant materialization.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D48275
llvm-svn: 335185
|
| |
|
|
|
|
|
|
|
|
| |
I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code.
I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering.
Differential Revision: https://reviews.llvm.org/D43608
llvm-svn: 335173
|
| |
|
|
|
|
|
|
|
|
|
|
| |
pseudos
Additionally, correct the definition of the rdhwr instruction.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D48216
llvm-svn: 335162
|
| |
|
|
| |
llvm-svn: 335161
|
| |
|
|
| |
llvm-svn: 335160
|