| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a post-linking pass which replaces the function pointer of enqueued
block kernel with a global variable (runtime handle) and adds
runtime-handle attribute to the enqueued block kernel.
In LLVM CodeGen the runtime-handle metadata will be translated to
RuntimeHandle metadata in code object. Runtime allocates a global buffer
for each kernel with RuntimeHandel metadata and saves the kernel address
required for the AQL packet into the buffer. __enqueue_kernel function
in device library knows that the invoke function pointer in the block
literal is actually runtime handle and loads the kernel address from it
and puts it into AQL packet for dispatching.
This cannot be done in FE since FE cannot create a unique global variable
with external linkage across LLVM modules. The global variable with internal
linkage does not work since optimization passes will try to replace loads
of the global variable with its initialization value.
Differential Revision: https://reviews.llvm.org/D38610
llvm-svn: 315352
|
| |
|
|
| |
llvm-svn: 315335
|
| |
|
|
| |
llvm-svn: 315332
|
| |
|
|
| |
llvm-svn: 315331
|
| |
|
|
|
|
|
|
|
|
| |
functions.
This makes the ownership of the resulting MCObjectWriter clear, and allows us
to remove one instance of MCObjectStreamer's bizarre "holding ownership via
someone else's reference" trick.
llvm-svn: 315327
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The pass to fix function bitcasts generates thunks for functions that
are called directly with a mismatching signature. It was also generating
thunks in cases where the function was address-taken, causing aliasing
problems in otherwise valid cases.
This patch tightens the restrictions for when the pass runs.
Reviewers: sunfish, dschuff
Subscribers: jfb, sbc100, llvm-commits, aheejin
Differential Revision: https://reviews.llvm.org/D38640
llvm-svn: 315326
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add instruction definitions for FP32 mode for recip.d and rsqrt.d.
Previously these instructions were only defined when targeting the
full 64-bit FPU model but were not guarded properly.
Reviewers: nitesh.jain, atanasyan
Differential Revision: https://reviews.llvm.org/D38400
llvm-svn: 315318
|
| |
|
|
|
|
|
|
|
| |
A number of record form instructions were missing from the P9 scheduling
model. Added those instructions and marked the P9 model as complete.
Differential Revision: https://reviews.llvm.org/D38560
llvm-svn: 315313
|
| |
|
|
|
| |
Change-Id: If6fe0b6ec01f111115fb734fe31c0e152dbc165f
llvm-svn: 315311
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the parsing of the 'subu $reg, ($reg,) imm' relied on a parser
which also rendered the operand to the instruction. In some cases the
general parser could construct an MCExpr which was not a MCConstantExpr
which MipsAsmParser was expecting.
Address this by altering the special handling to cope with unexpected inputs
and fine-tune the handling of cases where an register name that is not
available in the current ABI is regarded as not a match for the custom parser
but also not as an outright error.
Also enforces the binutils restriction that only constants are accepted.
This partially resolves PR34391.
Thanks to Ed Maste for reporting the issue!
Reviewers: nitesh.jain, arichardson
Differential Revision: https://reviews.llvm.org/D37476
llvm-svn: 315310
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the code that implemented the GNU assembler aliases for the
LDRD and STRD instructions (where the second register is omitted)
assumed that the input was a valid instruction. This caused assertion
failures for every example in ldrd-strd-gnu-bad-inst.s.
This improves this code so that it bails out if the instruction is not
in the expected format, the check bails out, and the asm parser is run
on the unmodified instruction.
It also relaxes the alias on thumb targets, so that unaligned pairs of
registers can be used. The restriction that Rt must be even-numbered
only applies to the ARM versions of these instructions.
Differential revision: https://reviews.llvm.org/D36732
llvm-svn: 315305
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds diagnostic strings for the ARM floating-point register
classes, which will be used when these classes are expected by the
assembler, but the provided operand is not valid.
One of these, DPR, requires C++ code to select the correct error
message, as that class contains different registers depending on the
FPU. The rest can all have their diagnostic strings stored in the
tablegen decription of them.
Differential revision: https://reviews.llvm.org/D36693
llvm-svn: 315304
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds diagnostic strings for the ARM general-purpose register
classes, which will be used when these classes are expected by the
assembler, but the provided operand is not valid.
One of these, rGPR, requires C++ code to select the correct error
message, as that class contains different registers in pre-v8 and v8
targets. The rest can all have their diagnostic strings stored in the
tablegen description of them.
Differential revision: https://reviews.llvm.org/D36692
llvm-svn: 315303
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Atomic buffer operations do not work (and trap on gfx9) when the
components are unaligned, even if their sum is aligned.
Previously, we generated an offset of 4156 without an SGPR by
splitting it as 4095 + 61 (immediate + inline constant). The
highest offset for which we can do this correctly is 4156 = 4092 + 64.
Fixes dEQP-GLES31.functional.ssbo.atomic.*
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D37850
llvm-svn: 315302
|
| |
|
|
|
|
|
|
|
|
| |
The issue is that we assume operand zero of the input to the add instruction
is a register. In this case, the input comes from inline assembly and
operand zero is not a register thereby causing a crash.
The code will bail anyway if the input instruction doesn't have the right
opcode. So do that check first and let short-circuiting prevent the crash.
llvm-svn: 315285
|
| |
|
|
|
|
| |
struct" since rL315256.
llvm-svn: 315283
|
| |
|
|
|
|
|
| |
createELFObjectWriter now takes a std::unique_ptr<MCELFObjectTargetWriter>
rather than a MCELFObjectTargetWriter*.
llvm-svn: 315275
|
| |
|
|
|
|
| |
This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass.
llvm-svn: 315274
|
| |
|
|
|
|
|
|
|
|
| |
This makes the .seh_ directives slightly more usable from standalone
assembly files.
This removes a large number of report_fatal_errors and recovers from the
error by ignoring the directive.
llvm-svn: 315262
|
| |
|
|
|
|
|
|
|
|
| |
to WasmObjectWriter's constructor.
Fixes the same ownership issue for COFF that r315245 did for MachO:
WasmObjectWriter takes ownership of its MCWasmObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.
llvm-svn: 315260
|
| |
|
|
| |
llvm-svn: 315258
|
| |
|
|
|
|
|
|
|
|
|
| |
createWinCOFFObjectWriter to WinCOFFObjectWriter's constructor.
Fixes the same ownership issue for COFF that r315245 did for MachO:
WinCOFFObjectWriter takes ownership of its MCWinCOFFObjectTargetWriter, so we
want to pass this through to the constructor via a unique_ptr, rather than a
raw ptr.
llvm-svn: 315257
|
| |
|
|
|
|
|
|
|
|
| |
ELFObjectWriter's constructor.
Fixes the same ownership issue for ELF that r315245 did for MachO:
ELFObjectWriter takes ownership of its MCELFObjectTargetWriter, so we want to
pass this through to the constructor via a unique_ptr, rather than a raw ptr.
llvm-svn: 315254
|
| |
|
|
|
|
|
|
|
|
|
| |
to MCObjectWriter's constructor.
MCObjectWriter takes ownership of its MCMachObjectTargetWriter argument -- this
patch plumbs that ownership relationship through the constructor (which
previously took raw MCMachObjectTargetWriter*) and the createMachObjectWriter
function.
llvm-svn: 315245
|
| |
|
|
|
|
|
|
|
|
|
| |
We end up creating COPY's that are either truncating/extending and this
should be illegal.
https://reviews.llvm.org/D37640
Patch for X86 and ARM by igorb, rovka
llvm-svn: 315240
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
On behalf of julia.koval@intel.com
The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints).
umax(a,b) - b -> subus(a,b)
a - umin(a,b) -> subus(a,b)
There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987).
The example of special case code:
```
void foo(unsigned short *p, int max, int n) {
int i;
unsigned m;
for (i = 0; i < n; i++) {
m = *--p;
*p = (unsigned short)(m >= max ? m-max : 0);
}
}
```
Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero.
Here is the table of types, I try to support, special case items are bold:
| Size | 128 | 256 | 512
| ----- | ----- | ----- | -----
| i8 | v16i8 | v32i8 | v64i8
| i16 | v8i16 | v16i16 | v32i16
| i32 | | **v8i32** | **v16i32**
| i64 | | | **v8i64**
Reviewers: zvi, spatel, DavidKreitzer, RKSimon
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37534
llvm-svn: 315237
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an
intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the
inverted condition code for the CSEL.
rdar://28495949
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D38160
llvm-svn: 315205
|
| |
|
|
|
|
| |
AVX512BW type and is alraedy present in the AVX512BW section.
llvm-svn: 315202
|
| |
|
|
|
|
|
|
|
|
| |
targeting AVX instructions.
We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX.
Differential Revision: https://reviews.llvm.org/D38609
llvm-svn: 315201
|
| |
|
|
|
|
|
|
| |
Return the combined shuffle from combineX86ShufflesRecursively and perform the combineTo in the caller.
Makes it easier for future patches to use this in functions that aren't actually shuffles themselves.
llvm-svn: 315195
|
| |
|
|
| |
llvm-svn: 315187
|
| |
|
|
| |
llvm-svn: 315185
|
| |
|
|
| |
llvm-svn: 315182
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
versions of scalar arithmetic patterns
Summary:
We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD.
Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary.
Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd.
This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type.
The rest of the patch is removing all the excess scalar patterns.
Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38023
llvm-svn: 315181
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38621
llvm-svn: 315177
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding the scheduling information for the SkylakeServer (SKX) target.
This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.
The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613.
Please expect some performance fluctuations due to code alignment effects.
Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu
Differential Revision: https://reviews.llvm.org/D38443
Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185
llvm-svn: 315175
|
| |
|
|
|
|
|
|
|
|
| |
the table.
Get the folding table 'MemoryFoldTable2Addr' to a complete state as part of the process explained in https://reviews.llvm.org/D38028
Differential Revision: https://reviews.llvm.org/D38500
llvm-svn: 315174
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
while disabling it by default.
After the original commit ([[ https://reviews.llvm.org/rL304088 | rL304088 ]]) was reverted, a discussion in llvm-dev was opened on 'how to accomplish this task'.
In the discussion we concluded that the best way to achieve our goal (which is to automate the folding tables and remove the manually maintained tables) is:
# Commit the tablegen backend disabled by default.
# Proceed with an incremental updating of the manual tables - while checking the validity of each added entry.
# Repeat previous step until we reach a state where the generated and the manual tables are identical. Then we can safely remove the manual tables and include the generated tables instead.
# Schedule periodical (1 week/2 weeks/1 month) runs of the pass:
- if changes appear (new entries):
- make sure the entries are legal
- If they are not, mark them as illegal to folding
- Commit the changes (if there are any).
CMake flag added for this purpose is "X86_GEN_FOLD_TABLES". Building with this flags will run the pass and emit the X86GenFoldTables.inc file under build/lib/Target/X86/ directory which is a good reference for any developer who wants to take part in the effort of completing the current folding tables.
Differential Revision: https://reviews.llvm.org/D38028
llvm-svn: 315173
|
| |
|
|
|
|
|
|
| |
with a v8i1/v16i1 condition when BWI is not available.
Some of the tests in vector-shuffle-v1.ll would get into an infinite loop without this.
llvm-svn: 315172
|
| |
|
|
|
|
|
|
|
|
|
| |
memory foldable"
This attribute will be used in a tablegen backend that generated the X86 memory folding tables which will be added in a future pass.
Instructions with this attribute unset will be excluded from the full set of X86 instructions available for the pass.
Differential Revision: https://reviews.llvm.org/D38027
llvm-svn: 315171
|
| |
|
|
|
|
|
|
| |
getExtractVEXTRACTImmediate. NFC
Replace one of the divides with a multiply.
llvm-svn: 315162
|
| |
|
|
|
|
|
|
| |
move the bitcast to the other side of the insert.
This improves detection of zeroing of upper bits during isel.
llvm-svn: 315161
|
| |
|
|
|
|
|
|
| |
Add isel patterns to make up for it.
This will allow for some flexibility in canonicalizing bitcasts around insert_subvector.
llvm-svn: 315160
|
| |
|
|
| |
llvm-svn: 315159
|
| |
|
|
|
|
|
|
| |
64-bit targets (PR34855)
Extension to rL315155, generate constant shifts on 64-bits as well as 32-bits.
llvm-svn: 315156
|
| |
|
|
|
|
|
|
| |
targets (PR34855)
We were already doing this for 32-bit targets, but we can generate these on 64-bits as well.
llvm-svn: 315155
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ComputeNumSignBitsForTargetNode.
Summary: Implementations based on ISD::SELECT.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38663
llvm-svn: 315153
|
| |
|
|
|
|
|
|
|
|
| |
Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs.
Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead.
Differential Revision: https://reviews.llvm.org/D38506
llvm-svn: 315150
|
| |
|
|
|
|
|
|
|
| |
The SjLj intrinsics in the X86 backend are intended for use with
SjLj exception handling as well, since SVN r271244.
Differential Revision: https://reviews.llvm.org/D38532
llvm-svn: 315146
|
| |
|
|
|
|
|
|
| |
Correct the CC type for the CMOV used with RDSEED/RDRAND.
The flag result was MVT::Glue, but should be MVT::i32. The CC type was MVT::i8, but should be MVT::i32.
llvm-svn: 315145
|