| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D48099
llvm-svn: 334559
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D47831
llvm-svn: 334553
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Register x20 is a callee-saved register which may be used for other
purposes in certain contexts, for example to hold special variables
within the kernel. This change adds support for reserving this register
both to frontend and backend to make this register usable for these
purposes.
Differential Revision: https://reviews.llvm.org/D46552
llvm-svn: 334531
|
| |
|
|
| |
llvm-svn: 334529
|
| |
|
|
|
|
|
|
|
|
|
|
| |
All COFF targets should use @IMGREL32 relocations for symbol differences
against __ImageBase. Do the same for getSectionForConstant, so that
immediates lowered to globals get merged across TUs.
Patch by Chris January
Differential Revision: https://reviews.llvm.org/D47783
llvm-svn: 334523
|
| |
|
|
|
|
| |
- Move section selection and alignment to AMDGPUAsmPrinter
llvm-svn: 334521
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Do not emit following assembler directives:
- .hsa_code_object_version
- .hsa_code_object_isa
- .amd_amdgpu_isa
- .amd_amdgpu_hsa_metadata
- .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
alignments
llvm-svn: 334519
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
R_X86_64_GOTPC32 instead of R_X86_64_PC32
Summary:
This is similar to D46319 (ARM). x86-64 psABI p40 gives an example:
leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc
GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32.
Reviewers: javed.absar, echristo
Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar
Differential Revision: https://reviews.llvm.org/D47507
llvm-svn: 334515
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:
e.g. v4f32: <0,5,2,7> or <4,1,6,3>
This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:
e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.
This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.
Differential Revision: https://reviews.llvm.org/D47985
llvm-svn: 334513
|
| |
|
|
|
|
| |
folding table.
llvm-svn: 334511
|
| |
|
|
| |
llvm-svn: 334508
|
| |
|
|
|
|
|
| |
We should be able to obsolete D48043 by easing the constraints
on this existing code.
llvm-svn: 334504
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Implement default legalization of rotates: either in terms of the rotation
in the opposite direction (if legal), or in terms of shifts and ors.
Implement generating of rotate instructions for Hexagon. Hexagon only
supports rotates by an immediate value, so implement custom lowering of
ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion.
Differential Revision: https://reviews.llvm.org/D47725
llvm-svn: 334497
|
| |
|
|
|
|
|
|
| |
Reviewers: smaksimovic, atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D47636
llvm-svn: 334491
|
| |
|
|
|
|
|
|
|
|
|
| |
Extend LONG_BRANCH_LUi and LONG_BRANCH_ADDiu pseudo instructions with
additional flag, so instead of always lowering to lui %hi(...),
addiu %lo(...) or addiu %hi(...), now they can lower to either %lo, %hi,
%higher or %highest depending on the added flag.
Differential Revision: https://reviews.llvm.org/D47941
llvm-svn: 334490
|
| |
|
|
| |
llvm-svn: 334488
|
| |
|
|
| |
llvm-svn: 334481
|
| |
|
|
|
|
| |
These include PUSH/POP instructions that don't match the manual table. This also includes CMPXCHG which we never emit in non-locked form.
llvm-svn: 334479
|
| |
|
|
|
|
|
|
|
|
| |
the autogenerated load folding table.
Most of these are system instructions or other instructions we don't use in CodeGen. No point wasting space for them in the table. Removing them from the autogenerated table makes it easier to review the manual table.
A few are real opcode collisions where the memory and register forms are completely different instructions.
llvm-svn: 334474
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from ffloor/fnearbyint/fceil/frint/ftrunc.
We were missing packed isel folding patterns for all of sse41, avx, and avx512.
For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns.
Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch.
Some of this was spotted in the review for D47993.
This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns.
llvm-svn: 334460
|
| |
|
|
|
|
|
|
|
| |
The use iterator, used within findMaskOperands(), can return anything which is
not a def. isUse() requires a register, so check isReg() before calling isUse().
Differential Revision: https://reviews.llvm.org/D48047
llvm-svn: 334459
|
| |
|
|
|
|
| |
Not shown in the diff: AQ is a `vector<SUnit *>`, and SU is a `SUnit *`
llvm-svn: 334451
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D47601
llvm-svn: 334443
|
| |
|
|
|
|
|
|
| |
that need them. NFC
All of the cases are already wrapped in curly braces so declaring a variable there isn't an issue. And the variables aren't assigned or used in the larger scope.
llvm-svn: 334436
|
| |
|
|
|
|
|
|
| |
integer/fp before forcing them to be the same size.
This may be needed by another patch that I'm working on. It should have no effect on any of the generated outputs.
llvm-svn: 334430
|
| |
|
|
| |
llvm-svn: 334426
|
| |
|
|
|
|
|
|
| |
Necessary for D46276 as even though btver2 doesn't use these instructions, its now flagged as complete so complains if ANY instruction isn't tagged.....
UnsupportedFeatures wouldn't help here as these instructions don't appear to have a feature predicate (like a lot of AVX512).
llvm-svn: 334423
|
| |
|
|
|
|
|
|
|
|
|
| |
Rational: if there is indirect access that is usually an issue
because load is not ready by the use. However, if use is inside a
loop and load is outside that is potentially an issue for a first
iteration only.
Differential Revision: https://reviews.llvm.org/D47740
llvm-svn: 334420
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When program is compiled for mips3 with n64 abi, wrong register class
is used for creating an emergency spill slot. This patch fixes the
correct register class to be chosen.
This patch resolves PR35859.
Thanks to John Baldwin for reporting the issue!
Differential Revision: https://reviews.llvm.org/D47938
llvm-svn: 334419
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This sets trackLivenessAfterRegAlloc on AVRRegisterInfo.
Most existing targets set this flag. Without it, specific IR inputs
cause LLVM to fail with:
Assertion failed: (getParent()->getProperties().hasProperty( MachineFunctionProperties::Property::TracksLiveness) &&
"Liveness information is accurate"), function livein_begin
file MachineBasicBlock.cpp, line 1354.
With this commit, this no longer happens.
Patch by Peter Nimmervoll.
llvm-svn: 334409
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes most of the scheduling info for SKX vector operations.
I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM.
The before/after llvm-exegesis analysis are in the phabricator diff.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47721
llvm-svn: 334407
|
| |
|
|
|
|
| |
This is part of https://reviews.llvm.org/D46356.
llvm-svn: 334391
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: In preparation for D47721. HSW and SNB still define unsupported
classes as they are used by KNL and generic models respectively.
Reviewers: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47763
llvm-svn: 334389
|
| |
|
|
| |
llvm-svn: 334384
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: When compiling with -fpic, in contrast to -fPIC, use only the
immediate field to index into the GOT. This saves space if the GOT is
known to be small. The linker will warn if the GOT is too large for
this method.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: brad, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D47136
llvm-svn: 334383
|
| |
|
|
|
|
| |
We use the target independent intrinsics now.
llvm-svn: 334381
|
| |
|
|
| |
llvm-svn: 334377
|
| |
|
|
|
|
|
|
|
| |
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47447
llvm-svn: 334361
|
| |
|
|
|
|
| |
intrinsics. Use a select in IR instead.
llvm-svn: 334358
|
| |
|
|
|
|
|
|
| |
The separate initializeEnvironment function was sort of
useless since r217071.
ARM did this move already with r273556.
llvm-svn: 334345
|
| |
|
|
|
|
|
|
|
|
| |
It looks like this got left in by accident in r289794; I can't think of
any reason this check would be necessary. (Maybe it was meant to be a
check that the AND has one use? But we check that a few lines earlier.)
Differential Revision: https://reviews.llvm.org/D47921
llvm-svn: 334322
|
| |
|
|
|
|
|
|
| |
Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together.
Differential Revision: https://reviews.llvm.org/D47822
llvm-svn: 334309
|
| |
|
|
|
|
|
|
| |
should match the dependency-breaking 'zero-idiom'
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits).
llvm-svn: 334303
|
| |
|
|
|
|
|
|
|
|
| |
AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error.
Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341,
e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable
Differential Revision: https://reviews.llvm.org/D44920
llvm-svn: 334301
|
| |
|
|
|
|
|
|
|
|
| |
Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code.
This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting).
I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking.
llvm-svn: 334289
|
| |
|
|
|
|
|
|
| |
Reviewers: smaksimovic, atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D47638
llvm-svn: 334280
|
| |
|
|
|
|
|
|
|
|
|
| |
The instruction makes use of a previously ignored field in the fence
instruction. It is introduced in the version 2.3 draft of the RISC-V
specification after much work by the Memory Model Task Group.
As clarified here <https://github.com/riscv/riscv-isa-manual/issues/186>,
the fence.tso assembler mnemonic does not have operands.
llvm-svn: 334278
|
| |
|
|
|
|
|
|
|
|
| |
We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS.
PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines.
The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets.
llvm-svn: 334276
|
| |
|
|
|
|
|
| |
These won't work as expected now, so error on them to avoid
wasting time debugging this in the future.
llvm-svn: 334269
|
| |
|
|
| |
llvm-svn: 334263
|