| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.
In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).
* Shiffle-broadcast cost will be changed in Simon's upcoming patch.
Differential Revision: https://reviews.llvm.org/D28118
llvm-svn: 290810
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Fixes PR 31345
Reviewers: dylanmckay
Subscribers: fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D28186
llvm-svn: 290778
|
|
|
|
|
|
| |
an unlikely thing to regress in the future.
llvm-svn: 290757
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Fixes PR 31344
Authored by Anmol P. Paralkar
Reviewers: dylanmckay
Subscribers: fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D28121
llvm-svn: 290732
|
|
|
|
|
|
|
|
| |
This reverts commit r290694. It broke sanitizer tests on Win64. I'll
probably bring this back, but the jump tables will just live in .text
like they do for MSVC.
llvm-svn: 290714
|
|
|
|
|
|
|
|
|
|
|
| |
Among other stuff, this allows to use predefined .option.machine_version_major
/minor/stepping symbols in the directive.
Relevant test expanded at once (also file renamed for clarity).
Differential Revision: https://reviews.llvm.org/D28140
llvm-svn: 290710
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We were already using 32-bit jump table entries, but this was a
consequence of the default PIC model on Win64, and not an intentional
design decision. This patch ensures that we always use 32-bit label
difference jump table entries on Win64 regardless of the PIC model. This
is a good idea because it saves executable size and object file size.
Moving the jump tables to .rdata cleans up the disassembled object code
and reduces the available ROP targets, but it requires adding one more
RIP-relative lea to the code. COFF doesn't have relocations to express
the difference between two arbitrary symbols, so we can't use the jump
table label in the label difference like we do elsewhere.
Fixes PR31488
Reviewers: majnemer, compnerd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28141
llvm-svn: 290694
|
|
|
|
|
|
|
|
|
|
|
|
| |
size by encoding EVEX AVX-512 instructions using the shorter VEX encoding when possible.
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.
Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky
Differential Revision: https://reviews.llvm.org/D27901
llvm-svn: 290663
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D27953
llvm-svn: 290609
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(.kernel.{v|s}gpr_count)
The feature allows for conditional assembly, filling the entries
of .amd_kernel_code_t etc.
Symbols are defined with value 0 at the beginning of each kernel scope.
After each register usage, the respective symbol is set to:
value = max( value, ( register index + 1 ) )
Thus, at the end of scope the value represents a count of used registers.
Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the
next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also
dummy scope that lies from the beginning of source file til the
first .amdgpu_hsa_kernel.
Test added.
Differential Revision: https://reviews.llvm.org/D27859
llvm-svn: 290608
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: nhaustov, artem.tamazov, vpykhtin, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D28051
llvm-svn: 290599
|
|
|
|
|
|
| |
folding tables.
llvm-svn: 290591
|
|
|
|
|
|
| |
to unmasked intrinsics plus a select.
llvm-svn: 290583
|
|
|
|
|
|
|
|
| |
add them to InstCombine with the 128 and 256 bit versions.
The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.
llvm-svn: 290573
|
|
|
|
|
|
| |
into masked instructions.
llvm-svn: 290564
|
|
|
|
| |
llvm-svn: 290536
|
|
|
|
|
|
|
|
| |
constant. While clang will guarantee this, nothing in the backend will.
A non-constant value will now result in an isel error instead of just asserting or crashing due to a bad cast during lowering.
llvm-svn: 290532
|
|
|
|
| |
llvm-svn: 290517
|
|
|
|
| |
llvm-svn: 290516
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D27989
llvm-svn: 290435
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit
(W-unit in AArch64SchedA57.td, the same as cryptography instructions),
not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA).
This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW
instead of A57UnitX. Also, latencies for those instructions are
corrected.
Patch by Andrew Zhogin.
llvm-svn: 290426
|
|
|
|
| |
llvm-svn: 290425
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.
In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.
Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)
Reviewers: mkuper, MatzeB, aprantl
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D27688
llvm-svn: 290423
|
|
|
|
|
|
| |
These will be used to guide the binary encoding of these immediates.
llvm-svn: 290412
|
|
|
|
|
|
|
|
| |
find.
Notable is the assert in NewGVN which had no effect because of the bug.
llvm-svn: 290400
|
|
|
|
|
|
|
|
|
|
|
| |
The InstructionSelect pass will not look at target specific instructions
since they are already selected. As a result, the operands of target
specific instructions must be properly constrained, because it is not
going to fix them.
This fixes invalid register classes on call instruction.
llvm-svn: 290377
|
|
|
|
|
|
|
|
|
|
|
| |
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.
This seems to usually be better but causes some code size regressions
in some tests.
llvm-svn: 290372
|
|
|
|
| |
llvm-svn: 290366
|
|
|
|
|
|
|
|
|
| |
This is for splitMergedValStore in DAG Combine to share the target query interface
with similar logic in CodeGenPrepare.
Differential Revision: https://reviews.llvm.org/D24707
llvm-svn: 290363
|
|
|
|
|
|
|
|
|
|
|
| |
Follow up to D27209 fix, this patch now properly handles single transient
instruction in basic block.
Patch by Aleksandar Beserminji.
Differential Revision: https://reviews.llvm.org/D27856
llvm-svn: 290361
|
|
|
|
| |
llvm-svn: 290351
|
|
|
|
| |
llvm-svn: 290349
|
|
|
|
| |
llvm-svn: 290348
|
|
|
|
|
|
|
|
| |
Caused by dereferencing end iterator when trying to const cast the iterator.
Patch by Martin Sherburn
llvm-svn: 290347
|
|
|
|
| |
llvm-svn: 290345
|
|
|
|
| |
llvm-svn: 290344
|
|
|
|
|
|
|
| |
WebAssembly's load/store offsets are unsigned and don't wrap, so it's not
valid to fold in a negative offset.
llvm-svn: 290342
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is needed for later SDWA support in CodeGen.
Reviewers: vpykhtin, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27412
llvm-svn: 290338
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands.
Reviewers: nhaustov, vpykhtin, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27847
llvm-svn: 290336
|
|
|
|
|
|
|
|
|
|
| |
instruction.
Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem.
Differential Revision: https://reviews.llvm.org/D28024
llvm-svn: 290333
|
|
|
|
|
|
| |
No tests because these aren't currently used anywhere.
llvm-svn: 290316
|
|
|
|
|
|
|
| |
FMA is canonicalized to constant in the middle operand. Do
the same so fmad matches and avoid an extra combine step.
llvm-svn: 290313
|
|
|
|
| |
llvm-svn: 290312
|
|
|
|
|
|
|
| |
Extend the existing fadd/fsub->fmad combines to produce
FMA if allowed.
llvm-svn: 290311
|
|
|
|
| |
llvm-svn: 290309
|
|
|
|
| |
llvm-svn: 290308
|
|
|
|
| |
llvm-svn: 290307
|
|
|
|
| |
llvm-svn: 290302
|
|
|
|
| |
llvm-svn: 290301
|
|
|
|
| |
llvm-svn: 290300
|