| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 286989
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic.
This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases.
Fix for PR13248
Differential Revision: https://reviews.llvm.org/D26583
llvm-svn: 286979
|
| |
|
|
|
|
| |
intrinsics. NFC
llvm-svn: 286961
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class.
We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction.
Fixes pr30981.
Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet
Subscribers: qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D26620
llvm-svn: 286958
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add basic functionality to support call lowering for X86.
Currently only supports functions which return void and take zero arguments.
Inspired by commit 286573.
Reviewers: ab, qcolombet, t.p.northover
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26593
llvm-svn: 286935
|
| |
|
|
|
|
| |
More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result
llvm-svn: 286838
|
| |
|
|
|
|
|
|
| |
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.
This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)
llvm-svn: 286832
|
| |
|
|
|
|
|
|
| |
vcvtsi2ss/vcvtsi2sd/vcvtusi2ss/vcvtusi2sd. This matches the VEX behavior.
Fixes another problem from PR28850.
llvm-svn: 286790
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions.
-Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions.
-Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions.
-Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax.
-Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing.
This should fix at least some of PR28850.
llvm-svn: 286787
|
| |
|
|
|
|
| |
intrinsics to the new unmasked versions and selects.
llvm-svn: 286786
|
| |
|
|
|
|
|
|
| |
immediate larger than 32. Fix the same bug with VLX vcmpps/vcmppd.
Fixes PR24941.
llvm-svn: 286775
|
| |
|
|
| |
llvm-svn: 286765
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D26128
llvm-svn: 286761
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D26022
llvm-svn: 286758
|
| |
|
|
|
|
| |
These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2.
llvm-svn: 286754
|
| |
|
|
|
|
|
|
| |
Autoupgrade them to recently introduced unmasked versions and a select.
After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics.
llvm-svn: 286725
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
element in XMM.
Summary:
This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.
This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.
Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.
Reviewers: RKSimon, zvi, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26333
llvm-svn: 286711
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases.
Reviewers: delena, RKSimon
Subscribers: Farhana, llvm-commits
Differential Revision: https://reviews.llvm.org/D26297
llvm-svn: 286709
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself. However, the original code did not properly consider this condition
if returned by a target. This patch addresses the issues to allow a target
to compute the series on its own.
Differential revision: https://reviews.llvm.org/D22975
llvm-svn: 286523
|
| |
|
|
|
|
|
| |
This shows up a lot profiling LTO testcases with -time-passes, so
better have a non confusing name.
llvm-svn: 286488
|
| |
|
|
|
|
| |
instruction when available.
llvm-svn: 286435
|
| |
|
|
|
|
|
|
| |
ISD::FP_TO_SINT in the intrinsics table and delete patterns. While nearby also move CVTDQ2PS patterns into their instructions.
This allows these intrinsics to also use EVEX instructons.
llvm-svn: 286434
|
| |
|
|
|
|
| |
table. Removes extra patterns and allows legacy intrinsic to select EVEX encoded instructions when available.
llvm-svn: 286433
|
| |
|
|
|
|
| |
corresponding instructions. NFC
llvm-svn: 286432
|
| |
|
|
|
|
| |
should have been removed in r286344. NFC
llvm-svn: 286431
|
| |
|
|
|
|
|
|
|
| |
represents a relocatable immediate.", with a fix for 32-bit x86.
Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions
that take a global address operand.
llvm-svn: 286420
|
| |
|
|
|
|
|
|
|
| |
represents a relocatable immediate."
Suspected to be the cause of a sanitizer-windows bot failure:
Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420
llvm-svn: 286385
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
immediate.
A relocatable immediate is either an immediate operand or an operand that
can be relocated by the linker to an immediate, such as a regular symbol
in non-PIC code.
Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands
of type "imm32_su". Remove a number of now-redundant patterns.
Differential Revision: https://reviews.llvm.org/D25812
llvm-svn: 286384
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.
If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.
It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.
Differential Revision: https://reviews.llvm.org/D26331
llvm-svn: 286345
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.
Reviewers: delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26330
llvm-svn: 286344
|
| |
|
|
|
|
|
|
| |
256-bits of a 512-bit vector to use a 256-bit aligned store.
Previously we were only checking for 16 byte alignment instead of 32 byte alignment. Fixes PR30947.
llvm-svn: 286342
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
set is also enabled.
Summary:
This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912.
Reviewers: delena, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26322
llvm-svn: 286339
|
| |
|
|
|
|
|
|
|
|
| |
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.
This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.
Differential Revision: https://reviews.llvm.org/D25910
llvm-svn: 286233
|
| |
|
|
|
|
|
|
| |
selects and native zext/sext.
This mostly reuses earlier autoupgrade support for the sse and avx equivalents. Just needed to add the code to add the select.
llvm-svn: 286092
|
| |
|
|
|
|
| |
legacy intrinsics and a select.
llvm-svn: 286089
|
| |
|
|
|
|
| |
upgrade them to a select and the older AVX2 intrinsic.
llvm-svn: 286073
|
| |
|
|
|
|
| |
Instead upgrade them to a select and the older SSE/AVX2 intrinsic.
llvm-svn: 286072
|
| |
|
|
|
|
| |
in xmm. Instead upgrade them to a select and the older SSE/AVX2 intrinsic.
llvm-svn: 286070
|
| |
|
|
|
|
|
|
| |
lowerVectorShuffleAsElementInsertion. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.
llvm-svn: 286067
|
| |
|
|
|
|
| |
addr:)) -> VCVTPS2PDZ128rm
llvm-svn: 286059
|
| |
|
|
|
|
| |
instruction when available.
llvm-svn: 286057
|
| |
|
|
|
|
| |
they can use EVEX instructions when available.
llvm-svn: 286056
|
| |
|
|
|
|
| |
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.
llvm-svn: 286045
|
| |
|
|
|
|
|
|
| |
lowerVectorShuffleAsZeroOrAnyExtend. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.
llvm-svn: 286044
|
| |
|
|
|
|
|
|
| |
lowering. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.
llvm-svn: 286043
|
| |
|
|
|
|
| |
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.
llvm-svn: 286042
|
| |
|
|
|
|
| |
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.
llvm-svn: 286040
|
| |
|
|
|
|
| |
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.
llvm-svn: 286039
|
| |
|
|
|
|
|
|
| |
We are repeatedly calling computeZeroableShuffleElements in many shuffle lowering calls for the same shuffle mask/inputs.
This is a first step towards reusing the zeroable result, initially just for lowerVectorShuffleAsShift calls.
llvm-svn: 286037
|
| |
|
|
| |
llvm-svn: 286034
|