summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* fix formatting; NFCSanjay Patel2016-11-151-1/+1
| | | | llvm-svn: 286989
* [X86][SSE] Improve SINT_TO_FP of boolean vector results (signum)Simon Pilgrim2016-11-151-1/+4
| | | | | | | | | | | | This patch helps avoids poor legalization of boolean vector results (e.g. 8f32 -> 8i1 -> 8i16) that feed into SINT_TO_FP by inserting an early SIGN_EXTEND and so help improve the truncation logic. This is not necessary for AVX512 targets where boolean vectors are legal - AVX512 manages to lower ( sint_to_fp vXi1 ) into some form of ( select mask, 1.0f , 0.0f ) in most cases. Fix for PR13248 Differential Revision: https://reviews.llvm.org/D26583 llvm-svn: 286979
* [X86][FastISel] Assert that we are dealing with arithmetic with overflow ↵Zvi Rackover2016-11-151-0/+3
| | | | | | intrinsics. NFC llvm-svn: 286961
* [X86][FastISel] Fix lowering of overflow result on AVX512 targetsZvi Rackover2016-11-151-2/+2
| | | | | | | | | | | | | | | | Summary: Fix a case where the overflow value of type i1, which is legal on AVX512, was assigned to a VK1 register class. We always want this value to be assigned to a GPR since the overflow return value is lowered to a SETO instruction. Fixes pr30981. Reviewers: mkuper, igorb, craig.topper, guyblank, qcolombet Subscribers: qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D26620 llvm-svn: 286958
* [X86][GlobalISel] Add minimal call lowering support to the IRTranslatorZvi Rackover2016-11-157-2/+196
| | | | | | | | | | | | | | | Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935
* [CostModel][X86] Added mul costs for vXi8 vectorsSimon Pilgrim2016-11-141-5/+21
| | | | | | More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result llvm-svn: 286838
* [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargetsSimon Pilgrim2016-11-141-0/+4
| | | | | | | | Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason. This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit) llvm-svn: 286832
* [AVX-512] Add suffixless aliases for EVEX encoded ↵Craig Topper2016-11-141-0/+10
| | | | | | | | vcvtsi2ss/vcvtsi2sd/vcvtusi2ss/vcvtusi2sd. This matches the VEX behavior. Fixes another problem from PR28850. llvm-svn: 286790
* [X86] Cleanup 'x' and 'y' mnemonic suffixes for ↵Craig Topper2016-11-143-23/+71
| | | | | | | | | | | | | vcvtpd2dq/vcvttpd2dq/vcvtpd2ps and similar instructions. -Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions. -Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions. -Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax. -Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing. This should fix at least some of PR28850. llvm-svn: 286787
* [AVX-512] Remove and autoupgrade masked dword/qword variable shift ↵Craig Topper2016-11-141-8/+0
| | | | | | intrinsics to the new unmasked versions and selects. llvm-svn: 286786
* [AVX-512] Fix a disassembler failure for AVX-512 vcmpss/vcmpsd with an ↵Craig Topper2016-11-131-4/+14
| | | | | | | | immediate larger than 32. Fix the same bug with VLX vcmpps/vcmppd. Fixes PR24941. llvm-svn: 286775
* revert commit r286761, some builds failed on Win platformsIgor Breger2016-11-131-0/+4
| | | | llvm-svn: 286765
* [X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss|sd} intrinsics.Ayman Musa2016-11-131-4/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 286761
* [X86][AVX512] Add patterns for all variants of VMOVSS/VMOVSD instructions.Ayman Musa2016-11-132-0/+91
| | | | | | Differential Revision: https://reviews.llvm.org/D26022 llvm-svn: 286758
* [AVX-512] Add unmasked intrinsics for variable shifts of dwords and qwords.Craig Topper2016-11-131-0/+8
| | | | | | These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2. llvm-svn: 286754
* [AVX-512] Remove the remaining masked shift by immediate or by single value. ↵Craig Topper2016-11-121-22/+0
| | | | | | | | Autoupgrade them to recently introduced unmasked versions and a select. After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics. llvm-svn: 286725
* [AVX-512] Add unmasked version of shift by immediate and shift by single ↵Craig Topper2016-11-121-0/+22
| | | | | | | | | | | | | | | | | | | element in XMM. Summary: This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend. This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang. Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts. Reviewers: RKSimon, zvi, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26333 llvm-svn: 286711
* [AVX-512] Add support for lowering shuffles to VALIGND/VALIGNQCraig Topper2016-11-121-28/+96
| | | | | | | | | | | | Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases. Reviewers: delena, RKSimon Subscribers: Farhana, llvm-commits Differential Revision: https://reviews.llvm.org/D26297 llvm-svn: 286709
* [DAG Combiner] Fix the native computation of the Newton series for reciprocalsEvandro Menezes2016-11-102-7/+8
| | | | | | | | | | | | The generic infrastructure to compute the Newton series for reciprocal and reciprocal square root was conceived to allow a target to compute the series itself. However, the original code did not properly consider this condition if returned by a target. This patch addresses the issues to allow a target to compute the series on its own. Differential revision: https://reviews.llvm.org/D22975 llvm-svn: 286523
* [Target] Rename X86/ARM Assembly printer to reflect reality.Davide Italiano2016-11-101-1/+1
| | | | | | | This shows up a lot profiling LTO testcases with -time-passes, so better have a non confusing name. llvm-svn: 286488
* [AVX-512] Allow legacy cvtpd2dq intrinsics to select EVEX encoded ↵Craig Topper2016-11-102-8/+12
| | | | | | instruction when available. llvm-svn: 286435
* [AVX-512][X86] Convert avx_cvtt_ps2dq_256 and sse2_cvttps2dq intrinsics to ↵Craig Topper2016-11-102-54/+28
| | | | | | | | ISD::FP_TO_SINT in the intrinsics table and delete patterns. While nearby also move CVTDQ2PS patterns into their instructions. This allows these intrinsics to also use EVEX instructons. llvm-svn: 286434
* [X86] Convert int_x86_avx_cvtt_pd2dq_256 to fp_to_sint using the intrinsics ↵Craig Topper2016-11-102-7/+5
| | | | | | table. Removes extra patterns and allows legacy intrinsic to select EVEX encoded instructions when available. llvm-svn: 286433
* [X86] Move some custom patterns into the currently empty pattern of their ↵Craig Topper2016-11-101-46/+37
| | | | | | corresponding instructions. NFC llvm-svn: 286432
* [X86] Remove some patterns still referencing int_x86_sse2_cvttpd2dq that ↵Craig Topper2016-11-101-9/+5
| | | | | | should have been removed in r286344. NFC llvm-svn: 286431
* Re-apply r286384, "X86: Introduce the "relocImm" ComplexPattern, which ↵Peter Collingbourne2016-11-094-52/+35
| | | | | | | | | represents a relocatable immediate.", with a fix for 32-bit x86. Teach X86InstrInfo::analyzeCompare() not to crash on CMP and SUB instructions that take a global address operand. llvm-svn: 286420
* Revert r286384, "X86: Introduce the "relocImm" ComplexPattern, which ↵Peter Collingbourne2016-11-093-31/+52
| | | | | | | | | represents a relocatable immediate." Suspected to be the cause of a sanitizer-windows bot failure: Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420 llvm-svn: 286385
* X86: Introduce the "relocImm" ComplexPattern, which represents a relocatable ↵Peter Collingbourne2016-11-093-52/+31
| | | | | | | | | | | | | | | immediate. A relocatable immediate is either an immediate operand or an operand that can be relocated by the linker to an immediate, such as a regular symbol in non-PIC code. Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands of type "imm32_su". Remove a number of now-redundant patterns. Differential Revision: https://reviews.llvm.org/D25812 llvm-svn: 286384
* [AVX-512] Add lowering to cvttpd2udq/cvttps2udq for fptoui v2f64/2f32 to 2i32Craig Topper2016-11-095-8/+26
| | | | | | | | | | | | This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions. If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result. It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result. Differential Revision: https://reviews.llvm.org/D26331 llvm-svn: 286345
* [X86] Lower AVX512 and SSE intrinsics for CVTTPD2DQ to X86ISD::CVTTPD2DQ.Craig Topper2016-11-093-30/+34
| | | | | | | | | | | | Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves. Reviewers: delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26330 llvm-svn: 286344
* [AVX-512] Use alignedstore256 in patterns that look for stores of the lower ↵Craig Topper2016-11-091-10/+10
| | | | | | | | 256-bits of a 512-bit vector to use a 256-bit aligned store. Previously we were only checking for 16 byte alignment instead of 32 byte alignment. Fixes PR30947. llvm-svn: 286342
* [AVX-512] Make VBMI instruction set enabling imply that the BWI instruction ↵Craig Topper2016-11-091-2/+2
| | | | | | | | | | | | | | | set is also enabled. Summary: This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912. Reviewers: delena, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26322 llvm-svn: 286339
* [VectorLegalizer] Expansion of CTLZ using CTPOP when possibleSimon Pilgrim2016-11-081-1/+4
| | | | | | | | | | This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available. This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful. Differential Revision: https://reviews.llvm.org/D25910 llvm-svn: 286233
* [AVX-512] Remove masked pmovzx/pmovsx builtins and autoupgrade them to ↵Craig Topper2016-11-071-72/+0
| | | | | | | | selects and native zext/sext. This mostly reuses earlier autoupgrade support for the sse and avx equivalents. Just needed to add the code to add the select. llvm-svn: 286092
* [AVX-512] Remove 128/256 masked pshufb intrinsics. Autoupgrade them to ↵Craig Topper2016-11-071-4/+0
| | | | | | legacy intrinsics and a select. llvm-svn: 286089
* [AVX-512] Remove intrinsics for 128/256-bit masked variable shift. Instead ↵Craig Topper2016-11-061-10/+0
| | | | | | upgrade them to a select and the older AVX2 intrinsic. llvm-svn: 286073
* [AVX-512] Remove intrinsics for 128/256-bit masked shift by immediate. ↵Craig Topper2016-11-061-16/+0
| | | | | | Instead upgrade them to a select and the older SSE/AVX2 intrinsic. llvm-svn: 286072
* [AVX-512] Remove intrinsics for 128/256-bit masked shift by single element ↵Craig Topper2016-11-061-16/+0
| | | | | | in xmm. Instead upgrade them to a select and the older SSE/AVX2 intrinsic. llvm-svn: 286070
* [X86][SSE] Reuse zeroable element mask in ↵Simon Pilgrim2016-11-061-16/+16
| | | | | | | | lowerVectorShuffleAsElementInsertion. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286067
* [AVX-512] Add missing EVEX version of pattern for (v2f64 (extloadv2f32 ↵Craig Topper2016-11-062-1/+3
| | | | | | addr:)) -> VCVTPS2PDZ128rm llvm-svn: 286059
* [AVX-512] Lower AVX cvtpd2ps intrinsic to ISD::FP_ROUND so it can use EVEX ↵Craig Topper2016-11-063-12/+15
| | | | | | instruction when available. llvm-svn: 286057
* [AVX-512] Lower SSE/AVX cvtdq2ps intrinsics directly to ISD::SINT_TO_FP so ↵Craig Topper2016-11-062-18/+2
| | | | | | they can use EVEX instructions when available. llvm-svn: 286056
* [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsBlend. NFCISimon Pilgrim2016-11-051-20/+24
| | | | | | Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286045
* [X86][SSE] Reuse zeroable element mask in ↵Simon Pilgrim2016-11-051-21/+18
| | | | | | | | lowerVectorShuffleAsZeroOrAnyExtend. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286044
* [X86][SSE] Reuse zeroable element mask in SSE4A EXTRQ/INSERTQ vector shuffle ↵Simon Pilgrim2016-11-051-5/+6
| | | | | | | | lowering. NFCI Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286043
* [X86][SSE] Reuse zeroable element mask in PSHUFB vector shuffle lowering. NFCISimon Pilgrim2016-11-051-14/+13
| | | | | | Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286042
* [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsInsertPS. NFCISimon Pilgrim2016-11-051-3/+5
| | | | | | Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286040
* [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsBitMask. NFCISimon Pilgrim2016-11-051-9/+11
| | | | | | Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available. llvm-svn: 286039
* [X86][SSE] Reuse zeroable element mask instead of regenerating it. NFCISimon Pilgrim2016-11-051-30/+47
| | | | | | | | We are repeatedly calling computeZeroableShuffleElements in many shuffle lowering calls for the same shuffle mask/inputs. This is a first step towards reusing the zeroable result, initially just for lowerVectorShuffleAsShift calls. llvm-svn: 286037
* Strip trailing whitespace. NFCI.Simon Pilgrim2016-11-051-4/+4
| | | | llvm-svn: 286034
OpenPOWER on IntegriCloud