summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][AVX512] extend vcvtph2ps to support xmm/ymm and sae versionsAsaf Badouh2015-10-221-0/+6
| | | | | | Differential Revision: http://reviews.llvm.org/D13945 llvm-svn: 251018
* [X86][AVX512DQ] add scalar fpclassAsaf Badouh2015-10-181-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D13769 llvm-svn: 250650
* [X86][XOP] Add VPROT instruction opcodesSimon Pilgrim2015-10-171-0/+7
| | | | | | Added X86ISD opcodes for VPROT vector rotate by variable and by immediate. llvm-svn: 250620
* AVX512: Implemented encoding and intrinsics for vpternlogd/q.Igor Breger2015-10-151-0/+5
| | | | | | Differential Revision: http://reviews.llvm.org/D13768 llvm-svn: 250396
* function names should start with a lower case letter; NFCSanjay Patel2015-10-131-2/+2
| | | | llvm-svn: 250174
* [X86][XOP] Added support for the lowering of 128-bit vector integer ↵Simon Pilgrim2015-10-111-0/+7
| | | | | | | | comparisons to XOP PCOM/PCOMU instructions. The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646). llvm-svn: 249976
* [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP ↵Simon Pilgrim2015-09-301-0/+7
| | | | | | | | | | | | shift instructions The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes. Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases. Differential Revision: http://reviews.llvm.org/D8690 llvm-svn: 248878
* [X86][AVX512] add masked version for RSQRT14 & RCP14 Scalar FPAsaf Badouh2015-09-211-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D12524 llvm-svn: 248147
* AVX512: Implemented encoding and intrinsics for vcmpss/sd.Igor Breger2015-09-201-4/+9
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12593 llvm-svn: 248121
* [X86][AVX512] extend support in Scalar conversionAsaf Badouh2015-09-201-1/+37
| | | | | | | | | | add scalar FP to Int conversion with truncation intrinsics add scalar conversion FP32 from/to FP64 intrinsics add rounding mode and SAE mode encoding for these intrinsics Differential Revision: http://reviews.llvm.org/D12665 llvm-svn: 248117
* AVX512: vsqrtss/sd encoding and intrinsics implementation.Igor Breger2015-09-201-4/+5
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12102 llvm-svn: 248116
* [X86][AVX512DQ] Add fpclass instruction Asaf Badouh2015-09-201-0/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D12931 llvm-svn: 248115
* [CodeGen] Make x86 nontemporal store patfrags generic. NFC.Ahmed Bougacha2015-09-101-19/+0
| | | | | | To be used by other targets. llvm-svn: 247225
* AVX512: Implemented encoding and intrinsics for vplzcntq, vplzcntd, ↵Igor Breger2015-09-031-1/+3
| | | | | | | | | | vpconflictq, vpconflictd Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11931 llvm-svn: 246750
* [X86] Require 32-byte alignment for 32-byte VMOVNTs.Ahmed Bougacha2015-09-021-2/+4
| | | | | | | | | | | | | | | | We used to accept (and even test, and generate) 16-byte alignment for 32-byte nontemporal stores, but they require 32-byte alignment, per SDM. Found by inspection. Instead of hardcoding 16 in the patfrag, check for natural alignment. Also fix the autoupgrade and the various tests. Also, use explicit -mattr instead of -mcpu: I stared at the output several minutes wondering why I get 2x movntps for the unaligned case (which is the ideal output, but needs some work: see FIXME), until I remembered corei7-avx implies +slow-unaligned-mem-32. llvm-svn: 246733
* [X86] Cleanup nontemporal fragments. NFCI.Ahmed Bougacha2015-09-021-15/+6
| | | | | | | | We can chain other fragments to avoid repeating conditions. This also fixes a potential bug (that realistically can't happen), where we would match indexed nontemporal stores for i32/i64. llvm-svn: 246719
* AVX512: Implemented encoding and intrinsics for VGETMANTPD/S , VGETMANTSD/S ↵Igor Breger2015-09-021-7/+9
| | | | | | | | | | instructions Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11593 llvm-svn: 246642
* AVX512: ktest implemantationIgor Breger2015-08-311-0/+1
| | | | | | | | Added tests for encoding. Differential Revision: http://reviews.llvm.org/D11979 llvm-svn: 246439
* AVX512: Implemented encoding and intrinsics for vdbpsadbwIgor Breger2015-08-311-0/+3
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D12491 llvm-svn: 246436
* AVX512: Implemented encoding and intrinsics for VGETEXPSS/D instructionsIgor Breger2015-07-281-1/+2
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11528 llvm-svn: 243390
* AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer ↵Igor Breger2015-07-241-14/+42
| | | | | | | | | | Truncate with/without saturation Added tests for DAG lowering ,encoding and intrinsic Differential Revision: http://reviews.llvm.org/D11218 llvm-svn: 243122
* Revert r242990: "AVX-512: Implemented encoding , DAG lowering and ..."Chandler Carruth2015-07-231-8/+10
| | | | | | | | | | This commit broke the build. Numerous build bots broken, and it was blocking my progress so reverting. It should be trivial to reproduce -- enable the BPF backend and it should fail when running llvm-tblgen. llvm-svn: 242992
* AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer ↵Igor Breger2015-07-231-10/+8
| | | | | | | | | | Truncate with/without saturation Added tests for DAG lowering ,encoding and intrinsic Differential Revision: http://reviews.llvm.org/D11218 llvm-svn: 242990
* [X86][AVX512] add reduce/range/scalef/rndScaleAsaf Badouh2015-07-221-1/+6
| | | | | | | | include encoding and intrinsics Differential Revision: http://reviews.llvm.org/D11222 llvm-svn: 242896
* AVX512 : Implemented VPMADDUBSW and VPMADDWD instruction , Igor Breger2015-07-211-0/+3
| | | | | | | | Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D11351 llvm-svn: 242761
* AVX-512: Added all AVX-512 forms of Vector Convert for Float/Double/Int/Long ↵Elena Demikhovsky2015-07-131-2/+67
| | | | | | | | | | | | types. In this patch I have only encoding. Intrinsics and DAG lowering will be in the next patch. I temporary removed the old intrinsics test (just to split this patch). Half types are not covered here. Differential Revision: http://reviews.llvm.org/D11134 llvm-svn: 242023
* [X86][SSE4A] Shuffle lowering using SSE4A EXTRQ/INSERTQ instructionsSimon Pilgrim2015-07-061-0/+8
| | | | | | | | | | | | This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it. As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions. From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level. Differential Revision: http://reviews.llvm.org/D10146 llvm-svn: 241508
* [X86][SSE] Use the general SMAX/SMIN/UMAX/UMIN opcodes and remove the X86 ↵Simon Pilgrim2015-07-061-5/+0
| | | | | | | | | | | | implementation With the completion of D9746 there is now a common implementation of integer signed/unsigned min/max nodes, removing the need for the equivalent X86 specific implementations. This patch removes the old X86ISD nodes, legalizes the relevant SSE2/SSE41/AVX2/AVX512 instructions for the ISD versions and converts the small amount of existing X86 code. Differential Revision: http://reviews.llvm.org/D10947 llvm-svn: 241506
* [X86][AVX512] Multiply Packed Unsigned Integers with Round and ScaleAsaf Badouh2015-07-061-0/+1
| | | | | | | | | pmulhrsw review: http://reviews.llvm.org/D10948 llvm-svn: 241443
* AVX-512: all forms of SCATTER instruction on SKX,Elena Demikhovsky2015-06-291-0/+24
| | | | | | encoding, intrinsics and tests. llvm-svn: 240936
* [x86][AVX512]Asaf Badouh2015-06-281-0/+1
| | | | | | | | | | | Add vscalef support include encoding and intrinsics review: http://reviews.llvm.org/D10730 llvm-svn: 240906
* AVX-512: Added all SKX forms of GATHER instructions.Elena Demikhovsky2015-06-281-0/+22
| | | | | | | Added intrinsics. Added encoding and tests. llvm-svn: 240905
* AVX-512: Added all forms of VPABS instructionElena Demikhovsky2015-06-231-0/+1
| | | | | | Added all intrinsics, tests for encoding, tests for intrinsics. llvm-svn: 240386
* AVX-512: All forms of VCOPMRESS VEXPAND instructions,Elena Demikhovsky2015-06-221-6/+4
| | | | | | encoding tests. llvm-svn: 240272
* [AVX512]Asaf Badouh2015-06-181-0/+1
| | | | | | | | | | add instructions: VPAVGB and VPAVGW review http://reviews.llvm.org/D10504 llvm-svn: 240012
* [X86][SSE] Vectorize v2i32 to v2f64 conversionsSimon Pilgrim2015-06-161-0/+3
| | | | | | | | This patch enables support for the conversion of v2i32 to v2f64 to use the CVTDQ2PD xmm instruction and stay on the SSE unit instead of scalarizing, sign extending to i64 and using CVTSI2SDQ scalar conversions. Differential Revision: http://reviews.llvm.org/D10433 llvm-svn: 239855
* AVX-512: Implemented cvtsi2ss/d cvtusi2ss/d instructions with round control ↵Igor Breger2015-06-141-0/+6
| | | | | | | | | | | for KNL. Added intrinsics for cvtsi2ss/d instructions. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D10430 llvm-svn: 239694
* re-apply 238809Asaf Badouh2015-06-031-0/+5
| | | | | | | | | | AVX-512: Implemented GETEXP instruction for KNL and SKX Added rounding mode modifier for SQRTPS/PD Added tests for encoding and intrinsics. CR: http://reviews.llvm.org/D9991 llvm-svn: 238923
* AVX-512: Implemented SHUFF32x4/SHUFF64x2/SHUFI32x4/SHUFI64x2 instructions ↵Elena Demikhovsky2015-06-031-1/+2
| | | | | | | | | | for SKX and KNL. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238917
* [X86] Removed (unused) FSRL x86 operationSimon Pilgrim2015-06-031-3/+0
| | | | | | | | | | This patch removes the old X86ISD::FSRL op - which allowed float vectors to use the byte right shift operations (causing a domain switch....). Since the refactoring of the shuffle lowering code this no longer has any use. Differential Revision: http://reviews.llvm.org/D10169 llvm-svn: 238906
* revert 238809Asaf Badouh2015-06-021-5/+0
| | | | llvm-svn: 238810
* AVX-512: Implemented GETEXP instruction for KNL and SKXAsaf Badouh2015-06-021-0/+5
| | | | | | | Added rounding mode modifier for SQRTPS/PD Added tests for encoding and intrinsics. llvm-svn: 238809
* AVX-512: Implemented VRANGEPD and VRANGEPD instructions for SKX.Elena Demikhovsky2015-06-011-0/+1
| | | | | | | | | Implemented DAG lowering for all these forms. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238738
* AVX-512: Implemented VFIXUPIMMPD and VFIXUPIMMPS instructions for KNL and SKXElena Demikhovsky2015-06-011-0/+4
| | | | | | | | | Implemented DAG lowering for all these forms. Added tests for encoding. by Igor Breger (igor.breger@intel.com) llvm-svn: 238728
* [x86] Implement a faster vector population count based on the PSHUFBChandler Carruth2015-05-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in-register LUT technique. Summary: A description of this technique can be found here: http://wm.ite.pl/articles/sse-popcount.html The core of the idea is to use an in-register lookup table and the PSHUFB instruction to compute the population count for the low and high nibbles of each byte, and then to use horizontal sums to aggregate these into vector population counts with wider element types. On x86 there is an instruction that will directly compute the horizontal sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily. Various tricks are used to get vNi32 and vNi16 from the vNi8 that the LUT computes. The base implemantion of this, and most of the work, was done by Bruno in a follow up to D6531. See Bruno's detailed post there for lots of timing information about these changes. I have extended Bruno's patch in the following ways: 0) I committed the new tests with baseline sequences so this shows a diff, and regenerated the tests using the update scripts. 1) Bruno had noticed and mentioned in IRC a redundant mask that I removed. 2) I introduced a particular optimization for the i32 vector cases where we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD + PSADBW to compute doubled high i32 popcounts. This takes advantage of the fact that to line up the high i32 popcounts we have to shift them anyways, and we can shift them by one fewer bit to effectively divide the count by two. While the PSHUFD based horizontal add is no faster, it doesn't require registers or load traffic the way a mask would, and provides more ILP as it happens on different ports with high throughput. 3) I did some code cleanups throughout to simplify the implementation logic. 4) I refactored it to continue to use the parallel bitmath lowering when SSSE3 is not available to preserve the performance of that version on SSE2 targets where it is still much better than scalarizing as we'll still do a bitmath implementation of popcount even in scalar code there. With #1 and #2 above, I analyzed the result in IACA for sandybridge, ivybridge, and haswell. In every case I measured, the throughput is the same or better using the LUT lowering, even v2i64 and v4i64, and even compared with using the native popcnt instruction! The latency of the LUT lowering is often higher than the latency of the scalarized popcnt instruction sequence, but I think those latency measurements are deeply misleading. Keeping the operation fully in the vector unit and having many chances for increased throughput seems much more likely to win. With this, we can lower every integer vector popcount implementation using the LUT strategy if we have SSSE3 or better (and thus have PSHUFB). I've updated the operation lowering to reflect this. This also fixes an issue where we were scalarizing horribly some AVX lowerings. Finally, there are some remaining cleanups. There is duplication between the two techniques in how they perform the horizontal sum once the byte population count is computed. I'm going to factor and merge those two in a separate follow-up commit. Differential Revision: http://reviews.llvm.org/D10084 llvm-svn: 238636
* AVX-512: Added VBROADCASTF64X4, VBROADCASTF64X2, VBROADCASTI32X8, and other ↵Elena Demikhovsky2015-05-181-0/+3
| | | | | | | | instructions from this set Added encoding tests. llvm-svn: 237557
* AVX-512: Added SKX instructions and intrinsics:Elena Demikhovsky2015-05-111-2/+2
| | | | | | | | {add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236971
* AVX-512: Added all forms of FP compare instructions for KNL and SKX.Elena Demikhovsky2015-05-071-5/+12
| | | | | | | | Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec. By Igor Breger (igor.breger@intel.com) llvm-svn: 236714
* AVX-512: added calling convention for i1 vectors in 32-bit mode.Elena Demikhovsky2015-05-041-1/+0
| | | | | | | Fixed some bugs in extend/truncate for AVX-512 target. Removed VBROADCASTM (masked broadcast) node, since it is not used any more. llvm-svn: 236420
* AVX-512: added integer "add" and "sub" instructions with saturation for SKXElena Demikhovsky2015-05-041-0/+3
| | | | | | | | with intrinsics and tests by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236418
OpenPOWER on IntegriCloud