summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrFragmentsSIMD.td
Commit message (Collapse)AuthorAgeFilesLines
...
* Masked gather and scatter - added DAGCombine visitorsElena Demikhovsky2015-04-301-0/+46
| | | | | | | | | and AVX-512 instruction selection patterns. All other patches, including tests will follow. http://reviews.llvm.org/D7665 llvm-svn: 236211
* Reapply r235977 "[DebugInfo] Add debug locations to constant SD nodes"Sergey Dmitrouk2015-04-281-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | [DebugInfo] Add debug locations to constant SD nodes This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235989
* Revert "[DebugInfo] Add debug locations to constant SD nodes"Daniel Jasper2015-04-281-6/+6
| | | | | | | This breaks a test: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/23870 llvm-svn: 235987
* [DebugInfo] Add debug locations to constant SD nodesSergey Dmitrouk2015-04-281-6/+6
| | | | | | | | | | | | | | | | | | | | | | | This adds debug location to constant nodes of Selection DAG and updates all places that create constants to pass debug locations (see PR13269). Can't guarantee that all locations are correct, but in a lot of cases choice is obvious, so most of them should be. At least all tests pass. Tests for these changes do not cover everything, instead just check it for SDNodes, ARM and AArch64 where it's easy to get incorrect locations on constants. This is not complete fix as FastISel contains workaround for wrong debug locations, which drops locations from instructions on processing constants, but there isn't currently a way to use debug locations from constants there as llvm::Constant doesn't cache it (yet). Although this is a bit different issue, not directly related to these changes. Differential Revision: http://reviews.llvm.org/D9084 llvm-svn: 235977
* AVX-512: Added VPTESTM and VPTESTNM instructions for SKXElena Demikhovsky2015-04-211-4/+6
| | | | llvm-svn: 235383
* Reduce dyn_cast<> to isa<> or cast<> where possible.Benjamin Kramer2015-04-101-14/+14
| | | | | | No functional change intended. llvm-svn: 234586
* AVX-512: Moved patterns for masked load/store under avx_store, avx_load classes.Elena Demikhovsky2015-03-031-0/+52
| | | | | | No functional changes. llvm-svn: 231069
* Reverted 230471 - gather scatter handling in table gen.Elena Demikhovsky2015-03-011-54/+0
| | | | llvm-svn: 230892
* AVX-512: Added mask and rounding mode for scalar arithmeticsElena Demikhovsky2015-03-011-0/+2
| | | | | | Added more tests for scalar instructions to destinguish between AVX and AVX-512 forms. llvm-svn: 230891
* AVX-512: Gather and Scatter patternsElena Demikhovsky2015-02-251-0/+54
| | | | | | | | | | | | | | | Gather and scatter instructions additionally write to one of the source operands - mask register. In this case Gather has 2 destination values - the loaded value and the mask. Till now we did not support code gen pattern for gather - the instruction was generated from intrinsic only and machine node was hardcoded. When we introduce the masked_gather node, we need to select instruction automatically, in the standard way. I added a flag "hasTwoExplicitDefs" that allows to handle 2 destination operands. (Some code in the X86InstrFragmentsSIMD.td is commented out, just to split one big patch in many small patches) llvm-svn: 230471
* [X86][MMX] Support folding loads in psll, psrl and psra intrinsicsBruno Cardoso Lopes2015-02-231-0/+2
| | | | llvm-svn: 230225
* AVX-512: recommitted 229837 + bugfix + testElena Demikhovsky2015-02-231-0/+3
| | | | llvm-svn: 230223
* Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and ↵Eric Christopher2015-02-201-3/+0
| | | | | | | | | | intrinsics." The instructions were being generated on architectures that don't support avx512. This reverts commit r229837. llvm-svn: 229942
* AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics.Elena Demikhovsky2015-02-191-0/+3
| | | | llvm-svn: 229837
* AVX-512: Added support for FP instructions with embedded rounding mode.Elena Demikhovsky2015-02-181-0/+8
| | | | | | By Asaf Badouh <asaf.badouh@intel.com> llvm-svn: 229645
* prevent folding a scalar FP load into a packed logical FP instruction (PR22371)Sanjay Patel2015-02-171-0/+19
| | | | | | | | | | | | | | | | Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors. That's what the hardware packed logical FP instructions define: 128-bit memory operands. There are no scalar versions of these instructions...because this is x86. Generating the wrong code (folding a scalar load into a 128-bit load) is still possible using the peephole optimization pass and the load folding tables. We won't completely solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl() to make sure it isn't changing the size of the load. Differential Revision: http://reviews.llvm.org/D7474 llvm-svn: 229531
* [X86] Remove 256-bit and 512-bit memop pattern fragments. They are no longer ↵Craig Topper2015-02-091-12/+0
| | | | | | used. llvm-svn: 228563
* [X86][MMX] Handle i32->mmx conversion using movdBruno Cardoso Lopes2015-02-051-0/+3
| | | | | | | | | Implement a BITCAST dag combine to transform i32->mmx conversion patterns into a X86 specific node (MMX_MOVW2D) and guarantee that moves between i32 and x86mmx are better handled, i.e., don't use store-load to do the conversion.. llvm-svn: 228293
* [X86][MMX] Move MMX DAG node to proper fileBruno Cardoso Lopes2015-02-051-0/+8
| | | | llvm-svn: 228291
* Fix program crashes due to alignment exceptions generated for SSE memop ↵Sanjay Patel2015-02-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | instructions (PR22371). r224330 introduced a bug by misinterpreting the "FeatureVectorUAMem" bit. The commit log says that change did not affect anything, but that's not correct. That change allowed SSE instructions to have unaligned mem operands folded into math ops, and that's not allowed in the default specification for any SSE variant. The bug is exposed when compiling for an AVX-capable CPU that had this feature flag but without enabling AVX codegen. Another mistake in r224330 was not adding the feature flag to all AVX CPUs; the AMD chips were excluded. This is part of the fix for PR22371 ( http://llvm.org/bugs/show_bug.cgi?id=22371 ). This feature bit is SSE-specific, so I've renamed it to "FeatureSSEUnalignedMem". Changed the existing test case for the feature bit to reflect the new name and renamed the test file itself to better reflect the feature. Added runs to fold-vex.ll to check for the failing codegen. Note that the feature bit is not set by default on any CPU because it may require a configuration register setting to enable the enhanced unaligned behavior. llvm-svn: 227983
* AVX-512: Added FMA intrinsics with rounding modeElena Demikhovsky2015-01-281-0/+9
| | | | | | | | | By Asaf Badouh and Elena Demikhovsky Added special nodes for rounding: FMADD_RND, FMSUB_RND.. It will prevent merge between nodes with rounding and other standard nodes. llvm-svn: 227303
* X86: Added FeatureVectorUAMem for all AVX architectures.Elena Demikhovsky2014-12-161-14/+4
| | | | | | | | | | | | | | | | | | | | | | | According to AVX specification: "Most arithmetic and data processing instructions encoded using the VEX prefix and performing memory accesses have more flexible memory alignment requirements than instructions that are encoded without the VEX prefix. Specifically, With the exception of explicitly aligned 16 or 32 byte SIMD load/store instructions, most VEX-encoded, arithmetic and data processing instructions operate in a flexible environment regarding memory address alignment, i.e. VEX-encoded instruction with 32-byte or 16-byte load semantics will support unaligned load operation by default. Memory arguments for most instructions with VEX prefix operate normally without causing #GP(0) on any byte-granularity alignment (unlike Legacy SSE instructions)." The same for AVX-512. This change does not affect anything right now, because only the "memop pattern fragment" depends on FeatureVectorUAMem and it is not used in AVX patterns. All AVX patterns are based on the "unaligned load" anyway. llvm-svn: 224330
* AVX-512: Added EXPAND instructions and intrinsics.Elena Demikhovsky2014-12-151-0/+3
| | | | llvm-svn: 224241
* AVX-512: Added all forms of COMPRESS instructionElena Demikhovsky2014-12-111-0/+4
| | | | | | + intrinsics + tests llvm-svn: 224019
* AVX-512: Scalar ERI intrinsicsElena Demikhovsky2014-11-261-1/+6
| | | | | | | | | | | | including SAE mode and memory operand. Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future. The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect. I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction. http://reviews.llvm.org/D6378 llvm-svn: 222820
* AVX-512: Intrinsics for ERIElena Demikhovsky2014-11-121-0/+6
| | | | | | | | | 3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms. Intrinsics include SAE (Suppres All Exceptions) parameter. http://reviews.llvm.org/D6214 llvm-svn: 221774
* [x86] Teach the AVX1 path of the new vector shuffle lowering one moreChandler Carruth2014-09-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | trick that I missed. VPERMILPS has a non-immediate memory operand mode that allows it to do asymetric shuffles in the two 128-bit lanes. Use this rather than two shuffles and a blend. However, it turns out the variable shuffle path to VPERMILPS (and VPERMILPD, although that one offers no functional differenc from the immediate operand other than variability) wasn't even plumbed through codegen. Do such plumbing so that we can reasonably emit a variable-masked VPERMILP instruction. Also plumb basic comment parsing and printing through so that the tests are reasonable. There are still a few tests which don't show the shuffle pattern. These are tests with undef lanes. I'll teach the shuffle decoding and printing to handle undef mask entries in a follow-up. I've looked at the masks and they seem reasonable. llvm-svn: 218300
* [x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for theChandler Carruth2014-09-221-4/+4
| | | | | | | | td pattern). Currently we only model the immediate operand variation of VPERMILPS and VPERMILPD, we should make that clear in the pseudos used. Will be adding support for the variable mask variant in my next commit. llvm-svn: 218282
* [x86] Start fixing our emission of ADDSUBPS and ADDSUBPD instructions byChandler Carruth2014-09-151-0/+3
| | | | | | | | | | | | | | | | introducing a synthetic X86 ISD node representing this generic operation. The relevant patterns for mapping these nodes into the concrete instructions are also added, and a gnarly bit of C++ code in the target-specific DAG combiner is replaced with simple code emitting this primitive. The next step is to generically combine blends of adds and subs into this node so that we can drop the reliance on an SSE4.1 ISD node (BLENDI) when matching an SSE3 feature (ADDSUB). llvm-svn: 217819
* [x86] Fix a pretty horrible bug and inconsistency in the x86 asmChandler Carruth2014-09-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | parsing (and latent bug in the instruction definitions). This is effectively a revert of r136287 which tried to address a specific and narrow case of immediate operands failing to be accepted by x86 instructions with a pretty heavy hammer: it introduced a new kind of operand that behaved differently. All of that is removed with this commit, but the test cases are both preserved and enhanced. The core problem that r136287 and this commit are trying to handle is that gas accepts both of the following instructions: insertps $192, %xmm0, %xmm1 insertps $-64, %xmm0, %xmm1 These will encode to the same byte sequence, with the immediate occupying an 8-bit entry. The first form was fixed by r136287 but that broke the prior handling of the second form! =[ Ironically, we would still emit the second form in some cases and then be unable to re-assemble the output. The reason why the first instruction failed to be handled is because prior to r136287 the operands ere marked 'i32i8imm' which forces them to be sign-extenable. Clearly, that won't work for 192 in a single byte. However, making thim zero-extended or "unsigned" doesn't really address the core issue either because it breaks negative immediates. The correct fix is to make these operands 'i8imm' reflecting that they can be either signed or unsigned but must be 8-bit immediates. This patch backs out r136287 and then changes those places as well as some others to use 'i8imm' rather than one of the extended variants. Naturally, this broke something else. The custom DAG nodes had to be updated to have a much more accurate type constraint of an i8 node, and a bunch of Pat immediates needed to be specified as i8 values. The fallout didn't end there though. We also then ceased to be able to match the instruction-specific intrinsics to the instructions so modified. Digging, this is because they too used i32 rather than i8 in their signature. So I've also switched those intrinsics to i8 arguments in line with the instructions. In order to make the intrinsic adjustments of course, I also had to add auto upgrading for the intrinsics. I suspect that the intrinsic argument types may have led everything down this rabbit hole. Pretty happy with the result. llvm-svn: 217310
* [AVX512] Add enum for the static rounding typesAdam Nemet2014-08-141-1/+3
| | | | | | | | | | No functional change. This will be used by the new FMA intrinsic lowering code. We can probably add NO_EXC here as well, I am just not too familiar with this part of AVX512 yet. We can add that later. llvm-svn: 215662
* [X86] Separate DAG node for valign and palignrAdam Nemet2014-08-051-0/+1
| | | | | | | | They have different semantics (valign is interlane while palingr is intralane) and palingr is still needed even in the AVX512 context. According to the latest spec AVX512BW provides these. llvm-svn: 214887
* [SKX] Enabling load/store instructions: encodingRobert Khasanov2014-08-041-0/+2
| | | | | | | | Instructions: VMOVAPD, VMOVAPS, VMOVDQA8, VMOVDQA16, VMOVDQA32,VMOVDQA64, VMOVDQU8, VMOVDQU16, VMOVDQU32,VMOVDQU64, VMOVUPD, VMOVUPS, Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214719
* [x86] Make the x86 PACKSSWB, PACKSSDW, PACKUSWB, and PACKUSDWChandler Carruth2014-06-201-0/+4
| | | | | | | | | | | | | | | instructions available as synthetic SDNodes PACKSS and PACKUS that will select to the correct instruction variants based on the return type. This allows us to use these rather important instructions when lowering vector shuffles. Also moves the relevant instruction definitions to be split out from the fully generic multiclasses to allow them to match these new SDNodes in the same way that the UNPCK instructions do. No functionality should actually be changed here. llvm-svn: 211332
* X86: Lower SMUL_LOHI of v4i32 to pmuldq when SSE4.1 is available.Benjamin Kramer2014-04-261-0/+3
| | | | llvm-svn: 207318
* Rename X86insrtps to the proper instruction name.Filipe Cabecinhas2014-04-211-1/+1
| | | | | | | | | | | | | | | Summary: The INSERTPS pattern fragment was called insrtps (mising 'e'), which would make it harder to grep for the patterns related to this instruction. Renaming it to use the proper instruction name. Reviewers: nadav CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3443 llvm-svn: 206779
* AVX-512: Fixed extract_vector_elt for v16i1 and v8i1 vectors.Elena Demikhovsky2014-02-101-0/+2
| | | | llvm-svn: 201066
* X86: deduplicate V[SZ]EXT_MOVL and V[SZ]EXT nodesTim Northover2014-02-061-10/+0
| | | | | | | | | | | | | | | | | | | | | | I believe VZEXT_MOVL means "zero all vector elements except the first" (and should have identical input & output types) whereas VZEXT means "zero extend each element of a vector (discarding higher elements if necessary)". For example: (v4i32 (vzext (v16i8 ...))) should zero extend the low 4 bytes of the incoming vector to 32-bits, discarding higher bytes. However, somewhere in the past, these two concepts had become confused, even leading to a nonsensical VSEXT_MOVL. This re-merges the nodes where appropriate (all VSEXT_MOVL -> VSEXT, VZEXT_MOVL -> VZEXT when it's an actual extension). rdar://problem/15981990 llvm-svn: 200918
* AVX-512: Added intrinsic for cvtph2ps.Elena Demikhovsky2014-02-051-0/+3
| | | | | | | Added VPTESTNM instruction. Added a pattern to vselect (lit tests will follow). llvm-svn: 200823
* Improve some x86 type constraints.Craig Topper2014-01-261-11/+22
| | | | llvm-svn: 200120
* AVX-512: added VPERM2D VPERM2Q VPERM2PS VPERM2PD instructions,Elena Demikhovsky2014-01-231-0/+1
| | | | | | they give better sequences than VPERMI llvm-svn: 199893
* AVX-512: added intrinsic vcvtpd2ps (with rounding mode and without)Elena Demikhovsky2014-01-061-0/+1
| | | | llvm-svn: 198593
* AVX-512: Added intrinsics for vcvt, vcvtt, vrndscale, vcmpElena Demikhovsky2014-01-011-0/+10
| | | | | | | Printing rounding control. Enncoding for EVEX_RC (rounding control). llvm-svn: 198277
* AVX-512: Added implementation of CONCAT_VECTORS for v8i1 vectors (by Alexey ↵Elena Demikhovsky2013-12-171-0/+3
| | | | | | | | Bader). Added implementation of "truncate" from integer type (i64/i32/i16/i8) to i1. llvm-svn: 197482
* AVX-512: Added legal type MVT::i1 and VK1 register for it.Elena Demikhovsky2013-12-161-5/+10
| | | | | | | | | Added scalar compare VCMPSS, VCMPSD. Implemented LowerSELECT for scalar FP operations. I replaced FSETCCss, FSETCCsd with one node type FSETCCs. Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1. llvm-svn: 197384
* AVX-512: aligned / unaligned load and store for 512-bit integer vectors.Elena Demikhovsky2013-10-221-0/+1
| | | | llvm-svn: 193156
* AVX-512: implemented extractelement with variable index.Elena Demikhovsky2013-09-121-0/+2
| | | | | | Added parsing of mask register and "zeroing" semantic, like {%k1} {z}. llvm-svn: 190595
* AVX-512: added extend and truncate instructions.Elena Demikhovsky2013-08-291-0/+7
| | | | llvm-svn: 189580
* Make sure x86 instructions using ssmem/sdmem operand types are only able to ↵Craig Topper2013-08-261-2/+2
| | | | | | parse memory operands of the proper size in Intel syntax. Primarily affects some of sse cvt instructions. llvm-svn: 189206
* AVX-512: Added SHIFT instructions.Elena Demikhovsky2013-08-211-0/+3
| | | | llvm-svn: 188899
OpenPOWER on IntegriCloud