summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrAVX512.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Remove some patterns for bitcasted alignednonedtemporalloads.Craig Topper2017-10-141-18/+0
| | | | | | These select the same instruction as the non-bitcasted pattern. So this provides no additional value. llvm-svn: 315799
* [X86] Remove unnecessary bitconverts as the root of patterns for zero ↵Craig Topper2017-10-141-4/+4
| | | | | | | | extended VCVTPD2UDQZ128rr and VCVTTPD2UDQZ128rr. We don't need a bitconvert as a root pattern in these cases. The types in the other parts of the pattern are sufficient to express the behavior of these instructions. llvm-svn: 315798
* [X86] Add additional patterns for folding loads with 128-bit VCVTDQ2PD and ↵Craig Topper2017-10-141-0/+10
| | | | | | | | | | VCVTUDQ2PD. This matches the patterns we have for the SSE/AVX version. This is a prerequisite for D38714. llvm-svn: 315797
* [X86] Use X86ISD::VBROADCAST in place of v2f64 X86ISD::MOVDDUP when AVX2 is ↵Craig Topper2017-10-131-14/+14
| | | | | | | | | | | | | | | | available This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations. For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP. We may be able to use this for AVX1 as well which would allow us to remove more isel patterns. I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP. Differential Revision: https://reviews.llvm.org/D38836 llvm-svn: 315768
* [X86] Remove patterns that select unmasked vbroadcastf2x32/vbroadcasti2x32. ↵Craig Topper2017-10-131-8/+20
| | | | | | | | Prefer vbroadcastsd/vpbroadcastq instead. There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table. llvm-svn: 315674
* [X86] Fix some patterns that select VLX instructions, but were incorrectly ↵Craig Topper2017-10-101-1/+3
| | | | | | | | also checking presence of BWI instructions. The EVEX->VEX pass probably obscures this. llvm-svn: 315365
* [AVX512] Add patterns to commute integer comparison instructions during isel.Craig Topper2017-10-101-0/+41
| | | | | | This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass. llvm-svn: 315274
* [X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI ↵Craig Topper2017-10-081-24/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | versions of scalar arithmetic patterns Summary: We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD. Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary. Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd. This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type. The rest of the patch is removing all the excess scalar patterns. Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38023 llvm-svn: 315181
* [X86] Add new attribute to X86 instructions to enable marking them as "not ↵Ayman Musa2017-10-081-8/+12
| | | | | | | | | | | memory foldable" This attribute will be used in a tablegen backend that generated the X86 memory folding tables which will be added in a future pass. Instructions with this attribute unset will be excluded from the full set of X86 instructions available for the pass. Differential Revision: https://reviews.llvm.org/D38027 llvm-svn: 315171
* [X86] Remove ISD::INSERT_SUBVECTOR handling from combineBitcastForMaskedOp. ↵Craig Topper2017-10-081-0/+133
| | | | | | | | Add isel patterns to make up for it. This will allow for some flexibility in canonicalizing bitcasts around insert_subvector. llvm-svn: 315160
* [AVX512] Fix TERNLOG when folding broadcastCameron McInally2017-10-061-4/+4
| | | | | | | | | Patch to fix ternlog instructions with a folded broadcast. The broadcast decorator, e.g. {1toX}, was missing. Differential Revision: https://reviews.llvm.org/D38649 llvm-svn: 315122
* [X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input ↵Craig Topper2017-10-041-34/+39
| | | | | | | | | | | | | | instead of FR32/FR64 This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted. This should fix PR33079 Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results. Differential Revision: https://reviews.llvm.org/D38449 llvm-svn: 314914
* [X86] Remove a couple unnecessary COPY_TO_REGCLASS from some output patterns ↵Craig Topper2017-10-011-9/+7
| | | | | | where the instruction already produces the correct register class. llvm-svn: 314638
* [AVX-512] Add patterns to make fp compare instructions commutable during isel.Craig Topper2017-09-301-1/+38
| | | | llvm-svn: 314598
* [AVX-512] Replace large number of explicit patterns that check for ↵Craig Topper2017-09-251-632/+0
| | | | | | | | | | | | | | insert_subvector with zero after masked compares with fewer patterns with predicate This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer. This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends. This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size. Differential Revision: https://reviews.llvm.org/D38217 llvm-svn: 314133
* [X86] Make IFMA instructions during isel so we can fold broadcast loads.Craig Topper2017-09-241-7/+7
| | | | | | This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us. llvm-svn: 314083
* [X86] Add IFMA instructions to the load folding tables and make them ↵Craig Topper2017-09-241-1/+1
| | | | | | commutable for the multiply operands. llvm-svn: 314080
* [AVX-512] Add pattern for selecting masked version of v8i32/v8f32 compare ↵Craig Topper2017-09-241-0/+17
| | | | | | | | instructions when VLX isn't available. We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'. llvm-svn: 314072
* [X86] Convert X86ISD::SELECT to ISD::VSELECT just before instruction ↵Craig Topper2017-09-191-15/+1
| | | | | | | | | | selection to avoid duplicate patterns Similar to what we do for X86ISD::SHRUNKBLEND just turn X86ISD::SELECT into ISD::VSELECT. This allows us to remove the duplicated TRUNC patterns. Differential Revision: https://reviews.llvm.org/D38022 llvm-svn: 313644
* [X86] Remove some unnecessary patterns for truncate with X86ISD::SELECT and ↵Craig Topper2017-09-191-6/+0
| | | | | | | | undef preserved source. We canonicalize undef preserved sources to zero during intrinsic lowering. llvm-svn: 313612
* [X86] Remove the X86ISD::MOVLHPD. Lowering doesn't use it and it's not a ↵Craig Topper2017-09-181-4/+1
| | | | | | | | real instruction. It was used in patterns, but we had the exact same patterns with Unpckl as well. So now just use Unpckl in the instruction patterns. llvm-svn: 313506
* [X86] Synchronize a pattern between SSE1 and AVX/AVX512.Craig Topper2017-09-171-1/+1
| | | | | | | | For some reason the SSE1 pattern expected a X86Movlhps pattern to have a v4f32 type, but AVX and AVX512 expected it to have a v4i32 type. I'm not even sure this pattern is even reachable post SSE1, but I'm starting with fixing this obvious bug. llvm-svn: 313495
* [X86] Remove isel patterns for X86Movhlps and X86Movlhps with integer types. ↵Craig Topper2017-09-171-12/+0
| | | | | | Lowering doesn't emit these. llvm-svn: 313492
* [X86] Remove isel patterns for movlpd/movlps with integer types. Lowering ↵Craig Topper2017-09-171-14/+0
| | | | | | doesn't emit these. llvm-svn: 313491
* X86: Improve AVX512 fptoui loweringZvi Rackover2017-09-071-0/+5
| | | | | | | | | | | | | | | | | Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704
* [X86] Remove patterns for selecting a v8f32 X86ISD::MOVSS or v4f64 ↵Craig Topper2017-09-071-24/+0
| | | | | | | | X86ISD::MOVSD. I don't think we ever generate these. If we did, I would expect we would also be able to generate v16f32 and v8f64, but we don't have those patterns. llvm-svn: 312694
* [X86] Move more isel patterns to X86InstrVecCompiler.td. NFCCraig Topper2017-09-061-347/+0
| | | | | | This moves more of our subvector insert/extract tricks to X86InstrVecCompiler.td and refactors them into multiclasses. llvm-svn: 312661
* [X86] Introduce a new td file to hold patterns some of the non instruction ↵Craig Topper2017-09-061-93/+0
| | | | | | | | | | | | patterns from SSE and AVX512 This patch moves some of similar non-instruction patterns from X86InstrSSE.td and X86InstrAVX512.td to a common file. This is intended as a starting point. There are many other optimization patterns that exist in both files that we could move here. Differential Revision: https://reviews.llvm.org/D37455 llvm-svn: 312649
* [X86] Add more FMA3 patterns to cover a load in all 3 possible positions.Craig Topper2017-09-061-2/+4
| | | | | | This matches what we already do for AVX512. The peephole pass makes up for this in most if not all cases. But this makes isel behavior for these consistent with every other instruction. llvm-svn: 312613
* [X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector ↵Craig Topper2017-09-051-4/+0
| | | | | | | | | | | | | | | | | | FR32X)))) patterns We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512. With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128 The same thing can happen for AVX with vblendps and those separate patterns already exist. For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too. For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too. So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register. llvm-svn: 312564
* [AVX512] Remove patterns for (v8f32 (X86vzmovl (insert_subvector undef, ↵Craig Topper2017-09-051-8/+0
| | | | | | | | (v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64. We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32. llvm-svn: 312543
* [X86] Add hasSideEffects=0 and mayLoad=1 to some instructions that recently ↵Craig Topper2017-09-051-1/+3
| | | | | | had their patterns removed. llvm-svn: 312520
* [X86] Remove duplicate FMA patterns from the isel table.Craig Topper2017-09-041-17/+16
| | | | | | | | This reorders some patterns to get tablegen to detect them as duplicates. Tablegen only detects duplicates when creating variants for commutable operations. It does not detect duplicates between the patterns as written in the td file. So we need to ensure all the FMA patterns in the td file are unique. This also uses null_frag to remove some other unneeded patterns. llvm-svn: 312470
* [X86] Mark the FMA nodes as commutable so tablegen will auto generate the ↵Craig Topper2017-09-041-38/+4
| | | | | | | | | | | | | | patterns. This uses the capability introduced in r312464 to make SDNode patterns commutable on the first two operands. This allows us to remove some of the extra FMA patterns that have to put loads and mask operands in different places to cover all cases. This even includes patterns that were missing to support match a load in the first operand with FMA4. Non-broadcast loads with masking for AVX512. I believe this is causing us to generate some duplicate patterns because tablegen's isomorphism checks don't catch isomorphism between the patterns as written in the td. It only detects isomorphism in the commuted variants it tries to create. The the unmasked 231 and 132 memory forms are isomorphic as written in the td file so we end up keeping both. I think we precommute the 132 pattern to fix this. We also need a follow up patch to go back to the legacy FMA3 instructions and add patterns to the 231 and 132 forms which we currently don't have. llvm-svn: 312469
* [X86] Add more patterns to use moves to zero the upper portions of a vector ↵Craig Topper2017-09-031-0/+46
| | | | | | register that I missed in r312450. llvm-svn: 312459
* [X86] Add patterns to turn an insert into lower subvector of a zero vector ↵Craig Topper2017-09-031-0/+181
| | | | | | | | into a move instruction which will implicitly zero the upper elements. Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed. llvm-svn: 312450
* [AVX512] Suppress duplicate register only FMA patterns.Craig Topper2017-09-011-30/+40
| | | | | | | | Previously we generated a register only pattern for each of the 3 instruction forms, but they are all identical as far as isel is concerned. So drop the others and just keep the 213 version. This removes 2968 bytes from the isel table. llvm-svn: 312313
* [X86] Remove unused multiclass.Craig Topper2017-09-011-17/+0
| | | | llvm-svn: 312312
* [X86] Simplify some multiclasses by inheriting from similar ones. NFCCraig Topper2017-09-011-17/+9
| | | | llvm-svn: 312311
* [X86] Add a couple TODOs to the PMADD52 instrucions about missing commuting ↵Craig Topper2017-09-011-0/+3
| | | | | | opportunities. llvm-svn: 312310
* [AVX512] Don't use 32-bit elements version of AND/OR/XOR/ANDN during isel ↵Craig Topper2017-08-301-32/+34
| | | | | | | | | | | | unless we're matching a masked op or broadcast Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does. Since there's no functional difference, just remove the extra patterns and save some isel table size. Differential Revision: https://reviews.llvm.org/D36854 llvm-svn: 312138
* [AVX512] Correct isel patterns to support selecting masked ↵Craig Topper2017-08-301-31/+55
| | | | | | | | | | | | | | | | | | | | | | | vbroadcastf32x2/vbroadcasti32x2 Summary: This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern. Fixes PR34357 We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch. Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to? There's a 128-bit v2i32 integer broadcast but not an fp one. Reviewers: aymanmus, zvi, igorb Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37286 llvm-svn: 312101
* [AVX512] Use 256-bit extract instructions for extracting bits [255:128] from ↵Craig Topper2017-08-301-0/+58
| | | | | | | | | | a 512-bit register This enables the use of a smaller encoding by using a VEX instruction when possible. Differential Revision: https://reviews.llvm.org/D37092 llvm-svn: 312100
* [AVX512] Add more patterns for using masked moves for subvector extracts of ↵Craig Topper2017-08-271-123/+58
| | | | | | the lowest subvector. This time with bitcasts between the vselect and the extract. llvm-svn: 311856
* [AVX512] Add patterns to match masked extract_subvector with bitcasts ↵Craig Topper2017-08-261-0/+106
| | | | | | | | | | between the vselect and the extract_subvector. Remove the late DAG combine. We used to do a late DAG combine to move the bitcasts out of the way, but I'm starting to think that it's better to canonicalize extract_subvector's type to match the type of its input. I've seen some cases where we've formed two different extract_subvector from the same node where one had a bitcast and the other didn't. Add some more test cases to ensure we've also got most of the zero masking covered too. llvm-svn: 311837
* [AVX512] Add patterns to use masked moves to implement masked ↵Craig Topper2017-08-251-0/+133
| | | | | | | | extract_subvector of the lowest subvector. This only supports 32 and 64 bit element sizes for now. But we could probably do 16 and 8-bit elements with BWI. llvm-svn: 311821
* [X86] Remove X86ISD::FMADD in favor ISD::FMACraig Topper2017-08-231-4/+4
| | | | | | | | | | There's no reason to have a target specific node with the same semantics as a target independent opcode. This should simplify D36335 so that it doesn't need to touch X86ISelDAGToDAG.cpp Differential Revision: https://reviews.llvm.org/D36983 llvm-svn: 311568
* [AVX-512] Don't change which instructions we use for unmasked subvector ↵Craig Topper2017-08-211-61/+43
| | | | | | | | | | broadcasts when AVX512DQ is enabled. There's no functional difference between the AVX512DQ instructions if we're not masking. This change unifies test checks and removes extra isel entries. Similar was done for subvector insert and extracts recently. llvm-svn: 311308
* [AVX-512] Use a scalar load pattern for FPCLASSSS/FPCLASSSD patterns.Craig Topper2017-08-201-5/+5
| | | | llvm-svn: 311297
* [X86] Converge alignedstore/alignedstore256/alignedstore512 to a single ↵Craig Topper2017-08-191-18/+18
| | | | | | | | predicate. We can read the memoryVT and get its store size directly from the SDNode to check its alignment. llvm-svn: 311265
OpenPOWER on IntegriCloud