summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrVecCompiler.td
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled.Craig Topper2019-06-101-1/+22
| | | | llvm-svn: 362915
* [X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. ↵Craig Topper2019-05-221-70/+0
| | | | | | | | | | | | | | | | | | | | | | Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431
* Recommit r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers ↵Craig Topper2019-03-101-1/+1
| | | | | | | | | | | | | | | | | | for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary." Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher. Original commit message: Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts. By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up. This removes something like 40,000 bytes from the X86 isel table. Differential Revision: https://reviews.llvm.org/D58595 llvm-svn: 355784
* Revert r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for ↵Craig Topper2019-03-051-1/+1
| | | | | | | | immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary." This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures. llvm-svn: 355433
* [TableGen][SelectionDAG][X86] Add specific isel matchers for ↵Craig Topper2019-03-011-1/+1
| | | | | | | | | | | | | | immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary. Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts. By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up. This removes something like 40,000 bytes from the X86 isel table. Differential Revision: https://reviews.llvm.org/D58595 llvm-svn: 355224
* [X86] Add test cases for opportunities to fold a truncate and a masked store ↵Craig Topper2019-01-241-1/+1
| | | | | | into a truncating masked store. llvm-svn: 352027
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [X86] Stop changing f128 fand/for/fxor to v2i64.Craig Topper2018-10-301-2/+19
| | | | | | The additional patterns don't cost us much and it seems better than changing element widths. llvm-svn: 345564
* [X86] Add some isel patterns for scalar_to_vector/extract_vector_element ↵Craig Topper2018-10-271-12/+32
| | | | | | that use the avx512 extended register classes when they are available. llvm-svn: 345448
* [X86] Remove all the vector NOP bitcast patterns. Use a few lines of code in ↵Craig Topper2018-08-031-117/+0
| | | | | | | | | | the Select method in X86ISelDAGToDAG.cpp instead. There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function. The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change. llvm-svn: 338824
* [X86] Support fp128 and/or/xor/load/store with VEX and EVEX encoded ↵Craig Topper2018-08-031-0/+78
| | | | | | | | | | | | instructions. Move all the patterns to X86InstrVecCompiler.td so we can keep SSE/AVX/AVX512 all in one place. To save some patterns we'll use an existing DAG combine to convert f128 fand/for/fxor to integer when sse2 is enabled. This allows use to reuse all the existing patterns for v2i64. I believe this now makes SHA instructions the only case where VEX/EVEX and legacy encoded instructions could be generated simultaneously. llvm-svn: 338821
* [X86] Merge the FR128 and VR128 regclass since they have identical spill and ↵Craig Topper2018-07-161-1/+1
| | | | | | | | | | alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147
* [X86] Remove i128 type from FR128 regclass.Craig Topper2018-07-121-2/+0
| | | | | | i128 isn't a legal type in our x86 implementation today. So remove this and the few patterns that used it until it becomes necessary. llvm-svn: 336889
* [X86] Remove patterns for inserting a load into a zero vector.Craig Topper2018-07-111-88/+45
| | | | | | | | We can instead block the load folding isProfitableToFold. Then isel will emit a register->register move for the zeroing part and a separate load. The PostProcessISelDAG should be able to remove the register->register move. This saves us patterns and fixes the fact that we only had unaligned load patterns. The test changes show places where we should have been using an aligned load. llvm-svn: 336828
* [X86] Use IsProfitableToFold to block vinsertf128rm in favor of ↵Craig Topper2018-07-101-1/+0
| | | | | | | | insert_subreg instead of artifically increasing pattern complexity to give priority. This is a much more direct way to solve the issue than just giving extra priority. llvm-svn: 336639
* [X86] Post process the DAG after isel to remove vector moves that were added ↵Craig Topper2018-03-161-61/+0
| | | | | | | | | | | | to zero upper bits. We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations. So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one. Differential Revision: https://reviews.llvm.org/D44289 llvm-svn: 327724
* [X86] Remove duplicate isel pattern. NFCCraig Topper2018-03-091-1/+0
| | | | llvm-svn: 327104
* [X86] Fix some isel patterns that used aligned vector load instructions with ↵Craig Topper2018-03-081-76/+76
| | | | | | | | | | unaligned predicates. These patterns weren't checking the alignment of the load, but were using the aligned instructions. This will cause a GP fault if the data isn't aligned. I believe these were introduced in r312450. llvm-svn: 326967
* [X86] Don't use EXTRACT_ELEMENT from v1i1 with i8/i32 result type when we ↵Craig Topper2018-02-281-0/+17
| | | | | | | | | | need to guarantee zeroes in the upper bits of return. An extract_element where the result type is larger than the scalar element type is semantically an any_extend of from the scalar element type to the result type. If we expect zeroes in the upper bits of the i8/i32 we need to mae sure those zeroes are explicit in the DAG. For these cases the best way to accomplish this is use an insert_subvector to pad zeroes to the upper bits of the v1i1 first. We extend to either v16i1(for i32) or v8i1(for i8). Then bitcast that to a scalar and finish with a zero_extend up to i32 if necessary. We can't extend past v16i1 because that's the largest mask size on KNL. But isel is smarter enough to know that a zext of a bitcast from v16i1 to i16 can use a KMOVW instruction. The insert_subvectors will be dropped during isel because we can determine that the producing instruction already zeroed the upper bits of the k-register. llvm-svn: 326308
* [X86] Various vXi1 insertion improvements.Craig Topper2018-01-231-0/+17
| | | | | | Add missing patterns for inserting v1i1 into a zero vector. Use insert_subvector to zero upper bits before inserting an element into a vXi1 vector. Replace kshift based isel pattern with insert_subvector based pattern now that code that caused the pattern has been fixed to emit insert_subvector. llvm-svn: 323173
* [X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for ↵Craig Topper2018-01-181-12/+12
| | | | | | | | consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. llvm-svn: 322820
* [X86] Make v2i1 and v4i1 legal types without VLXCraig Topper2018-01-071-13/+12
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967
* [X86] Use KMOV instructions to zero upper bits of vectors when possible.Craig Topper2017-12-091-12/+29
| | | | llvm-svn: 320268
* [X86] Teach lowering to only let through (insert_subvector (vXi1 zeros), ↵Craig Topper2017-12-081-32/+4
| | | | | | | | | | | | subvec, 0) for vector sizes that have native KSHIFT support. For narrow sizes we'll widen the zero vector and widen the insert. Then do an extract_subvector to get back down to correct size. This allows us to remove some patterns from the isel table that had to COPY_TO_REGCLASS to an oversized register, do the shift and then COPY_TO_REGCLASS back to the narrow register. Now this is represented explicitly in the DAG. This seems to have perturbed the register allocation in one of the tests, but the number of instructions didn't change. llvm-svn: 320190
* [X86] Always consider inserting a vXi1 vector into the lsbs of a zero vector ↵Craig Topper2017-12-081-1/+95
| | | | | | | | | | | | to be legal during lowering. Add isel patterns to emit shifts. Previously we only allowed these through if the subvector came from a compare or test instruction which we would again check for during isel. With this change we only check for the compare and test instructions during isel and have fallback patterns that emit the shifts if needed. I noticed that in a lot of cases we don't actually see the compare during lowering and rely on an odd legalization of concat_vectors with a zero vector as the second argument. This keeps the concat_vectors around long enough for a later dag combine to expose the compare then we re-legalize the concat_vectors and catch the compare. llvm-svn: 320134
* [X86] If we see an insert of a bitcast into zero vector, canonicalize it to ↵Craig Topper2017-10-081-1/+2
| | | | | | | | move the bitcast to the other side of the insert. This improves detection of zeroing of upper bits during isel. llvm-svn: 315161
* [AVX-512] Replace large number of explicit patterns that check for ↵Craig Topper2017-09-251-0/+81
| | | | | | | | | | | | | | insert_subvector with zero after masked compares with fewer patterns with predicate This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer. This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends. This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size. Differential Revision: https://reviews.llvm.org/D38217 llvm-svn: 314133
* [X86] Add isel pattern infrastructure to begin recognizing when we're ↵Craig Topper2017-09-151-0/+59
| | | | | | | | | | | | | | | | | | inserting 0s into the upper portions of a vector register and the producing instruction as already produced the zeros. Currently if we're inserting 0s into the upper elements of a vector register we insert an explicit move of the smaller register to implicitly zero the upper bits. But if we can prove that they are already zero we can skip that. This is based on a similar idea of what we do to avoid emitting explicit zero extends for GR32->GR64. Unfortunately, this is harder for vector registers because there are several opcodes that don't have VEX equivalent instructions, but can write to XMM registers. Among these are SHA instructions and a MMX->XMM move. Bitcasts can also get in the way. So for now I'm starting with explicitly allowing only VPMADDWD because we emit zeros in combineLoopMAddPattern. So that is placing extra instruction into the reduction loop. I'd like to allow PSADBW as well after D37453, but that's currently blocked by a bitcast. We either need to peek through bitcasts or canonicalize insert_subvectors with zeros to remove bitcasts on the value being inserted. Longer term we should probably have a cleanup pass that removes superfluous zeroing moves even when the producer is in another basic block which is something these isel tricks can't do. See PR32544. Differential Revision: https://reviews.llvm.org/D37653 llvm-svn: 313365
* [X86] Move more isel patterns to X86InstrVecCompiler.td. NFCCraig Topper2017-09-061-1/+184
| | | | | | This moves more of our subvector insert/extract tricks to X86InstrVecCompiler.td and refactors them into multiclasses. llvm-svn: 312661
* [X86] Actually add the new file that was supposed to go with r312649.Craig Topper2017-09-061-0/+179
llvm-svn: 312650
OpenPOWER on IntegriCloud