summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86InstrAVX512.td
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Turn FP_ROUND/STRICT_FP_ROUND into X86ISD::VFPROUND/STRICT_VFPROUND ↵Craig Topper2020-01-111-60/+0
| | | | during PreprocessISelDAG to remove some duplicate isel patterns.
* [X86] Add isel patterns for bitcasting between v32i1/v64i1 and float/double.Craig Topper2020-01-081-0/+11
| | | | | | We have to do an intermediate jump to a GPR to make the cast. Fixes PR43750.
* [NFC] Fix trivial typos in commentsJames Henderson2020-01-061-1/+1
| | | | | | | | Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.
* [X86] Reorder X86any* PatFrags to put the strict node first so that chain ↵Craig Topper2020-01-031-2/+2
| | | | | | | | | | | property will be inferred for the instruction by the tablegen backend. Also use X86any_vfpround instead of X86vfpround in some instruction definitions so the strict version can be used to infer the chain property. Without these changes we don't propagate strict FP chain through isel for some instructions.
* add strict float for round operationLiu, Chen32020-01-011-3/+3
| | | | Differential Revision: https://reviews.llvm.org/D72026
* [X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without ↵Craig Topper2019-12-261-49/+0
| | | | | | | | | | | | | | | | | | | | | | avx512vl. Similar for vXi64 on avx512dq without avx512vl. Summary: Previously we did this with isel patterns that used garbage in the widened part of the source. But that's not valid for strictfp. So now we custom widen and use zeroes for the widened elemens for strictfp. This replaces D71864. Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3 Reviewed By: pengfei Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71879
* add custom operation for strict fpextend/fproundLiu, Chen32019-12-271-4/+4
| | | | Differential Revision: https://reviews.llvm.org/D71892
* [X86] Custom widen 128/256-bit vXi32 uint_to_fp on avx512f targets without ↵Craig Topper2019-12-261-45/+0
| | | | | | | | avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl. Previously we widened these through isel patterns, but that didn't work for STRICT_ nodes. Those need to be padded with zeroes in the upper bits which is harder to do in isel patterns.
* [X86] Add custom widening for v2i32->v2f64 strict_uint_to_fp with AVX512F, ↵Craig Topper2019-12-261-5/+0
| | | | | | | | | | | | | | | but not AVX512VL. Previously we were widening with isel patterns, but that wasn't exception safe for strict FP. So now we widen to v4i32->v4f64 during type legalization. And then let op legalization further widen to v8i32->v8f64. The vec_int_to_fp.ll changes are caused by us no longer narrowing extracts of strict_uint_to_fp to the v4i32->v2f64 instruction without AVX512VL only to have isel rewiden it. Now we just keep it wide throughout. So we don't have an opportunity to narrow the load.
* [X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backendWang, Pengfei2019-12-261-15/+15
| | | | | | | | | | | | Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71871
* [X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.Craig Topper2019-12-241-41/+41
| | | | Differential Revision: https://reviews.llvm.org/D71850
* [FPEnv][X86] More strict int <-> FP conversion fixesUlrich Weigand2019-12-231-49/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840
* Enable STRICT_FP_TO_SINT/UINT on X86 backendLiu, Chen32019-12-191-8/+8
| | | | | | This patch is mainly for custom lowering the vector operation. Differential Revision: https://reviews.llvm.org/D71592
* [X86] Add strict fma supportWang, Pengfei2019-12-181-6/+6
| | | | | | | | | | | | Summary: Add strict fma support Reviewers: craig.topper, RKSimon, LiuChen3 Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71604
* [FPEnv][X86] Constrained FCmp intrinsics enabling on X86Wang, Pengfei2019-12-111-10/+8
| | | | | | | | | | | | Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision. Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70582
* add support for strict operation fpextend/fpround/fsqrt on X86 backendLiu, Chen32019-12-101-30/+30
| | | | Differential Revision: https://reviews.llvm.org/D71184
* Add strict fp support for instructions fadd/fsub/fmul/fdivLiu, Chen32019-12-061-8/+8
| | | | Differential Revision: https://reviews.llvm.org/D68757
* [X86] Model DAZ and FTZWang, Pengfei2019-12-041-19/+41
| | | | | | | | | | | | Summary: This is a follow-up of D70881. It models DAZ and FTZ for releated instructions. Reviewers: craig.topper, RKSimon, andrew.w.kaylor Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70938
* [X86] Model MXCSR for all AVX512 instructionsWang, Pengfei2019-12-041-55/+76
| | | | | | | | | | | | Summary: Model MXCSR for all AVX512 instructions Reviewers: craig.topper, RKSimon, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70881
* [X86] Add floating point execution domain to ↵Craig Topper2019-11-301-2/+5
| | | | comi/ucomi/cvtss2si/cvtsd2si/cvttss2si/cvttsd2si/cvtsi2ss/cvtsi2sd instructions.
* [X86] Add SSEPackedSingle/Double execution domain to COMI/UCOMI SSE/AVX ↵Craig Topper2019-11-271-13/+14
| | | | instructions.
* [X86] Add proper execution domain information to the avx512vnni instructions.Craig Topper2019-11-251-0/+2
|
* [X86][AVX] Add plausible schedule classes to MASKPAIR/VP2INTERSECT/VDPBF16PS ↵Simon Pilgrim2019-11-131-20/+24
| | | | | | | | instructions These are really just placeholders that use approximately the right resources - once we have CPUs scheduler models that support these instructions they will need revisiting. In the meantime this means that all instructions have a class of some kind., meaning models can be more easily flagged as complete.
* [X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these ↵Craig Topper2019-10-041-146/+32
| | | | | | | | | | | | | | | | | based on the immediate in MCInstLower The immediate form of VPCMP can represent these completely. The vpcmpgt/eq are just shorter encodings. This patch removes the isel patterns and just swaps the opcodes and removes the immediate in MCInstLower. This matches where we do some other encodings tricks. Removes over 10K bytes from the isel table. Differential Revision: https://reviews.llvm.org/D68446 llvm-svn: 373766
* [X86] Add DAG combine to turn (bitcast (vbroadcast_load)) into just a ↵Craig Topper2019-10-031-103/+2
| | | | | | | | | | | | | | | | vbroadcast_load if the scalar size is the same. This improves broadcast load folding of i64 elements on 32-bit targets where i64 isn't legal. Previously we had to represent these as vXf64 vbroadcast_loads and a bitcast to vXi64. But we didn't have any isel patterns looking for that. This also allows us to remove or simplify some isel patterns that were looking for bitcasted vbroadcast_loads. llvm-svn: 373566
* [X86] Add broadcast load folding patterns to NoVLX ↵Craig Topper2019-10-031-7/+31
| | | | | | | | VPMULLQ/VPMAXSQ/VPMAXUQ/VPMINSQ/VPMINUQ patterns. More fixes for PR36191. llvm-svn: 373560
* [X86] Remove a couple redundant isel patterns that look to have been ↵Craig Topper2019-10-031-17/+0
| | | | | | copy/pasted from right above them. NFC llvm-svn: 373559
* [X86] Add broadcast load folding patterns to the NoVLX compare patterns.Craig Topper2019-10-021-16/+138
| | | | | | | | | These patterns use zmm registers for 128/256-bit compares when the VLX instructions aren't available. Previously we only supported registers, but as PR36191 notes we can fold broadcast loads, but not regular loads. llvm-svn: 373423
* [X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load ↵Craig Topper2019-10-011-145/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349
* [X86] Consider isCodeGenOnly in the EVEX2VEX pass to make VMAXPD/PS map to ↵Craig Topper2019-10-011-16/+25
| | | | | | | | | | the non-commutable VEX instruction. Use EVEX2VEX override to fix the scalar instructions. Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD were mapped to the same VEX instruction. But we should keep the commutableness when change the opcode. llvm-svn: 373303
* [X86] Remove some redundant isel patterns. NFCICraig Topper2019-09-301-78/+0
| | | | | | | These are all also implemented in avx512_logical_lowering_types with support for masking. llvm-svn: 373181
* [X86] Enable isel to fold broadcast loads that have been bitcasted from FP ↵Craig Topper2019-09-291-0/+96
| | | | | | into a vpternlog. llvm-svn: 373157
* [X86] Move bitselect matching to vpternlog into X86ISelDAGToDAG.cppCraig Topper2019-09-291-43/+107
| | | | | | | | | | | | This allows us to reduce the use count on the condition node before the match. This enables load folding for that operand without relying on the peephole pass. This will be improved on for broadcast load folding in a subsequent commit. This still requires a bunch of isel patterns for vXi16/vXi8 types though. llvm-svn: 373156
* [X86] Match (or (and A, B), (andn (A, C))) to VPTERNLOG with AVX512.Craig Topper2019-09-291-0/+43
| | | | | | This uses a similar isel pattern as we used for vpcmov with XOP. llvm-svn: 373154
* [X86] Remove CodeGenOnly instructions added in r373021, but keep the isel ↵Craig Topper2019-09-261-16/+10
| | | | | | patterns and add COPY_TO_REGCLASS to them. llvm-svn: 373031
* [X86] Add CodeGenOnly instructions for (f32 (X86selects $mask, (loadf32 ↵Craig Topper2019-09-261-1/+23
| | | | | | | | | | | | addr), fp32imm0) to use masked MOVSS from memory. Similar for f64 and having a non-zero passthru value. We were previously not trying to fold the load at all. Using a CodeGenOnly instruction allows us to use FR32X/FR64X as the register class to avoid a bunch of COPY_TO_REGCLASS. llvm-svn: 373021
* [X86] Mark the EVEX encoded PSADBW instructions as commutable to enable load ↵Craig Topper2019-09-261-0/+1
| | | | | | | | folding of the other operand. The SSE and VEX versions are already correct. llvm-svn: 372941
* [X86] Fix some VCVTPS2PH isel patterns where 'i32' was used instead of 'timm'Craig Topper2019-09-221-8/+8
| | | | | | | This seems to have completed omitted any check for the opcode of the operand in the isel table. llvm-svn: 372526
* [X86][TableGen] Allow timm to appear in output patterns. Use it to remove ↵Craig Topper2019-09-221-56/+56
| | | | | | | | | | | | | | ConvertToTarget opcodes from the X86 isel table. We're now using a lot more TargetConstant nodes in SelectionDAG. But we were still telling isel to convert some of them to TargetConstants even though they already are. This is because isel emits a conversion anytime the output pattern has a an 'imm'. I guess for patterns in instructions we take the 'timm' from the 'set' pattern, but for Pat patterns with explcicit output we previously had to say 'imm' since 'timm' wasn't allowed in outputs. llvm-svn: 372525
* [X86] Update commutable EVEX vcmp patterns to use timm instead of imm.Craig Topper2019-09-221-6/+6
| | | | | | | We need to match TargetConstant, not Constant. This was broken in r372338, but we lacked test coverage. llvm-svn: 372523
* [X86] Use sse_load_f32/f64 and timm in patterns for memory form of ↵Craig Topper2019-09-211-4/+3
| | | | | | | | | | | | vgetmantss/sd. Previously we only matched scalar_to_vector and scalar load, but we should be able to narrow a vector load or match vzload. Also need to match TargetConstant instead of Constant. The register patterns were previously updated, but not the memory patterns. llvm-svn: 372458
* Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Matt Arsenault2019-09-191-112/+112
| | | | | | | | | This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338
* Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"Hans Wennborg2019-09-191-112/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC*. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_* instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314
* GlobalISel: Don't materialize immarg arguments to intrinsicsMatt Arsenault2019-09-191-112/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encode them directly as an imm argument to G_INTRINSIC*. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_* instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285
* [X86] Allow masked VBROADCAST instructions to be turned into BLENDM with a ↵Craig Topper2019-09-171-47/+104
| | | | | | | | | | broadcast load to avoid a copy. The BLENDM instructions allow an 2 sources and an independent destination while masked VBROADCAST has the destination tied to the source. llvm-svn: 372068
* [X86] Enable commuting of EVEX VCMP for all immediate values during isel.Craig Topper2019-09-171-12/+17
| | | | llvm-svn: 372065
* DAG/GlobalISel: Correct type profile of bitcount opsMatt Arsenault2019-09-131-6/+6
| | | | | | | | The result integer does not need to be the same width as the input. AMDGPU, NVPTX, and Hexagon all have patterns working around the types matching. GlobalISel defines these as being different type indexes. llvm-svn: 371797
* Rename nonvolatile_load/store to simple_load/store [NFC]Philip Reames2019-09-121-3/+3
| | | | | | Implement the TODO from D66318. llvm-svn: 371789
* [X86] Use xorps to create fp128 +0.0 constants.Craig Topper2019-09-091-1/+3
| | | | | | This matches what we do for f32/f64. gcc also does this for fp128. llvm-svn: 371357
* [X86] Remove call to getZeroVector from materializeVectorConstant. Add isel ↵Craig Topper2019-09-081-0/+9
| | | | | | | | | | | | | | | | patterns for zero vectors with all types. The change to avx512-vec-cmp.ll is a regression, but should be easy to fix. It occurs because the getZeroVector call was canonicalizing both sides to the same node, then SimplifySelect was able to simplify it. But since only called getZeroVector on some VTs this isn't a robust way to combine this. The change to vector-shuffle-combining-ssse3.ll is more instructions, but removes a constant pool load so its unclear if its a regression or not. llvm-svn: 371350
OpenPOWER on IntegriCloud