summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are ↵Craig Topper2019-10-011-0/+34
| | | | | | | | | | | | constant with sufficient sign bits to fit in vXi32 The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size. I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold. Differential Revision: https://reviews.llvm.org/D68247 llvm-svn: 373408
* Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ↵Craig Topper2019-10-011-79/+1
| | | | | | | | | | | | | | | | ops." This seems to be causing some performance regresions that I'm trying to investigate. One thing that stands out is that this transform can increase the live range of the operands of the earlier logic op. This can be bad for register allocation. If there are two logic op inputs we should really combine the one that is closest, but SelectionDAG doesn't have a good way to do that. Maybe we need to do this as a basic block transform in Machine IR. llvm-svn: 373401
* [X86] convertToThreeAddress, make sure second operand of SUB32ri is really ↵Craig Topper2019-10-011-0/+4
| | | | | | | | | | | | | an immediate before calling getImm(). It might be a symbol instead. We can't fold those since we can't negate them. Similar for other SUB with immediates. Fixes PR43529. llvm-svn: 373397
* [X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load ↵Craig Topper2019-10-016-182/+335
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349
* [X86] Consider isCodeGenOnly in the EVEX2VEX pass to make VMAXPD/PS map to ↵Craig Topper2019-10-011-16/+25
| | | | | | | | | | the non-commutable VEX instruction. Use EVEX2VEX override to fix the scalar instructions. Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD were mapped to the same VEX instruction. But we should keep the commutableness when change the opcode. llvm-svn: 373303
* TLI: Remove DAG argument from getRegisterByNameMatt Arsenault2019-10-012-8/+6
| | | | | | | | | | | Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292
* [X86] Mask off upper bits of splat element in LowerBUILD_VECTORvXi1 when ↵Craig Topper2019-09-301-2/+12
| | | | | | | | | | | | | | | | forming a SELECT. The i1 scalar would have been type legalized to i8, but that doesn't guarantee anything about the upper bits. If we're going to use it as condition we need to make sure the upper bits are 0. I've special cased ISD::SETCC conditions since that should guarantee zero upper bits. We could go further and use computeKnownBits, but we have no tests that would need that. Fixes PR43507. llvm-svn: 373246
* [X86] Address post-commit review from code I accidentally commited in r373136.Craig Topper2019-09-301-3/+6
| | | | | | See https://reviews.llvm.org/D68167 llvm-svn: 373245
* [NewPM] Port MachineModuleInfo to the new pass manager.Yuanfang Chen2019-09-302-4/+4
| | | | | | | | | | | | | Existing clients are converted to use MachineModuleInfoWrapperPass. The new interface is for defining a new pass manager API in CodeGen. Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm Reviewed By: arsenm, fedor.sergeev Differential Revision: https://reviews.llvm.org/D64183 llvm-svn: 373240
* [X86] Add ANY_EXTEND to switch in ReplaceNodeResults, but just fall back to ↵Craig Topper2019-09-301-0/+6
| | | | | | | | | | | | | default handling. ANY_EXTEND of v8i8 is marked Custom on AVX512 for handling extends from v8i8. But the type legalization infrastructure will call ReplaceNodeResults for v8i8 results. We should just defer it the default handling instead of asserting in the default of the switch. Fixes PR43509. llvm-svn: 373234
* [X86] Remove some redundant isel patterns. NFCICraig Topper2019-09-301-78/+0
| | | | | | | These are all also implemented in avx512_logical_lowering_types with support for masking. llvm-svn: 373181
* [X86] Split v16i32/v8i64 bitreverse on avx512f targets without avx512bw to ↵Craig Topper2019-09-301-1/+12
| | | | | | enable the use of vpshufb on the 256-bit halves. llvm-svn: 373177
* [X86] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r373174Fangrui Song2019-09-301-0/+1
| | | | llvm-svn: 373175
* [X86] Remove -x86-experimental-vector-widening-legalization command line flagCraig Topper2019-09-292-1257/+145
| | | | | | | | | This was added back to allow some performance regressions to be investigated. The main perf issue was fixed shortly after adding this back and no other major issues have been reported. So I think its safe to remove this again. llvm-svn: 373174
* [X86] Add custom isel logic to match VPTERNLOG from 2 logic ops.Craig Topper2019-09-291-1/+79
| | | | | | | | | | | | | | | There's room from improvement here, but this is a decent starting point. There are a few minor regressions in the vector-rotate tests, where we are now forming a vpternlog from an and before we get a chance to form it for a bitselect that we were matching previously. This results in an AND and an ANDN feeding the vpternlog where previously we just had an AND after the vpternlog. I think we can probably DAG combine the AND with the bitselect to get back to similar codegen. llvm-svn: 373172
* [X86] Enable isel to fold broadcast loads that have been bitcasted from FP ↵Craig Topper2019-09-291-0/+96
| | | | | | into a vpternlog. llvm-svn: 373157
* [X86] Move bitselect matching to vpternlog into X86ISelDAGToDAG.cppCraig Topper2019-09-292-43/+160
| | | | | | | | | | | | This allows us to reduce the use count on the condition node before the match. This enables load folding for that operand without relying on the peephole pass. This will be improved on for broadcast load folding in a subsequent commit. This still requires a bunch of isel patterns for vXi16/vXi8 types though. llvm-svn: 373156
* [X86] Enable canonicalizeBitSelect for AVX512 since we can use VPTERNLOG now.Craig Topper2019-09-291-5/+7
| | | | llvm-svn: 373155
* [X86] Match (or (and A, B), (andn (A, C))) to VPTERNLOG with AVX512.Craig Topper2019-09-291-0/+43
| | | | | | This uses a similar isel pattern as we used for vpcmov with XOP. llvm-svn: 373154
* [X86] Add broadcast load unfolding support for VPTESTMD/Q and VPTESTNMD/Q.Craig Topper2019-09-281-0/+12
| | | | llvm-svn: 373138
* [X86] Stop using UpdateNodeOperands in combineGatherScatter. Create new ↵Craig Topper2019-09-281-35/+58
| | | | | | | | | | | | | | nodes like most other DAG combines. Creating new nodes is what we usually do. Have to explicitly check that we don't update to an existing node and having to manually manage the worklist is unusual. We can probably add a helper function to reduce the duplication of having to check if we should create a gather or scatter, but I wanted to just get the simple thing done. llvm-svn: 373137
* [X86] Split combineGatherScatter into a version for generic ISD nodes and ↵Craig Topper2019-09-281-5/+39
| | | | | | | | | | | | | | | | another version for X86 specific nodes. The majority of the code doesn't run on the X86 nodes today since its gated by isBeforeLegalizeOps and we don't formm X86 nodes until after that. Except for a couple special case in type legalization. But I think we would probably break those if some of the transforms fire on them. I want to remove the hardcoded operand numbers and the unusual use of UpdateNodeOperands. Being able to know which ISD opcodes are present should help with that. llvm-svn: 373136
* [X86] Call SimplifyDemandedBits in combineGatherScatter any time the mask ↵Craig Topper2019-09-271-3/+3
| | | | | | | | | element is wider than i1, not just when AVX512 is disabled. The AVX2 intrinsics can still be used when AVX512 is enabled and those go through this path. So we should simplify them. llvm-svn: 373108
* [Alignment][NFC] Remove unneeded llvm:: scoping on Align typesGuillaume Chatelet2019-09-275-9/+9
| | | | llvm-svn: 373081
* [X86] Remove CodeGenOnly instructions added in r373021, but keep the isel ↵Craig Topper2019-09-261-16/+10
| | | | | | patterns and add COPY_TO_REGCLASS to them. llvm-svn: 373031
* [X86] Remove unused arguments from a tablegen multiclass. NFCCraig Topper2019-09-261-13/+13
| | | | llvm-svn: 373026
* [X86] Add VMOVSSZrrk/VMOVSDZrrk/VMOVSSZrrkz/VMOVSDZrrkz to getUndefRegClearance.Craig Topper2019-09-261-6/+15
| | | | | | | | | We have isel patterns that can put an IMPLICIT_DEF on one of the sources for these instructions. So we should make sure we break any dependencies there. This should be done by just using one of the other sources. llvm-svn: 373025
* [X86] Add CodeGenOnly instructions for (f32 (X86selects $mask, (loadf32 ↵Craig Topper2019-09-261-1/+23
| | | | | | | | | | | | addr), fp32imm0) to use masked MOVSS from memory. Similar for f64 and having a non-zero passthru value. We were previously not trying to fold the load at all. Using a CodeGenOnly instruction allows us to use FR32X/FR64X as the register class to avoid a bunch of COPY_TO_REGCLASS. llvm-svn: 373021
* [CostModel][X86] Fix SLM <2 x i64> icmp costsSimon Pilgrim2019-09-261-0/+9
| | | | | | | | SLM is 2 x slower for <2 x i64> comparison ops than other vector types, we should account for this like we do for SLM <2 x i64> add/sub/mul costs. This should remove some of the SLM codegen diffs in D43582 llvm-svn: 372954
* [X86] Remove isCodeGenOnly from (V)ROUND.*_Int and put it on the non _Int ↵Craig Topper2019-09-261-6/+6
| | | | | | | | | | | | | | | | | | form instead. This matches what's done for VRNDSCALE and most other instructions. This mainly determines which instruction will be preferred by disassembler and assembly parser. The printing and encoding information is the same. We prefer the _Int form since it uses the VR128 class due to intrinsic interface. For some of EVEX features like embedded rounding, we only select from intrinsics today. So there is only a VR128 version. So making the VR128 version the preferred is overally consistent. llvm-svn: 372947
* [X86] Mark the EVEX encoded PSADBW instructions as commutable to enable load ↵Craig Topper2019-09-261-0/+1
| | | | | | | | folding of the other operand. The SSE and VEX versions are already correct. llvm-svn: 372941
* [X86] Use VR512_0_15RegClass intead of VR512RegClass in X86VZeroUpper.Craig Topper2019-09-251-4/+2
| | | | | | | | | | | | | | This pass is only concerned with ZMM0-15 and YMM0-15. For YMM we use VR256 which only contains YMM0-15, but for ZMM we were using VR512 which contains ZMM0-31. Using VR512_0_15 is more correct. Given that the ABI and register allocator will use registers in order, its unlikely that register from 16-31 would be used without also using 0-15. So this probably doesn't functionally matter. llvm-svn: 372933
* [TargetInstrInfo] Let findCommutedOpIndices take const MachineInstr&Simon Pilgrim2019-09-252-2/+3
| | | | | | | | | | Neither the base implementation of findCommutedOpIndices nor any in-tree target modifies the instruction passed in and there is no reason why they would in the future. Committed on behalf of @hvdijk (Harald van Dijk) Differential Revision: https://reviews.llvm.org/D66138 llvm-svn: 372882
* [X86] Add MMX MOVD/MOVQ stores to folding tables to support stack foldingSimon Pilgrim2019-09-241-0/+2
| | | | llvm-svn: 372770
* Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression ↵Ilya Biryukov2019-09-242-129/+17
| | | | | | | | | | to a target hook (PR42863) Reason: this caused severe compile time regressions in JAX. See email thread of original revision on llvm-commits for details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html llvm-svn: 372756
* MCRegisterInfo: Merge getLLVMRegNum and getLLVMRegNumFromEHPavel Labath2019-09-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The functions different in two ways: - getLLVMRegNum could return both "eh" and "other" dwarf register numbers, while getLLVMRegNumFromEH only returned the "eh" number. - getLLVMRegNum asserted if the register was not found, while the second function returned -1. The second distinction was pretty important, but it was very hard to infer that from the function name. Aditionally, for the use case of dumping dwarf expressions, we needed a function which can work with both kinds of number, but does not assert. This patch solves both of these issues by merging the two functions into one, returning an Optional<unsigned> value. While the same thing could be achieved by adding an "IsEH" argument to the (renamed) getLLVMRegNumFromEH function, it seemed better to avoid the confusion of two functions and put the choice of asserting into the hands of the caller -- if he checks the Optional value, he can safely process "untrusted" input, and if he blindly dereferences the Optional, he gets the assertion. I've updated all call sites to the new API, choosing between the two options according to the function they were calling originally, except that I've updated the usage in DWARFExpression.cpp to use the "safe" method instead, and added a test case which would have previously triggered an assertion failure when processing (incorrect?) dwarf expressions. Reviewers: dsanders, arsenm, JDevlieghere Subscribers: wdng, aprantl, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67154 llvm-svn: 372710
* [X86] Use TargetConstant for condition code on X86ISD::SETCC/CMOV/BRCOND nodes.Craig Topper2019-09-234-141/+136
| | | | | | | | | | This removes the need for ConvertToTarget opcodes in the isel table. It's also consistent with the recent changes to use TargetConstant for intrinsic nodes that always take immediates. Differential Revision: https://reviews.llvm.org/D67902 llvm-svn: 372645
* [x86] fix assert with horizontal math + broadcast of vector (PR43402)Sanjay Patel2019-09-232-5/+6
| | | | | | https://bugs.llvm.org/show_bug.cgi?id=43402 llvm-svn: 372606
* [X86] Canonicalize all zeroes vector to RHS in X86DAGToDAGISel::tryVPTESTM.Craig Topper2019-09-231-3/+9
| | | | llvm-svn: 372544
* [X86] Remove SETEQ/SETNE canonicalization code from LowerIntVSETCC_AVX512 to ↵Craig Topper2019-09-232-9/+2
| | | | | | | | | | | | prevent an infinite loop. The attached test case would previous infinite loop after r365711. I'm going to move this to X86ISelDAGToDAG.cpp to get the setcc to match VPTEST in 32-bit mode in a follow up commit. llvm-svn: 372543
* Prefer AVX512 memcpy when applicableDavid Zarzycki2019-09-231-0/+5
| | | | | | | | | | | When AVX512 is available and the preferred vector width is 512-bits or more, we should prefer AVX512 for memcpy(). https://bugs.llvm.org/show_bug.cgi?id=43240 https://reviews.llvm.org/D67874 llvm-svn: 372540
* [X86] Convert to Constant arguments to MMX shift by i32 intrinsics to ↵Craig Topper2019-09-232-5/+8
| | | | | | | | | | | | | | TargetConstant during lowering. This allows us to use timm in the isel table which is more consistent with other intrinsics that take an immediate now. We can't declare the intrinsic as taking an ImmArg because we need to match non-constants to the shift by MMX register instruction which we do by mutating the intrinsic id during lowering. llvm-svn: 372537
* [X86] Remove stale FIXME.Craig Topper2019-09-231-1/+0
| | | | | | | This goes back to when MMX was migrated to intrinsic only. The hack referenced here has been gone for quite a while. llvm-svn: 372536
* [X86][SelectionDAGBuilder] Move the hack for handling MMX shift by i32 ↵Craig Topper2019-09-231-0/+52
| | | | | | | | | | | | | | | | | | | | | | intrinsics into the X86 backend. This intrinsics should be shift by immediate, but gcc allows any i32 scalar and clang needs to match that. So we try to detect the non-constant case and move the data from an integer register to an MMX register. Previously this was done by creating a v2i32 build_vector and bitcast in SelectionDAGBuilder. This had to be done early since v2i32 isn't a legal type. The bitcast+build_vector would be DAG combined to X86ISD::MMX_MOVW2D which isel will turn into a GPR->MMX MOVD. This commit just moves the whole thing to lowering and emits the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The test changes just seem to be due to nodes being linearized in a different order. llvm-svn: 372535
* [X86] Require last argument to LWPINS/LWPVAL builtins to be an ICE. Add ↵Craig Topper2019-09-221-4/+4
| | | | | | | | ImmArg to the llvm intrinsics. Update the isel patterns to use timm instead of imm. llvm-svn: 372534
* [X86] X86DAGToDAGISel::matchBEXTRFromAndImm(): if can't use BEXTR, fallback ↵Roman Lebedev2019-09-221-12/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to BZHI is profitable (PR43381) Summary: PR43381 notes that while we are good at matching `(X >> C1) & C2` as BEXTR/BEXTRI, we only do that if we either have BEXTRI (TBM), or if BEXTR is marked as being fast (`-mattr=+fast-bextr`). In all other cases we don't match. But that is mainly only true for AMD CPU's. However, for all the CPU's for which we have sched models, the BZHI is always fast (or the sched models are all bad.) So if we decide that it's unprofitable to emit BEXTR/BEXTRI, we should consider falling-back to BZHI if it is available, and follow-up with the shift. While it's really tempting to do something because it's cool it is wise to first think whether it actually makes sense to do. We shouldn't just use BZHI because we can, but only it it is beneficial. In particular, it isn't really worth it if the input is a register, mask is small, or we can fold a load. But it is worth it if the mask does not fit into 32-bits. (careful, i don't know much about intel cpu's, my choice of `-mcpu` may be bad here) Thus we manage to fold a load: https://godbolt.org/z/Er0OQz Or if we'd end up using BZHI anyways because the mask is large: https://godbolt.org/z/dBJ_5h But this isn'r actually profitable in general case, e.g. here we'd increase microop count (the register renaming is free, mca does not model that there it seems) https://godbolt.org/z/k6wFoz Likewise, not worth it if we just get load folding: https://godbolt.org/z/1M1deG https://bugs.llvm.org/show_bug.cgi?id=43381 Reviewers: RKSimon, craig.topper, davezarzycki, spatel Reviewed By: craig.topper, davezarzycki Subscribers: andreadb, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67875 llvm-svn: 372532
* [X86] Fix some VCVTPS2PH isel patterns where 'i32' was used instead of 'timm'Craig Topper2019-09-222-14/+14
| | | | | | | This seems to have completed omitted any check for the opcode of the operand in the isel table. llvm-svn: 372526
* [X86][TableGen] Allow timm to appear in output patterns. Use it to remove ↵Craig Topper2019-09-223-119/+119
| | | | | | | | | | | | | | ConvertToTarget opcodes from the X86 isel table. We're now using a lot more TargetConstant nodes in SelectionDAG. But we were still telling isel to convert some of them to TargetConstants even though they already are. This is because isel emits a conversion anytime the output pattern has a an 'imm'. I guess for patterns in instructions we take the 'timm' from the 'set' pattern, but for Pat patterns with explcicit output we previously had to say 'imm' since 'timm' wasn't allowed in outputs. llvm-svn: 372525
* [X86] Update commutable EVEX vcmp patterns to use timm instead of imm.Craig Topper2019-09-221-6/+6
| | | | | | | We need to match TargetConstant, not Constant. This was broken in r372338, but we lacked test coverage. llvm-svn: 372523
* [Cost][X86] Add more missing vector truncation costsSimon Pilgrim2019-09-221-0/+6
| | | | | | The AVX512 cases still need some work to correct recognise the PMOV truncation cases. llvm-svn: 372514
OpenPOWER on IntegriCloud