summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86ISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Call SimplifyDemandedVectorElts on KSHIFTL/KSHIFTR nodes during DAG ↵Craig Topper2019-09-171-0/+16
| | | | | | combine. llvm-svn: 372154
* [X86] Simplify some code in LowerBUILD_VECTORvXi1. NFCICraig Topper2019-09-171-15/+8
| | | | | | | | | | | | | The case were Immediate is 0 and HasConstElts is true should never happen since that would mean the constant elts were all zero. But we check for all zero build vector earlier. So just use HasConstElts and blindly take Immediate without checking if its 0. Move the code that bitcasts and extract the immediate into the the HasConstElts case since the other code just creates an undef with the right type. No casting needed. llvm-svn: 372153
* [X86] Use APInt::getLowBitsSet helper. NFCI.Simon Pilgrim2019-09-171-1/+2
| | | | | | Also avoids a static analyzer warning about out of range shifts. llvm-svn: 372103
* [SVE][MVT] Fixed-length vector MVT rangesGraham Hunter2019-09-171-4/+4
| | | | | | | | | | | | | | | | | * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099
* [X86] Split oversized vXi1 vector arguments and return values into scalars ↵Craig Topper2019-09-171-0/+30
| | | | | | | | | | | | | | | | | on avx512 targets. Previously we tried to split them into narrower v64i1 or v16i1 pieces that each got promoted to vXi8 and then passed in a zmm or xmm register. But this crashes when you need to pass more pieces than available registers reserved for argument passing. The scalarizing done here generates much longer and slower code, but is consistent with the behavior of avx2 and earlier targets for these types. Fixes PR43323. llvm-svn: 372069
* [X86][AVX] matchShuffleWithSHUFPD - add support for zeroable operandsSimon Pilgrim2019-09-161-15/+41
| | | | | | Determine if all of the uses of LHS/RHS operands can be replaced with a zero vector. llvm-svn: 372013
* [X86] Use incDecVectorConstant to simplify the min/max code in LowerVSETCC.Craig Topper2019-09-131-14/+12
| | | | | | | incDecVectorConstant is used for a similar reason in LowerVSETCCWithSUBUS so we might as well share the code. llvm-svn: 371861
* [X86] negateFMAOpcode - extend to support FMADDSUB/FMSUBADD and output ↵Simon Pilgrim2019-09-131-27/+40
| | | | | | | | | | negation. NFCI. Some prep work for PR42863, this change allows us to move all the FMA opcode mappings into the negateFMAOpcode helper. For the FMADDSUB/FMSUBADD cases, we can only negate the accumulator - any other negations will result in an error. llvm-svn: 371840
* [DAGCombiner][X86] Pass the CmpOpVT to reduceSelectOfFPConstantLoads so X86 ↵Craig Topper2019-09-121-1/+2
| | | | | | | | | | | can exclude fp128 compares. The X86 decision assumes the compare will produce a result in an XMM register, but that can't happen for an fp128 compare since those go to a libcall the returns an i32. Pass the VT so X86 can check the type. llvm-svn: 371775
* [X86] Move negateFMAOpcode helper earlier to help future patch. NFCI.Simon Pilgrim2019-09-121-32/+32
| | | | llvm-svn: 371770
* [X86] Fix latent bugs in 32-bit CMPXCHG8B inserterReid Kleckner2019-09-111-4/+8
| | | | | | | | | | | | | | | I found three issues: 1. the loop over E[ABCD]X copies run over BB start 2. the direct address of cmpxchg8b could be a frame index 3. the displacement of cmpxchg8b could be a global instead of an immediate These were all introduced together in r287875, and should be fixed with this change. Issue reported by Zachary Turner. llvm-svn: 371678
* [X86] Move x86_64 fp128 conversion to libcalls from type legalization to DAG ↵Craig Topper2019-09-111-20/+151
| | | | | | | | | | | | | | legalization fp128 is considered a legal type for a register, but has almost no legal operations so everything needs to be converted to a libcall. Previously this was implemented by tricking type legalization into softening the operations with various checks for "is legal in hardware register" to change the behavior to still use f128 as the resulting type instead of converting to i128. This patch abandons this approach and instead moves the libcall conversions to LegalizeDAG. This is the approach taken by AArch64 where they also have a legal fp128 type, but no legal operations. I think this is more in spirit with how SelectionDAG's phases are supposed to work. I had to make some hacks for STRICT_FP_ROUND because some of the strict FP handling checks if ISD::FP_ROUND is Legal for a given result type, but I had to make ISD::FP_ROUND Custom to allow making a libcall when the input is f128. For all other types the Custom handler just returns the original node. These hacks are incomplete and don't work for a strict truncate from f128, but I don't think it worked before either since LegalizeFloatTypes doesn't know about strict ops yet. I've also raised PR43209 against AArch64 which currently crashes on a strict ftrunc from f64->f32 because of FP_ROUND being marked Custom for the same reason there. Differential Revision: https://reviews.llvm.org/D67128 llvm-svn: 371672
* [X86] Updated target specific selection dag code to conservatively check for ↵Philip Reames2019-09-101-14/+14
| | | | | | | | | | | | | | isAtomic in addition to isVolatile See D66309 for context. This is the first sweep of x86 target specific code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. Sorry for the lack of tests. As discussed in the review, most of these are vector tests (for which atomicity is not well defined) and I couldn't figure out to exercise the anyextend cases which aren't vector specific. Differential Revision: https://reviews.llvm.org/D66322 llvm-svn: 371547
* Introduce infrastructure for an incremental port of SelectionDAG atomic ↵Philip Reames2019-09-091-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | load/store handling This is the first patch in a large sequence. The eventual goal is to have unordered atomic loads and stores - and possibly ordered atomics as well - handled through the normal ISEL codepaths for loads and stores. Today, there handled w/instances of AtomicSDNodes. The result of which is that all transforms need to be duplicated to work for unordered atomics. The benefit of the current design is that it's harder to introduce a silent miscompile by adding an transform which forgets about atomicity. See the thread on llvm-dev titled "FYI: proposed changes to atomic load/store in SelectionDAG" for further context. Note that this patch is NFC unless the experimental flag is set. The basic strategy I plan on taking is: introduce infrastructure and a flag for testing (this patch) Audit uses of isVolatile, and apply isAtomic conservatively* piecemeal conservative* update generic code and x86 backedge code in individual reviews w/tests for cases which didn't check volatile, but can be found with inspection flip the flag at the end (with minimal diffs) Work through todo list identified in (2) and (3) exposing performance ops (*) The "conservative" bit here is aimed at minimizing the number of diffs involved in (4). Ideally, there'd be none. In practice, getting it down to something reviewable by a human is the actual goal. Note that there are (currently) no paths which produce LoadSDNode or StoreSDNode with atomic MMOs, so we don't need to worry about preserving any behaviour there. We've taken a very similar strategy twice before with success - once at IR level, and once at the MI level (post ISEL). Differential Revision: https://reviews.llvm.org/D66309 llvm-svn: 371441
* [SelectionDAG] Remove ISD::FP_ROUND_INREGCraig Topper2019-09-091-1/+0
| | | | | | | | | | | | I don't think anything in tree creates this node. So all of this code appears to be dead. Code coverage agrees http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html Differential Revision: https://reviews.llvm.org/D67312 llvm-svn: 371431
* [X86] Allow _MM_FROUND_CUR_DIRECTION and _MM_FROUND_NO_EXC to be used ↵Craig Topper2019-09-091-2/+10
| | | | | | | | | | together on instructions that only support SAE and not embedded rounding. Current for SAE instructions we only allow _MM_FROUND_CUR_DIRECTION(bit 2) or _MM_FROUND_NO_EXC(bit 3) to be used as the immediate passed to the inrinsics. But these instructions don't perform rounding so _MM_FROUND_CUR_DIRECTION is just sort of a default placeholder when you don't want to suppress exceptions. Using _MM_FROUND_NO_EXC by itself is really bit equivalent to (_MM_FROUND_NO_EXC | _MM_FROUND_TO_NEAREST_INT) since _MM_FROUND_TO_NEAREST_INT is 0. Since we aren't rounding on these instructions we should also accept (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC) as equivalent to (_MM_FROUND_NO_EXC). icc allows this, but gcc does not. Differential Revision: https://reviews.llvm.org/D67289 llvm-svn: 371430
* [X86] Use xorps to create fp128 +0.0 constants.Craig Topper2019-09-091-0/+2
| | | | | | This matches what we do for f32/f64. gcc also does this for fp128. llvm-svn: 371357
* [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add faux shuffle support.Simon Pilgrim2019-09-081-26/+52
| | | | | | This patch decodes target and faux shuffles with getTargetShuffleInputs - a reduced version of resolveTargetShuffleInputs that doesn't resolve SM_SentinelZero cases, so we can correctly remove zero vectors if they aren't demanded. llvm-svn: 371353
* [X86] Add a hack to combineVSelectWithAllOnesOrZeros to turn selects with ↵Craig Topper2019-09-081-0/+9
| | | | | | | | | | | two zero/undef vector inputs into an all zeroes vector. If the two zero vectors have undefs in different places they won't get combined by simplifySelect. This fixes a regression from an earlier commit. llvm-svn: 371351
* [X86] Remove call to getZeroVector from materializeVectorConstant. Add isel ↵Craig Topper2019-09-081-9/+2
| | | | | | | | | | | | | | | | patterns for zero vectors with all types. The change to avx512-vec-cmp.ll is a regression, but should be easy to fix. It occurs because the getZeroVector call was canonicalizing both sides to the same node, then SimplifySelect was able to simplify it. But since only called getZeroVector on some VTs this isn't a robust way to combine this. The change to vector-shuffle-combining-ssse3.ll is more instructions, but removes a constant pool load so its unclear if its a regression or not. llvm-svn: 371350
* [X86] Use DAG.getConstant instead of getZeroVector in combinePMULDQ.Craig Topper2019-09-081-1/+1
| | | | | | | | getZeroVector canonicalizes the type to vXi32, but that's a legalization action. We should use the most correct type if possible. llvm-svn: 371345
* [X86] Teach materializeVectorConstant to not call ↵Craig Topper2019-09-081-3/+3
| | | | | | getZeroVector/getOnesVector on the types we already have isel patterns for. llvm-svn: 371343
* [X86][SSE] Fix out of range shift introduced in D67070/rL371328Simon Pilgrim2019-09-081-1/+2
| | | | | | Use APInt to create the comparison mask instead. llvm-svn: 371330
* [X86][SSE] Add support for <64 x i1> bool reductionSimon Pilgrim2019-09-081-11/+14
| | | | | | | | | | This generalizes the existing <32 x i1> pre-AVX2 split code to support reductions from <64 x i1> as well, we can probably generalize to any larger pow2 case in the future if the (unlikely) need ever arises. We still need to tweak combineBitcastvxi1 to improve AVX512F codegen as its assumes vXi1 types should be handled on the mask registers even when they aren't legal. Differential Revision: https://reviews.llvm.org/D67070 llvm-svn: 371328
* [X86] Make getZeroVector return floating point vectors in their native type ↵Craig Topper2019-09-081-0/+2
| | | | | | | | | | | | | | on SSE2 and later. isel used to require zero vectors to be canonicalized to a single type to minimize the number of patterns needed to match. This is no longer required. I plan to do this to integers too, but floating point was simpler to start with. Integer has a complication where v32i16/v64i8 aren't legal when the other 512-bit integer types are. llvm-svn: 371325
* [X86] Avoid uses of getZextValue(). NFCI.Simon Pilgrim2019-09-071-22/+19
| | | | | | | | Use getAPIntValue() directly - this is mainly a best practice style issue to help prevent fuzz tests blowing up when a i12345 (or whatever) is generated. Use getConstantOperandVal/getConstantOperandAPInt wrappers where possible. llvm-svn: 371315
* [X86] Fix pshuflw formation from repeated shuffle mask (PR43230)Nikita Popov2019-09-071-2/+2
| | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=43230. When creating PSHUFLW from a repeated shuffle mask, we have to apply the checks to the repeated mask, not the original one. For the test case from PR43230 the inspected part of the original mask is all undef. Differential Revision: https://reviews.llvm.org/D67314 llvm-svn: 371307
* Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI.Simon Pilgrim2019-09-071-1/+1
| | | | llvm-svn: 371302
* [Alignment][NFC] Use Align with TargetLowering::setPrefFunctionAlignmentGuillaume Chatelet2019-09-061-1/+1
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67267 llvm-svn: 371212
* [Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignmentGuillaume Chatelet2019-09-061-4/+5
| | | | | | | | | | | | | | | | | Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210
* [X86] Enable BuildSDIVPow2 for i16.Craig Topper2019-09-051-2/+3
| | | | | | | We're able to use a 32-bit ADD and CMOV here and should work well with our other i16->i32 promotion optimizations. llvm-svn: 371107
* [X86] Override BuildSDIVPow2 for X86.Craig Topper2019-09-051-0/+55
| | | | | | | | | | | | | | | | | | | | | | | As noted in PR43197, we can use test+add+cmov+sra to implement signed division by a power of 2. This is based off the similar version in AArch64, but I've adjusted it to use target independent nodes where AArch64 uses target specific CMP and CSEL nodes. I've also blocked INT_MIN as the transform isn't valid for that. I've limited this to i32 and i64 on 64-bit targets for now and only when CMOV is supported. i8 and i16 need further investigation to be sure they get promoted to i32 well. I adjusted a few tests to enable cmov to demonstrate the new codegen. I also changed twoaddr-coalesce-3.ll to 32-bit mode without cmov to avoid perturbing the scenario that is being set up there. Differential Revision: https://reviews.llvm.org/D67087 llvm-svn: 371104
* [x86] fix horizontal math bug exposed by improved demanded elements analysis ↵Sanjay Patel2019-09-051-5/+24
| | | | | | | | (PR43225) https://bugs.llvm.org/show_bug.cgi?id=43225 llvm-svn: 371095
* [X86] Fix stale comment. NFCCraig Topper2019-09-051-2/+2
| | | | | | | We aren't checking for a concat here. We're just always splitting 256-bit stores. llvm-svn: 371092
* [X86][SSE] EltsFromConsecutiveLoads - ignore non-zero offset base loads ↵Simon Pilgrim2019-09-051-0/+4
| | | | | | | | | | (PR43227) As discussed on D64551 and PR43227, we don't correctly handle cases where the base load has a non-zero byte offset. Until we can properly handle this, we must bail from EltsFromConsecutiveLoads. llvm-svn: 371078
* [LLVM][Alignment] Make functions using log of alignment explicitGuillaume Chatelet2019-09-051-2/+2
| | | | | | | | | | | | | | | | | | | | | Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045
* Revert [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturnReid Kleckner2019-09-031-12/+0
| | | | | | | | | | | | | | | | | | This reverts r370525 (git commit 0bb1630685fba255fa93def92603f064c2ffd203) Also reverts r370543 (git commit 185ddc08eed6542781040b8499ef7ad15c8ae9f4) The approach I took only works for functions marked `noreturn`. In general, a call that is not known to be noreturn may be followed by unreachable for other reasons. For example, there could be multiple call sites to a function that throws sometimes, and at some call sites, it is known to always throw, so it is followed by unreachable. We need to insert an `int3` in these cases to pacify the Windows unwinder. I think this probably deserves its own standalone, Win64-only fixup pass that runs after block placement. Implementing that will take some time, so let's revert to TrapUnreachable in the mean time. llvm-svn: 370829
* [X86] Merge 2 consecutive HasInt256 branches. NFCI.Simon Pilgrim2019-09-031-3/+2
| | | | llvm-svn: 370761
* [X86] Simplify the setOperationAction handling for fp_to_uint by improving ↵Craig Topper2019-09-031-19/+18
| | | | | | | | | | | | | | | | the Custom handler a bit. This merges the 32-bit and 64-bit mode code to just use Custom for both i32 and i64. We already had most of the handling in the custom handling due to the AVX512 having legal fp_to_uint. Just needed to add the i32->i64 promotion handling. Refactor the fp_to_uint code in the custom handler to simplify the number of times we check things. Tweak cost model tables to match the default handling we were getting due to Expand before. llvm-svn: 370700
* [X86] Don't use Expand for i32 fp_to_uint on SSE1/2 targets on 32-bit target.Craig Topper2019-09-031-13/+7
| | | | | | | | Use Custom lowering instead. Fall back to default expansion only when the scalar FP type belongs in an XMM register. This improves lowering for i32 to fp80, and also i32 to double on SSE1 only. llvm-svn: 370699
* [X86] Custom promote i32->f80 uint_to_fp on AVX512 64-bit targets.Craig Topper2019-09-031-8/+7
| | | | | | | Reuse the same code to promote all i32 uint_to_fp on 64-bit targets to simplify the X86ISelLowering constructor. llvm-svn: 370693
* [X86] Enable fp128 as a legal type with SSE1 rather than with MMX.Craig Topper2019-09-021-2/+2
| | | | | | | | | | | | | | | | FP128 values are passed in xmm registers so should be asssociated with an SSE feature rather than MMX which uses a different set of registers. llc enables sse1 and sse2 by default with x86_64. But does not enable mmx. Clang enables all 3 features by default. I've tried to add command lines to test with -sse where possible, but any test that returns a value in an xmm register fails with a fatal error with -sse since we have no defined ABI for that scenario. llvm-svn: 370682
* [X86] getPMOVMSKB - add MVT::v64i8 handling and remove from ↵Simon Pilgrim2019-09-021-11/+12
| | | | | | combineBitcastvxi1. NFCI. llvm-svn: 370670
* [X86] combineHorizontalPredicateResult - pull out repeated ↵Simon Pilgrim2019-09-021-2/+2
| | | | | | getTargetLoweringInfo() calls. NFCI. llvm-svn: 370637
* [X86][AVX] Rename + cleanup lowerShuffleAsLanePermuteAndBlend. NFCI.Simon Pilgrim2019-09-011-28/+30
| | | | | | | | | | Rename to lowerShuffleAsLanePermuteAndShuffle to make it clear that not just blends are performed. Cleanup the in-lane shuffle mask generation to make it more obvious what's going on. Some prep work noticed while investigating the poor shuffle code mentioned in D66004. llvm-svn: 370613
* Fix shadow variable warning. NFCI.Simon Pilgrim2019-09-011-3/+3
| | | | llvm-svn: 370610
* [X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element ↵Simon Pilgrim2019-08-311-11/+16
| | | | | | | | count (PR43170) EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars. llvm-svn: 370592
* Fix shadow variable warning by making CondCodes names more explicit. NFCI.Simon Pilgrim2019-08-311-10/+10
| | | | llvm-svn: 370589
* Fix shadow variable warning. NFCI.Simon Pilgrim2019-08-311-2/+2
| | | | llvm-svn: 370585
* [X86ISelLowering] combineCMov - cleanup CMOV->LEA codegen. NFCI.Simon Pilgrim2019-08-311-5/+5
| | | | | | Only compute the diff once and we don't need the truncation code (assert the bitwidth is correct just to be safe). llvm-svn: 370583
OpenPOWER on IntegriCloud