summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86ISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821Fangrui Song2019-12-271-13/+0
| | | | | | | Intrinsic has incorrect argument type! i32 (i32*)* @llvm.setjmp *wipes tear*
* [X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without ↵Craig Topper2019-12-261-15/+91
| | | | | | | | | | | | | | | | | | | | | | avx512vl. Similar for vXi64 on avx512dq without avx512vl. Summary: Previously we did this with isel patterns that used garbage in the widened part of the source. But that's not valid for strictfp. So now we custom widen and use zeroes for the widened elemens for strictfp. This replaces D71864. Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3 Reviewed By: pengfei Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71879
* [X86] Custom widen strict v2f32->v2i32 by padding with zeroes.Craig Topper2019-12-261-0/+12
| | | | | For non-strict, generic type legalization will take care of this, but that doesn't happen currently for strict nodes.
* [X86] Fix -Wmisleading-indentation after D71892Fangrui Song2019-12-261-0/+1
|
* [X86][FPEnv] Promote some float strictfp operations to double on ↵Craig Topper2019-12-261-2/+9
| | | | | | | | i686-pc-windows-msvc to match what we do for non-strict. The float libcalls are inlined in MSVC's math header where they just cast to double and use the double libcall. Do the same when we emit libcalls.
* [X86] Add custom legalization for strict_uint_to_fp v2i32->v2f32.Craig Topper2019-12-261-7/+16
| | | | | | | | I believe the algorithm we use for non-strict is exception safe for strict. The fsub won't generate any exceptions. After it we will have an exact version of the i32 integer in a double. Then we just round it to f32. That rounding will generate a precision exception if it can't be represented exactly.
* add custom operation for strict fpextend/fproundLiu, Chen32019-12-271-8/+24
| | | | Differential Revision: https://reviews.llvm.org/D71892
* Remove SrcVT only used in an assert and propagate query.Eric Christopher2019-12-261-2/+2
|
* [X86] Custom widen 128/256-bit vXi32 uint_to_fp on avx512f targets without ↵Craig Topper2019-12-261-17/+106
| | | | | | | | avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl. Previously we widened these through isel patterns, but that didn't work for STRICT_ nodes. Those need to be padded with zeroes in the upper bits which is harder to do in isel patterns.
* [X86] Add custom widening for v2i32->v2f64 strict_uint_to_fp with AVX512F, ↵Craig Topper2019-12-261-5/+23
| | | | | | | | | | | | | | | but not AVX512VL. Previously we were widening with isel patterns, but that wasn't exception safe for strict FP. So now we widen to v4i32->v4f64 during type legalization. And then let op legalization further widen to v8i32->v8f64. The vec_int_to_fp.ll changes are caused by us no longer narrowing extracts of strict_uint_to_fp to the v4i32->v2f64 instruction without AVX512VL only to have isel rewiden it. Now we just keep it wide throughout. So we don't have an opportunity to narrow the load.
* [X86] Add custom widening for v2f64->v2i32 strict_fp_to_uint with avx512f, ↵Craig Topper2019-12-261-6/+15
| | | | | | | | | | | | | but not avx512vl. AVX512F added instruction for vector fp_to_uint conversions. With AVX512VL we can use a specific instruction that does v2f64->v4i32 with zeroes in the 2 extra elements. For non-strict nodes without AVX512VL we relied on type legalization to turn it to v4f64->v4i32 which would later be widened by op legalization to v8f64->v8i32. But type legalization doesn't currently widen strict nodes since it doesn't know how to safely and efficiently pad the extra elements. But for X86 we know padding with zeroes is safe and efficient so do that ourselves.
* [X86] Merge the SINT_TO_FP/UINT_TO_FP handlers in ReplaceNodeResults since ↵Craig Topper2019-12-261-23/+11
| | | | the AVX512DQ+AVX512VL code is very similar in both. NFC
* [X86] Add custom lowering for v2i64->v2f32 ↵Craig Topper2019-12-261-8/+32
| | | | | | | strict_sint_to_fp/strict_uint_to_fp for avx512dq+avx512vl targets. With avx512dq+avx512vl we have instruction that implements this and places zeroes in the upper 64-bits of the destination xmm register.
* [X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backendWang, Pengfei2019-12-261-25/+110
| | | | | | | | | | | | Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71871
* [X86] Use zero vector to extend to 512-bits for strict_fp_to_uint ↵Craig Topper2019-12-251-3/+7
| | | | | | | | v2i1->v2f64 on targets with AVX512F, but not AVX512VL. In the worst case, this requires a 128-bit move instruction to implicitly zero the upper bits. In the common case, we should recognize the producing instruction already zeroed the upper bits.
* [X86] Merge together some common code in LowerFP_TO_INT now that we have ↵Craig Topper2019-12-251-17/+11
| | | | STRICT_CVTTP2SI/STRICT_CVTTP2UI nodes. NFC
* Add missing strict_fp_to_intLiu, Chen32019-12-251-0/+3
| | | | Differential Revision: https://reviews.llvm.org/D71867
* [X86] Use 128-bit vector instructions for f32/f64->i64 conversions on 32-bit ↵Craig Topper2019-12-241-7/+14
| | | | | | | | | | | | | | targets with avx512dq and avx512vl instructions. On 32-bit targets we can't use the scalar instruction so we insert the scalar into a vector and use packed conversions. Previously we used either v4f32->v4i64 or v4f64->v4i64 to avoid some complexity creating target specific ISD opcodes for v4f32->v2i64. But this causes extra vzeroupper instructions and possibly frequency throttling on Intel CPUs. This patch changes this to create a 128-bit vector and uses a target specific ISD opcode if needed.
* [X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.Craig Topper2019-12-241-83/+79
| | | | Differential Revision: https://reviews.llvm.org/D71850
* [FPEnv][X86] More strict int <-> FP conversion fixesUlrich Weigand2019-12-231-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840
* [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' ↵Sanjay Patel2019-12-231-26/+0
| | | | | | | | | | | | | | | | | | | extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815
* [X86] Fix a KNL miscompile caused by combineSetCC swapping LHS/RHS variables ↵Craig Topper2019-12-201-19/+23
| | | | | | | | | | | | | | before a later use. The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code. A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands. To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe. I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that. Differential Revision: https://reviews.llvm.org/D71736
* [X86] Make EmitCmp into a static function and explicitly return chain result ↵Craig Topper2019-12-191-12/+19
| | | | | | | | | | | | for STRICT_FCMP. NFCI The only thing its getting from the X86TargetLowering class is the subtarget which we can easily pass. This function only has one call site now since this might help the compiler inline it. Explicitly return both the flag result and the chain result for STRICT_FCMP nodes. This removes an assumption in the caller that getValue(1) is the right way to get the chain.
* [X86] Directly call EmitTest in two places instead of creating a null ↵Craig Topper2019-12-191-4/+2
| | | | | | | | | constant and calling EmitCmp. NFCI EmitCmp will just immediately call EmitTest and discard the null constant only to have EmitTest create it again if it doesn't fold. So just skip all that and go directly to EmitTest.
* Enable STRICT_FP_TO_SINT/UINT on X86 backendLiu, Chen32019-12-191-89/+156
| | | | | | This patch is mainly for custom lowering the vector operation. Differential Revision: https://reviews.llvm.org/D71592
* [X86] Add strict fma supportWang, Pengfei2019-12-181-1/+4
| | | | | | | | | | | | Summary: Add strict fma support Reviewers: craig.topper, RKSimon, LiuChen3 Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71604
* [X86] Manually format some setOperationAction calls to line up arguments to ↵Craig Topper2019-12-171-8/+8
| | | | improve readability. NFC
* This adds constrained intrinsics for the signed and unsigned conversionsKevin P. Neal2019-12-171-37/+137
| | | | | | | | | of integers to floating point. This includes some of Craig Topper's changes for promotion support from D71130. Differential Revision: https://reviews.llvm.org/D69275
* [NFC] Use EVT instead of bool for getSetCCInverse()Alex Richardson2019-12-131-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void *)0x12033091e < (void *)0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917
* Revert "[SDAG] remove use restriction in isNegatibleForFree() when called ↵Sanjay Patel2019-12-111-5/+3
| | | | | | | from getNegatedExpression()" This reverts commit d1f0bdf2d2df9bdf11ee2ddfff3df50e53f2f042. The patch can cause infinite loops in DAGCombiner.
* [SDAG] remove use restriction in isNegatibleForFree() when called from ↵Sanjay Patel2019-12-111-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | getNegatedExpression() This is an alternate fix for the bug discussed in D70595. This also includes minimal tests for other in-tree targets to show the problem more generally. We check the number of uses as a predicate for whether some value is free to negate, but that use count can change as we rewrite the expression in getNegatedExpression(). So something that was marked free to negate during the cost evaluation phase becomes not free to negate during the rewrite phase (or the inverse - something that was not free becomes free). This can lead to a crash/assert because we expect that everything in an expression that is negatible to be handled in the corresponding code within getNegatedExpression(). This patch skips the use check during the rewrite phase. So we determine that some expression isNegatibleForFree (identically to without this patch), but during the rewrite, don't rely on use counts to decide how to create the optimal expression. Differential Revision: https://reviews.llvm.org/D70975
* [X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 ↵Craig Topper2019-12-101-3/+17
| | | | | | under min-legal-vector-width=256 This is an improvement to 88dacbd43625cf7aad8a01c0c3b92142c4dc0970
* [FPEnv][X86] Constrained FCmp intrinsics enabling on X86Wang, Pengfei2019-12-111-53/+179
| | | | | | | | | | | | Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision. Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70582
* [X86] Go back to considering v64i1 as a legal type under ↵Craig Topper2019-12-101-47/+32
| | | | | | | | | | min-legal-vector-width=256. Scalarize v64i1 arguments and shuffles under min-legal-vector-width=256. This reverts 3e1aee2ba717529b651a79ed4fc7e7147358043f in favor of a different approach. Scalarizing isn't great codegen, but making the type illegal was interfering with k constraint in inline assembly.
* add support for strict operation fpextend/fpround/fsqrt on X86 backendLiu, Chen32019-12-101-36/+36
| | | | Differential Revision: https://reviews.llvm.org/D71184
* [X86] Fix prolog/epilog mismatch for stack protectors on win32-macho.Amara Emerson2019-12-061-1/+1
| | | | | | | | The xor'ing behaviour is only used for msvc/crt environments, when we're targeting macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing when targeting win32-macho to be consistent. Differential Revision: https://reviews.llvm.org/D71095
* [TargetLowering] Fix another potential FPE in expandFP_TO_UINTCraig Topper2019-12-061-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105
* [X86] Don't setup and teardown memory for a musttail callReid Kleckner2019-12-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: musttail calls should not require allocating extra stack for arguments. Updates to arguments passed in memory should happen in place before the epilogue. This bug was mostly a missed optimization, unless inalloca was used and store to push conversion fired. If a reserved call frame was used for an inalloca musttail call, the call setup and teardown instructions would be deleted, and SP adjustments would be inserted in the prologue and epilogue. You can see these are removed from several test cases in this change. In the case where the stack frame was not reserved, i.e. call frame optimization fires and turns argument stores into pushes, then the imbalanced call frame setup instructions created for inalloca calls become a problem. They remain in the instruction stream, resulting in a call setup that allocates zero bytes (expected for inalloca), and a call teardown that deallocates the inalloca pack. This deallocation was unbalanced, leading to subsequent crashes. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71097
* [X86] Make X86TargetLowering::BuildFILD return a std::pair of SDValues so we ↵Craig Topper2019-12-051-9/+9
| | | | | | | | | | explicitly return the chain instead of calling getValue on the single SDValue. We shouldn't assume that the returned result can be used to get the other result. This is prep-work for strict FP where we will also need to pass the chain result along in more cases.
* Add strict fp support for instructions fadd/fsub/fmul/fdivLiu, Chen32019-12-061-5/+17
| | | | Differential Revision: https://reviews.llvm.org/D68757
* [X86] Remove override of shouldUseStrictFP_TO_INT for fp80. NFCCraig Topper2019-12-041-6/+0
| | | | | | | | | | I suspect this became unnecessary after r354161. Prior to that we may have been going through the default expansion of FP_TO_UINT on 64-bit targets and then ending up back in Custom X86 handling to handle the FP_TO_SINT for it. Now we just Custom handle the FP_TO_UINT directly. We already need to handle it for 32-bit mode during type legalization so we wouldn't save any code by using the default expansion on 64-bit.
* Add support for lowering 32-bit/64-bit pointersAmy Huang2019-12-041-0/+55
| | | | | | | | | | | | | | | | | | | | Summary: This follows a previous patch that changes the X86 datalayout to represent mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces (https://reviews.llvm.org/D64931) This patch implements the address space cast lowering to the corresponding sign extension, zero extension, or truncate instructions. Related to https://bugs.llvm.org/show_bug.cgi?id=42359 Reviewers: rnk, craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69639
* [X86] Add support for STRICT_FP_TO_UINT/SINT from fp128.Craig Topper2019-11-271-4/+9
|
* [X86] Add strict fp support for operations of X87 instructionsCraig Topper2019-11-261-0/+20
| | | | | | | | | | This is the following patch of D68854. This patch adds basic operations of X87 instructions, including +, -, *, / , fp extensions and fp truncations. Patch by Chen Liu(LiuChen3) Differential Revision: https://reviews.llvm.org/D68857
* [Codegen][ARM] Add addressing modes from masked loads and storesDavid Green2019-11-261-20/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176
* [X86] Return Op instead of SDValue() for lowering flags_read/write intrinsicsCraig Topper2019-11-251-1/+1
| | | | | | | | | | Returning SDValue() means we didn't handle it and the common code should try to expand it. But its a target intrinsic so expanding won't do anything and just leave the node alone. But it will print confusing debug messages. By returning Op we tell the common code that the node is legal and shouldn't receive any further processing.
* [X86] Add support for STRICT_FP_ROUND/STRICT_FP_EXTEND from/to fp128 to/from ↵Craig Topper2019-11-251-17/+37
| | | | | | | | | | f32/f64/f80 in 64-bit mode. These need to emit a libcall like we do for the non-strict version. 32-bit mode needs to SoftenFloat support to be implemented for strict FP nodes. Differential Revision: https://reviews.llvm.org/D70504
* [X86][SSE] Split off generic isLaneCrossingShuffleMask helper. NFC.Simon Pilgrim2019-11-231-3/+14
| | | | Avoid MVT dependency which will be needed in a future patch.
* [X86] Add test cases for most of the constrained fp libcalls with fp128.Craig Topper2019-11-211-4/+8
| | | | | | | | | | Add explicit setOperation actions for some to match their none strict counterparts. This isn't required, but makes the code self documenting that we didn't forget about strict fp. I've used LibCall instead of Expand since that's more explicitly what we want. Only lrint/llrint/lround/llround are missing now.
* [X86] Mark fp128 FMA as LibCall instead of Expand. Add STRICT_FMA as well.Craig Topper2019-11-211-1/+2
| | | | | The Expand code would fall back to LibCall, but this makes it more explicit.
OpenPOWER on IntegriCloud