summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86ISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Add Indirect Thunk Support to X86 to mitigate Load Value Injection (LVI)Scott Constable2020-06-241-0/+5
| | | | | | | | | | | | | | | | | This pass replaces each indirect call/jump with a direct call to a thunk that looks like: lfence jmpq *%r11 This ensures that if the value in register %r11 was loaded from memory, then the value in %r11 is (architecturally) correct prior to the jump. Also adds a new target feature to X86: +lvi-cfi ("cfi" meaning control-flow integrity) The feature can be added via clang CLI using -mlvi-cfi. This is an alternate implementation to https://reviews.llvm.org/D75934 That merges the thunk insertion functionality with the existing X86 retpoline code. Differential Revision: https://reviews.llvm.org/D76812
* [X86][NFC] Generalize the naming of "Retpoline Thunks" and related code to ↵Scott Constable2020-06-241-38/+42
| | | | | | | | | | | | "Indirect Thunks" There are applications for indirect call/branch thunks other than retpoline for Spectre v2, e.g., https://software.intel.com/security-software-guidance/software-guidance/load-value-injection Therefore it makes sense to refactor X86RetpolineThunks as a more general capability. Differential Revision: https://reviews.llvm.org/D76810
* [X86] Fold undef elts to 0 in getTargetVShiftByConstNode.Craig Topper2020-06-161-3/+6
| | | | | | | | Similar to D81212. Differential Revision: https://reviews.llvm.org/D81292 (cherry picked from commit 3408dcbdf054ac3cc32a97a6a82a3cf5844be609)
* [X86] Teach combineVectorShiftImm to constant fold undef elements to 0 not ↵Craig Topper2020-06-161-2/+10
| | | | | | | | | | | | | | undef. Shifts are supposed to always shift in zeros or sign bits regardless of their inputs. It's possible the input value may have been replaced with undef by SimplifyDemandedBits, but the shift in zeros are still demanded. This issue was reported to me by ispc from 10.0. Unfortunately their failing test does not fail on trunk. Seems to be because the shl is optimized out earlier now and doesn't become VSHLI. ispc bug https://github.com/ispc/ispc/issues/1771 Differential Revision: https://reviews.llvm.org/D81212 (cherry picked from commit 7c9a89fed8f5d53d61fe3a61a2581a7c28b1b6d2)
* [X86][SSE] combineX86ShufflesConstants - early out for zeroable vectors ↵Simon Pilgrim2020-04-161-1/+7
| | | | | | | | | | | | (PR45443) Shuffle combining can insert zero byte sized elements into the shuffle mask, which combineX86ShufflesConstants will attempt to fold without taking into account whether the byte-sized type is legal (e.g. AVX512F only targets). If we have a full-zeroable vector then we should just return a zero version of the root type, otherwise if the type isn't valid we should bail. Fixes PR45443 (cherry picked from commit e3b60597769f79a8abc19fb8ef1f321d9adc1358)
* [X86] Use MVT::i8 instead of MVT::i64 for shift amount in BuildSDIVPow2Craig Topper2020-02-101-1/+1
| | | | | | | | | | | | X86 uses i8 for shift amounts. This code can fail on a 32-bit target if it runs after type legalization. This code was copied from AArch64 and modified for X86, but the shift amount wasn't changed to the correct type for X86. Fixes PR44812 (cherry picked from commit ec9a94af4d5fb3270f2451fcbec5a3a99f4ac03a)
* [X86] Don't call LowerUINT_TO_FP_i32 for i32->f80 on 32-bit targets with sse2.Craig Topper2020-01-151-1/+1
| | | | | | | | | | We were performing an emulated i32->f64 in the SSE registers, then storing that value to memory and doing a extload into the X87 domain. After this patch we'll now just store the i32 to memory along with an i32 0. Then do a 64-bit FILD to f80 completely in the X87 unit. This matches what we do without SSE.
* [Win64] Handle FP arguments more gracefully under -mno-sseReid Kleckner2020-01-141-16/+17
| | | | | | | | | | | | | | | | | Pass small FP values in GPRs or stack memory according the the normal convention. This is what gcc -mno-sse does on Win64. I adjusted the conditions under which we emit an error to check if the argument or return value would be passed in an XMM register when SSE is disabled. This has a side effect of no longer emitting an error for FP arguments marked 'inreg' when targetting x86 with SSE disabled. Our calling convention logic was already assigning it to FP0/FP1, and then we emitted this error. That seems unnecessary, we can ignore 'inreg' and compile it without SSE. Reviewers: jyknight, aemerson Differential Revision: https://reviews.llvm.org/D70465
* [X86] Drop an unneeded FIXME. NFCCraig Topper2020-01-141-1/+0
| | | | The extload on X87 is free.
* [X86] Swap the 0 and the fudge factor in the constant pool for the 32-bit ↵Craig Topper2020-01-141-4/+4
| | | | | | | | | | | | mode i64->f32/f64/f80 uint_to_fp algorithm. This allows us to generate better code for selecting the fixup to load. Previously when the sign was set we had to load offset 0. And when it was clear we had to load offset 4. This required a testl, setns, zero extend, and finally a mul by 4. By switching the offsets we can just shift the sign bit into the lsb and multiply it by 4.
* [X86] Directly emit a BROADCAST_LOAD from constant pool in ↵Craig Topper2020-01-141-2/+12
| | | | | | | | lowerUINT_TO_FP_vXi32 to avoid double loads seen in D71971 By directly emitting the constants as a constant pool load we seem to avoid the build_vector/extract_subvector combines that resulted in the duplicate loads we had before. Differential Revision: https://reviews.llvm.org/D72307
* [X86][AVX] Use lowerShuffleAsLanePermuteAndSHUFP to lower binary v4f64 shuffles.Simon Pilgrim2020-01-121-0/+12
| | | | | | Only perform this if we are shuffling lower and upper lane elements across the lanes (otherwise splitting to lower xmm shuffles would be better). This is a regression if we shuffle build_vectors due to getVectorShuffle canonicalizing 'blend of splat' build vectors, for now I've set this not to shuffle build_vector nodes at all to avoid this.
* [X86][AVX] lowerShuffleAsLanePermuteAndSHUFP - only set the demanded ↵Simon Pilgrim2020-01-121-2/+1
| | | | | | elements of the lane mask. Fixes an cyclic dependency issue with an upcoming patch where getVectorShuffle canonicalizes masks with splat build vector sources.
* [X86] Don't call LowerSETCC from LowerSELECT for ↵Craig Topper2020-01-111-3/+1
| | | | | | | | | | | STRICT_FSETCC/STRICT_FSETCCS nodes. This causes the STRICT_FSETCC/STRICT_FSETCCS nodes to lowered early while lowering SELECT, but the output chain doesn't get connected. Then we visit the node again when it is its turn because we haven't replaced the use of the chain result. In the case of the fp128 libcall lowering, after D72341 this will cause the libcall to be emitted twice.
* [TargetLowering][X86] Connect the chain from STRICT_FSETCC in ↵Craig Topper2020-01-111-2/+4
| | | | TargetLowering::expandFP_TO_UINT and X86TargetLowering::FP_TO_INTHelper.
* [X86] Fix outdated commentSimon Pilgrim2020-01-111-2/+1
| | | | The generic saturated math opcodes are no longer widened inside X86TargetLowering
* [X86][AVX] Add lowerShuffleAsLanePermuteAndSHUFP loweringSimon Pilgrim2020-01-111-0/+40
| | | | | | | | Add initial support for lowering v4f64 shuffles to SHUFPD(VPERM2F128(V1, V2), VPERM2F128(V1, V2)), eventually this could be used for v8f32 (and maybe v8f64/v16f32) but I'm being conservative for the initial implementation as only v4f64 can always succeed. This currently is only called from lowerShuffleAsLanePermuteAndShuffle so only gets used for unary shuffles, and we limit this to cases where we use upper elements as otherwise concating 2 xmm shuffles is probably the better case. Helps with poor shuffles mentioned in D66004.
* [X86][AVX] lowerShuffleAsLanePermuteAndShuffle - consistently normalize ↵Simon Pilgrim2020-01-101-2/+2
| | | | | | | | multi-input shuffle elements We only use lowerShuffleAsLanePermuteAndShuffle for unary shuffles at the moment, but we should consistently handle lane index calculations for multiple inputs in both the AVX1 and AVX2 paths. Minor (almost NFC) tidyup as I'm hoping to use lowerShuffleAsLanePermuteAndShuffle for binary shuffles soon.
* CodeGen: Use LLT instead of EVT in getRegisterByNameMatt Arsenault2020-01-091-1/+1
| | | | | | Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.
* [X86] Custom type legalize v4i64->v4f32 uint_to_fp on sse4.1 targets in ↵Craig Topper2020-01-081-9/+11
| | | | | | | | | | 64-bit mode For v4i64->v4f32 uint_to_fp on pre-avx targets where v4i64 isn't legal we create to v2i64->v2f32 uint_to_fp that need to be shuffled together. Our codegen for v2i64->v2f32 involves detecting if the number is larger than (2^31 - 1), if so we do a special divison by 2 so we can do a signed conversion which we need to scalarize, then do a multiply by 2 at the end if we divided earlier. When v4i64 isn't legal we need to split the checking for a larger number and dividing by 2 into two v2i64 vectors. The scalar part can extract the 4 i64 values from those 4 splits. But we can reassemble the 4 scalar f32 results directly into a single v432 vector. Then we just need to combine the fixup indications from the 2 halves and we can do the final multiply by 2 fixup on all 4 values if needed at once using a single v4f32 blend and v4f32 fadd. Differential Revision: https://reviews.llvm.org/D72368
* [X86] Adding fp128 support for strict fcmpWang, Pengfei2020-01-081-5/+5
| | | | | | | | | | | | Summary: Adding fp128 support for strict fcmp Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71897
* [X86] Enable v2i64->v2f32 uint_to_fp code in ReplaceNodeResults on SSE4.1 targetCraig Topper2020-01-071-3/+1
| | | | | | Now that we generate decent code for (v2i64 (setlt zero, X)) on pre-sse4.2 targets I think we can use this now. Differential Revision: https://reviews.llvm.org/D72354
* [X86] Improve lowering of (v2i64 (setgt X, -1)) on pre-SSE2 targets. Enable ↵Craig Topper2020-01-071-3/+14
| | | | | | | | v2i64 in foldVectorXorShiftIntoCmp. Similar to D72302 but for the canonical form for the opposite case. I've changed foldVectorXorShiftIntoCmp to form a target independent setcc node instead of PCMPGT now and enabled its for v2i64 on pre-SSE4.2 targets. The setcc should eventually get lowered to PCMPGT or the new v2i64 sequence. Differential Revision: https://reviews.llvm.org/D72318
* [X86] Improve lowering of v2i64 sign bit tests on pre-sse4.2 targetsCraig Topper2020-01-071-0/+13
| | | | | | Without sse4.2 a v2i64 setlt needs to expand into a pcmpgtd, pcmpeqd, 3 shuffles, and 2 logic ops. But if we're only interested in the sign bit of the i64 elements, we can just use one pcmpgtd and shuffle the odd elements to the even elements. Differential Revision: https://reviews.llvm.org/D72302
* [X86] Pull out repeated SrcVT.getVectorNumElements() call. NFCI.Simon Pilgrim2020-01-071-2/+2
|
* [X86] Standardize shuffle match/lowering function names. NFC.Simon Pilgrim2020-01-071-38/+39
| | | | We mainly use lowerShuffle*/matchShuffle* - replace the (few) lowerVectorShuffle*/matchVectorShuffle* cases to be consistent.
* [X86] Improve v4i32->v4f64 uint_to_fp for AVX1/AVX2 targets.Craig Topper2020-01-061-0/+15
| | | | | | Use zext+or+fsub to do the conversion. Similar to D71971. Differential Revision: https://reviews.llvm.org/D71971
* [X86] Improve v2i64->v2f32 and v4i64->v4f32 uint_to_fp on avx and avx2 targets.Craig Topper2020-01-051-24/+125
| | | | | | | | | | | | | | | | | | | | | Summary: Based on Simon's D52965, but improved to handle strict fp and improve some of the shuffling. Rather than use v2i1/v4i1 and let type legalization continue, just generate all the code with legal types and use an explicit shuffle. I also added an explicit setcc to the v4i64 code to match the semantics of vselect which doesn't just use the sign bit. I'm also using a v4i64->v4i32 truncate instead of the shuffle in Simon's original code. With the setcc this will become a pack. Future work can look into using X86ISD::BLENDV and a different shuffle that only moves the sign bit. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71956
* [NFC] Modify the format:Liu, Chen32020-01-061-2/+1
| | | | Drop the else since we alerady returned in the if.
* [X86][SSE] Combine combineLogicBlendIntoConditionalNegate for VSELECT nodes ↵Simon Pilgrim2020-01-051-2/+13
| | | | | | | | (PR43660) Attempt to use combineLogicBlendIntoConditionalNegate for (select M, (sub 0, X), X) -> (sub (xor X, M), M) We limit this to cases that can't easily replace the VSELECT with a shuffle (non-constant masks) or where a BLENDV is likely to occur (which tends to result in slower codegen).
* [X86] Move combineLogicBlendIntoConditionalNegate before combineSelect. NFCI.Simon Pilgrim2020-01-051-62/+62
| | | | Updates function order in preparation of future fix for PR43660
* [X86] Merge (identical) LowerGC_TRANSITION_START and LowerGC_TRANSITION_END ↵Simon Pilgrim2020-01-051-25/+3
| | | | | | (NFC) Silences a copy+paste analyzer warning - all they are doing are inserting NOOPs in exactly the same way.
* [X86] Improve for v2i32->v2f64 uint_to_fpCraig Topper2020-01-031-36/+14
| | | | | | | | | | | | | | This uses an alternative implementation of this conversion derived from our v2i32->v2f32 handling. We can zero extend the v2i32 to v2i64, or it with the bit representation of 2.0^52 which will give us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits. Then we just need to subtract 2.0^52 as a double and let the floating point unit normalize the remaining bits into a valid double. This is less instructions then our previous code, but does require a port 5 shuffle for the zero extend or unpack. Differential Revision: https://reviews.llvm.org/D71945
* Move tail call disabling code to target independent codeReid Kleckner2020-01-031-7/+1
| | | | | | | | | | | | | | | | | When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118
* [X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB ↵Craig Topper2020-01-021-15/+9
| | | | | | | | | | | | | | | | | | | | | | | | instead of an FADD. Summary: We previously disabled this under fast math due to aggressive reassociation by the machine combiner. But I think we can work around this by using a FSUB instead of FADD for the first operation. This matches the similar algorithm we do for uint_to_fp i64->f64 in TargetLowering::expandUINT_TO_FP. If reassociation hasn't been a problem for that, hopefully its not a problem here. Reviewers: RKSimon, spatel, scanon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71968
* [X86] Enable strict FP by default and remove option ↵Wang, Pengfei2020-01-031-0/+3
| | | | -disable-strictnode-mutation. NFCI.
* [X86] Optimization of inserting vxi1 sub vector into vXi1 vectorWang, Pengfei2020-01-031-2/+20
| | | | | | | | | | | | | | | | | Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917
* [X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if ↵Craig Topper2020-01-011-24/+42
| | | | | | | | the condition is used by something other than select conditions. We might be able to bypass some nodes on the condition path. Differential Revision: https://reviews.llvm.org/D71984
* add strict float for round operationLiu, Chen32020-01-011-18/+34
| | | | Differential Revision: https://reviews.llvm.org/D72026
* [X86] Constant fold KSHIFT of an all zeros vector to just an all zeros vector.Craig Topper2019-12-311-0/+3
|
* [X86] Use carry flag from add for (seteq (add X, -1), -1).Craig Topper2019-12-311-10/+31
| | | | | | | | If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest. Fixes one case from PR44412 Differential Revision: https://reviews.llvm.org/D72019
* [X86] Slightly improve our attempted error recovery for 64-bit -mno-sse2 in ↵Craig Topper2019-12-311-2/+8
| | | | | | | | | | | | | | | LowerCallResult to use FP1 if there are two return values. If the return value is a struct of 2 doubles we need two return registers. If SSE2 is disabled we can't return in XMM registers like the ABI says. After logging an error we attempt to recover by using FP0 instead of an XMM register. But if the return needs two registers, we may have already used FP0. So if the register we were supposed to copy to is XMM1, copy to FP1 in the recovery instead. This seems to fix the assertion/crash in PR44413.
* [X86] Add X86ISD::PCMPGT to SimplifyMultipleUseDemandedBitsForTargetNode.Craig Topper2019-12-301-0/+7
| | | | | If only the sign bit is demanded, and the LHS is all zeroes, then we can bypass the PCMPGT.
* [X86] Use APInt::isOneValue and ConstantSDNode::isOne. NFCCraig Topper2019-12-291-4/+4
| | | | | These are implemented slightly more efficiently than comparing to 1 in the case that the value is more than 64 bits.
* [X86] Use isOneConstant to simplify some code. NFCCraig Topper2019-12-291-2/+1
|
* [X86] Remove dyn_casts to ConstantSDNode for operand 1 of ↵Craig Topper2019-12-291-108/+99
| | | | | | | | X86ISD::VSRLI/VSRAI/VSRLI. Use getConstantOperandVal and APInt operations. These nodes should only ever be formed with an i8 TargetConstant so we don't need to check for it to be a constant. It's also always 8-bits so we don't need to use APInt compare functions.
* [X86] Stop accidentally custom type legalizing v4i32->v4f32 on SSE1 only ↵Craig Topper2019-12-281-2/+3
| | | | | | | | | targets. We had a Custom operation action for v4i32 on SSE1. But since v4i32 isn't legal until SSE2 this was not what was intended. The code that get executed was intended for op legalization and creates a bunch of v4i32 nodes that all end up scalarized.
* [X86] Remove a redundant (scalar_to_vector (extract_vector_elt X))) in ↵Craig Topper2019-12-281-6/+1
| | | | LowerUINT_TO_FP_i32. NFCI
* [X86] Allow v2i32->v2f32 strict and non-strict uint_to_fp to be widened to ↵Craig Topper2019-12-271-1/+1
| | | | | | | | v4i32->v4f32 under avx512. With avx512vl we get v4i32->v4f32 uint_to_fp instructions. With avx512f we get v16i32->v16f32 instructions which we can use to emulate v4i32->v4f32.
* [X86] Custom widen v2i32->v2f32 strict_sint_to_fp to avoid scalarization.Craig Topper2019-12-271-3/+19
|
OpenPOWER on IntegriCloud