summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [x86] split 256-bit store of concatenated vectorsSanjay Patel2019-05-281-0/+11
| | | | | | | | | | | | | | | | | | | | This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is the reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 361822
* [x86] fix 256-bit vector store splitting to honor 'volatile'Sanjay Patel2019-05-281-14/+30
| | | | | | | | | | | Forking this out of the discussion in D62498 (and assuming that will be committed later, so adding the helper function here). The LangRef says: "the backend should never split or merge target-legal volatile load/store instructions." Differential Revision: https://reviews.llvm.org/D62506 llvm-svn: 361815
* [X86] Custom lower CONCAT_VECTORS of v2i1Benjamin Kramer2019-05-281-7/+2
| | | | | | | The generic legalizer cannot handle this. Add an assert instead of silently miscompiling vectors with elements smaller than 8 bits. llvm-svn: 361814
* [X86] X86CmovConverterPass::collectCmovCandidates - fix uninitialized ↵Simon Pilgrim2019-05-281-1/+2
| | | | | | variable warnings. NFCI. llvm-svn: 361804
* [X86][SSE] Add shuffle combining support for ISD::ANY_EXTEND_VECTOR_INREGSimon Pilgrim2019-05-264-13/+23
| | | | | | Reuses what we already have in place for ISD::ZERO_EXTEND_VECTOR_INREG just with a different sentinel llvm-svn: 361734
* [X86][AVX] combineBitcastvxi1 - peek through bitops to determine size of ↵Simon Pilgrim2019-05-261-3/+17
| | | | | | | | | | original vector We were only testing for direct SETCC results - this allows us to peek through AND/OR/XOR combinations of the comparison results as well. There's a missing SEXT(PACKSS) fold that I need to investigate for v8i1 cases before I can enable it there as well. llvm-svn: 361716
* [X86] lowerBuildVectorToBitOp - support build_vector(shift()) -> ↵Simon Pilgrim2019-05-251-0/+20
| | | | | | | | shift(build_vector(),C) Commonly occurs in sign-extension cases llvm-svn: 361706
* [X86] Combine fminnum/fmaxnum with non-nan operand to fmin/fmaxNikita Popov2019-05-251-3/+7
| | | | | | | | | If we have a known non-nan operand, place it in the second operand of fmin/fmax that is returned if either operand is nan. Differential Revision: https://reviews.llvm.org/D62448 llvm-svn: 361704
* [X86FixupLEAs] Turn optIncDec into a generic two address LEA optimizer. ↵Craig Topper2019-05-251-48/+106
| | | | | | | | | | | | | | Support LEA64_32r properly. INC/DEC is really a special case of a more generic issue. We should also turn leas into add reg/reg or add reg/imm regardless of the slow lea flags. This also supports LEA64_32 which has 64 bit input registers and 32 bit output registers. So we need to convert the 64 bit inputs to their 32 bit equivalents to check if they are equal to base reg. One thing to note, the original code preserved the kill flags by adding operands to the new instruction instead of using addReg. But I think tied operands aren't supposed to have the kill flag set. I dropped the kill flags, but I could probably try to preserve it in the add reg/reg case if we think its important. Not sure which operand its supposed to go on for the LEA64_32r instruction due to the super reg implicit uses. Though I'm also not sure those are needed since they were probably just created by an INSERT_SUBREG from a 32-bit input. Differential Revision: https://reviews.llvm.org/D61472 llvm-svn: 361691
* [X86] Add zero idioms to the haswell, broadwell, and skylake schedule ↵Craig Topper2019-05-255-18/+395
| | | | | | | | | | models. Add 256-bit fp xor to sandybridge zero idioms This copies the Sandy Bridge zero idiom support to later CPUs. Adding the AVX2 and AVX512F/VL instructions as appropriate. Differential Revision: https://reviews.llvm.org/D62360 llvm-svn: 361690
* [SelectionDAG] computeKnownBits - support constant pool values from targetSimon Pilgrim2019-05-242-4/+14
| | | | | | | | | | | | | | | | This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets. computeKnownBits then uses this function to improve codegen, notably vector code after legalization. A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit. This required a couple of fixes: * SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR). * Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets. Differential Revision: https://reviews.llvm.org/D61887 llvm-svn: 361620
* Resubmit r360436 "[X86] Avoid SFB - Fix inconsistent codegen with/without ↵Robert Lougher2019-05-231-4/+10
| | | | | | | | | | | | | | | | | | | | debug info" Fixes https://bugs.llvm.org/show_bug.cgi?id=40969 The functions findPotentiallyBlockedCopies and buildCopy are currently not accounting for the presence of debug instructions. In the former this results in the optimization not being trigerred, and in the latter results in inconsistent codegen. This patch enables the optimization to be performed in a debug build and ensures the codegen is consistent with non-debug builds. Patch by Chris Dawson. Differential Revision: https://reviews.llvm.org/D61680 llvm-svn: 361527
* [X86] Support -fno-plt __tls_get_addr callsFangrui Song2019-05-231-51/+72
| | | | | | | | | | | | | | | | | In general dynamic/local dynamic TLS models, with -fno-plt, * x86: emit `calll *___tls_get_addr@GOT(%ebx)` instead of `calll ___tls_get_addr@PLT` Note, on x86, if we can get rid of %ebx as the PIC register, it may be better to use a register not preserved across function calls. * x86_64: emit `callq *__tls_get_addr@GOTPCREL(%rip)` instead of `callq __tls_get_addr@PLT` Reorganize the code by separating 32-bit and 64-bit. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62106 llvm-svn: 361453
* [X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. ↵Craig Topper2019-05-223-143/+8
| | | | | | | | | | | | | | | | | | | | | | Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431
* [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics ↵Craig Topper2019-05-222-72/+0
| | | | | | | | | | | | | | | | | | | | | | | into llvm.ceil/floor. Remove some isel patterns that existed because that was happening. We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into llvm.floor/ceil. The llvm.ceil/floor intrinsics are supposed to correspond to the libm functions. For the libm functions we need to disable the precision exception so the llvm.floor/ceil functions should always map to encodings 0x9 and 0xA. We had a mix of isel patterns where some used 0x9 and 0xA and others used 0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA. Since we have no way in isel of knowing where the llvm.ceil/floor came from, we can't map X86 specific intrinsics with encodings 1 or 2 to it. We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like to see a use case and optimization advantage first. I've left the backend test cases to show the blend we now emit without the extra isel patterns. But I've removed the InstCombine tests completely. llvm-svn: 361425
* [TargetMachine] error message unsupported code modelSjoerd Meijer2019-05-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the tiny code model is requested for a target machine that does not support this, we get an error message (which is nice) but also this diagnostic and request to submit a bug report: fatal error: error in backend: Target does not support the tiny CodeModel [Inferior 2 (process 31509) exited with code 0106] clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation) (gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74) Target: arm-arm-none-eabi Thread model: posix clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. clang-9: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh clang-9: note: diagnostic msg: But this is not a bug, this is a feature. :-) Not only is this not a bug, this is also pretty confusing. This patch causes just to print the fatal error and not the diagnostic: fatal error: error in backend: Target does not support the tiny CodeModel Differential Revision: https://reviews.llvm.org/D62236 llvm-svn: 361370
* [X86] Don't compare i128 through vector if construction not cheap (PR41971)Nikita Popov2019-05-221-3/+8
| | | | | | | | | | | | | | Fix for https://bugs.llvm.org/show_bug.cgi?id=41971. Make the combineVectorSizedSetCCEquality() transform more conservative by checking that the bitcast to the vector type will be cheap/free for both operands. I'm considering it cheap if it's a constant, a load or already a vector. I've dropped the explicit check for f128 because it should fall out naturally (in the cases where it'd be detrimental). Differential Revision: https://reviews.llvm.org/D62220 llvm-svn: 361352
* [X86] [CET] Deal with return-twice function such as vfork, setjmp whenPengfei Wang2019-05-221-12/+30
| | | | | | | | | | | | | | CET-IBT enabled Return-twice functions will indirectly jump after the caller's position. So when CET-IBT is enable, we should make sure these is endbr* instructions follow these Return-twice function caller. Like GCC does. Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D61881 llvm-svn: 361342
* [X86] Remove an unneeded ZERO_EXTEND creation from LowerINTRINSIC_W_CHAIN. NFCCraig Topper2019-05-211-2/+1
| | | | | | We were trying to ZERO_EXTEND from an i8 X86ISD::SETCC to i8 again. llvm-svn: 361288
* [X86][SSE] computeKnownBitsForTargetNode - add X86ISD::ANDNP supportSimon Pilgrim2019-05-211-0/+9
| | | | | | Fixes PACKSS-PSHUFB shuffle regressions mentioned on D61692 llvm-svn: 361270
* [DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC)Petar Jovanovic2019-05-201-4/+1
| | | | | | | | | | | Refactor DIExpression::With* into a flag enum in order to be less error-prone to use (as discussed on D60866). Patch by Djordje Todorovic. Differential Revision: https://reviews.llvm.org/D61943 llvm-svn: 361137
* [X86] Remove combineShift function. Just dispatch directly to the handler ↵Craig Topper2019-05-191-21/+3
| | | | | | for each flavor from the main switch. NFC llvm-svn: 361108
* [X86][SSE] Fold movmsk(not(x)) -> not(movmsk)Simon Pilgrim2019-05-171-1/+14
| | | | | | Helps to improve folding of comparisons with movmsk results. llvm-svn: 361056
* [X86][SSE] Match all-of bool scalar reductions into a bitcast/movmsk + cmp.Simon Pilgrim2019-05-171-0/+18
| | | | | | Same as what we do for vector reductions in combineHorizontalPredicateResult, use movmsk+cmp for scalar (and(extract(x,0),extract(x,1)) reduction patterns. llvm-svn: 361052
* [X86][AVX] Remove LowerCTTZ's AVX1 custom vector handling.Simon Pilgrim2019-05-171-7/+0
| | | | | | We can now rely on generic expansion to handle this. llvm-svn: 361038
* [X86][AVX] isNOT - add extract_subvector(xor X, -1) -> extract_subvector(X) ↵Simon Pilgrim2019-05-171-0/+9
| | | | | | | | fold. Prep work for the removal of the remaining x86 CTTZ vector lowering. llvm-svn: 361035
* [X86] Pull out IsNOT helper. NFCI.Simon Pilgrim2019-05-171-8/+16
| | | | | | Return the input value for the NOT pattern: (xor X, -1) -> X llvm-svn: 361012
* [X86] Add FeatureFastScalarShiftMasks and FeatureFastVectorShiftMasks to the ↵Craig Topper2019-05-171-0/+2
| | | | | | | | | ignore list for inlining compatibility. These are tuning flags and won't cause any codegen issue if we inline a function with a different value. llvm-svn: 360992
* [X86] Support .reloc *, R_{386,X86_64}_NONE, *Fangrui Song2019-05-172-9/+51
| | | | | | | | | | | | | | This can be used to create references among sections. When --gc-sections is used, the referenced section will be retained if the origin section is retained. See R_MIPS_NONE (D13659), R_ARM_NONE (D61992), R_AARCH64_NONE (D61973) for similar changes. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D62014 llvm-svn: 360983
* [X86][AsmParser] Add mnemonics missed in r360954.David L. Jones2019-05-171-1/+2
| | | | | | | | These are valid Jcc, but aren't based on the EFLAGS condition codes (Intel 64 and IA-32 Architetcures Software Developer's Manual Vol. 1, Appendix B). These are covered in clang/test, but not llvm/test. llvm-svn: 360960
* [X86][AsmParser] Ignore "short" even harder in Intel syntax ASM.David L. Jones2019-05-161-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | In Intel syntax, it's not uncommon to see a "short" modifier on Jcc conditional jumps, which indicates the offset should be a "short jump" (8-bit immediate offset from EIP, -128 to +127). This patch expands to all recognized Jcc condition codes, and removes the inline restriction. Clang already ignores "jmp short" in inline assembly. However, only "jmp" and a couple of Jcc are actually checked, and only inline (i.e., not when using the integrated assembler for asm sources). A quick search through asm-containing libraries at hand shows a pretty broad range of Jcc conditions spelled with "short." GAS ignores the "short" modifier, and instead uses an encoding based on the given immediate. MS inline seems to do the same, and I suspect MASM does, too. NASM will yield an error if presented with an out-of-range immediate value. Example of GCC 9.1 and MSVC v19.20, "jmp short" with offsets that do and do not fit within 8 bits: https://gcc.godbolt.org/z/aFZmjY Differential Revision: https://reviews.llvm.org/D61990 llvm-svn: 360954
* [X86][AsmParser] Rename "ConditionCode" variable to "ConditionPredicate".David L. Jones2019-05-161-9/+9
| | | | | | | | This better matches the verbiage in Intel documentation, and should help avoid confusion between these two different kinds of values, both of which are parsed from mnemonics. llvm-svn: 360953
* [X86] Deduplicate symbol lowering logic, NFCReid Kleckner2019-05-162-88/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This refactors four pieces of code that create SDNodes for references to symbols: - normal global address lowering (LEA, MOV, etc) - callee global address lowering (CALL) - external symbol address lowering (LEA, MOV, etc) - external symbol address lowering (CALL) Each of these pieces of code need to: - classify the reference - lower the symbol - emit a RIP wrapper if needed - emit a load if needed - add offsets if needed I think handling them all in one place will make the code easier to maintain in the future. Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61690 llvm-svn: 360952
* [X86] Use 0x9 instead of 0x1 as the immediate in some masked floor pattern. ↵Craig Topper2019-05-161-4/+4
| | | | | | | | | Similarly change 0x2 to 0xA for ceil. This suppresses exceptions which is what we should be doing for ceil and floor. We already use the correct immediate in patterns without masking. llvm-svn: 360915
* [CodeGen] Add lround/llround builtinsAdhemerval Zanella2019-05-161-0/+2
| | | | | | | | | | | | | This patch add the ISD::LROUND and ISD::LLROUND along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lround/llround generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. llvm-svn: 360889
* [X86] Delay creating index register negations during address matching until ↵Craig Topper2019-05-151-7/+15
| | | | | | | | | | after we know for sure the match will succeed If we're trying to match an LEA, its possible the LEA match will be deemed unprofitable. In which case the negation we created in matchAddress would be left dangling in the SelectionDAG. This could artificially increase use counts for other nodes in the DAG. Though I don't have an example of that. But it just seems like bad form to have dangling nodes in isel. Differential Revision: https://reviews.llvm.org/D61047 llvm-svn: 360823
* [X86] Strengthen type constraints on some specialized X86 ISD opcodes that ↵Craig Topper2019-05-151-5/+17
| | | | | | | | | | | | don't have any flexibility. NFC These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the element size is always known. One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in X86ISelLowering and probably low value. llvm-svn: 360815
* [X86] Use OR32mi8Locked instead of LOCK_OR32mi8 in emitLockedStackOp.Craig Topper2019-05-152-5/+3
| | | | | | | | | | | They encode the same way, but OR32mi8Locked sets hasUnmodeledSideEffects set which should be stronger than the mayLoad/mayStore on LOCK_OR32mi8. I think this makes sense since we are using it as a fence. This also seems to hide the operation from the speculative load hardening pass so I've reverted r360511. llvm-svn: 360747
* [NFC] Reuse a helper function to eliminate duplicate codePhilip Reames2019-05-151-79/+67
| | | | llvm-svn: 360740
* [X86] Create a TargetInfo header. NFCRichard Trieu2019-05-158-4/+27
| | | | | | | | Move the declarations of getThe<Name>Target() functions into a new header in TargetInfo and make users of these functions include this new header. This fixes a layering problem. llvm-svn: 360736
* Use an offset from TOS for idempotent rmw locked op loweringPhilip Reames2019-05-141-6/+16
| | | | | | | | This was the portion split off D58632 so that it could follow the redzone API cleanup. Note that I changed the offset preferred from -8 to -64. The difference should be very minor, but I thought it might help address one concern which had been previously raised. Differential Revision: https://reviews.llvm.org/D61862 llvm-svn: 360719
* [X86] Disable shouldFoldConstantShiftPairToMask for scalar shifts on AMD ↵Simon Pilgrim2019-05-143-7/+22
| | | | | | | | | | | | | | targets (PR40758) D61068 handled vector shifts, this patch does the same for scalars where there are similar number of pipes for shifts as bit ops - this is true almost entirely for AMD targets where the scalar ALUs are well balanced. This combine avoids AND immediate mask which usually means we reduce encoding size. Some tests show use of (slow, scaled) LEA instead of SHL in some cases, but thats due to particular shift immediates - shift+mask generate these just as easily. Differential Revision: https://reviews.llvm.org/D61830 llvm-svn: 360684
* [X86] X86TargetLowering::LowerINTRINSIC_WO_CHAIN - ensure rounding control ↵Simon Pilgrim2019-05-141-7/+7
| | | | | | | | is initialized. NFCI. Fixes scan-build warnings llvm-svn: 360664
* [X86] Prefer locked stack op over mfence for seq_cst 64-bit stores on 32-bit ↵Philip Reames2019-05-141-2/+3
| | | | | | | | | | targets This is a follow on to D58632, with the same logic. Given a memory operation which needs ordering, but doesn't need to modify any particular address, prefer to use a locked stack op over an mfence. Differential Revision: https://reviews.llvm.org/D61863 llvm-svn: 360649
* [SDAG, x86] allow targets to override test for binop opcodesSanjay Patel2019-05-142-0/+20
| | | | | | | | This follows the pattern of the existing isCommutativeBinOp(). x86 shows improvements from vector narrowing for the min/max opcodes. llvm-svn: 360639
* [X86] Use ISD::MERGE_VALUES to return from lowerAtomicArith instead of ↵Craig Topper2019-05-131-4/+8
| | | | | | | | | | | | | | calling ReplaceAllUsesOfValueWith and returning SDValue(). Returning SDValue() makes the caller think that nothing happened and it will end up executing the Expand path. This generates extra nodes that will need to be pruned as dead code. Returning an ISD::MERGE_VALUES will tell the caller that we'd like to make a change and it will take care of replacing uses. This will prevent falling into the Expand path. llvm-svn: 360627
* [X86] Various type corrections to the code that creates ↵Craig Topper2019-05-131-10/+13
| | | | | | | | | | | | | | | | | | LOCK_OR32mi8/OR32mi8Locked to the stack for idempotent atomic rmw and atomic fence. These are updates to match how isel table would emit a LOCK_OR32mi8 node. -Use i32 for the immediate zero even though only 8 bits are encoded. -Use i16 for segment register. -Use LOCK_OR32mi8 for idempotent atomic operations in 32-bit mode to match 64-bit mode. I'm not sure why OR32mi8Locked and LOCK_OR32mi8 both exist. The only difference seems to be that OR32mi8Locked is marked as UnmodeledSideEffects=1. -Emit an extra i32 result for the flags output. I don't know if the types here really matter just noticed it was inconsistent with normal behavior. llvm-svn: 360619
* Revert [X86] Avoid SFB - Fix inconsistent codegen with/without debug info Robert Lougher2019-05-131-4/+2
| | | | | | Revert r360436 as it is causing clang-x64-windows-msvc buildbot to fail. llvm-svn: 360606
* [TargetLowering] Handle multi depth GEPs w/ inline asm constraintsNick Desaulniers2019-05-131-34/+6
| | | | | | | | | | | | | | | | | | | | | | | Summary: X86TargetLowering::LowerAsmOperandForConstraint had better support than TargetLowering::LowerAsmOperandForConstraint for arbitrary depth getelementpointers for "i", "n", and "s" extended inline assembly constraints. Hoist its support from the derived class into the base class. Link: https://github.com/ClangBuiltLinux/linux/issues/469 Reviewers: echristo, t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D61560 llvm-svn: 360604
* [X86][SSE] LowerBuildVectorv4x32 - don't insert MOVQ for undef eltsSimon Pilgrim2019-05-131-7/+10
| | | | | | | | Fixes the regression noted in D61782 where a VZEXT_MOVL was being inserted because we weren't discriminating between 'zeroable' and 'all undef' for the upper elts. Differential Revision: https://reviews.llvm.org/D61782 llvm-svn: 360596
OpenPOWER on IntegriCloud