summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Add a DAG combine to combine (sext (setcc)) with VLXCraig Topper2018-01-091-0/+42
| | | | | | | | | | | | Normally target independent DAG combine would do this combine based on getSetCCResultType, but with VLX getSetCCResultType returns a vXi1 type preventing the DAG combining from kicking in. But doing this combine can allow us to remove the explicit sign extend that would otherwise be emitted. This patch adds a target specific DAG combine to combine the sext+setcc when the result type is the same size as the input to the setcc. I've restricted this to FP compares and things that can be represented with PCMPEQ and PCMPGT since we don't have full integer compare support on the older ISAs. Differential Revision: https://reviews.llvm.org/D41850 llvm-svn: 322101
* Instrument Control Flow For Indirect Branch TrackingOren Ben Simhon2018-01-095-1/+176
| | | | | | | | | | | | | CET (Control-Flow Enforcement Technology) introduces a new mechanism called IBT (Indirect Branch Tracking). According to IBT, each Indirect branch should land on dedicated ENDBR instruction (End Branch). The new pass adds ENDBR instructions for every indirect jmp/call (including jumps using jump tables / switches). For more information, please see the following: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf Differential Revision: https://reviews.llvm.org/D40482 Change-Id: Icb754489faf483a95248f96982a4e8b1009eb709 llvm-svn: 322062
* [X86] Allow more cmpps/pd immediate encodings to be commuted during isel.Craig Topper2018-01-091-2/+2
| | | | | | The code that checks the immediate wasn't masking to the lower 3-bits like the code in X86InstrInfo.cpp that's used by the peephole pass does. llvm-svn: 322060
* [X86] Remove llvm.x86.avx512.cvt*2mask.* intrinsics and autoupgrade to (icmp ↵Craig Topper2018-01-092-26/+1
| | | | | | | | slt X, 0) I had to drop fast-isel-abort from a test because we can't fast isel some of the mask stuff. When we used intrinsics we implicitly fell back to SelectionDAG for the intrinsic call without triggering the abort error. But with native IR that doesn't happen the same way. llvm-svn: 322050
* [X86] Remove unnecessary isel pattern that is a combination of two other ↵Craig Topper2018-01-091-2/+0
| | | | | | | | | | | | | | | patterns. The pattern was this def : Pat<(i32 (zext (i8 (bitconvert (v8i1 VK8:$src))))), (MOVZX32rr8 (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit))>, Requires<[NoDQI]>; but if you just let (i32 (zext X)) match byte itself you'll get MOVZX32rr8. And if you let (i8 (bitconvert (v8i1 VK8:$src))) match by itself you'll get (EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS VK8:$src, GR32)), sub_8bit). So we can just let isel do the two patterns naturally. llvm-svn: 322049
* [MachineOutliner] AArch64: Handle instrs that use SP and will never need fixupsJessica Paquette2018-01-092-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit does two things. Firstly, it adds a collection of flags which can be passed along to the target to encode information about the MBB that an instruction lives in to the outliner. Second, it adds some of those flags to the AArch64 outliner in order to add more stack instructions to the list of legal instructions that are handled by the outliner. The two flags added check if - There are calls in the MachineBasicBlock containing the instruction - The link register is available in the entire block If the link register is available and there are no calls, then a stack instruction can always be outlined without fixups, regardless of what it is, since in this case, the outliner will never modify the stack to create a call or outlined frame. The motivation for doing this was checking which instructions are most often missed by the outliner. Instructions like, say %sp<def> = ADDXri %sp, 32, 0; flags: FrameDestroy are very common, but cannot be outlined in the case that the outliner might modify the stack. This commit allows us to outline instructions like this. llvm-svn: 322048
* [X86] Remove side-effects from determineCalleeSavesFrancis Visoiu Mistrih2018-01-081-28/+27
| | | | | | | | | | | | | | (Target)FrameLowering::determineCalleeSaves can be called multiple times. I don't think it should have side-effects as creating stack objects and setting global MachineFunctionInfo state as it is doing today (in other back-ends as well). This moves the creation of stack objects from determineCalleeSaves to assignCalleeSavedSpillSlots. Differential Revision: https://reviews.llvm.org/D41703 llvm-svn: 321987
* [X86] Replace CVT2MASK ISD opcode with PCMPGTM compared to zero.Craig Topper2018-01-085-24/+22
| | | | | | CVT2MASK is just checking the sign bit which can be represented with a comparison with zero. llvm-svn: 321985
* [X86] Add patterns to allow 512-bit BWI compare instructions to be used for ↵Craig Topper2018-01-082-6/+27
| | | | | | 128/256-bit compares when VLX is not available. llvm-svn: 321984
* [X86] Simplify some code in lower1BitVectorShuffle by relying on getNode's ↵Craig Topper2018-01-071-15/+2
| | | | | | ability to constant fold vector SIGN_EXTEND. llvm-svn: 321979
* [X86] Add VSHUFF32X4 and similar instructions to load folding tables.Craig Topper2018-01-071-0/+24
| | | | llvm-svn: 321978
* [X86] Revert accidental change to CMakeLists.txt in r321952Craig Topper2018-01-071-1/+3
| | | | | | I had removed the qualifiers around the autogenerated folding table so I could compare with the manual table, but didn't intend to commit the change. llvm-svn: 321971
* [X86] Remove unneeded code from combineGatherScatter that used to delte ↵Craig Topper2018-01-071-11/+1
| | | | | | | | SIGN_EXTEND_INREG nodes created during legalization of v2i1/v4i1 masks on KNL. v2i1/v4i1 are now legal on KNL so no sign_extend_inreg is generated. llvm-svn: 321968
* [X86] Make v2i1 and v4i1 legal types without VLXCraig Topper2018-01-074-129/+123
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967
* [X86] Add the 16 and 8-bit CRC32 instructions to the load folding tables.Craig Topper2018-01-071-0/+3
| | | | llvm-svn: 321958
* [X86] Correct the load folding flags for xmm fp->mmx conversion instructions.Craig Topper2018-01-071-4/+4
| | | | | | The instructions that load 64-bits or an xmm register should be TB_NO_REVERSE to avoid the load being widened during unfold. The instructions that load 128-bits need to ensure 128-bit alignment. llvm-svn: 321956
* [X86] Add TB_NO_REVERSE to some scalar intrinsic instructions in the load ↵Craig Topper2018-01-071-4/+4
| | | | | | folding table. llvm-svn: 321955
* [X86] Don't put any EVEX_B instructions in the tablegen generated load ↵Craig Topper2018-01-071-0/+1
| | | | | | | | folding tables. EVEX_B means different things for memory and register forms. The instructions should not be considered equivalent. llvm-svn: 321954
* [X86] Add 128 and 256-bit VPOPCNTD/Q instructions to load folding tables.Craig Topper2018-01-071-0/+30
| | | | llvm-svn: 321953
* [X86] Add some 8 and 16-bit instructions to the load folding tables.Craig Topper2018-01-072-3/+7
| | | | llvm-svn: 321952
* [X86] Add EVEX vcvtph2ps to the load folding tables.Craig Topper2018-01-071-1/+4
| | | | llvm-svn: 321951
* [X86] Remove cvtps2ph xmm->xmm from store folding tables. Add the evex ↵Craig Topper2018-01-071-2/+3
| | | | | | | | versions of cvtps2ph to the store folding tables. The memory form of the xmm->xmm version only writes 64-bits. If we use it in the folding tables and its get used for a stack spill, only half the slot will be written. Then a reload may read all 128-bits which will pull in garbage. But without the spill the upper bits of the register would have been zero. By not folding we would preserve the zeros. llvm-svn: 321950
* [X86] Add CMP8ri8 to load folding tables.Craig Topper2018-01-071-0/+1
| | | | llvm-svn: 321949
* [X86] Remove assembler predicates from all AVX512 related feature flags.Craig Topper2018-01-061-26/+13
| | | | | | | | We don't do fine grained feature control like this on features prior to AVX512. We do still have checks in place in the assembly parser itself that prevents %zmm references or %xmm16-31 from being parsed without at least -mattr=avx512f. Same for rounding control and mask operands. That will prevent the table matcher from matching for any instructions that need those features and that's probably good enough. llvm-svn: 321947
* [X86] Remove memory forms of EVEX encoded vcvttss2si/vcvttsd2si from asm ↵Craig Topper2018-01-061-7/+20
| | | | | | | | matcher table. This is also needed to fix PR35837. llvm-svn: 321946
* [X86] Add load folding pattern to EVEX vcvttss2si/vcvtsd2si.Craig Topper2018-01-061-6/+7
| | | | llvm-svn: 321945
* [X86] Remove an unnecessary VCVTTSD2SIrrb/VCVTSS2SIrrb instruction with no ↵Craig Topper2018-01-061-27/+23
| | | | | | | | isel pattern that only existed for the assembler. Use VCVTTSD2SIrrb_Int instead. For consistency use the _Int version of VCVTTSD2SIrr_Int and VCVTTSD2SIrm_Int for the assembler as well. llvm-svn: 321944
* [X86] Remove memory forms of EVEX encoded vcvtsd2si/vcvtss2si from the ↵Craig Topper2018-01-061-5/+17
| | | | | | | | | | assembler matcher table We should always prefer the VEX encoded version of these instructions. There is no advantage to the EVEX version. Fixes PR35837. llvm-svn: 321939
* [x86, MemCmpExpansion] allow 2 pairs of loads per block (PR33325)Sanjay Patel2018-01-061-0/+5
| | | | | | | | | | | | | | | | This is the last step needed to fix PR33325: https://bugs.llvm.org/show_bug.cgi?id=33325 We're trading branch and compares for loads and logic ops. This makes the code smaller and hopefully faster in most cases. The 24-byte test shows an interesting construct: we load the trailing scalar elements into vector registers and generate the same pcmpeq+movmsk code that we expected for a pair of full vector elements (see the 32- and 64-byte tests). Differential Revision: https://reviews.llvm.org/D41714 llvm-svn: 321934
* [X86] Rename the EVEX encoded GFNI instructions to start with a 'V'. NFCCraig Topper2018-01-061-8/+8
| | | | | | This makes the names consistent with the mnemonics like every other instruction. llvm-svn: 321931
* [X86] When parsing rounding mode operands, provide a proper end location so ↵Craig Topper2018-01-061-5/+6
| | | | | | we don't crash when trying to print an error message using it. llvm-svn: 321930
* [X86] Call lowerShuffleAsRepeatedMaskAndLanePermute from ↵Craig Topper2018-01-061-0/+6
| | | | | | lowerV4I64VectorShuffle. llvm-svn: 321929
* [X86] Add vcvtsd2sil/vcvtsd2siq etc. InstAliases to the EVEX-encoded ↵Craig Topper2018-01-051-10/+18
| | | | | | | | instructions. This matches their VEX equivalents. llvm-svn: 321912
* [X86] Add InstAliases for 'vmovd' with GR64 registers to select EVEX encoded ↵Craig Topper2018-01-051-0/+6
| | | | | | | | | | instructions as well. Without this we allow "vmovd %rax, %xmm0", but not "vmovd %rax, %xmm16" This exists due to continue a silly bug where really old versions of the GNU assembler required movd instead of movq on these instructions. This compatibility hack then crept forward to avx version too, but we didn't propagate it to avx512. llvm-svn: 321903
* [X86] Stop printing moves between VR64 and GR64 with 'movd' mnemonic. Use ↵Craig Topper2018-01-052-7/+9
| | | | | | | | 'movq' instead. This behavior existed to work with an old version of the gnu assembler on MacOS that only accepted this form. Newer versions of GNU assembler and the current LLVM derived version of the assembler on MacOS support movq as well. llvm-svn: 321898
* [X86] Correct the execution domain for AVX1 VBROADCASTF128 to be FP instead ↵Craig Topper2018-01-041-1/+2
| | | | | | of integer. llvm-svn: 321821
* [X86] Remove 'else' after 'return' I forgot to cleanup before committing D41691.Craig Topper2018-01-031-4/+7
| | | | llvm-svn: 321755
* [X86] Remove useless custom inserter for 64-bit TAILJMP and TCRETURN opcodesCraig Topper2018-01-032-12/+1
| | | | | | | | | | This custom inserter was added in r124272 at which time it added about bunch of Defs for Win64. In r150708, those defs were removed leaving only the "return BB". So I think this means the custom inserter is a NOP these days. This patch removes the remaining code and stops tagging the instructions for custom insertion Differential Revision: https://reviews.llvm.org/D41671 llvm-svn: 321747
* [X86] Use ANY_EXTEND instead of SIGN_EXTEND in lowerMasksToRegCraig Topper2018-01-031-1/+1
| | | | | | | | Currently we use SIGN_EXTEND in lowerMasksToReg as part of calling convention setup, but we don't require a specific value for the upper bits. This patch changes it to ANY_EXTEND which will be lowered as SIGN_EXTEND if it ends up sticking around. llvm-svn: 321746
* Remove left-over debug printout from r321692Hans Wennborg2018-01-031-1/+0
| | | | | | | Besides the unsightly print-out, it was causing some buildbots to fail, e.g. http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/9311 llvm-svn: 321711
* Thread MCSubtargetInfo through Target::createMCAsmBackendAlex Bradbury2018-01-032-8/+13
| | | | | | | | | | | | | | | | | | | | | Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend. D20830 threaded an MCSubtargetInfo reference through MCAsmBackend::relaxInstruction, but this isn't the only function that would benefit from access. This patch removes the Triple and CPUString arguments from createMCAsmBackend and replaces them with MCSubtargetInfo. This patch just changes the interface without making any intentional functional changes. Once in, several cleanups are possible: * Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend * Support 16-bit instructions when valid in MipsAsmBackend::writeNopData * Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl * Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221) This change initially exposed PR35686, which has since been resolved in r321026. Differential Revision: https://reviews.llvm.org/D41349 llvm-svn: 321692
* Handle the case of live 16-bit subregisters in X86FixupBWInstsAndrew Kaylor2018-01-021-82/+71
| | | | | | | Differential Revision: https://reviews.llvm.org/D40524 Change-Id: Ie3a405b28503ceae999f5f3ba07a68fa733a2400 llvm-svn: 321674
* [x86] allow pairs of PCMPEQ for vector-sized integer equality comparisons ↵Sanjay Patel2018-01-021-7/+31
| | | | | | | | | | | | | | | | | | (PR33325) This is an extension of D31156 with the goal that we'll allow memcmp() == 0 expansion for x86 to use 2 pairs of loads per block. The memcmp expansion pass (formerly part of CGP) will generate this kind of pattern with oversized integer compares, so we want to transform these into x86-specific vector nodes before legalization splits things into scalar chunks. See PR33325 for more details: https://bugs.llvm.org/show_bug.cgi?id=33325 Differential Revision: https://reviews.llvm.org/D41618 llvm-svn: 321656
* Strip trailing whitespace. NFCISimon Pilgrim2018-01-021-2/+2
| | | | llvm-svn: 321644
* [X86] Promote vXi1 fp_to_uint/fp_to_sint to vXi32 to avoid scalarization.Craig Topper2018-01-011-1/+32
| | | | llvm-svn: 321632
* [X86] Replace custom lowering of vXi1 SINT_TO_FP/UINT_TO_FP with promotion.Craig Topper2018-01-011-32/+20
| | | | | | The custom lowering was just doing the same thing promotion would do. llvm-svn: 321630
* [SelectionDAG][X86][AArch64] Require targets to specify the promotion type ↵Craig Topper2018-01-011-6/+6
| | | | | | | | | | | | | | | | when using setOperationAction Promote for INT_TO_FP and FP_TO_INT Currently the promotion for these ignores the normal getTypeToPromoteTo and instead just tries to double the element width. This is because the default behavior of getTypeToPromote to just adds 1 to the SimpleVT, which has the affect of increasing the element count while keeping the scalar size the same. If multiple steps are required to get to a legal operation type, int_to_fp will be promoted multiple times. And fp_to_int will keep trying wider types in a loop until it finds one that works. getTypeToPromoteTo does have the ability to query a promotion map to get the type and not do the increasing behavior. It seems better to just let the target specify the promotion type in the map explicitly instead of letting the legalizer iterate via widening. FWIW, it's worth I think for any other vector operations that need to be promoted, we have to specify the type explicitly because the default behavior of getTypeToPromote isn't useful for vectors. The other types of promotion already require either the element count is constant or the total vector width is constant, but neither happens by incrementing the SimpleVT enum. Differential Revision: https://reviews.llvm.org/D40664 llvm-svn: 321629
* [X86] In LowerTruncateVecI1, don't add SHL if the input is known to be all ↵Craig Topper2018-01-011-10/+16
| | | | | | | | sign bits. If the input is all sign bits then the LSB through MSB are all the same so we don't need to be move the LSB to the MSB. llvm-svn: 321617
* [X86] Add missing NoVLX predicate around some patterns that use zmm ↵Craig Topper2018-01-011-1/+1
| | | | | | registers to implement 128/256-bit operations without VLX. llvm-svn: 321613
* [X86] Add patterns for using zmm registers for v8i32/v8f32 vselect with the ↵Craig Topper2018-01-011-19/+24
| | | | | | | | false input being zero. We can use zmm move with zero masking for this. We already had patterns for using a masked move, but we didn't check for the zero masking case separately. llvm-svn: 321612
OpenPOWER on IntegriCloud