summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86][SSE] Use (V)PHMINPOSUW for vXi8 SMAX/SMIN/UMAX/UMIN horizontal ↵Simon Pilgrim2017-12-191-9/+27
| | | | | | | | | | | | reductions (PR32841) Extension to D39729 which performed this for vXi16, with the same bit flipping to handle SMAX/SMIN/UMAX cases, vXi8 UMIN horizontal reductions can be performed. This makes use of the fact that by performing a pair-wise i8 SHUFFLE/UMIN before PHMINPOSUW, we both get the UMIN of each pair but also zero-extend the upper bits ready for v8i16. Differential Revision: https://reviews.llvm.org/D41294 llvm-svn: 321070
* [X86] Don't extend v16i8 non-uniform shifts to v16i32 if we have BWI. Use ↵Craig Topper2017-12-191-1/+3
| | | | | | | | v16i16 instead. BWI supports shifting by word amounts. Even if VLX isn't support we can still widen to v32i16 and extract the lower half. For SKX its preferrable to not use 512-bit vector if we can. llvm-svn: 321059
* [X86] Use a specific list of MVTs in combineShiftRightArithmetic instead of ↵Craig Topper2017-12-191-2/+2
| | | | | | | | iterating over every integer VT and checking their size. Previously, we were checking for MVTs with sizes betwen 8 and 64 which only includes i8, i16, i32, and i64 today. But I don't think we should assume that and should list the types that are legal for x86. I also don't think we need i64 since type legalization is guaranteed to split those up. llvm-svn: 321058
* [X86] Remove unnecessary check for integer VT from combineShiftRightArithmetic.Craig Topper2017-12-191-1/+1
| | | | | | I doubt there's any way to create a ashr for an FP type. llvm-svn: 321057
* [X86] Remove dead code for turning vector shifts by large amounts into a ↵Craig Topper2017-12-191-36/+0
| | | | | | | | zero vector. Pretty sure these are handled by a target independent DAG combine that turns them into undef these days. llvm-svn: 321056
* [X86] Use ZERO_EXTEND instead of ANY_EXTEND when extending the shift amount ↵Craig Topper2017-12-191-1/+1
| | | | | | | | | | for a non-uniform shift. My reading of the SDM says that all bits of the shift amount are used. If the value of the element is larger than the number of bits the result the shift result is zero. So I think we need to zero_extend here to avoid garbage in the upper bits. In reality we lower any_extend as zero_extend so in most cases it would be hard to hit this. llvm-svn: 321055
* [X86] Don't use NOPL when the assembler is passed an empty CPU string.Craig Topper2017-12-181-1/+1
| | | | | | This recommits the change from r321026. I have a fix for the lld test now. llvm-svn: 321038
* X86/AArch64/ARM: Factor out common sincos_stret logic; NFCIMatthias Braun2017-12-183-24/+7
| | | | | | | | | | | Note: - X86ISelLowering: setLibcallName(SINCOS) was superfluous as InitLibcalls() already does it. - ARMISelLowering: Setting libcallnames for sincos/sincosf seemed superfluous as in the darwin case it wouldn't be used while for all other cases InitLibcalls already does it. llvm-svn: 321036
* AArch64/X86: Factor out common bzero logic; NFCMatthias Braun2017-12-183-23/+4
| | | | llvm-svn: 321035
* Revert part of r321026 "[X86] Don't use NOPL when the assembler is passed an ↵Craig Topper2017-12-181-1/+1
| | | | | | | | empty CPU string." while I investigate how to fix an lld test failure. Looks like lld also needs to pass a -mcpu in some of its tests llvm-svn: 321033
* [X86] Don't use NOPL when the assembler is passed an empty CPU string. ↵Craig Topper2017-12-181-1/+1
| | | | | | | | | | Update tests to force a CPU with NOPL Empty string should be equivalent to "generic" which doesn't allow NOPL. Force tests to use specificy 'pentiumpro' to guarantee NOPL. Fixes PR35686 llvm-svn: 321026
* [X86] Fix mistake that I made when splitting up the setOperationAction calls ↵Craig Topper2017-12-181-2/+2
| | | | | | | | recently. The block I moved things that need BWI and 512-bit or VLX is incorrectly qualified with just hasBWI || hasVLX. Here I've qualified it with hasBWI && (hasAVX512 || hasVLX) where the hasAVX512 will be replaced with allowing 512-bit vectors in an upcoming patch. llvm-svn: 320957
* [X86] Make the code that creates fmaddsub from build_vector of extracts and ↵Craig Topper2017-12-171-6/+15
| | | | | | | | | | | | | | | | | | | inserts functional and add tests. Summary: We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2. The missing coverage was noticed during the review of D40335. Reviewers: RKSimon, zvi, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41133 llvm-svn: 320950
* Remove superfluous break after a return. NFCI.Simon Pilgrim2017-12-171-1/+0
| | | | llvm-svn: 320941
* [X86DomainReassignment] Store legal domains in a std::bitset instead of ↵Craig Topper2017-12-171-16/+18
| | | | | | using a SmallVector that really only ever has one element as a set. llvm-svn: 320940
* [X86] Use extract_vector_elt instead of X86ISD::VEXTRACT for isel of vXi1 ↵Craig Topper2017-12-174-14/+15
| | | | | | extractions. llvm-svn: 320937
* [X86] Canonicalize extract_vector_elt from vXi1 to always return MVT::i32.Craig Topper2017-12-172-6/+7
| | | | | | This allows us to remove some isel patterns that allowed MVT::i8 result type. llvm-svn: 320936
* [X86] Don't create X86ISD::VEXTRACT nodes directly. Use EXTRACT_VECTOR_ELT ↵Craig Topper2017-12-171-5/+6
| | | | | | | | and allow that to be legaized to VEXTRACT. I think we can remove the VEXTRACT node completely and use a canonicalized EXTRACT_VECTOR_ELT instead. This is a first step. llvm-svn: 320935
* Fix unused variable warning.Simon Pilgrim2017-12-161-3/+2
| | | | llvm-svn: 320934
* [X86][AVX] lowerVectorShuffleAsBroadcast - aggressively peek through BITCASTsSimon Pilgrim2017-12-161-2/+26
| | | | | | | | Assuming we can safely adjust the broadcast index for the new type to keep it suitably aligned, then peek through BITCASTs when looking for the broadcast source. Fixes PR32007 llvm-svn: 320933
* [X86][AVX] Use extract128BitVector helper. NFCI.Simon Pilgrim2017-12-161-4/+1
| | | | llvm-svn: 320932
* [X86][AVX] Fix failed broadcast foldSimon Pilgrim2017-12-161-3/+7
| | | | | | Strip excess BITCASTs from EXTRACT_SUBVECTOR input llvm-svn: 320930
* [X86] Don't pass a zero input to the passthru operand of ↵Craig Topper2017-12-161-9/+6
| | | | | | | | getVectorMaskingNode/getScalarMaskingNode when its going to emit an ISD::OR/ISD::AND. NFCI In those cases, the pass thru operand of the methods isn't used. The calls to the scalar version were passing a MVT::i1 zero, which is an illegal type at the stage this code runs. llvm-svn: 320928
* [X86] Have getVectorMaskingNode return an ISD::AND for X86ISD::VPSHUFBITQMB ↵Craig Topper2017-12-161-0/+1
| | | | | | instead of creating a select with one input being 0. llvm-svn: 320927
* [X86] When using vpopcntdq for ctpop of v8i16 vectors, only promote to v8i32.Craig Topper2017-12-161-8/+7
| | | | | | Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half. llvm-svn: 320926
* [X86] Combine some more scheduler model entries using regular expressions.Craig Topper2017-12-164-200/+100
| | | | | | We had a lot of separate 32 and 64 instructions that had the same scheduling data. This merges them into the same regular expression. This is pretty consistent with a lot of other instructions. llvm-svn: 320924
* [X86] Use instrs instead of instregex for gather/scatter instructions in the ↵Craig Topper2017-12-164-130/+116
| | | | | | | | scheduler models. Combine into single InstrRW entries. The reduces the number of scheduler groups in subtarget info. llvm-svn: 320923
* [X86] Remove unneeded code for handling the old kunpck intrinsics.Craig Topper2017-12-162-13/+1
| | | | llvm-svn: 320917
* [X86] Add 128 and 256-bit VPOPCNTDQ instructions. Adjust some tablegen ↵Craig Topper2017-12-161-64/+33
| | | | | | | | classes LZCNT/POPCNT. I think when this instruction was first published it was only for a Knights CPU and thus VLX version was missing. llvm-svn: 320910
* [X86] Add back the assert from r320830 that was reverted in r320850Craig Topper2017-12-161-0/+2
| | | | | | Hopefully r320864 has fixed the offending case that failed the assert. llvm-svn: 320898
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-1521-141/+139
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* [X86] Use AND32ri8 instead of AND64ri8 in Asan code in EmitCallAsanReport ↵Craig Topper2017-12-151-1/+1
| | | | | | | | for 32-bit mode. This seemed to work due to a quirk in the X86 MC encoder that didn't emit a REX byte that the AND64ri8 implies when in 32-bit mode. This made the encoding the same as AND32ri8. I tried to add an assert to catch the dropped REX prefix that caught this. llvm-svn: 320864
* [X86] In LowerVectorCTPOP use ISD::ZERO_EXTEND/ISD::TRUNCATE instead of the ↵Craig Topper2017-12-151-4/+4
| | | | | | | | target specific nodes. The target independent nodes will get legalized to the target specific nodes by their own legalization process. Someday I'd like to stop using a target specific for zero extends and truncates of legal types so the less places we reference the target specific opcode the better. llvm-svn: 320863
* [X86] Remove unnecessary TODO.Craig Topper2017-12-151-1/+0
| | | | | | When I wrote it I thought we were missing a potential optimization for KNL. But investigating further shows that for KNL we still do the optimal thing by widening to v4f32 and then using special isel patterns to widen again to zmm a register. llvm-svn: 320862
* [X86] Remove assert in X86MCCodeEmitter.cpp that was added in r320830.Craig Topper2017-12-151-2/+0
| | | | | | It seems to be failing real code which is concerning, but we were silently getting away with it. I'll investigate further. llvm-svn: 320850
* [SelectionDAG][X86] Fix insert_vector_elt lowering for v32i1/v64i1 with ↵Craig Topper2017-12-151-3/+5
| | | | | | | | | | | | | | | | | | | | | | | non-constant index Summary: Currently we don't handle v32i1/v64i1 insert_vector_elt correctly as we fail to look at the number of elements closely and assume it can only be v16i1 or v8i1. We also can't type legalize v64i1 insert_vector_elt correctly on KNL due to the type not being byte addressable as required by the legalizing through memory accesses path requires. For the first issue, the patch now tries to pick a 512-bit register with the correct number of elements and promotes to that. For the second issue, we now extend the vector to a byte addressable type, do the stores to memory, load the two halves, and then truncate the halves back to the original type. Technically since we changed the type, we may not need two loads, but actually checking that is more work and for the v64i1 case we do need them. Reviewers: RKSimon, delena, spatel, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40942 llvm-svn: 320849
* [X86] Add 'Requires<[In64BitMode]>' to a bunch of instructions that only ↵Craig Topper2017-12-154-67/+94
| | | | | | | | have memory and immediate operands. The asm parser wasn't preventing these from being accepted in 32-bit mode. Instructions that use a GR64 register are protected by the parser rejecting the register in 32-bit mode. llvm-svn: 320846
* [X86] Change BNDLDX to use anymem instead of i64mem for itsmemory operand.Craig Topper2017-12-151-1/+1
| | | | | | This instruction doesn't access memory. It juse use a similar looking memory encoding. Don't require Intel syntax to put "qword ptr" in front of it. llvm-svn: 320845
* [X86] Remove the 'Requires' In64BitMode/Not64BitMode from the LWP instructions.Craig Topper2017-12-151-4/+4
| | | | | | These aren't doing anything due to a top level "let Predicates =". I think the GR32/GR64 register class protects these anyway. llvm-svn: 320844
* [X86] Remove the 'Requires<[In64BitMode]>' from SHSTK instructions.Craig Topper2017-12-151-9/+5
| | | | | | This has no effect due to a top level "let Predicates =" around the instructions. But its also not required because the GR64 usage in the instruction guarantees it can never match. llvm-svn: 320843
* Fix for bug PR35549 - Repeated schedule comments.Andrew V. Tischenko2017-12-152-2/+8
| | | | | | Differential Revision: https://reviews.llvm.org/D40960 llvm-svn: 320837
* [X86] Fix XSAVE64 and similar instructions to not be allowed by the ↵Craig Topper2017-12-153-35/+34
| | | | | | | | | | assembler in 32-bit mode. There was a top level "let Predicates =" in the .td file that was overriding the Requires on each instruction. I've added an assert to the code emitter to catch more cases like this. I'm sure this isn't the only place where the right predicates aren't being applied. This assert already found that we don't block btq/btsq/btrq in 32-bit mode. llvm-svn: 320830
* [CodeGen] Print stack object references as %(fixed-)stack.0 in both MIR and ↵Francis Visoiu Mistrih2017-12-151-5/+5
| | | | | | | | | | | | | | debug output Work towards the unification of MIR and debug output by printing `%stack.0` instead of `<fi#0>`, and `%fixed-stack.0` instead of `<fi#-4>` (supposing there are 4 fixed stack objects). Only debug syntax is affected. Differential Revision: https://reviews.llvm.org/D41027 llvm-svn: 320827
* [X86] Widen (v2i32 (fp_to_uint v2f64)) to (v8i32 (fp_to_uint v8f64)) during ↵Craig Topper2017-12-152-11/+15
| | | | | | | | legalization if we have AVX512F, but not VLX. NFC Previously we widened it using isel patterns. llvm-svn: 320824
* [X86] Fix a couple bugs in my recent changes to vXi1 insert_subvector lowering.Craig Topper2017-12-151-9/+9
| | | | | | | | A couple places didn't use the same SDValue variables to connect everything all the way through. I don't have a test case for a bug in insert into the lower bits of a non-zero, non-undef vector. Not sure the best way to create that. We don't create the case when lowering concat_vectors which is the main way to get insert_subvectors. llvm-svn: 320790
* [X86] Add a TODO about v8i1 CONCAT_VECTORS.Craig Topper2017-12-151-0/+3
| | | | llvm-svn: 320784
* [X86] Further rearrange the setOperationAction calls to separate the ones ↵Craig Topper2017-12-151-66/+83
| | | | | | | | | | | | that require 512-bit registers OR VLX into separate sections. NFCI We have several instructions that were introduced in AVX512F that are only available in 512-bit form on KNL. We still make use of them for 128/256 by artificially widening and extracting during isel. This commit separates these operations from the true 512-bit operations. This way we can qualify the normal 512-bit operations with needing 512-bit register support. And these special operations will get qualified with needing 512-bit registers OR VLX. The 512-bit register qualification will be introduced in a future patch this just gets everything grouped to minimize deltas on that patch. llvm-svn: 320782
* [X86] Group setOperationActions related to vXi1 masks together. NFCICraig Topper2017-12-151-74/+71
| | | | | | | | Previously they were sort of interleaved in with XMM/YMM/ZMM action related code. Trying to separate things so its easier to split 512-bit vectors later. llvm-svn: 320781
* [X86] Make ISD::INSERT_SUBVECTOR v8i1 legal with AVX512F because we should ↵Craig Topper2017-12-151-1/+1
| | | | | | | | be custom lowering inserting v1i1 into v8i1 under this. I don't have a test case at the moment. Just noticed while auditing things. llvm-svn: 320780
* [X86] Move some of the hasVLX qualified code out of the main hasAVX512 block ↵Craig Topper2017-12-151-34/+51
| | | | | | | | | | in the X86ISelLowering constructor. NFCI Move it into the separate hasVLX block later in the constructor. I'm trying to separate 128/256 and 512-bit related code so we can eventually qualify the hasAVX512 block with support for 512-bit vectors required by the prefer-vector-width feature support being talked about in D41096. llvm-svn: 320779
OpenPOWER on IntegriCloud