summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Bitcast subvector before broadcasting it.Ahmed Bougacha2017-02-101-1/+10
| | | | | | | | | | | | | Since r274013, we've been looking through bitcasts on broadcast inputs. In the scalar-folding case (from a load, build_vector, or sc2vec), the input type didn't matter, as we'd simply bitcast the resulting scalar back. However, when broadcasting a 128-bit-lane-aligned element, we create an EXTRACT_SUBVECTOR. Use proper types, by creating an extract_subvector of the original input type. llvm-svn: 294774
* [X86][SSE] Use SDValue::getConstantOperandVal helper. NFCI.Simon Pilgrim2017-02-101-11/+6
| | | | | | Also reordered an if statement to test low cost comparisons first llvm-svn: 294748
* [X86][SSE] Add support for extracting target constants from BUILD_VECTORSimon Pilgrim2017-02-101-0/+17
| | | | | | | In some cases we call getTargetConstantBitsFromNode for nodes that haven't been lowered from BUILD_VECTOR yet Note: We're getting very close to being able to move most of the constant extraction code from getTargetShuffleMaskIndices into getTargetConstantBitsFromNode llvm-svn: 294746
* [X86][SSE] Add missing comment describing combing to SHUFPS. NFCISimon Pilgrim2017-02-101-0/+2
| | | | llvm-svn: 294745
* add #ifdef, fix compilation error in case LLVM_BUILD_GLOBAL_ISEL=OFFIgor Breger2017-02-101-0/+2
| | | | llvm-svn: 294726
* [X86][GlobalISel] Add general-purpose Register BankIgor Breger2017-02-109-12/+369
| | | | | | | | | | | | | | | | | Summary: [X86][GlobalISel] Add general-purpose Register Bank. Add trivial handling of G_ADD legalization . Add Regestry Bank selection for COPY and G_ADD instructions Reviewers: rovka, zvi, ab, t.p.northover, qcolombet Reviewed By: qcolombet Subscribers: qcolombet, mgorny, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29771 llvm-svn: 294723
* Temporarily revert "For X86-64 linux and PPC64 linux align int128 to 16 bytes."Eric Christopher2017-02-101-5/+0
| | | | | | | | | until we can get better TargetMachine::isCompatibleDataLayout to compare - otherwise we can't code generate existing bitcode without a string equality data layout. This reverts commit r294702. llvm-svn: 294709
* For X86-64 linux and PPC64 linux align int128 to 16 bytes.Eric Christopher2017-02-101-0/+5
| | | | | | | | | | | | For other platforms we should find out what they need and likely make the same change, however, a smaller additional change is easier for platforms we know have it specified in the ABI. As part of this rewrite some of the handling in the backends for data layout and update a bunch of testcases. Based on a patch by Simonas Kazlauskas! llvm-svn: 294702
* [X86] Remove duplicate call to getValueType. NFCI.Simon Pilgrim2017-02-091-4/+3
| | | | llvm-svn: 294640
* X86: Introduce relocImm-based patterns for cmp.Peter Collingbourne2017-02-093-0/+47
| | | | | | Differential Revision: https://reviews.llvm.org/D28690 llvm-svn: 294636
* X86: Teach X86InstrInfo::analyzeCompare to recognize compares of symbols.Peter Collingbourne2017-02-091-17/+22
| | | | | | | | | | | | | | | This requires that we communicate to X86InstrInfo::optimizeCompareInstr that the second operand is neither a register nor an immediate. The way we do that is by setting CmpMask to zero. Note that there were already instructions where the second operand was not a register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero for those instructions. This seems like a latent bug, but I was unable to trigger it. Differential Revision: https://reviews.llvm.org/D28621 llvm-svn: 294634
* Convert to for-range loop. NFCI.Simon Pilgrim2017-02-091-3/+3
| | | | llvm-svn: 294610
* [X86][MMX] Remove the (long time) unused MMX_PINSRW ISD opcode.Simon Pilgrim2017-02-092-2/+1
| | | | llvm-svn: 294596
* [X86][btver2] PR31902: Fix a crash in combineOrCmpEqZeroToCtlzSrl under fast ↵Pierre Gousseau2017-02-091-1/+1
| | | | | | | | | | math. In combineOrCmpEqZeroToCtlzSrl, replace "getConstantOperand == 0" by "isNullConstant" to account for floating point constants. Differential Revision: https://reviews.llvm.org/D29756 llvm-svn: 294588
* [X86][SSE] Attempt to break register dependencies during lowerBuildVectorSimon Pilgrim2017-02-091-11/+40
| | | | | | | | | | | | LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register. This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD. On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.) Differential Revision: https://reviews.llvm.org/D29720 llvm-svn: 294581
* [X86] Remove the HLE feature flag.Craig Topper2017-02-095-15/+5
| | | | | | We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line. llvm-svn: 294562
* [X86] Remove INVPCID and SMAP feature flags. They aren't currently used by ↵Craig Topper2017-02-092-14/+1
| | | | | | | | any instructions and not tested. If we implement intrinsics for their instructions in the future, the feature flags can be added back with proper testing. llvm-svn: 294561
* [X86] Clzero intrinsic and its addition under znver1Craig Topper2017-02-096-2/+48
| | | | | | | | | | | | | | | | | This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558
* [X86][SSE] Tidyup LowerBuildVectorv16i8 and LowerBuildVectorv8i16. NFCI.Simon Pilgrim2017-02-081-20/+18
| | | | | | Run clang-format and standardized variable names between functions. llvm-svn: 294456
* [X86] Add test for clflushopt intrinsic and only enable it to be selected if ↵Craig Topper2017-02-082-0/+3
| | | | | | the feature flag is set. llvm-svn: 294407
* [X86] Remove the VMFUNC feature flag. It was only partially implemented and ↵Craig Topper2017-02-082-6/+0
| | | | | | | | we have no support for codegening vmfunc instructions today. If that support ever gets added, the full feature flag support should come along with it. llvm-svn: 294406
* [X86] Remove PCOMMIT instruction support since Intel has deprecated this ↵Craig Topper2017-02-083-7/+0
| | | | | | | | instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction llvm-svn: 294405
* [X86] Disable conditional tail calls (PR31257)Hans Wennborg2017-02-075-145/+2
| | | | | | | | | They are currently modelled incorrectly (as calls, which clobber registers, confusing e.g. Machine Copy Propagation). Reverting until we figure out the proper solution. llvm-svn: 294348
* [x86] improve comments for SHRUNKBLEND node creation; NFCSanjay Patel2017-02-072-28/+27
| | | | llvm-svn: 294344
* [ImplicitNullCheck] Extend Implicit Null Check scope by using storesSanjoy Das2017-02-072-20/+24
| | | | | | | | | | | | | | | | | | | | | Summary: This change allows usage of store instruction for implicit null check. Memory Aliasing Analisys is not used and change conservatively supposes that any store and load may access the same memory. As a result re-ordering of store-store, store-load and load-store is prohibited. Patch by Serguei Katkov! Reviewers: reames, sanjoy Reviewed By: sanjoy Subscribers: atrick, llvm-commits Differential Revision: https://reviews.llvm.org/D29400 llvm-svn: 294338
* [x86] use range-for loops; NFCISanjay Patel2017-02-071-8/+6
| | | | llvm-svn: 294337
* [x86] use getSignBit() for clarity; NFCISanjay Patel2017-02-071-4/+3
| | | | llvm-svn: 294333
* [X86][SSE] Ensure that vector shift-by-immediate inputs are correctly ↵Simon Pilgrim2017-02-071-6/+10
| | | | | | | | | | bitcast to the result type vXi8/vXi64 vector shifts are often shifted as vYi16/vYi32 types but we weren't always remembering to bitcast the input. Tested with a new assert as we don't currently manipulate these shifts enough for test cases to catch them. llvm-svn: 294308
* [AVX-512] Add masked and unmasked shift by immediate instructions to load ↵Craig Topper2017-02-071-0/+81
| | | | | | folding tables. llvm-svn: 294287
* [AVX-512] Add masked shift instructions to load folding tables.Craig Topper2017-02-071-0/+108
| | | | | | This adds the masked versions of everything, but the shift by immediate instructions. llvm-svn: 294286
* [AVX-512] Add some of the shift instructions to the load folding tables.Craig Topper2017-02-071-0/+49
| | | | | | | | This includes unmasked forms of variable shift and shifting by the lower element of a register. Still need to do shift by immediate which was not foldable prior to avx512 and all the masked forms. llvm-svn: 294285
* [X86] Change the Defs list for VZEROALL/VZEROUPPER back to not including ↵Craig Topper2017-02-071-3/+2
| | | | | | YMM16-31. llvm-svn: 294277
* [X86] Fix some Include What You Use warnings; other minor fixes (NFC).Eugene Zelenko2017-02-065-19/+37
| | | | | | This is preparation to reduce MCExpr.h dependencies.(vlsj-clangbuild)[622] llvm-svn: 294246
* [X86][SSE] Combine shuffle nodes with multiple uses if all the users are ↵Simon Pilgrim2017-02-061-9/+19
| | | | | | | | | | | | | | being combined. Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines. We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree. This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list. Differential Revision: https://reviews.llvm.org/D29399 llvm-svn: 294183
* [X86][GlobalISel] Add limited ret lowering support to the IRTranslator.Igor Breger2017-02-062-14/+114
| | | | | | | | | | | | | | | | Summary: Support return lowering for i8/i16/i32/i64/float/double, vector type supported for 64bit platform only. Support argument lowering for float/double types. Reviewers: t.p.northover, zvi, ab, rovka Reviewed By: zvi Subscribers: dberris, kristof.beyls, delena, llvm-commits Differential Revision: https://reviews.llvm.org/D29261 llvm-svn: 294173
* [AVX-512] Add VPSLLDQ/VPSRLDQ to load folding tables.Craig Topper2017-02-061-0/+6
| | | | llvm-svn: 294170
* [AVX-512] Add VPABSB/D/Q/W to load folding tables.Craig Topper2017-02-061-0/+34
| | | | llvm-svn: 294169
* [AVX-512] Add VSHUFPS/PD to load folding tables.Craig Topper2017-02-061-0/+16
| | | | llvm-svn: 294168
* [AVX-512] Add VPMULLD/Q/W instructions to load folding tables.Craig Topper2017-02-061-0/+27
| | | | llvm-svn: 294164
* [AVX-512] Add all masked and unmasked versions of VPMULDQ and VPMULUDQ to ↵Craig Topper2017-02-051-0/+16
| | | | | | load folding tables. llvm-svn: 294163
* [X86][SSE] Replace insert_vector_elt(vec, -1, idx) with shuffleSimon Pilgrim2017-02-051-8/+12
| | | | | | Similar to what we already do for zero elt insertion, we can quickly rematerialize 'allbits' vectors so to avoid a unnecessary gpr value and insertion into a vector llvm-svn: 294162
* [AVX-512] Add scalar masked max/min intrinsic instructions to the load ↵Craig Topper2017-02-051-0/+12
| | | | | | folding tables. llvm-svn: 294153
* [AVX-512] Add scalar masked add/sub/mul/div intrinsic instructions to the ↵Craig Topper2017-02-051-0/+24
| | | | | | load folding tables. llvm-svn: 294152
* [AVX-512] Add masked scalar FMA intrinsics to ↵Craig Topper2017-02-051-0/+24
| | | | | | isNonFoldablePartialRegisterLoad to improve load folding of scalar loads. llvm-svn: 294151
* Revamp llvm::once_flag to be closer to std::once_flagKamil Rytarowski2017-02-051-1/+1
| | | | | | | | | | | | | | | | | | | Summary: Make this interface reusable similarly to std::call_once and std::once_flag interface. This makes porting LLDB to NetBSD easier as there was in the original approach a portable way to specify a non-static once_flag. With this change translating std::once_flag to llvm::once_flag is mechanical. Sponsored by <The NetBSD Foundation> Reviewers: mehdi_amini, labath, joerg Reviewed By: mehdi_amini Subscribers: emaste, clayborg Differential Revision: https://reviews.llvm.org/D29566 llvm-svn: 294143
* [X86] Fix printing of sha256rnds2 to include the implicit %xmm0 argument.Craig Topper2017-02-051-6/+10
| | | | llvm-svn: 294132
* [X86] Fix printing of blendvpd/blendvps/pblendvb to include the implicit ↵Craig Topper2017-02-051-14/+14
| | | | | | %xmm0 argument. This makes codegen output more obvious about the %xmm0 usage. llvm-svn: 294131
* [X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly ↵Craig Topper2017-02-051-25/+13
| | | | | | | | | | | | creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later. Similar was already done for several other shuffles in this function. The test changes are because the old code used explicity zeroing for elements that could have been undef. While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us. llvm-svn: 294130
* [X86] Add support for folding (insert_subvector vec1, (extract_subvector ↵Craig Topper2017-02-041-2/+25
| | | | | | vec2, idx1), idx1) -> (blendi vec2, vec1). llvm-svn: 294112
* [X86] Simplify the code that turns INSERT_SUBVECTOR into BLENDI. NFCICraig Topper2017-02-041-19/+8
| | | | llvm-svn: 294111
OpenPOWER on IntegriCloud