summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86][AVX512] Adding new LLVM TableGen backend which generates the EVEX2VEX ↵Ayman Musa2017-03-073-1193/+8
| | | | | | | | | | | | compressing tables. X86EvexToVex machine instruction pass compresses EVEX encoded instructions by replacing them with their identical VEX encoded instructions when possible. It uses manually supported 2 large tables that map the EVEX instructions to their VEX ideticals. This TableGen backend replaces the tables by automatically generating them. Differential Revision: https://reviews.llvm.org/D30451 llvm-svn: 297127
* [X86][AVX512] Add missing entries to EVEX2VEX tablesAyman Musa2017-03-071-28/+61
| | | | | | | | evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries. Differential Revision: https://reviews.llvm.org/D30501 llvm-svn: 297126
* GlobalISel: restrict G_EXTRACT instruction to just one operand.Tim Northover2017-03-062-10/+18
| | | | | | | A bit more painful than G_INSERT because it was more widely used, but this should simplify the handling of extract operations in most locations. llvm-svn: 297100
* [Outliner] Fixed Asan bot failure in r296418Jessica Paquette2017-03-062-0/+99
| | | | | | | | Fixed the asan bot failure which led to the last commit of the outliner being reverted. The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector is no longer initialized using reserve but just a standard constructor. llvm-svn: 297081
* [X86] Fix arg copy elision for illegal typesReid Kleckner2017-03-061-37/+33
| | | | | | | | | | | Use the store size of the argument type, which will be a byte-sized quantity, rather than dividing the size in bits by 8. Fixes PR32136 and re-enables copy elision from i64 arguments. Reverts the workaround in from r296950. llvm-svn: 297045
* [X86] Silence GCC enum compare warning.Benjamin Kramer2017-03-051-2/+2
| | | | | | | | X86ISelLowering.cpp:26506:36: error: enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType' [-Werror=enum-compare] llvm-svn: 296986
* [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG opsSimon Pilgrim2017-03-053-93/+120
| | | | | | | | | | | | As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985
* [x86] don't require a zext when forming ADC/SBBSanjay Patel2017-03-041-24/+29
| | | | | | | | | | | | | The larger goal is to move the ADC/SBB transforms currently in combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're creating ADC/SBB in lots of places where we shouldn't. This was intended to be an NFC change, but avx-512 has something strange going on. It doesn't seem like any of the affected tests should really be using SET+TEST or ADC; a simple ADD could replace several instructions. But that's another bug... llvm-svn: 296978
* [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)Sanjay Patel2017-03-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977
* [X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targetsSimon Pilgrim2017-03-041-5/+0
| | | | | | | | Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks. Differential Revision: https://reviews.llvm.org/D30213 llvm-svn: 296966
* X86ISelLowering: Only perform copy elision on legal types.Matthias Braun2017-03-041-33/+37
| | | | | | | | | This fixes cases where i1 types were not properly legalized yet and lead to the creating of 0-sized stack slots. This fixes http://llvm.org/PR32136 llvm-svn: 296950
* [x86] check for commuted add pattern to find ADC/SBBSanjay Patel2017-03-041-4/+11
| | | | llvm-svn: 296933
* [x86] refactor combineAddOrSubToADCOrSBB(); NFCISanjay Patel2017-03-031-21/+25
| | | | | | | | | | | | The comments were wrong, and this is not an obvious transform. This hopefully makes it clearer that we're missing the commuted patterns for adds. It's less clear that this is actually a good transform for all micro-arch. This is prep work for trying to clean up the current adc/sbb codegen because it's definitely not happening optimally. llvm-svn: 296918
* [x86] clean up materializeSBB(); NFCISanjay Patel2017-03-031-20/+14
| | | | | | This is producing SBB where it is obviously not necessary, so it needs to be limited. llvm-svn: 296894
* [x86] fix formatting; NFCSanjay Patel2017-03-031-3/+2
| | | | llvm-svn: 296875
* Use APInt::getHighBitsSet instead of APInt::getBitsSet for upper bit mask ↵Simon Pilgrim2017-03-031-1/+1
| | | | | | creation llvm-svn: 296874
* [X86] Generate VZEROUPPER for Skylake-avx512.Amjad Aboud2017-03-034-65/+77
| | | | | | | | VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859
* [GlobalISel][X86] Support float/double and vector types.Igor Breger2017-03-037-32/+306
| | | | | | | | | | | | | | Summary: [GlobalISel][X86] Add support for f32/f64 and vector types in RegisterBank and InstructionSelector. Reviewers: delena, zvi Reviewed By: zvi Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30533 llvm-svn: 296856
* [X86][MMX] Fixed i32 extraction on 32-bit targetsSimon Pilgrim2017-03-021-0/+10
| | | | | | MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal llvm-svn: 296782
* Elide argument copies during instruction selectionReid Kleckner2017-03-011-20/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Avoids tons of prologue boilerplate when arguments are passed in memory and left in memory. This can happen in a debug build or in a release build when an argument alloca is escaped. This will dramatically affect the code size of x86 debug builds, because X86 fast isel doesn't handle arguments passed in memory at all. It only handles the x86_64 case of up to 6 basic register parameters. This is implemented by analyzing the entry block before ISel to identify copy elision candidates. A copy elision candidate is an argument that is used to fully initialize an alloca before any other possibly escaping uses of that alloca. If an argument is a copy elision candidate, we set a flag on the InputArg. If the the target generates loads from a fixed stack object that matches the size and alignment requirements of the alloca, the SelectionDAG builder will delete the stack object created for the alloca and replace it with the fixed stack object. The load is left behind to satisfy any remaining uses of the argument value. The store is now dead and is therefore elided. The fixed stack object is also marked as mutable, as it may now be modified by the user, and it would be invalid to rematerialize the initial load from it. Supersedes D28388 Fixes PR26328 Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D29668 llvm-svn: 296683
* [X86] Fix creating vreg def after use. Ayman Musa2017-03-011-5/+10
| | | | llvm-svn: 296601
* Revert r296474 - [globalisel] Change LLT constructor string into an LLT ↵Daniel Sanders2017-02-281-3/+2
| | | | | | | | subclass that knows how to generate it. There's a circular dependency that's only revealed when LLVM_ENABLE_MODULES=1. llvm-svn: 296478
* [globalisel] Change LLT constructor string into an LLT subclass that knows ↵Daniel Sanders2017-02-281-2/+3
| | | | | | | | | | | | | | | | | | how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 296474
* Revert "Add MIR-level outlining pass"Matthias Braun2017-02-282-101/+0
| | | | | | | | Revert Machine Outliner for now, as it breaks the asan bot. This reverts commit r296418. llvm-svn: 296426
* Add MIR-level outlining passMatthias Braun2017-02-282-0/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | This is a patch for the outliner described in the RFC at: http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html The outliner is a code-size reduction pass which works by finding repeated sequences of instructions in a program, and replacing them with calls to functions. This is useful to people working in low-memory environments, where sacrificing performance for space is acceptable. This adds an interprocedural outliner directly before printing assembly. For reference on how this would work, this patch also includes X86 target hooks and an X86 test. The outliner is run like so: clang -mno-red-zone -mllvm -enable-machine-outliner file.c Patch by Jessica Paquette<jpaquette@apple.com>! rdar://29166825 Differential Revision: https://reviews.llvm.org/D26872 llvm-svn: 296418
* [X86][SSE] Attempt to extract vector elements through target shufflesSimon Pilgrim2017-02-271-0/+97
| | | | | | | | | | DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered. This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it. Differential Revision: https://reviews.llvm.org/D30176 llvm-svn: 296381
* [X86] Use APInt instead of SmallBitVector tracking undef elements from ↵Craig Topper2017-02-271-25/+25
| | | | | | | | | | | | | | | | | | | getTargetConstantBitsFromNode and getConstVector. Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30392 llvm-svn: 296355
* [X86] Use APInt instead of SmallBitVector for tracking Zeroable elements in ↵Craig Topper2017-02-271-63/+57
| | | | | | | | | | | | | | | | | | | shuffle lowering Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30390 llvm-svn: 296354
* [X86] Fix SmallVector sizes in constant pool shuffle decoding to avoid heap ↵Craig Topper2017-02-271-5/+5
| | | | | | | | | | allocation Some of the vectors are under sized to avoid heap allocation. In one case the vector was oversized. Differential Revision: https://reviews.llvm.org/D30387 llvm-svn: 296353
* [X86] Use APInt instead of SmallBitVector for tracking undef elements in ↵Craig Topper2017-02-271-10/+10
| | | | | | | | | | | | | | | | | | | constant pool shuffle decoding Summary: SmallBitVector uses a malloc for more than 58 bits on a 64-bit target and more than 27 bits on a 32-bit target. Some of the vector types we deal with here use more than those number of elements and therefore cause a malloc. APInt on the other hand supports up to 64 bits without a malloc. That's the maximum number of bits we need here so we can avoid a malloc for all cases by using APInt. This will incur a minor increase in stack usage due to APInt storing the bit count separately from the data bits unlike SmallBitVector, but that should be ok. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30386 llvm-svn: 296352
* [X86] Check for less than 0 rather than explicit compare with -1. NFCCraig Topper2017-02-271-2/+3
| | | | llvm-svn: 296321
* [X86] Fix execution domain for cmpss/sd instructions.Craig Topper2017-02-261-0/+8
| | | | llvm-svn: 296293
* [AVX-512] Fix execution domain for scalar commutable min/max instructions.Craig Topper2017-02-261-1/+1
| | | | llvm-svn: 296292
* [AVX-512] Fix execution domain for vmovhpd/lpd/hps/lps.Craig Topper2017-02-261-0/+1
| | | | llvm-svn: 296291
* [AVX-512] Fix the execution domain for AVX-512 integer broadcasts.Craig Topper2017-02-261-0/+1
| | | | llvm-svn: 296290
* [AVX-512] Disable the redundant patterns in the VPBROADCASTBr_Alt and ↵Craig Topper2017-02-261-14/+16
| | | | | | VPBROADCASTWr_Alt instructions. NFC llvm-svn: 296289
* [AVX-512] Fix execution domain for VPMADD52 instructions.Craig Topper2017-02-261-0/+2
| | | | llvm-svn: 296288
* [AVX-512] Fix the execution domain for VSCALEF instructions.Craig Topper2017-02-261-0/+4
| | | | llvm-svn: 296286
* [AVX-512] Fix execution domain of scalar VRANGE/REDUCE/GETMANT with sae.Craig Topper2017-02-261-0/+1
| | | | llvm-svn: 296285
* [X86] Fix the execution domain for scalar SQRT intrinsic instruction.Craig Topper2017-02-261-2/+2
| | | | llvm-svn: 296284
* [APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied)Simon Pilgrim2017-02-252-8/+7
| | | | | | | | | | | | | | | | The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296272
* [AVX-512] Fix the execution domain for scalar FMA instructions.Craig Topper2017-02-251-1/+2
| | | | llvm-svn: 296271
* [AVX-512] Fix the execution domain on some instructions.Craig Topper2017-02-251-4/+13
| | | | llvm-svn: 296270
* [AVX-512] Remove unnecessary masked versions of VCVTSS2SD and VCVTSD2SS ↵Craig Topper2017-02-251-24/+11
| | | | | | using the scalar register class. We only have patterns for the masked intrinsics. llvm-svn: 296264
* Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt ↵Simon Pilgrim2017-02-242-7/+8
| | | | | | | | | | | | | | | | | | subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296147
* [APInt] Add APInt::extractBits() method to extract APInt subrangeSimon Pilgrim2017-02-242-8/+7
| | | | | | | | | | | | | | | | The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296141
* [X86][SSE] Target shuffle combine can try to combine up to 16 vectorsSimon Pilgrim2017-02-241-6/+6
| | | | | | Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth). llvm-svn: 296130
* [x86] use DAG.getAllOnesConstant(); NFCISanjay Patel2017-02-241-18/+11
| | | | llvm-svn: 296128
* [APInt] Add APInt::setBits() method to set all bits in rangeSimon Pilgrim2017-02-242-9/+6
| | | | | | | | | | | | | | | | | | The current pattern for setting bits in range is typically: Mask |= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable. This is one of the key compile time issues identified in PR32037. This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible. I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial. Differential Revision: https://reviews.llvm.org/D30265 llvm-svn: 296102
* [AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD ↵Craig Topper2017-02-245-26/+38
| | | | | | opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC llvm-svn: 296094
OpenPOWER on IntegriCloud