summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86ISelLowering.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* Spelling mistakes in comments. NFCI.Simon Pilgrim2017-03-291-5/+5
| | | | llvm-svn: 299000
* [X86][AVX2] Prevent unary interleaving patterns from calling ↵Simon Pilgrim2017-03-291-3/+4
| | | | | | lowerVectorShuffleAsSplitOrBlend (PR32453) llvm-svn: 298993
* [X86][MMX] Match MMX fp_to_sint conversions from XMM registersSimon Pilgrim2017-03-281-4/+13
| | | | | | | | | | We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack). This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc. Differential Revision: https://reviews.llvm.org/D30868 llvm-svn: 298943
* [x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equalitySanjay Patel2017-03-281-1/+5
| | | | | | | Follow-up to: https://reviews.llvm.org/rL298775 llvm-svn: 298933
* [X86][AVX2] Add support for combining v16i16 shuffles to VPBLENDWSimon Pilgrim2017-03-281-28/+47
| | | | llvm-svn: 298929
* [X86][SSE] Refactored shuffle BLEND combining to make future 16i16 support ↵Simon Pilgrim2017-03-281-34/+33
| | | | | | | | easier. NFCI. Call the matchVectorShuffleAsBlend test as early as possible. llvm-svn: 298925
* Fix signed/unsigned comparison warningSimon Pilgrim2017-03-281-2/+2
| | | | llvm-svn: 298917
* [X86][SSE] Begin merging vector shuffle to BLEND for lowering and combining.Simon Pilgrim2017-03-281-70/+82
| | | | | | Split off matchVectorShuffleAsBlend from lowerVectorShuffleAsBlend for reuse in combining. llvm-svn: 298914
* [X86][SSE] Set second operand to undef instead of first operand in unary ↵Simon Pilgrim2017-03-281-1/+2
| | | | | | | | shuffle combines. Copy isn't necessary after the matchVectorShuffleWithUNPCK refactor and undef value will make some future undef/zero handling easier. llvm-svn: 298910
* Strip trailing whitespaceSimon Pilgrim2017-03-281-1/+1
| | | | llvm-svn: 298909
* [X86][AVX2] bugzilla bug 21281 Performance regression in vector interleave ↵Gadi Haber2017-03-271-0/+33
| | | | | | | | | | | | | | | | | | | | | in AVX2 This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1. The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2. In this case using vector unpack instructions is more efficient. Reviewers: zvi delena igorb craig.topper guyblank eladcohen m_zuckerman aymanmus RKSimon llvm-svn: 298840
* [X86][SSE] Add computeKnownBitsForTargetNode support for (V)PSLL/(V)PSRL ↵Simon Pilgrim2017-03-261-1/+26
| | | | | | instructions llvm-svn: 298806
* [X86] Pull out repeated ScalarValueSizeInBits code. NFCI.Simon Pilgrim2017-03-251-6/+4
| | | | llvm-svn: 298783
* [X86][SSE] Combine (VSRLI (VSRAI X, Y), (NumSignBits-1)) -> (VSRLI X, ↵Simon Pilgrim2017-03-251-1/+9
| | | | | | | | | | (NumSignBits-1)) Part 3 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298782
* [X86][SSE] Added ComputeNumSignBitsForTargetNode support for (V)PSRAISimon Pilgrim2017-03-251-0/+9
| | | | | | | | Part 2 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298780
* [X86][SSE] Generalised CMP+AND1 combine to ZERO/ALLBITS+MASKSimon Pilgrim2017-03-251-26/+22
| | | | | | | | | | | | Patch to generalize combinePCMPAnd1 (for handling SETCC + ZEXT cases) to work for any input that has zero/all bits set masked with an 'all low bits' mask. Replaced the implicit assumption of shift availability with a call to SupportedVectorShiftWithImm. Part 1 of 3. Differential Revision: https://reviews.llvm.org/D31347 llvm-svn: 298779
* [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equalitySanjay Patel2017-03-251-0/+16
| | | | | | | | | This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, we can replace memcmp calls with inline code that is both smaller and faster. Differential Revision: https://reviews.llvm.org/D31290 llvm-svn: 298775
* [X86][SSE] Generalised lowerTruncate by PACKSS to work with any 'zero/all ↵Simon Pilgrim2017-03-241-17/+19
| | | | | | | | | | bits' result, not just comparisons. Added vector compare opcodes to X86TargetLowering::ComputeNumSignBitsForTargetNode Covered by existing tests added for D22814. llvm-svn: 298704
* Remove the subtarget argument from LowerFP_TO_INT since there's oneEric Christopher2017-03-231-7/+3
| | | | | | stored on X86TargetLowering. llvm-svn: 298628
* Remove unused X86Subtarget argument from getOnesVector.Eric Christopher2017-03-231-7/+6
| | | | llvm-svn: 298627
* [X86][SSE] Extract elements from narrower shuffle masks.Simon Pilgrim2017-03-231-14/+21
| | | | | | Add support for widening narrow shuffle masks so we can directly extract from the relevant input vector of the shuffle. llvm-svn: 298616
* [X86][SSE] Tidyup canWidenShuffleElements. NFCI.Simon Pilgrim2017-03-231-12/+13
| | | | | | Pull out mask elements at the start, allowing us to make the widening pattern matching more readable. llvm-svn: 298594
* [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instructionMichael Zuckerman2017-03-231-11/+5
| | | | | | | | | | | Up until now, vpmovm2 instruction described its destination operand size by the source operand size. This patch adds new pattern for the vpmovm2 instruction. The node describes new expansion of the destination (from {128|256} to 512). Differential Revision: https://reviews.llvm.org/D30654 llvm-svn: 298586
* Clean up some Subtarget uses and casts in the X86 backend, removing ↵Eric Christopher2017-03-221-11/+5
| | | | | | unnecessary work or calls. llvm-svn: 298555
* Rename AttributeSet to AttributeListReid Kleckner2017-03-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393
* [x86] use PMOVMSK for vector-sized equality comparisonsSanjay Patel2017-03-211-0/+44
| | | | | | | | | | We could do better by splitting any oversized type into whatever vector size the target supports, but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte structs, so I think we can wire that up with a TLI hook that feeds into this. Differential Revision: https://reviews.llvm.org/D31156 llvm-svn: 298376
* [Fuchsia] Use %gs for ABI slots under -mcmodel=kernelEvgeniy Stepanov2017-03-201-2/+2
| | | | | | | | | | | Make x86_64-fuchsia targets under -mcmodel=kernel use %gs rather than %fs to access ABI slots for stack-protector and safe-stack Patch by Roland McGrath. Differential Revision: https://reviews.llvm.org/D30870 llvm-svn: 298302
* [AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time ↵Craig Topper2017-03-191-0/+36
| | | | | | | | | | | | | | | | | | | instead of isel Summary: Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust. This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely. Reviewers: zvi, delena, RKSimon, igorb Reviewed By: igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31056 llvm-svn: 298228
* [MIR] Support Customed Register Mask and CSRsOren Ben Simhon2017-03-191-2/+0
| | | | | | | | | | | | | The MIR printer dumps a string that describe the register mask of a function. A static predefined list of register masks matches a static list of strings. However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails. This patch adds support to custom register mask printing and dumping. Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic. As such this data needs to be dumped and parsed back to the Machine Register Info. Differential Revision: https://reviews.llvm.org/D30971 llvm-svn: 298207
* ExecutionDepsFix: Normalize names; NFCMatthias Braun2017-03-181-2/+2
| | | | | | | Normalize ExeDepsFix, execution-fix, ExecutionDependencyFix and ExecutionDepsFix to the last one. llvm-svn: 298183
* Make library calls sensitive to regparm module flag (Fixes PR3997).Nirav Dave2017-03-181-7/+41
| | | | | | | | | | Reviewers: mkuper, rnk Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D27050 llvm-svn: 298179
* Capitalize ArgListEntry fields. NFC.Nirav Dave2017-03-181-4/+4
| | | | llvm-svn: 298178
* [x86] clean up setcc with negated operand transform and add missing test; NFCISanjay Patel2017-03-171-14/+15
| | | | llvm-svn: 298118
* [x86] avoid adc/sbb assert when both sides of add are zexted (PR32316)Sanjay Patel2017-03-171-2/+6
| | | | | | | | | | | | | | As noted in the comment, we might want to account for this case, but I didn't look at what that would mean for the asm. I'm also not sure why this only reproduces with avx512, but I'm putting a conservative fix in for now to avoid the crash. Also, if both sides of an add are zexted, shouldn't we shrink that add? https://bugs.llvm.org/show_bug.cgi?id=32316 llvm-svn: 298107
* [X86][SSE] Fixed shuffle MOVSS/MOVSD combining of all zeroable inputsSimon Pilgrim2017-03-151-4/+5
| | | | | | | | Turns out it can happen, so the assertion was too harsh Found during fuzz testing llvm-svn: 297833
* [SelectionDAG] Add a signed integer absolute ISD nodeSimon Pilgrim2017-03-141-1/+28
| | | | | | | | | | | | Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780
* Disable Callee Saved RegistersOren Ben Simhon2017-03-141-6/+61
| | | | | | | | | | | | | | Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller. Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list. The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee. The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee. Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span). The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments. The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC. Differential Revision: https://reviews.llvm.org/D28566 llvm-svn: 297715
* [X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input ↵Craig Topper2017-03-131-0/+32
| | | | | | | | source optimizations to break execution dependencies. For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2. llvm-svn: 297652
* [AVX-512] If gather mask is all ones, force the input to a zero vector.Craig Topper2017-03-131-1/+4
| | | | | | | | We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too. Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today. llvm-svn: 297651
* [AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics.Craig Topper2017-03-121-2/+3
| | | | | | The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly. llvm-svn: 297591
* [x86] don't blindly transform SETB into SBBSanjay Patel2017-03-121-39/+50
| | | | | | | | | | | | | | | | | | | | I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. This happens because we were transforming any 'setb' - even when we only wanted a single-bit result. This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that existing behavior in this patch. Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files where this transform still fires. The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D30611 llvm-svn: 297586
* [X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41)Simon Pilgrim2017-03-111-1/+27
| | | | | | | | | | | | Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte. This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte. Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate? Differential Revision: https://reviews.llvm.org/D29841 llvm-svn: 297568
* [X86] Remove unnecessary commented out code. NFCCraig Topper2017-03-111-1/+0
| | | | llvm-svn: 297563
* [X86] Fix Wunused-lambda-capture warningSimon Pilgrim2017-03-101-2/+2
| | | | llvm-svn: 297521
* [APInt] Add APInt::insertBits() method to insert an APInt into a larger APIntSimon Pilgrim2017-03-101-4/+3
| | | | | | | | | | | | We currently have to insert bits via a temporary variable of the same size as the target with various shift/mask stages, resulting in further temporary variables, all of which require the allocation of memory for large APInts (MaskSizeInBits > 64). This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::insertBits() helper method which avoids the temporary memory allocation and masks/inserts the raw bits directly into the target. Differential Revision: https://reviews.llvm.org/D30780 llvm-svn: 297458
* [X86][SSE] combineX86ShufflesRecursively can handle shuffle masks up to 64 ↵Simon Pilgrim2017-03-081-8/+7
| | | | | | | | elements wide By defining the mask types as SmallVector<int, 16> we were causing a lot of unnecessary heap usage. llvm-svn: 297267
* [X86] Add option to specify preferable loop alignmentSanjoy Das2017-03-071-1/+9
| | | | | | | | | | | | | | | | | | | | | Summary: Loop alignment can cause a significant change of the perfromance for short loops. To be able to evaluate the impact of loop alignment this change introduces the new option x86-experimental-pref-loop-alignment. The alignment will be 2^Value bytes, the default value is 4. Patch by Serguei Katkov! Reviewers: craig.topper Reviewed By: craig.topper Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D30391 llvm-svn: 297178
* [X86] Fix arg copy elision for illegal typesReid Kleckner2017-03-061-37/+33
| | | | | | | | | | | Use the store size of the argument type, which will be a byte-sized quantity, rather than dividing the size in bits by 8. Fixes PR32136 and re-enables copy elision from i64 arguments. Reverts the workaround in from r296950. llvm-svn: 297045
* [X86] Silence GCC enum compare warning.Benjamin Kramer2017-03-051-2/+2
| | | | | | | | X86ISelLowering.cpp:26506:36: error: enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType' [-Werror=enum-compare] llvm-svn: 296986
* [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG opsSimon Pilgrim2017-03-051-34/+61
| | | | | | | | | | | | As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985
OpenPOWER on IntegriCloud