summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [x86] don't use a shuffle when a vselect will do; NFCISanjay Patel2016-03-101-16/+5
| | | | | | | | Looking at the IR definition of a masked load made me realize there was no reason to use a shuffle here, so we don't need to convert the format of the mask at all. llvm-svn: 263167
* [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ↵Simon Pilgrim2016-03-101-9/+24
| | | | | | | | | | | | ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159
* [X86] Correctly select registers to pop into for x86_64Michael Kuperstein2016-03-101-1/+2
| | | | | | | | | | | | | | When trying to replace an add to esp with pops, we need to choose dead registers to pop into. Registers clobbered by the call and not imp-def'd by it should be safe. Except that it's not enough to check the register itself isn't defined, we also need to make sure no overlapping registers are defined either. This fixes PR26711. Differential Revision: http://reviews.llvm.org/D18029 llvm-svn: 263139
* Unified the handling of returns in the X87 stackifier so that the stackifierDavid L Kreitzer2016-03-101-90/+93
| | | | | | | | runs successfully on routines containing IRETs. This fixes PR26410. Differential Revision: http://reviews.llvm.org/D17643 llvm-svn: 263120
* AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512)Elena Demikhovsky2016-03-101-23/+27
| | | | | | | | (failed on instruction selection phase) Differential Revision: http://reviews.llvm.org/D17924 llvm-svn: 263111
* [X86][AVX] Improve target shuffle combining of BLEND+zeroSimon Pilgrim2016-03-101-1/+2
| | | | | | | | The BLEND+zero combine was failing to combine equivalent BLEND masks. Follow up to D17483 and D17858 llvm-svn: 263105
* [X86][SSE] Basic combining of unary target shuffles of binary target shuffles.Simon Pilgrim2016-03-101-12/+18
| | | | | | | | | | | | This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask. This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining. There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for. Differential Revision: http://reviews.llvm.org/D17858 llvm-svn: 263102
* AVX-512: Fixed a bug in shuffle for v64i8 typeElena Demikhovsky2016-03-101-0/+2
| | | | | | | | Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on". Differential Revision: http://reviews.llvm.org/D17994 llvm-svn: 263097
* [x86] fix cost model inaccuracy for vector memory opsSanjay Patel2016-03-091-4/+4
| | | | | | | | | | | The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge. Differential Revision: http://reviews.llvm.org/D18000 llvm-svn: 263069
* [x86, AVX] optimize masked loads with constant masksSanjay Patel2016-03-091-2/+44
| | | | | | | | Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper. Differential Revision: http://reviews.llvm.org/D17899 llvm-svn: 263067
* [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.Chad Rosier2016-03-092-2/+2
| | | | | | http://reviews.llvm.org/D17967 llvm-svn: 263021
* Revert r262759 and r262760.Quentin Colombet2016-03-081-9/+0
| | | | | | | | The fix consisting in using the library call for atomic compare and swap when the instruction is not safe to use may be incorrect. Indeed the library call may not exist on all platform. In other words, we need a better fix! llvm-svn: 262943
* Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ↵Hans Wennborg2016-03-081-24/+9
| | | | | | | | ZERO_EXTEND_VECTOR_INREG" This caused PR26870. llvm-svn: 262935
* AVX512: Add extract_subvector patterns v8i1->v4i1 , v4i1->v2i1.Igor Breger2016-03-081-0/+8
| | | | | | Differential Revision: http://reviews.llvm.org/D17953 llvm-svn: 262929
* [ms-inline-asm][AVX512] Add ability to use k registers in MS inline asm + ↵Marina Yatsina2016-03-071-1/+11
| | | | | | | | | | | | | | | | | | | | | | fix bag with curly braces Until now curly braces could only be used in MS inline assembly to mark block start/end. All curly braces were removed completely at a very early stage. This approach caused bugs like: "m{o}v eax, ebx" turned into "mov eax, ebx" without any error. In addition, AVX-512 added special operands (e.g., k registers), which are also surrounded by curly braces that mark them as such. Now, we need to keep the curly braces and identify at a later stage if they are marking block start/end (if so, ignore them), or surrounding special AVX-512 operands (if so, parse them as such). This patch fixes the bug described above and enables the use of AVX-512 special operands. This commit is the the llvm part of the patch. The clang part of the review is: http://reviews.llvm.org/D17766 The llvm part of the review is: http://reviews.llvm.org/D17767 Differential Revision: http://reviews.llvm.org/D17767 llvm-svn: 262843
* [X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target ↵Simon Pilgrim2016-03-063-11/+9
| | | | | | | | | | | | | | shuffle combining. Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles. This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed. Removed non-constant pool mask decode path as we have no way of testing it right now. Differential Revision: http://reviews.llvm.org/D17916 llvm-svn: 262809
* AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element ↵Igor Breger2016-03-062-37/+56
| | | | | | | | types on SKX Differential Revision: http://reviews.llvm.org/D17913 llvm-svn: 262803
* [X86] Use high bits of return value from getEncoding instead of predicate ↵Craig Topper2016-03-061-162/+101
| | | | | | functions to populate the REX and VEX prefix bits that extend register encodings. NFC llvm-svn: 262800
* [X86] Remove unnecessary masking. The assert above it already guaranteed it. NFCCraig Topper2016-03-061-2/+0
| | | | llvm-svn: 262799
* [X86] Use uint8_t instead of unsigned char as it shortens the code and more ↵Craig Topper2016-03-061-27/+26
| | | | | | explicitly reflects the desired size. llvm-svn: 262798
* AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use ↵Igor Breger2016-03-062-87/+80
| | | | | | | | kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector). Differential Revision: http://reviews.llvm.org/D17763 llvm-svn: 262797
* [X86][AVX] Improved VPERMILPS variable shuffle mask decoding.Simon Pilgrim2016-03-053-1/+43
| | | | | | | | | | Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool. Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization Followup to D17681 llvm-svn: 262784
* [X86] AMD Bobcat CPU (btver1) doesn't support XSAVE Simon Pilgrim2016-03-051-1/+0
| | | | | | | | btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't support XSAVE. Differential Revision: http://reviews.llvm.org/D17683 llvm-svn: 262782
* [X86] Fix the lowering of setjmp intrinsic on i386.Quentin Colombet2016-03-051-0/+10
| | | | | | | | | | When the lowering of the setjmp intrinsic requires a global base pointer to be set, make sure such pointer gets defined by the CGBR pass. This fixes PR26742. llvm-svn: 262762
* [X86] Do not use cmpxchgXXb when we need the base pointer (RBX).Quentin Colombet2016-03-041-0/+9
| | | | | | | | | | | | cmpxchgXXb uses RBX as one of its implicit argument. I.e., when we use that instruction we need to clobber RBX. This is generally fine, expect when RBX is a reserved register because in that case, the register allocator will not track its value and will not save and restore it when interferences occur. rdar://problem/24851412 llvm-svn: 262759
* Fix build breakageDavid Majnemer2016-03-041-4/+5
| | | | llvm-svn: 262756
* [X86] Support cleaning more than 2**16 bytes of stackDavid Majnemer2016-03-046-8/+35
| | | | | | | | | | | | | | | | | | | The x86 ret instruction has a 16 bit immediate indicating how many bytes to pop off of the stack beyond the return address. There is a problem when extremely large structs are passed by value: we might not be able to fit the number of bytes to pop into the return instruction. To fix this, expand RET_FLAG a little later and use a special sequence to clean the stack: pop %ecx ; return address is now in %ecx add $n, %esp ; clean the stack push %ecx ; bring the return address back on the stack ret ; pop the return address and jmp to it's value llvm-svn: 262755
* Make headers self-contained again.Benjamin Kramer2016-03-041-0/+1
| | | | llvm-svn: 262702
* [X86][AVX512BW] Fixed 512-bit PSHUFB shuffle mask decode and added combine test.Simon Pilgrim2016-03-031-3/+3
| | | | | | PSHUFB decoder was assuming that input was 128 or 256-bit vector only. llvm-svn: 262661
* [X86][AVX] Better support for the variable mask form of VPERMILPD/VPERMILPSSimon Pilgrim2016-03-033-19/+35
| | | | | | | | | | The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic. This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle. Differential Revision: http://reviews.llvm.org/D17681 llvm-svn: 262635
* [X86] Tidied up 256-bit -> 2 x 128-bit vector shift extraction.Simon Pilgrim2016-03-031-14/+2
| | | | | | lowerShift was manually splitting BUILD_VECTOR cases when it could just call Extract128BitVector which does this anyway. llvm-svn: 262633
* [X86] Pulled out repeated code testing for constant vector shift amount. NFCI.Simon Pilgrim2016-03-031-8/+6
| | | | llvm-svn: 262631
* MCU target has its own ABI, however X86 interrupt handler calling convention ↵Amjad Aboud2016-03-031-1/+3
| | | | | | | | | | overrides this ABI. Fixed the ordering to check first for X86 interrupt handler then for MCU target. Differential Revision: http://reviews.llvm.org/D17801 llvm-svn: 262628
* [X86] Don't assume that shuffle non-mask operands starts at #0.Ahmed Bougacha2016-03-032-32/+68
| | | | | | | | | | | | | | | | | | | | | | | | | That's not the case for VPERMV/VPERMV3, which cover all possible combinations (the C intrinsics use a different order; the AVX vs AVX512 intrinsics are different still). Since: r246981 AVX-512: Lowering for 512-bit vector shuffles. VPERMV is recognized in getTargetShuffleMask. This breaks assumptions in most callers, as they expect the non-mask operands to start at index 0. VPERMV has the mask as operand #0; VPERMV3 has it in the middle. Instead of the faulty assumption, have getTargetShuffleMask return its operands as well. One alternative we considered was to change the operand order of VPERMV, but we agreed to stick to the instruction order, as there are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular). Differential Revision: http://reviews.llvm.org/D17041 llvm-svn: 262627
* AVX512: Combine AND + TESTM instructions .Igor Breger2016-03-033-8/+25
| | | | | | Differential Revision: http://reviews.llvm.org/D17844 llvm-svn: 262621
* [X86][SSE] Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREGSimon Pilgrim2016-03-031-9/+24
| | | | | | | | Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 262599
* [LLVM][AVX512] PSRLWI Chnage imm8 to intMichael Zuckerman2016-03-031-3/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D17753 llvm-svn: 262592
* [X86] Enable forwarding bool arguments in tail calls (PR26305)Hans Wennborg2016-03-031-0/+20
| | | | | | | | | The code was previously not able to track a boolean argument at a call site back to the formal argument of the caller. Differential Revision: http://reviews.llvm.org/D17786 llvm-svn: 262575
* [X86] Don't give catch objects a displacement of zeroDavid Majnemer2016-03-033-1/+23
| | | | | | | | | | | | | | | | | | Catch objects with a displacement of zero do not initialize a catch object. The displacement is relative to %rsp at the end of the function's prologue for x86_64 targets. If we place an object at the top-of-stack, we will end up wit a displacement of zero resulting in our catch object remaining uninitialized. Address this by creating our catch objects as fixed objects. We will ensure that the UnwindHelp object is created after the catch objects so that no catch object will have a displacement of zero. Differential Revision: http://reviews.llvm.org/D17823 llvm-svn: 262546
* Revert "[X86] Elide references to _chkstk for dynamic allocas"Reid Kleckner2016-03-021-29/+9
| | | | | | | | | | | | | | | This reverts commit r262370. It turns out there is code out there that does sequences of allocas greater than 4K: http://crbug.com/591404 The goal of this change was to improve the code size of inalloca call sequences, but we got tangled up in the mess of dynamic allocas. Instead, we should come back later with a separate MI pass that uses dominance to optimize the full sequence. This should also be able to remove the often unneeded stacksave/stackrestore pairs around the call. llvm-svn: 262505
* [LLVM][AVX512]PSRAWI Change imm8 to int.Michael Zuckerman2016-03-021-9/+9
| | | | | | Differential Revision: http://reviews.llvm.org/D17705 llvm-svn: 262480
* [X86][SSE] Lower 128-bit MOVDDUP with existing VBROADCAST mechanismsSimon Pilgrim2016-03-021-39/+51
| | | | | | | | | | | | We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of. This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well. It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts. Differential Revision: http://reviews.llvm.org/D17680 llvm-svn: 262478
* [X86] Remove unnecessary call to isReg from emitter's DestMem handling for ↵Craig Topper2016-03-021-7/+5
| | | | | | VEX prefix. The operand is always a register. NFC llvm-svn: 262468
* [X86] Make X86MCCodeEmitter::DetermineREXPrefix locate operands more like ↵Craig Topper2016-03-021-54/+50
| | | | | | how VEX prefix handling does. llvm-svn: 262467
* [X86] Permit reading of the FLAGS register without it being previously definedDavid Majnemer2016-03-022-3/+8
| | | | | | | | | | | We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS. While technically correct, this is not be desirable for folks who want to examine aspects of the FLAGS register which are not related to computation like whether or not CPUID is a valid instruction. Differential Revision: http://reviews.llvm.org/D17782 llvm-svn: 262465
* [X86] Remove assertion I accidentally left in.Craig Topper2016-03-021-1/+0
| | | | llvm-svn: 262464
* [X86] Be more structured about how we capture the register number when it is ↵Craig Topper2016-03-021-41/+39
| | | | | | | | | | encoded in bits 7:4 of the immediate. For some instructions the register is not the last operand and the immediate handling had to detect this and hardcode the index to find it. It also required CurOp to be pointing at the last operand handled in the Form switch whereas for any instruction it would be pointing at the next operand. Now we just capture the value in the Form switch when we know exactly where it is and the CurOp pointer can behave normally. llvm-svn: 262462
* [X86] Use MCPhysReg and uint16_t for static arrays of registers and opcodes ↵Craig Topper2016-03-025-16/+16
| | | | | | respectively should reduce size tiny bit. NFC llvm-svn: 262458
* [x86] use getBitcast()Sanjay Patel2016-03-011-20/+20
| | | | | | | | This isn't quite NFC because some of the SDLocs may change which could cause scheduling differences. But no regression tests are affected and there is no functional change intended. llvm-svn: 262391
* TableGen: Check scheduling models for completenessMatthias Braun2016-03-012-0/+2
| | | | | | | | | | | | | | | | | | | | | | TableGen checks at compiletime that for scheduling models with "CompleteModel = 1" one of the following holds: - Is marked with the hasNoSchedulingInfo flag - The instruction is a subclass of Sched - There are InstRW definitions in the scheduling model Typical steps necessary to complete a model: - Ensure all pseudo instructions that are expanded before machine scheduling (usually everything handled with EmitYYY() functions in XXXTargetLowering). - If a CPU does not support some instructions mark the corresponding resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }". - Add missing scheduling information. Differential Revision: http://reviews.llvm.org/D17747 llvm-svn: 262384
OpenPOWER on IntegriCloud