summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.Manman Ren2016-03-181-0/+7
| | | | | | | Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller and callee have mismatched calling conventions. llvm-svn: 263856
* [X86][SSE] Simplified blend-with-zero combiningSimon Pilgrim2016-03-171-14/+13
| | | | | | | | We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines This patch stops the combine if we already have a blend in place (means we miss some domain corrections) llvm-svn: 263717
* fix function names; NFCSanjay Patel2016-03-161-58/+60
| | | | llvm-svn: 263646
* AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for ↵Igor Breger2016-03-161-0/+5
| | | | | | | | 512bit vector because PCMPGT supported only for 128/256bit. Differential Revision: http://reviews.llvm.org/D18204 llvm-svn: 263624
* Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND onEric Christopher2016-03-141-3/+1
| | | | | | | | | pre-SSE41 hardware" as it seems to be causing crashes during code generation in halide. PR forthcoming. This reverts commit r263303. llvm-svn: 263512
* [DAG] use !isUndef() ; NFCISanjay Patel2016-03-141-23/+19
| | | | llvm-svn: 263453
* [DAG] use isUndef() ; NFCISanjay Patel2016-03-141-49/+39
| | | | llvm-svn: 263448
* [x86, AVX] replace masked load with full vector load when possibleSanjay Patel2016-03-141-7/+25
| | | | | | | | | | | | | Converting masked vector loads to regular vector loads for x86 AVX should always be a win. I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any objections. 1. x86 already does this kind of optimization for multiple scalar loads -> vector load. 2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner. Differential Revision: http://reviews.llvm.org/D18094 llvm-svn: 263446
* AVX512: icmp operation should be always lowered to CMPM (AVX-512) ↵Igor Breger2016-03-141-22/+23
| | | | | | | | | | instruction on SKX. implemented by delena Differential Revision: http://reviews.llvm.org/D18054 llvm-svn: 263417
* [X86][SSE41] Avoid variable blend for constant v8i16 shiftsSimon Pilgrim2016-03-131-2/+7
| | | | | | The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering. llvm-svn: 263383
* [X86] Remove many operands that represent memory stores from outs to ins. ↵Craig Topper2016-03-136-34/+34
| | | | | | These operands are the registers and immediates that specify the memory address not the memory itself thus they are inputs. llvm-svn: 263354
* [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.Quentin Colombet2016-03-125-12/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cmpxchg[8|16]b uses RBX as one of its argument. In other words, using this instruction clobbers RBX as it is defined to hold one the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. Reserved registers have special semantic that only the target understands and enforces, because of that, the register allocator don’t use them, but also, don’t try to make sure they are used properly (remember it does not know how they are supposed to be used). Therefore, when RBX is used as a reserved register but defined by something that is not compatible with that use, the register allocator will not fix the surrounding code to make sure it gets saved and restored properly around the broken code. This is the responsibility of the target to do the right thing with its reserved register. To fix that, when the base pointer needs to be preserved, we use a different pseudo instruction for cmpxchg that save rbx. That pseudo takes two more arguments than the regular instruction: - One is the value to be copied into RBX to set the proper value for the comparison. - The other is the virtual register holding the save of the value of RBX as the base pointer. This saving is done as part of isel (i.e., we emit a copy from rbx). cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp This gets expanded into: rbx = copy input_for_rbx_reg cmpxchg <regular cmpxchg args> rbx = save_of_rbx_as_bp Note: The actual modeling of the pseudo is a bit more complicated to make sure the interferes that appears after the pseudo gets expanded are properly modeled before that expansion. This fixes PR26883. llvm-svn: 263325
* [X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardwareSimon Pilgrim2016-03-111-1/+3
| | | | | | | | | | | | Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 263303
* Fix spelling.Simon Pilgrim2016-03-111-1/+1
| | | | llvm-svn: 263266
* [X86][AVX] Fixed issue where a long chain of shuffles could attempt to ↵Simon Pilgrim2016-03-111-1/+4
| | | | | | | | combine to a single (illegal) PSHUFB instruction. Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases. llvm-svn: 263239
* [x86] don't use a shuffle when a vselect will do; NFCISanjay Patel2016-03-101-16/+5
| | | | | | | | Looking at the IR definition of a masked load made me realize there was no reason to use a shuffle here, so we don't need to convert the format of the mask at all. llvm-svn: 263167
* [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ↵Simon Pilgrim2016-03-101-9/+24
| | | | | | | | | | | | ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159
* [X86] Correctly select registers to pop into for x86_64Michael Kuperstein2016-03-101-1/+2
| | | | | | | | | | | | | | When trying to replace an add to esp with pops, we need to choose dead registers to pop into. Registers clobbered by the call and not imp-def'd by it should be safe. Except that it's not enough to check the register itself isn't defined, we also need to make sure no overlapping registers are defined either. This fixes PR26711. Differential Revision: http://reviews.llvm.org/D18029 llvm-svn: 263139
* Unified the handling of returns in the X87 stackifier so that the stackifierDavid L Kreitzer2016-03-101-90/+93
| | | | | | | | runs successfully on routines containing IRETs. This fixes PR26410. Differential Revision: http://reviews.llvm.org/D17643 llvm-svn: 263120
* AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512)Elena Demikhovsky2016-03-101-23/+27
| | | | | | | | (failed on instruction selection phase) Differential Revision: http://reviews.llvm.org/D17924 llvm-svn: 263111
* [X86][AVX] Improve target shuffle combining of BLEND+zeroSimon Pilgrim2016-03-101-1/+2
| | | | | | | | The BLEND+zero combine was failing to combine equivalent BLEND masks. Follow up to D17483 and D17858 llvm-svn: 263105
* [X86][SSE] Basic combining of unary target shuffles of binary target shuffles.Simon Pilgrim2016-03-101-12/+18
| | | | | | | | | | | | This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask. This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining. There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for. Differential Revision: http://reviews.llvm.org/D17858 llvm-svn: 263102
* AVX-512: Fixed a bug in shuffle for v64i8 typeElena Demikhovsky2016-03-101-0/+2
| | | | | | | | Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on". Differential Revision: http://reviews.llvm.org/D17994 llvm-svn: 263097
* [x86] fix cost model inaccuracy for vector memory opsSanjay Patel2016-03-091-4/+4
| | | | | | | | | | | The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge. Differential Revision: http://reviews.llvm.org/D18000 llvm-svn: 263069
* [x86, AVX] optimize masked loads with constant masksSanjay Patel2016-03-091-2/+44
| | | | | | | | Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper. Differential Revision: http://reviews.llvm.org/D17899 llvm-svn: 263067
* [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.Chad Rosier2016-03-092-2/+2
| | | | | | http://reviews.llvm.org/D17967 llvm-svn: 263021
* Revert r262759 and r262760.Quentin Colombet2016-03-081-9/+0
| | | | | | | | The fix consisting in using the library call for atomic compare and swap when the instruction is not safe to use may be incorrect. Indeed the library call may not exist on all platform. In other words, we need a better fix! llvm-svn: 262943
* Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ↵Hans Wennborg2016-03-081-24/+9
| | | | | | | | ZERO_EXTEND_VECTOR_INREG" This caused PR26870. llvm-svn: 262935
* AVX512: Add extract_subvector patterns v8i1->v4i1 , v4i1->v2i1.Igor Breger2016-03-081-0/+8
| | | | | | Differential Revision: http://reviews.llvm.org/D17953 llvm-svn: 262929
* [ms-inline-asm][AVX512] Add ability to use k registers in MS inline asm + ↵Marina Yatsina2016-03-071-1/+11
| | | | | | | | | | | | | | | | | | | | | | fix bag with curly braces Until now curly braces could only be used in MS inline assembly to mark block start/end. All curly braces were removed completely at a very early stage. This approach caused bugs like: "m{o}v eax, ebx" turned into "mov eax, ebx" without any error. In addition, AVX-512 added special operands (e.g., k registers), which are also surrounded by curly braces that mark them as such. Now, we need to keep the curly braces and identify at a later stage if they are marking block start/end (if so, ignore them), or surrounding special AVX-512 operands (if so, parse them as such). This patch fixes the bug described above and enables the use of AVX-512 special operands. This commit is the the llvm part of the patch. The clang part of the review is: http://reviews.llvm.org/D17766 The llvm part of the review is: http://reviews.llvm.org/D17767 Differential Revision: http://reviews.llvm.org/D17767 llvm-svn: 262843
* [X86][AVX512] Fixed VPERMT2* shuffle mask decoding and enabled target ↵Simon Pilgrim2016-03-063-11/+9
| | | | | | | | | | | | | | shuffle combining. Patch to add support for target shuffle combining of X86ISD::VPERMV3 nodes, including support for detecting unary shuffles. This uncovered several issues with the X86ISD::VPERMV3 shuffle mask decoding of non-64 bit shuffle mask elements - the bit masking wasn't being correctly computed. Removed non-constant pool mask decode path as we have no way of testing it right now. Differential Revision: http://reviews.llvm.org/D17916 llvm-svn: 262809
* AVX512BW: Support llvm intrinsic masked vector load/store for i8/i16 element ↵Igor Breger2016-03-062-37/+56
| | | | | | | | types on SKX Differential Revision: http://reviews.llvm.org/D17913 llvm-svn: 262803
* [X86] Use high bits of return value from getEncoding instead of predicate ↵Craig Topper2016-03-061-162/+101
| | | | | | functions to populate the REX and VEX prefix bits that extend register encodings. NFC llvm-svn: 262800
* [X86] Remove unnecessary masking. The assert above it already guaranteed it. NFCCraig Topper2016-03-061-2/+0
| | | | llvm-svn: 262799
* [X86] Use uint8_t instead of unsigned char as it shortens the code and more ↵Craig Topper2016-03-061-27/+26
| | | | | | explicitly reflects the desired size. llvm-svn: 262798
* AVX512: Remove VSHRI kmask patterns from TD file. It is incorrect to use ↵Igor Breger2016-03-062-87/+80
| | | | | | | | kshiftw to implement VSHRI v4i1 , bits 15-4 is undef so the upper bits of v4i1 may not be zeroed. v4i1 should be zero_extend to v16i1 ( or any natively supported vector). Differential Revision: http://reviews.llvm.org/D17763 llvm-svn: 262797
* [X86][AVX] Improved VPERMILPS variable shuffle mask decoding.Simon Pilgrim2016-03-053-1/+43
| | | | | | | | | | Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool. Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization Followup to D17681 llvm-svn: 262784
* [X86] AMD Bobcat CPU (btver1) doesn't support XSAVE Simon Pilgrim2016-03-051-1/+0
| | | | | | | | btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't support XSAVE. Differential Revision: http://reviews.llvm.org/D17683 llvm-svn: 262782
* [X86] Fix the lowering of setjmp intrinsic on i386.Quentin Colombet2016-03-051-0/+10
| | | | | | | | | | When the lowering of the setjmp intrinsic requires a global base pointer to be set, make sure such pointer gets defined by the CGBR pass. This fixes PR26742. llvm-svn: 262762
* [X86] Do not use cmpxchgXXb when we need the base pointer (RBX).Quentin Colombet2016-03-041-0/+9
| | | | | | | | | | | | cmpxchgXXb uses RBX as one of its implicit argument. I.e., when we use that instruction we need to clobber RBX. This is generally fine, expect when RBX is a reserved register because in that case, the register allocator will not track its value and will not save and restore it when interferences occur. rdar://problem/24851412 llvm-svn: 262759
* Fix build breakageDavid Majnemer2016-03-041-4/+5
| | | | llvm-svn: 262756
* [X86] Support cleaning more than 2**16 bytes of stackDavid Majnemer2016-03-046-8/+35
| | | | | | | | | | | | | | | | | | | The x86 ret instruction has a 16 bit immediate indicating how many bytes to pop off of the stack beyond the return address. There is a problem when extremely large structs are passed by value: we might not be able to fit the number of bytes to pop into the return instruction. To fix this, expand RET_FLAG a little later and use a special sequence to clean the stack: pop %ecx ; return address is now in %ecx add $n, %esp ; clean the stack push %ecx ; bring the return address back on the stack ret ; pop the return address and jmp to it's value llvm-svn: 262755
* Make headers self-contained again.Benjamin Kramer2016-03-041-0/+1
| | | | llvm-svn: 262702
* [X86][AVX512BW] Fixed 512-bit PSHUFB shuffle mask decode and added combine test.Simon Pilgrim2016-03-031-3/+3
| | | | | | PSHUFB decoder was assuming that input was 128 or 256-bit vector only. llvm-svn: 262661
* [X86][AVX] Better support for the variable mask form of VPERMILPD/VPERMILPSSimon Pilgrim2016-03-033-19/+35
| | | | | | | | | | The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic. This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle. Differential Revision: http://reviews.llvm.org/D17681 llvm-svn: 262635
* [X86] Tidied up 256-bit -> 2 x 128-bit vector shift extraction.Simon Pilgrim2016-03-031-14/+2
| | | | | | lowerShift was manually splitting BUILD_VECTOR cases when it could just call Extract128BitVector which does this anyway. llvm-svn: 262633
* [X86] Pulled out repeated code testing for constant vector shift amount. NFCI.Simon Pilgrim2016-03-031-8/+6
| | | | llvm-svn: 262631
* MCU target has its own ABI, however X86 interrupt handler calling convention ↵Amjad Aboud2016-03-031-1/+3
| | | | | | | | | | overrides this ABI. Fixed the ordering to check first for X86 interrupt handler then for MCU target. Differential Revision: http://reviews.llvm.org/D17801 llvm-svn: 262628
* [X86] Don't assume that shuffle non-mask operands starts at #0.Ahmed Bougacha2016-03-032-32/+68
| | | | | | | | | | | | | | | | | | | | | | | | | That's not the case for VPERMV/VPERMV3, which cover all possible combinations (the C intrinsics use a different order; the AVX vs AVX512 intrinsics are different still). Since: r246981 AVX-512: Lowering for 512-bit vector shuffles. VPERMV is recognized in getTargetShuffleMask. This breaks assumptions in most callers, as they expect the non-mask operands to start at index 0. VPERMV has the mask as operand #0; VPERMV3 has it in the middle. Instead of the faulty assumption, have getTargetShuffleMask return its operands as well. One alternative we considered was to change the operand order of VPERMV, but we agreed to stick to the instruction order, as there are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular). Differential Revision: http://reviews.llvm.org/D17041 llvm-svn: 262627
* AVX512: Combine AND + TESTM instructions .Igor Breger2016-03-033-8/+25
| | | | | | Differential Revision: http://reviews.llvm.org/D17844 llvm-svn: 262621
OpenPOWER on IntegriCloud