summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectorsSimon Pilgrim2016-03-261-0/+124
| | | | | | | | Currently this is to mainly to prevent scalarization of integer division by constants. Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264511
* [X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 ↵Simon Pilgrim2016-03-261-2/+3
| | | | | | | | multiplies. Only pre-AVX512BW targets need to split v32i8 vectors. llvm-svn: 264509
* [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC.Simon Pilgrim2016-03-261-13/+5
| | | | | | LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead. llvm-svn: 264506
* [X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddrDavid Majnemer2016-03-251-1/+1
| | | | | | | | | We forgot to add the second machine operand to our ADJCALLSTACKDOWN, resulting in crashes in PEI. This fixes PR27071. llvm-svn: 264465
* [X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsizeHans Wennborg2016-03-251-0/+12
| | | | | | | | | | | | 64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes, respectively, whereas and/or with 8-bit immediate is only three bytes. Since these instructions imply an additional memory read (which the CPU could elide, but we don't think it does), restrict these patterns to minsize functions. Differential Revision: http://reviews.llvm.org/D18374 llvm-svn: 264440
* [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerShift. NFC.Simon Pilgrim2016-03-251-20/+2
| | | | | | LowerShift was using the same code as Lower256IntArith to split 256-bit vectors into 2 x 128-bit vectors, so now we just call Lower256IntArith. llvm-svn: 264403
* fixed typoElena Demikhovsky2016-03-251-1/+1
| | | | llvm-svn: 264395
* X86: Use push-pop for materializing 8-bit immediates for minsize (take 2)Hans Wennborg2016-03-257-4/+93
| | | | | | | | | This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375
* [X86][XOP] Fixed instruction postfixes to more closely match operandsSimon Pilgrim2016-03-242-91/+91
| | | | | | Suggested by Sanjay in D18189 as the multiple folding options in XOP instructions can be tricky llvm-svn: 264305
* AVX-512: Generate KTEST instead of TEST fir i1 vectorsElena Demikhovsky2016-03-241-5/+27
| | | | | | | | | | | | KTEST instruction may be used instead of TEST in this case: %int_sel3 = bitcast <8 x i1> %sel3 to i8 %res = icmp eq i8 %int_sel3, zeroinitializer br i1 %res, label %L2, label %L1 Differential Revision: http://reviews.llvm.org/D18444 llvm-svn: 264298
* [X86][XOP] Merged 128/256 bit 4op instruction definitions. NFCI.Simon Pilgrim2016-03-241-15/+14
| | | | llvm-svn: 264294
* [X86][XOP] Support for VPPERM byte shuffle instructionSimon Pilgrim2016-03-245-3/+49
| | | | | | | | This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode. Differential Revision: http://reviews.llvm.org/D18189 llvm-svn: 264260
* [PS4] Guarantee an instruction after a 'noreturn' call.Paul Robinson2016-03-241-1/+3
| | | | | | | | | | We need the "return address" of a noreturn call to be within the bounds of the calling function; TrapUnreachable turns 'unreachable' into a 'ud2' instruction, which has that desired effect. Differential Revision: http://reviews.llvm.org/D18414 llvm-svn: 264224
* Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.Cong Hou2016-03-232-74/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 264199
* [x86] make peekThroughBitcasts() a helper functionSanjay Patel2016-03-231-60/+31
| | | | | | | | | | This should be hoisted further up so it can be used in DAGCombiner and other backends, but I'm limiting the scope in the interest of patch minimalism. It's not quite NFC because some of the replaced code was using an 'if' check rather than a 'while' loop, so those cases would only look through a single bitcast. llvm-svn: 264186
* [X86] Introduction of FeatureX87.Andrey Turetskiy2016-03-234-69/+100
| | | | | | | | Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87. Differential Revision: http://reviews.llvm.org/D13979 llvm-svn: 264148
* TypoJoerg Sonnenberger2016-03-221-1/+1
| | | | llvm-svn: 264110
* [X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardwareSimon Pilgrim2016-03-221-1/+3
| | | | | | | | | | | | | | Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Reapplied with a fix for PR26953 (missing vector widening legalization). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 264062
* [X86][SSE] Tidyup setTargetShuffleZeroElements to match ↵Simon Pilgrim2016-03-201-4/+4
| | | | | | | | computeZeroableShuffleElements Based on feedback for D14261 llvm-svn: 263911
* [X86][SSE] Detect zeroable shuffle elements from different value typesSimon Pilgrim2016-03-201-8/+42
| | | | | | | | Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask. Differential Revision: http://reviews.llvm.org/D14261 llvm-svn: 263906
* AVX512BW: Enable v32i1/v64i1 BUILD_VECTORIgor Breger2016-03-201-0/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D18211 llvm-svn: 263898
* Use a range-based for loop. NFC.Michael Kuperstein2016-03-201-4/+4
| | | | llvm-svn: 263889
* [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.Manman Ren2016-03-181-0/+7
| | | | | | | Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller and callee have mismatched calling conventions. llvm-svn: 263856
* [X86][SSE] Simplified blend-with-zero combiningSimon Pilgrim2016-03-171-14/+13
| | | | | | | | We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines This patch stops the combine if we already have a blend in place (means we miss some domain corrections) llvm-svn: 263717
* fix function names; NFCSanjay Patel2016-03-161-58/+60
| | | | llvm-svn: 263646
* AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for ↵Igor Breger2016-03-161-0/+5
| | | | | | | | 512bit vector because PCMPGT supported only for 128/256bit. Differential Revision: http://reviews.llvm.org/D18204 llvm-svn: 263624
* Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND onEric Christopher2016-03-141-3/+1
| | | | | | | | | pre-SSE41 hardware" as it seems to be causing crashes during code generation in halide. PR forthcoming. This reverts commit r263303. llvm-svn: 263512
* [DAG] use !isUndef() ; NFCISanjay Patel2016-03-141-23/+19
| | | | llvm-svn: 263453
* [DAG] use isUndef() ; NFCISanjay Patel2016-03-141-49/+39
| | | | llvm-svn: 263448
* [x86, AVX] replace masked load with full vector load when possibleSanjay Patel2016-03-141-7/+25
| | | | | | | | | | | | | Converting masked vector loads to regular vector loads for x86 AVX should always be a win. I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any objections. 1. x86 already does this kind of optimization for multiple scalar loads -> vector load. 2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner. Differential Revision: http://reviews.llvm.org/D18094 llvm-svn: 263446
* AVX512: icmp operation should be always lowered to CMPM (AVX-512) ↵Igor Breger2016-03-141-22/+23
| | | | | | | | | | instruction on SKX. implemented by delena Differential Revision: http://reviews.llvm.org/D18054 llvm-svn: 263417
* [X86][SSE41] Avoid variable blend for constant v8i16 shiftsSimon Pilgrim2016-03-131-2/+7
| | | | | | The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering. llvm-svn: 263383
* [X86] Remove many operands that represent memory stores from outs to ins. ↵Craig Topper2016-03-136-34/+34
| | | | | | These operands are the registers and immediates that specify the memory address not the memory itself thus they are inputs. llvm-svn: 263354
* [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.Quentin Colombet2016-03-125-12/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cmpxchg[8|16]b uses RBX as one of its argument. In other words, using this instruction clobbers RBX as it is defined to hold one the input. When the backend uses dynamically allocated stack, RBX is used as a reserved register for the base pointer. Reserved registers have special semantic that only the target understands and enforces, because of that, the register allocator don’t use them, but also, don’t try to make sure they are used properly (remember it does not know how they are supposed to be used). Therefore, when RBX is used as a reserved register but defined by something that is not compatible with that use, the register allocator will not fix the surrounding code to make sure it gets saved and restored properly around the broken code. This is the responsibility of the target to do the right thing with its reserved register. To fix that, when the base pointer needs to be preserved, we use a different pseudo instruction for cmpxchg that save rbx. That pseudo takes two more arguments than the regular instruction: - One is the value to be copied into RBX to set the proper value for the comparison. - The other is the virtual register holding the save of the value of RBX as the base pointer. This saving is done as part of isel (i.e., we emit a copy from rbx). cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp This gets expanded into: rbx = copy input_for_rbx_reg cmpxchg <regular cmpxchg args> rbx = save_of_rbx_as_bp Note: The actual modeling of the pseudo is a bit more complicated to make sure the interferes that appears after the pseudo gets expanded are properly modeled before that expansion. This fixes PR26883. llvm-svn: 263325
* [X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardwareSimon Pilgrim2016-03-111-1/+3
| | | | | | | | | | | | Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 263303
* Fix spelling.Simon Pilgrim2016-03-111-1/+1
| | | | llvm-svn: 263266
* [X86][AVX] Fixed issue where a long chain of shuffles could attempt to ↵Simon Pilgrim2016-03-111-1/+4
| | | | | | | | combine to a single (illegal) PSHUFB instruction. Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases. llvm-svn: 263239
* [x86] don't use a shuffle when a vselect will do; NFCISanjay Patel2016-03-101-16/+5
| | | | | | | | Looking at the IR definition of a masked load made me realize there was no reason to use a shuffle here, so we don't need to convert the format of the mask at all. llvm-svn: 263167
* [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ↵Simon Pilgrim2016-03-101-9/+24
| | | | | | | | | | | | ZERO_EXTEND_VECTOR_INREG Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns. Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion). Differential Revision: http://reviews.llvm.org/D17691 llvm-svn: 263159
* [X86] Correctly select registers to pop into for x86_64Michael Kuperstein2016-03-101-1/+2
| | | | | | | | | | | | | | When trying to replace an add to esp with pops, we need to choose dead registers to pop into. Registers clobbered by the call and not imp-def'd by it should be safe. Except that it's not enough to check the register itself isn't defined, we also need to make sure no overlapping registers are defined either. This fixes PR26711. Differential Revision: http://reviews.llvm.org/D18029 llvm-svn: 263139
* Unified the handling of returns in the X87 stackifier so that the stackifierDavid L Kreitzer2016-03-101-90/+93
| | | | | | | | runs successfully on routines containing IRETs. This fixes PR26410. Differential Revision: http://reviews.llvm.org/D17643 llvm-svn: 263120
* AVX-512: Fixed a bug in i1 vector zero extending. (Skylake-avx512)Elena Demikhovsky2016-03-101-23/+27
| | | | | | | | (failed on instruction selection phase) Differential Revision: http://reviews.llvm.org/D17924 llvm-svn: 263111
* [X86][AVX] Improve target shuffle combining of BLEND+zeroSimon Pilgrim2016-03-101-1/+2
| | | | | | | | The BLEND+zero combine was failing to combine equivalent BLEND masks. Follow up to D17483 and D17858 llvm-svn: 263105
* [X86][SSE] Basic combining of unary target shuffles of binary target shuffles.Simon Pilgrim2016-03-101-12/+18
| | | | | | | | | | | | This patch reorders the combining of target shuffle masks so that when a unary shuffle takes a binary shuffle as its input but only references one of its inputs it can correctly combine into a unary shuffle mask. This is starting to encroach on the purpose of resolveTargetShuffleInputs, but I don't want to remove it until we definitely know we won't need it for full binary shuffle combining. There is a lot more work before we can properly support binary target shuffle masks but this was an easy case to add support for. Differential Revision: http://reviews.llvm.org/D17858 llvm-svn: 263102
* AVX-512: Fixed a bug in shuffle for v64i8 typeElena Demikhovsky2016-03-101-0/+2
| | | | | | | | Operation SCALAR_TO_VECTOR for v64i8 and v32i16 should be lowered if BW feature is "on". Differential Revision: http://reviews.llvm.org/D17994 llvm-svn: 263097
* [x86] fix cost model inaccuracy for vector memory opsSanjay Patel2016-03-091-4/+4
| | | | | | | | | | | The irony of this patch is that one CPU that is affected is AMD Jaguar, and Jaguar has a completely double-pumped AVX implementation. But getting the cost model to reflect that is a much bigger problem. The small goal here is simply to improve on the lie that !AVX2 == SandyBridge. Differential Revision: http://reviews.llvm.org/D18000 llvm-svn: 263069
* [x86, AVX] optimize masked loads with constant masksSanjay Patel2016-03-091-2/+44
| | | | | | | | Instead of a variable-blend instruction, form a blend with immediate because those are always cheaper. Differential Revision: http://reviews.llvm.org/D17899 llvm-svn: 263067
* [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.Chad Rosier2016-03-092-2/+2
| | | | | | http://reviews.llvm.org/D17967 llvm-svn: 263021
* Revert r262759 and r262760.Quentin Colombet2016-03-081-9/+0
| | | | | | | | The fix consisting in using the library call for atomic compare and swap when the instruction is not safe to use may be incorrect. Indeed the library call may not exist on all platform. In other words, we need a better fix! llvm-svn: 262943
* Revert r262599 "[X86][SSE] Improve vector ZERO_EXTEND by combining to ↵Hans Wennborg2016-03-081-24/+9
| | | | | | | | ZERO_EXTEND_VECTOR_INREG" This caused PR26870. llvm-svn: 262935
OpenPOWER on IntegriCloud