summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Declare SSE4.1/AVX2 vector extloads covered by PMOV[SZ]X legal.Ahmed Bougacha2015-01-212-9/+70
| | | | | | | | | | | | | | | | | | Now that we can fully specify extload legality, we can declare them legal for the PMOVSX/PMOVZX instructions. This for instance enables a DAGCombine to fire on code such as (and (<zextload-equivalent> ...), <redundant mask>) to turn it into: (zextload ...) as seen in the testcase changes. There is one regression, in widen_load-2.ll: we're no longer able to do store-to-load forwarding with illegal extload memory types. This will be addressed separately. Differential Revision: http://reviews.llvm.org/D6533 llvm-svn: 226676
* [x32] Fast ISel should use LEA64_32r instead of LEA32r to adjust addresses ↵Michael Kuperstein2015-01-211-2/+8
| | | | | | in x32 mode. llvm-svn: 226661
* [x86] Remove some unnecessary and slightly confusing typecasts from some ↵Craig Topper2015-01-211-4/+4
| | | | | | patterns. I think it actually went i32->iPtr->i32 in some of these cases. llvm-svn: 226647
* [X86] Convert all the i8imm used by AVX512 and MMX instructions to u8imm.Craig Topper2015-01-212-27/+27
| | | | llvm-svn: 226646
* [X86] Convert all the i8imm used by SSE and AVX instructions to u8imm.Craig Topper2015-01-213-77/+66
| | | | | | This makes the assembler check their size and removes a hack from the disassembler to avoid sign extending the immediate. llvm-svn: 226645
* [x86] Add assembly parser bounds checking to the immediate value for ↵Craig Topper2015-01-215-14/+40
| | | | | | cmpss/cmpsd/cmpps/cmppd. llvm-svn: 226642
* [x86] Add some mayLoad/hasSideEffects flags. Remove one that was already ↵Craig Topper2015-01-202-3/+7
| | | | | | covered by a pattern. llvm-svn: 226562
* [X86][AVX] Missing AVX1 memory folding float instructionsSimon Pilgrim2015-01-191-2/+54
| | | | | | | | | | Now that we can create much more exhaustive X86 memory folding tests, this patch adds the missing AVX1/F16C floating point instruction stack foldings we can easily test for including the scalar intrinsics (add, div, max, min, mul, sub), conversions float/int to double, half precision conversions, rounding, dot product and bit test. The patch also adds a couple of obviously missing SSE instructions (more to follow once we have full SSE testing). Now that scalar folding is working it broke a very old test (2006-10-07-ScalarSSEMiscompile.ll) - this test appears to make no sense as its trying to ensure that a scalar subtraction isn't folded as it 'would zero the top elts of the loaded vector' - this test just appears to be wrong to me. Differential Revision: http://reviews.llvm.org/D7055 llvm-svn: 226513
* Add r224985 back with fixes.Rafael Espindola2015-01-192-83/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fixes are to note that AArch64 has additional restrictions on when local relocations can be used. In particular, ld64 requires that relocations to cstring/cfstrings use linker visible symbols. Original message: In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 226503
* [x86] Change AVX512 intrinsics to take a 8-bit immediate for the comparision ↵Craig Topper2015-01-192-8/+4
| | | | | | kind instead of a 32-bit immediate. This better aligns with the emitted instruction. It also matches SSE and AVX1 equivalents. Also add auto upgrade support. llvm-svn: 226430
* std::unique_ptrify the MCStreamer argument to createAsmPrinterDavid Blaikie2015-01-181-2/+3
| | | | llvm-svn: 226414
* X86: fix comment typo in AsmParserSaleem Abdulrasool2015-01-161-1/+1
| | | | | | Fix a typo. NFC. llvm-svn: 226313
* [AVX512] Add intrinsics for masked aligned FP loads and storesAdam Nemet2015-01-161-0/+25
| | | | | | | | | | Similar to the unaligned cases. Test was generated with update_llc_test_checks.py. Part of <rdar://problem/17688758> llvm-svn: 226296
* [X86][DAG] Disable target specific combine on INSERTPS dag nodes at -O0.Andrea Di Biagio2015-01-161-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch disables target specific combine on X86ISD::INSERTPS dag nodes if optlevel is CodeGenOpt::None. The backend currently implements a target specific combine rule that converts a vector load used by an INSERTPS dag node into a scalar load plus a scalar_to_vector. This allows ISel to select a single INSERTPSrm instead of two instructions (i.e. a vector load plus INSERTPSrr). However, the existing target combine rule on INSERTPS nodes only works under the assumption that ISel will always be able to match an INSERTPSrm. This is not true in general at -O0, since the backend only allows folding a load into the memory operand of an instruction if the optimization level is not CodeGenOpt::None. In the example below: // __m128 test(__m128 a, __m128 *b) { __m128 c = _mm_insert_ps(a, *b, 1 << 6); return c; } // Before this patch, at -O0, the backend would have canonicalized the load to 'b' into a scalar load plus scalar_to_vector. Later on, ISel would have selected an INSERTPSrr leaving the insertps mask in an inconsistent state: movss 4(%rdi), %xmm1 insertps $64, %xmm1, %xmm0 # xmm0 = xmm1[1],xmm0[1,2,3]. With this patch, the backend avoids folding the vector load into the operand of the INSERTPS. The new codegen at -O0 is: movaps (%rdi), %xmm1 insertps $64, %xmm1, %xmm0 # %xmm1[1],xmm0[1,2,3]. llvm-svn: 226277
* Support @PLT loads on 32bit x86.Joerg Sonnenberger2015-01-151-0/+3
| | | | llvm-svn: 226182
* Hide some redundant AVX512 instructions from the asm parser, but force them ↵Craig Topper2015-01-151-1/+1
| | | | | | to show up in the disassembler. llvm-svn: 226155
* Revert "Add r224985 back with two fixes."Rafael Espindola2015-01-142-46/+83
| | | | | | This reverts commit r225644 while I debug a regression. llvm-svn: 226022
* Use the operand vector instead so inline assembly can be validated tooDavid Majnemer2015-01-141-5/+5
| | | | | | The buildbots got upset after r225941, this should hopefully fix things. llvm-svn: 225954
* X86: only access operands if they are presentSaleem Abdulrasool2015-01-141-0/+2
| | | | | | | | If there is no associated immediate (MS style inline asm), do not try to access the operand, assume that it is valid. This should fix the buildbots after SVN r225941. llvm-svn: 225950
* Revert "Insert random noops to increase security against ROP attacks (llvm)"JF Bastien2015-01-142-68/+0
| | | | | | | This reverts commit: http://reviews.llvm.org/D3392 llvm-svn: 225948
* X86: validate 'int' instructionSaleem Abdulrasool2015-01-141-0/+21
| | | | | | | The int instruction takes as an operand an 8-bit immediate value. Validate that the input is valid rather than silently truncating the value. llvm-svn: 225941
* Insert random noops to increase security against ROP attacks (llvm)JF Bastien2015-01-142-0/+68
| | | | | | | | | | | | | | | | | | | | A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks. Command line options: -noop-insertion // Enable noop insertion. -noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion) -max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions. In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future). -fdiversify This is the llvm part of the patch. clang part: D3393 http://reviews.llvm.org/D3392 Patch by Stephen Crane (@rinon) llvm-svn: 225908
* [AVX512] Unpack support in new shuffle loweringAdam Nemet2015-01-131-0/+34
| | | | | | | | | | | This now handles both 32 and 64-bit element sizes. In this version, the test are in vector-shuffle-512-v8.ll, canonicalized by Chandler's update_llc_test_checks.py. Part of <rdar://problem/17688758> llvm-svn: 225838
* [AVX512] Add pretty-printing of shuffle mask for unpacksAdam Nemet2015-01-131-0/+64
| | | | llvm-svn: 225837
* Rename llvm.recoverframeallocation to llvm.framerecoverReid Kleckner2015-01-131-1/+1
| | | | | | | | This name is less descriptive, but it sort of puts things in the 'llvm.frame...' namespace, relating it to frameallocate and frameaddress. It also avoids using "allocate" and "allocation" together. llvm-svn: 225752
* Add the llvm.frameallocate and llvm.recoverframeallocation intrinsicsReid Kleckner2015-01-132-0/+7
| | | | | | | | | | | | | | | | | | | | | These intrinsics allow multiple functions to share a single stack allocation from one function's call frame. The function with the allocation may only perform one allocation, and it must be in the entry block. Functions accessing the allocation call llvm.recoverframeallocation with the function whose frame they are accessing and a frame pointer from an active call frame of that function. These intrinsics are very difficult to inline correctly, so the intention is that they be introduced rarely, or at least very late during EH preparation. Reviewers: echristo, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D6493 llvm-svn: 225746
* [X86][SSE] Minor regression fix for r225551Simon Pilgrim2015-01-121-1/+2
| | | | | | r225551 vector byte shuffle optimization caused an assertion as fully zeroable vectors can be produced under certain circumstances. This fix drops the assert and returns a zero vector where the assert would have failed. llvm-svn: 225718
* [X86] Also create+widen FMIN/FMAX nodes for v2f32.Ahmed Bougacha2015-01-121-1/+18
| | | | | | | | | | | | | | | | This happens in the HINT benchmark, where the SLP-vectorizer created v2f32 fcmp/select code. The "correct" solution would have been to teach the vectorizer cost model that v2f32 isn't legal (because really, it isn't), but if we can vectorize we might as well do so. We legalize these v2f32 FMIN/FMAX nodes by widening to v4f32 later on. v3f32 were already widened to v4f32 by the generic unroll-and-build-vector legalization. rdar://15763436 Differential Revision: http://reviews.llvm.org/D6557 llvm-svn: 225691
* Add r224985 back with two fixes.Rafael Espindola2015-01-122-83/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One is that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. The other is that ld64 requires the relocations to cstring to use linker visible symbols on AArch64. Thanks to Michael Zolotukhin for testing this! Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225644
* [X86][SSE] Minor fix to VPBLENDW AVX2 commutation.Simon Pilgrim2015-01-111-3/+3
| | | | | | D6015 / rL221313 enabled commutation for SSE immediate blend instructions, but due to a typo the AVX2 VPBLENDW ymm instructions weren't flagged as commutative along with the others in the tables, but were still being commuted in code and tested for. llvm-svn: 225612
* Revert most of r225597David Majnemer2015-01-114-71/+44
| | | | | | We can't rely on a DataLayout enlightened constant folder. llvm-svn: 225599
* X86: Properly decode shuffle masks when the constant pool type is weirdDavid Majnemer2015-01-114-66/+80
| | | | | | | | | | | | | It's possible for the constant pool entry for the shuffle mask to come from a completely different operation. This occurs when Constants have the same bit pattern but have different types. Make DecodePSHUFBMask tolerant of types which, after a bitcast, are appropriately sized vector types. This fixes PR22188. llvm-svn: 225597
* X86: teach X86TargetLowering about L,M,O constraintsSaleem Abdulrasool2015-01-111-0/+25
| | | | | | | | | Teach the ISelLowering for X86 about the L,M,O target specific constraints. Although, for the moment, clang performs constraint validation and prevents passing along inline asm which may have immediate constant constraints violated, the backend should be able to cope with the invalid inline asm a bit better. llvm-svn: 225596
* [X86][SSE] Improved (v)insertps shuffle matchingSimon Pilgrim2015-01-101-42/+82
| | | | | | | | | | | | In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable. This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced. We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function. Differential Revision: http://reviews.llvm.org/D6879 llvm-svn: 225589
* [X86][SSE] Avoid vector byte shuffles with zero by using pshufb to create zerosSimon Pilgrim2015-01-091-12/+30
| | | | | | | | pshufb can shuffle in zero bytes as well as bytes from a source vector - we can use this to avoid having to shuffle 2 vectors and ORing the result when the used inputs from a vector are all zeroable. Differential Revision: http://reviews.llvm.org/D6878 llvm-svn: 225551
* Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revisionLang Hames2015-01-092-21/+2
| | | | | | | | | | introduced. A test case for the bug was already committed in r225385. Patch by Rafael Espindola. llvm-svn: 225534
* [x86] Add a flag to control the vector shuffle legality predicates thatChandler Carruth2015-01-091-0/+23
| | | | | | | | | | complements the new vector shuffle lowering code path. This flag, naturally, is *off* because we've not tested or evaluated the results of this at all. However, the flag will make it much easier to evaluate whether we can be this aggressive and whether there are missing vector shuffle lowering optimizations. llvm-svn: 225491
* [X86] Reflow comment. NFC.Ahmed Bougacha2015-01-081-3/+4
| | | | llvm-svn: 225455
* [X86] Don't try to generate direct calls to TLS globalsMichael Kuperstein2015-01-081-1/+2
| | | | | | | | | The call lowering assumes that if the callee is a global, we want to emit a direct call. This is correct for regular globals, but not for TLS ones. Differential Revision: http://reviews.llvm.org/D6862 llvm-svn: 225438
* [X86] Don't print 'dword ptr' or 'qword ptr' on the operand to some of the ↵Craig Topper2015-01-084-4/+14
| | | | | | LEA variants in Intel syntax. The memory operand is inherently unsized. llvm-svn: 225432
* [SelectionDAG] Allow targets to specify legality of extloads' resultAhmed Bougacha2015-01-081-26/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type (in addition to the memory type). The *LoadExt* legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
* X86: VZeroUpperInserter: shortcut should not trigger if we have any function ↵Matthias Braun2015-01-081-8/+12
| | | | | | live-ins. llvm-svn: 225419
* [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.Ahmed Bougacha2015-01-071-26/+9
| | | | | | | | A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392
* [X86] Fix 512->256 typo in comments. NFC.Ahmed Bougacha2015-01-071-2/+2
| | | | llvm-svn: 225367
* X86: Allow the stack probe size to be configurable per functionDavid Majnemer2015-01-071-3/+9
| | | | | | | | | | | | | | | | | | | LLVM emits stack probes on Windows targets to ensure that the stack is correctly accessed. However, the amount of stack allocated before emitting such a probe is hardcoded to 4096. It is desirable to have this be configurable so that a function might opt-out of stack probes. Our level of granularity is at the function level instead of, say, the module level to permit proper generation of code after LTO. Patch by Andrew H! N.B. The inliner needs to be updated to properly consider what happens after inlining a function with a specific stack-probe-size into another function with a different stack-probe-size. llvm-svn: 225360
* [X86] Teach FCOPYSIGN lowering to recognize constant magnitudes.Ahmed Bougacha2015-01-071-6/+19
| | | | | | | | | | | | | | | | | | | | | For code like: float foo(float x) { return copysign(1.0, x); } We used to generate: andps <-0.000000e+00,0,0,0>, %xmm0 movss <1.000000e+00>, %xmm1 andps <nan>, %xmm1 orps %xmm0, %xmm1 Basically doing an abs(1.0f) in the two middle instructions. We now generate: andps <-0.000000e+00,0,0,0>, %xmm0 orps <1.000000e+00,0,0,0>, %xmm0 Builds on cleanups r223415, r223542. rdar://19049548 Differential Revision: http://reviews.llvm.org/D6555 llvm-svn: 225357
* [X86] Merge a switch statement inside a default case of another switch ↵Craig Topper2015-01-071-160/+155
| | | | | | statement on the same variable. There was no additional code in the default so this should be no functional change. llvm-svn: 225345
* [X86] Don't mark the shift by 1 instructions as isConvertibleToThreeAddress. ↵Craig Topper2015-01-071-1/+1
| | | | | | There is no handling for them. llvm-svn: 225344
* [X86] Remove some unused TYPE enums from the disassembler.Craig Topper2015-01-073-18/+1
| | | | llvm-svn: 225343
* Revert r224935 "Refactor duplicated code. No intended functionality change."Lang Hames2015-01-062-2/+21
| | | | | | | | This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin. Reverting to get the bots green while I track down the source of the changed behavior. llvm-svn: 225311
OpenPOWER on IntegriCloud