summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* Fix PR17764Michael Liao2013-11-021-0/+10
| | | | | | | - When selecting BLEND from vselect, the operands need swapping as due to the difference between vselect and SSE/AVX's BLEND insn llvm-svn: 193900
* These test cases for experimental features are a bit too darwin-specific ↵Andrew Trick2013-10-312-4/+4
| | | | | | still. Use a triple. llvm-svn: 193820
* Add new calling convention for WebKit Java Script.Andrew Trick2013-10-311-0/+20
| | | | llvm-svn: 193812
* Add support for stack map generation in the X86 backend.Andrew Trick2013-10-312-0/+253
| | | | | | Originally implemented by Lang Hames. llvm-svn: 193811
* Merge and filecheckize.Roman Divacky2013-10-312-8/+16
| | | | llvm-svn: 193778
* Add AVX512 unmasked integer broadcast intrinsics and support.Cameron McInally2013-10-311-0/+28
| | | | llvm-svn: 193748
* AVX-512: Implemented CMOV for 512-bit vectorsElena Demikhovsky2013-10-311-0/+22
| | | | llvm-svn: 193747
* Legalize: Improve legalization of long vector extends.Jim Grosbach2013-10-311-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an extend more than doubles the size of the elements (e.g., a zext from v16i8 to v16i32), the normal legalization method of splitting the vectors will run into problems as by the time the destination vector is legal, the source vector is illegal. The end result is the operation often becoming scalarized, with the typical horrible performance. For example, on x86_64, the simple input of: define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind { %tmp = zext <16 x i8> %a to <16 x i32> store <16 x i32> %tmp, <16 x i32>*%p ret void } Generates: .section __TEXT,__text,regular,pure_instructions .section __TEXT,__const .align 5 LCPI0_0: .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .section __TEXT,__text,regular,pure_instructions .globl _bar .align 4, 0x90 _bar: vpunpckhbw %xmm0, %xmm0, %xmm1 vpunpckhwd %xmm0, %xmm1, %xmm2 vpmovzxwd %xmm1, %xmm1 vinsertf128 $1, %xmm2, %ymm1, %ymm1 vmovaps LCPI0_0(%rip), %ymm2 vandps %ymm2, %ymm1, %ymm1 vpmovzxbw %xmm0, %xmm3 vpunpckhwd %xmm0, %xmm3, %xmm3 vpmovzxbd %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vandps %ymm2, %ymm0, %ymm0 vmovaps %ymm0, (%rdi) vmovaps %ymm1, 32(%rdi) vzeroupper ret So instead we can check if there are legal types that enable us to split more cleverly when the input vector is already legal such that we don't turn it into an illegal type. If the extend is such that it's more than doubling the size of the input we check if - the number of vector elements is even, - the source type is legal, - the type of a split source is illegal, - the type of an extended (by doubling element size) source is legal, and - the type of that extended source when split is legal. If the conditions are met, instead of just splitting both the destination and the source types, we create an extend that only goes up one "step" (doubling the element width), and the continue legalizing the rest of the operation normally. The result is that this operates as a new, more effecient, termination condition for the loop of "split the operation until the destination type is legal." With this change, the above example now compiles to: _bar: vpxor %xmm1, %xmm1, %xmm1 vpunpcklbw %xmm1, %xmm0, %xmm2 vpunpckhwd %xmm1, %xmm2, %xmm3 vpunpcklwd %xmm1, %xmm2, %xmm2 vinsertf128 $1, %xmm3, %ymm2, %ymm2 vpunpckhbw %xmm1, %xmm0, %xmm0 vpunpckhwd %xmm1, %xmm0, %xmm3 vpunpcklwd %xmm1, %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vmovaps %ymm0, 32(%rdi) vmovaps %ymm2, (%rdi) vzeroupper ret This generalizes a custom lowering that was added a while back to the ARM backend. That lowering is no longer necessary, and is removed. The testcases for it, however, provide excellent ARM tests for this change and so remain. rdar://14735100 llvm-svn: 193727
* Produce .weak_def_can_be_hidden for some linkonce_odr valuesRafael Espindola2013-10-301-0/+26
| | | | | | | | | | | | | | With this patch llvm produces a weak_def_can_be_hidden for linkonce_odr if they are also unnamed_addr or don't have their address taken. There is not a lot of documentation about .weak_def_can_be_hidden, but from the old discussion about linkonce_odr_auto_hide and the name of the directive this looks correct: these symbols can be hidden. Testing this with the ld64 in Xcode 5 linking clang reduces the number of exported symbols from 21053 to 19049. llvm-svn: 193718
* Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs ↵Juergen Ributzka2013-10-301-42/+0
| | | | | | | | splitting too." Now Hexagon and SystemZ are not happy with it :-( llvm-svn: 193677
* SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.Juergen Ributzka2013-10-301-0/+42
| | | | | | | | | | | | | | | | | | | | The Type Legalizer recognizes that VSELECT needs to be split, because the type is to wide for the given target. The same does not always apply to SETCC, because less space is required to encode the result of a comparison. As a result VSELECT is split and SETCC is unrolled into scalar comparisons. This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG Combiner. If a matching pattern is found, then the result mask of SETCC is promoted to the expected vector mask type for the given target. This mask has usually the same size as the VSELECT return type (except for Intel KNL). Now the type legalizer will split both VSELECT and SETCC. This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>. Reviewed by Nadav llvm-svn: 193676
* AVX-512: PMIN/PMAX intrinsics and patternsElena Demikhovsky2013-10-271-0/+56
| | | | | | Patch by Cameron McInally <cameron.mcinally@nyu.edu> llvm-svn: 193497
* [X86][AVX512] Add patterns that match the AVX512 floating point register ↵Quentin Colombet2013-10-251-0/+14
| | | | | | | | vbroadcast intrinsics. Patch by Cameron McInally <cameron.mcinally@nyu.edu> llvm-svn: 193422
* [X86][AVX512] Add patterns that match the AVX512 floating point vbroadcast ↵Quentin Colombet2013-10-251-0/+14
| | | | | | | | intrinsics. Patch by Cameron McInally <cameron.mcinally@nyu.edu> llvm-svn: 193421
* Added test for -elf configuration, to see that _alloca call is properly Yaron Keren2013-10-241-9/+16
| | | | | | | | generated. See: http://llvm.org/viewvc/llvm-project?view=revision&revision=193289 llvm-svn: 193321
* AVX-512: added VCVTPH2PS, VCVTPS2PH with intrinsicsElena Demikhovsky2013-10-241-0/+15
| | | | llvm-svn: 193312
* Replace sse41/sse42 with sse4.1/sse4.2 in test command lines to fix bots.Craig Topper2013-10-242-2/+2
| | | | llvm-svn: 193311
* Add non-AVX tests for AES intrinsics.Craig Topper2013-10-241-0/+48
| | | | llvm-svn: 193310
* Add tests for SSE intrinsics in non-avx mode by copying from the AVX test ↵Craig Topper2013-10-246-0/+1704
| | | | | | cases. Some of these may have been tested by other tests, but most weren't. Patch by Cameron McInally. llvm-svn: 193309
* X86: Custom lower sext v16i8 to v16i16, and the corresponding truncate.Benjamin Kramer2013-10-234-3/+37
| | | | | | Also update the cost model. llvm-svn: 193270
* X86: Custom lower zext v16i8 to v16i16.Benjamin Kramer2013-10-232-0/+21
| | | | | | | | | | | | | | | | | On sandy bridge (PR17654) we now get vpxor %xmm1, %xmm1, %xmm1 vpunpckhbw %xmm1, %xmm0, %xmm2 vpunpcklbw %xmm1, %xmm0, %xmm0 vinsertf128 $1, %xmm2, %ymm0, %ymm0 On haswell it's a simple vpmovzxbw %xmm0, %ymm0 There is a maze of duplicated and dead transforms and patterns in this area. Remove the dead custom lowering of zext v8i16 to v8i32, that's already handled by LowerAVXExtend. llvm-svn: 193262
* Fix PR17631Michael Liao2013-10-231-0/+22
| | | | | | | | | - Skip instructions added in prolog. For specific targets, prolog may insert helper function calls (e.g. _chkstk will be called when there're more than 4K bytes allocated on stack). However, these helpers don't use/def YMM/XMM registers. llvm-svn: 193261
* AVX-512: aligned / unaligned load and store for 512-bit integer vectors.Elena Demikhovsky2013-10-221-0/+28
| | | | llvm-svn: 193156
* Add testcase for PR3168. It was fixed over time.Bill Wendling2013-10-221-0/+21
| | | | | | PR3168 llvm-svn: 193152
* Fix spelling, grammar, and match naming convention for test files.Eric Christopher2013-10-211-1/+1
| | | | llvm-svn: 193130
* X86 vector element shift-by-immediate instructions take i8 immediates. MakeLang Hames2013-10-212-4/+4
| | | | | | | | | | | | | | the instruction defenitions and ISEL reflect this. Prior to this patch these instructions took an i32i8imm, and the high bits were dropped during encoding. This led to incorrect behavior for shifts by immediates higher than 255. This patch fixes that issue by detecting large immediate shifts and returning constant zero (for logical shifts) or capping the shift amount at an encodable value (for arithmetic shifts). Fixes <rdar://problem/14968098> llvm-svn: 193096
* AVX-512: MUL operation lowering for v8i64Elena Demikhovsky2013-10-211-1/+10
| | | | llvm-svn: 193083
* Emit prefix data after debug and EH directives.Peter Collingbourne2013-10-201-0/+2
| | | | | | | | | This ensures that the prefix data is treated as part of the function for the purpose of debug info. This provides a better debugging experience, among other things by allowing a debug info client to correctly look up a function in debug info given a function pointer. llvm-svn: 193042
* Test case for r192957David Majnemer2013-10-181-0/+21
| | | | | | Forgot to 'svn add' llvm-svn: 192978
* Revert "Re-commit r192758 - MC: quote tricky symbol names in asm output"Hans Wennborg2013-10-183-7/+5
| | | | | | | | | | | | | | | | | This caused the clang-native-mingw32-win7 buildbot to break. The assembler was complaining about the following lines that were showing up in the asm for CrashRecoveryContext.cpp: movl $"__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4", 4(%eax) calll "_AddVectoredExceptionHandler@8" .def "__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4"; "__ZL16ExceptionHandlerP19_EXCEPTION_POINTERS@4": calll "_RemoveVectoredExceptionHandler@4" Reverting for now. llvm-svn: 192940
* Add testcase to make sure we don't generate a compact unwind section for ELF ↵Bill Wendling2013-10-171-0/+48
| | | | | | | | binaries. This tests r190354. llvm-svn: 192903
* Fix tests not to depend on specific regalloc or instruction order.Benjamin Kramer2013-10-172-4/+4
| | | | | | They were failing with -mcpu=atom. llvm-svn: 192890
* Fix edge condition in DAGCombiner to improve codegen of shift sequences.Andrea Di Biagio2013-10-171-0/+8
| | | | | | | | | | | | When canonicalizing dags according to the rule (shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1)) remember to add the new shl dag to the DAGCombiner worklist of nodes. If we don't explicitly add it to the worklist of nodes to visit, we may not trigger later on the rule that folds the shift left + logical shift right into a AND instruction with bitmask. llvm-svn: 192883
* x86: Move bitcasts outside concat_vector.Jim Grosbach2013-10-171-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following: typedef unsigned short ushort4U __attribute__((ext_vector_type(4), aligned(2))); typedef unsigned short ushort4 __attribute__((ext_vector_type(4))); typedef unsigned short ushort8 __attribute__((ext_vector_type(8))); typedef int int4 __attribute__((ext_vector_type(4))); int4 __bbase_cvt_int(ushort4 v) { ushort8 a; a.lo = v; return _mm_cvtepu16_epi32(a); } This generates the, not unreasonable, IR: define <4 x i32> @foo0(double %v.coerce) nounwind ssp { %tmp = bitcast double %v.coerce to <4 x i16> %tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32 %0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> %tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1) ret <4 x i32> %tmp2 } The problem is when type legalization gets hold of the v4i16. It legalizes that by spilling to the stack, then doing a zero-extending load. Things go even more silly from there, ending up with something like: _foo0: movsd %xmm0, -8(%rsp) <== Spill to the stack. movq -8(%rsp), %xmm0 <== Reload it right back out. pmovzxwd %xmm0, %xmm1 <== Here's what we actually asked for. pblendw $1, %xmm1, %xmm0 <== We don't need this at all pmovzxwd %xmm0, %xmm0 <== We already did this ret The v8i8 to v8i16 zext intrinsic gives even worse results, with two table lookups via pshufb instructions(!!). To avoid all that, we can move the bitcasting until after we've formed the wider (legal) vector type. Then our normal codegen flows along nicely and we get the expected: _foo0: pmovzxwd %xmm0, %xmm0 ret rdar://15245794 llvm-svn: 192866
* Re-commit r192758 - MC: quote tricky symbol names in asm outputHans Wennborg2013-10-173-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | The reason this got reverted was that the @feat.00 symbol which was emitted for every TU became quoted, and on cygwin/mingw we use the gas assembler which couldn't handle the quotes. This commit fixes the problem by only emitting @feat.00 for win32, where we use clang -cc1as to assemble. gas would just drop this symbol anyway, so there is no loss there. With @feat.00 gone, there shouldn't be quoted symbols showing up on cygwin since it uses the Itanium ABI, which doesn't put these funny characters in symbols. > Because of win32 mangling, we produce symbol and section names with > funny characters in them, most notably @ characters. > > MC would choke on trying to parse its own assembly output. This patch addresses > that by: > > - Making @ trigger quoting of symbol names > - Also quote section names in the same way > - Just parse section names like other identifiers (to allow for quotes) > - Don't assume @ signifies a symbol variant if it is in a string. llvm-svn: 192859
* Enabling 3DNow! prefetch instruction for a few AMD processors: bobcat, jaguar,Yunzhong Gao2013-10-161-0/+3
| | | | | | | | | bulldozer and piledriver. Support for the instruction itself seems to have already been added in r178040. Differential Revision: http://llvm-reviews.chandlerc.com/D1933 llvm-svn: 192828
* DAGCombiner: Don't fold xor into not if getNOT would introduce an illegal ↵Benjamin Kramer2013-10-161-0/+14
| | | | | | | | | | | constant. This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly because i64 is illegal. It would be nice if getNOT would handle this transparently, but I don't see a way to generate a legal constant there right now. Fixes PR17487. llvm-svn: 192795
* Revert r192758 (and r192759), "MC: Better handling of tricky symbol and ↵NAKAMURA Takumi2013-10-163-4/+4
| | | | | | | | | | | | | | | section names" GNU AS didn't like quotes in symbol names. Error: junk at end of line, first unrecognized character is `"' .def "@feat.00"; "@feat.00" = 1 Reproduced on Cygwin's 2.23.52.20130309 and mingw32's 2.20.1.20100303. llvm-svn: 192775
* Add a triple to this test.Rafael Espindola2013-10-161-1/+1
| | | | llvm-svn: 192767
* Add support for metadata representing .ident directives.Rafael Espindola2013-10-161-0/+9
| | | | llvm-svn: 192764
* MC: Better handling of tricky symbol and section namesHans Wennborg2013-10-163-4/+4
| | | | | | | | | | | | | | | | | Because of win32 mangling, we produce symbol and section names with funny characters in them, most notably @ characters. MC would choke on trying to parse its own assembly output. This patch addresses that by: - Making @ trigger quoting of symbol names - Also quote section names in the same way - Just parse section names like other identifiers (to allow for quotes) - Don't assume @ signifies a symbol variant if it is in a string. Differential Revision: http://llvm-reviews.chandlerc.com/D1945 llvm-svn: 192758
* Enable MI Sched for x86.Andrew Trick2013-10-1565-278/+336
| | | | | | | | | | | | | | | | | | | | | | | | | | This changes the SelectionDAG scheduling preference to source order. Soon, the SelectionDAG scheduler can be bypassed saving a nice chunk of compile time. Performance differences that result from this change are often a consequence of register coalescing. The register coalescer is far from perfect. Bugs can be filed for deficiencies. On x86 SandyBridge/Haswell, the source order schedule is often preserved, particularly for small blocks. Register pressure is generally improved over the SD scheduler's ILP mode. However, we are still able to handle large blocks that require latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also attempts to discover the critical path in single-block loops and adjust heuristics accordingly. The MI scheduler relies on the new machine model. This is currently unimplemented for AVX, so we may not be generating the best code yet. Unit tests are updated so they don't depend on SD scheduling heuristics. llvm-svn: 192750
* Fix PR17546Michael Liao2013-10-151-0/+10
| | | | | | | | | | - Type of index used in extract_vector_elt or insert_vector_elt supposes to be TLI.getVectorIdxTy() which is pointer type on most targets. It'd better to truncate (or zero-extend in case it's changed later) it to mask element type to guarantee they are matching instead of asserting that. llvm-svn: 192722
* Fix PR16807Michael Liao2013-10-151-0/+18
| | | | | | | | | | - Lower signed division by constant powers-of-2 to target-independent DAG operators instead of target-dependent ones to support them better on targets where vector types are legal but shift operators on that types are illegal. E.g., on AVX, PSRAW is only available on <8 x i16> though <16 x i16> is a legal type. llvm-svn: 192721
* llvm/test/CodeGen/X86/break-avx-dep.ll: Relax an expression to be matched to ↵NAKAMURA Takumi2013-10-151-1/+1
| | | | | | also r[89], not only rXX. llvm-svn: 192675
* Improve on r192635, ExeDepsFix for avx, and add a test case.Andrew Trick2013-10-151-0/+29
| | | | | | | | | rdar:15221834 False AVX register dependencies cause 5x slowdown on flops-5/6 and significant slowdown on several others. This was blocking the switch to MI-Sched. llvm-svn: 192669
* [X86][FastISel] During X86 fastisel, the address of indirect call was resolvedQuentin Colombet2013-10-141-0/+132
| | | | | | | | | | | | | | | through bitcast, ptrtoint, and inttoptr instructions. This is valid only if the related instructions are in that same basic block, otherwise we may reference variables that were not live accross basic blocks resulting in undefined virtual registers. The bug was exposed when both SDISel and FastISel were used within the same function, i.e., one basic block is issued with FastISel and another with SDISel, as demonstrated with the testcase. <rdar://problem/15192473> llvm-svn: 192636
* Fix a typo, in a comment, in a test.Nick Lewycky2013-10-141-1/+1
| | | | llvm-svn: 192632
* Revert part of a fix from 2010, changes since then:Eric Christopher2013-10-141-1/+5
| | | | | | | | | | | | a) x86-64 TLS has been documented b) the code path should use movq for the correct relocation to be generated. I've also added a fixme for the test case that we should improve the code generated, it should look something like is documented in the tls abi document. llvm-svn: 192631
* MachineSink: Fix and tweak critical-edge breaking heuristic.Will Dietz2013-10-144-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per original comment, the intention of this loop is to go ahead and break the critical edge (in order to sink this instruction) if there's reason to believe doing so might "unblock" the sinking of additional instructions that define registers used by this one. The idea is that if we have a few instructions to sink "together" breaking the edge might be worthwhile. This commit makes a few small changes to help better realize this goal: First, modify the loop to ignore registers defined by this instruction. We don't sink definitions of physical registers, and sinking an SSA definition isn't going to unblock an upstream instruction. Second, ignore uses of physical registers. Instructions that define physical registers are rejected for sinking, and so moving this one won't enable moving any defining instructions. As an added bonus, while virtual register use-def chains are generally small due to SSA goodness, iteration over the uses and definitions (used by hasOneNonDBGUse) for physical registers like EFLAGS can be rather expensive in practice. (This is the original reason for looking at this) Finally, to keep things simple continue to only consider this trick for registers that have a single use (via hasOneNonDBGUse), but to avoid spuriously breaking critical edges only do so if the definition resides in the same MBB and therefore this one directly blocks it from being sunk as well. If sinking them together is meant to be, let the iterative nature of this pass sink the definition into this block first. Update tests to accomodate this change, add new testcase where sinking avoids pipeline stalls. llvm-svn: 192608
OpenPOWER on IntegriCloud