summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* These test cases for experimental features are a bit too darwin-specific ↵Andrew Trick2013-10-312-4/+4
| | | | | | still. Use a triple. llvm-svn: 193820
* [AArch64] Add support for NEON scalar fixed-point convert to floating-point ↵Chad Rosier2013-10-311-0/+48
| | | | | | instructions. llvm-svn: 193816
* Add new calling convention for WebKit Java Script.Andrew Trick2013-10-311-0/+20
| | | | llvm-svn: 193812
* Add support for stack map generation in the X86 backend.Andrew Trick2013-10-312-0/+253
| | | | | | Originally implemented by Lang Hames. llvm-svn: 193811
* [AArch64] Add support for NEON scalar shift immediate instructions.Chad Rosier2013-10-311-0/+527
| | | | llvm-svn: 193790
* SparcV9 doesnt have rem instruction either.Roman Divacky2013-10-311-0/+23
| | | | llvm-svn: 193789
* Merge and filecheckize.Roman Divacky2013-10-312-8/+16
| | | | llvm-svn: 193778
* Add AVX512 unmasked integer broadcast intrinsics and support.Cameron McInally2013-10-311-0/+28
| | | | llvm-svn: 193748
* AVX-512: Implemented CMOV for 512-bit vectorsElena Demikhovsky2013-10-311-0/+22
| | | | llvm-svn: 193747
* [SystemZ] Automatically detect zEC12 and z196 hostsRichard Sandiford2013-10-3120-42/+72
| | | | | | | | | | As on other hosts, the CPU identification instruction is priveleged, so we need to look through /proc/cpuinfo. I copied the PowerPC way of handling "generic". Several tests were implicitly assuming z10 and so failed on z196. llvm-svn: 193742
* [AArch64] Make the use of FP instructions optional, but enabled by default.Amara Emerson2013-10-3110-4/+127
| | | | | | | This adds a new subtarget feature called FPARMv8 (implied by NEON), and predicates the support of the FP instructions and registers on this feature. llvm-svn: 193739
* Legalize: Improve legalization of long vector extends.Jim Grosbach2013-10-311-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an extend more than doubles the size of the elements (e.g., a zext from v16i8 to v16i32), the normal legalization method of splitting the vectors will run into problems as by the time the destination vector is legal, the source vector is illegal. The end result is the operation often becoming scalarized, with the typical horrible performance. For example, on x86_64, the simple input of: define void @bar(<16 x i8> %a, <16 x i32>* %p) nounwind { %tmp = zext <16 x i8> %a to <16 x i32> store <16 x i32> %tmp, <16 x i32>*%p ret void } Generates: .section __TEXT,__text,regular,pure_instructions .section __TEXT,__const .align 5 LCPI0_0: .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .long 255 ## 0xff .section __TEXT,__text,regular,pure_instructions .globl _bar .align 4, 0x90 _bar: vpunpckhbw %xmm0, %xmm0, %xmm1 vpunpckhwd %xmm0, %xmm1, %xmm2 vpmovzxwd %xmm1, %xmm1 vinsertf128 $1, %xmm2, %ymm1, %ymm1 vmovaps LCPI0_0(%rip), %ymm2 vandps %ymm2, %ymm1, %ymm1 vpmovzxbw %xmm0, %xmm3 vpunpckhwd %xmm0, %xmm3, %xmm3 vpmovzxbd %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vandps %ymm2, %ymm0, %ymm0 vmovaps %ymm0, (%rdi) vmovaps %ymm1, 32(%rdi) vzeroupper ret So instead we can check if there are legal types that enable us to split more cleverly when the input vector is already legal such that we don't turn it into an illegal type. If the extend is such that it's more than doubling the size of the input we check if - the number of vector elements is even, - the source type is legal, - the type of a split source is illegal, - the type of an extended (by doubling element size) source is legal, and - the type of that extended source when split is legal. If the conditions are met, instead of just splitting both the destination and the source types, we create an extend that only goes up one "step" (doubling the element width), and the continue legalizing the rest of the operation normally. The result is that this operates as a new, more effecient, termination condition for the loop of "split the operation until the destination type is legal." With this change, the above example now compiles to: _bar: vpxor %xmm1, %xmm1, %xmm1 vpunpcklbw %xmm1, %xmm0, %xmm2 vpunpckhwd %xmm1, %xmm2, %xmm3 vpunpcklwd %xmm1, %xmm2, %xmm2 vinsertf128 $1, %xmm3, %ymm2, %ymm2 vpunpckhbw %xmm1, %xmm0, %xmm0 vpunpckhwd %xmm1, %xmm0, %xmm3 vpunpcklwd %xmm1, %xmm0, %xmm0 vinsertf128 $1, %xmm3, %ymm0, %ymm0 vmovaps %ymm0, 32(%rdi) vmovaps %ymm2, (%rdi) vzeroupper ret This generalizes a custom lowering that was added a while back to the ARM backend. That lowering is no longer necessary, and is removed. The testcases for it, however, provide excellent ARM tests for this change and so remain. rdar://14735100 llvm-svn: 193727
* Fix CodeGen for unaligned loads with address spacesMatt Arsenault2013-10-301-0/+19
| | | | llvm-svn: 193721
* Produce .weak_def_can_be_hidden for some linkonce_odr valuesRafael Espindola2013-10-301-0/+26
| | | | | | | | | | | | | | With this patch llvm produces a weak_def_can_be_hidden for linkonce_odr if they are also unnamed_addr or don't have their address taken. There is not a lot of documentation about .weak_def_can_be_hidden, but from the old discussion about linkonce_odr_auto_hide and the name of the directive this looks correct: these symbols can be hidden. Testing this with the ld64 in Xcode 5 linking clang reduces the number of exported symbols from 21053 to 19049. llvm-svn: 193718
* R600: Custom lower f32 = uint_to_fp i64Tom Stellard2013-10-301-4/+19
| | | | llvm-svn: 193701
* [mips][msa] Correct definition of bins[lr] and CHECK-DAG-ize related testsDaniel Sanders2013-10-301-73/+121
| | | | llvm-svn: 193695
* [mips][msa] Added support for matching bmnz, bmnzi, bmz, and bmzi from ↵Daniel Sanders2013-10-304-162/+299
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | normal IR (i.e. not intrinsics) Also corrected the definition of the intrinsics for these instructions (the result register is also the first operand), and added intrinsics for bsel and bseli to clang (they already existed in the backend). These four operations are mostly equivalent to bsel, and bseli (the difference is which operand is tied to the result). As a result some of the tests changed as described below. bitwise.ll: - bsel.v test adapted so that the mask is unknown at compile-time. This stops it emitting bmnzi.b instead of the intended bsel.v. - The bseli.b test now tests the right thing. Namely the case when one of the values is an uimm8, rather than when the condition is a uimm8 (which is covered by bmnzi.b) compare.ll: - bsel.v tests now (correctly) emits bmnz.v instead of bsel.v because this is the same operation (see MSA.txt). i8.ll - CHECK-DAG-ized test. - bmzi.b test now (correctly) emits equivalent bmnzi.b with swapped operands because this is the same operation (see MSA.txt). - bseli.b still emits bseli.b though because the immediate makes it distinguishable from bmnzi.b. vec.ll: - CHECK-DAG-ized test. - bmz.v tests now (correctly) emits bmnz.v with swapped operands (see MSA.txt). - bsel.v tests now (correctly) emits bmnz.v with swapped operands (see MSA.txt). llvm-svn: 193693
* [AArch64] Add support for NEON scalar floating-point compare instructions.Chad Rosier2013-10-302-19/+335
| | | | llvm-svn: 193691
* [mips][msa] Added support for matching bins[lr]i.[bhwd] from normal IR (i.e. ↵Daniel Sanders2013-10-302-84/+216
| | | | | | | | | | | | | | | | | not intrinsics) This required correcting the definition of the bins[lr]i intrinsics because the result is also the first operand. It also required removing the (arbitrary) check for 32-bit immediates in MipsSEDAGToDAGISel::selectVSplat(). Currently using binsli.d with 2 bits set in the mask doesn't select binsli.d because the constant is legalized into a ConstantPool. Similar things can happen with binsri.d with more than 10 bits set in the mask. The resulting code when this happens is correct but not optimal. llvm-svn: 193687
* [mips][msa] Combine binsri-like DAG of AND and OR into equivalent VSELECTDaniel Sanders2013-10-301-0/+164
| | | | | | | | | | | | | | | (or (and $a, $mask), (and $b, $inverse_mask)) => (vselect $mask, $a, $b). where $mask is a constant splat. This allows bitwise operations to make use of bsel. It's also a stepping stone towards matching bins[lr], and bins[lr]i from normal IR. Two sets of similar tests have been added in this commit. The bsel_* functions test the case where binsri cannot be used. The binsr_*_i functions will start to use the binsri instruction in the next commit. llvm-svn: 193682
* [mips][msa] Added support for matching splat.[bhw] from normal IR (i.e. not ↵Daniel Sanders2013-10-301-33/+44
| | | | | | | | | | intrinsics) splat.d is implemented but this subtest is currently disabled. This is because it is difficult to match the appropriate IR on MIPS32. There is a patch under review that should help with this so I hope to enable the subtest soon. llvm-svn: 193680
* Revert "SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs ↵Juergen Ributzka2013-10-301-42/+0
| | | | | | | | splitting too." Now Hexagon and SystemZ are not happy with it :-( llvm-svn: 193677
* SelectionDAG: Teach the legalizer to split SETCC if VSELECT needs splitting too.Juergen Ributzka2013-10-301-0/+42
| | | | | | | | | | | | | | | | | | | | The Type Legalizer recognizes that VSELECT needs to be split, because the type is to wide for the given target. The same does not always apply to SETCC, because less space is required to encode the result of a comparison. As a result VSELECT is split and SETCC is unrolled into scalar comparisons. This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG Combiner. If a matching pattern is found, then the result mask of SETCC is promoted to the expected vector mask type for the given target. This mask has usually the same size as the VSELECT return type (except for Intel KNL). Now the type legalizer will split both VSELECT and SETCC. This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>. Reviewed by Nadav llvm-svn: 193676
* [mips] Align the stack to 16-bytes for mfp64.Akira Hatanaka2013-10-291-0/+14
| | | | llvm-svn: 193641
* add test cases for frameaddr and returnaddr for aarch64Weiming Zhao2013-10-292-0/+41
| | | | llvm-svn: 193626
* R600/SI: Add compute support for CI v2Tom Stellard2013-10-291-7/+11
| | | | | | | | v2: - Fix LDS size calculation Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 193621
* R600: Expand vector FSQRT opsTom Stellard2013-10-291-0/+54
| | | | llvm-svn: 193620
* AArch64: add 'a' inline asm operand modifierTim Northover2013-10-291-0/+9
| | | | | | | This is used in the Linux kernel, and effectively just means "print an address". llvm-svn: 193593
* Convert another llc -filetype=obj test.Rafael Espindola2013-10-281-29/+0
| | | | llvm-svn: 193548
* Convert another llc -filetype=obj test.Rafael Espindola2013-10-281-34/+0
| | | | llvm-svn: 193547
* Convert another llc -filetype=obj test.Rafael Espindola2013-10-281-31/+0
| | | | llvm-svn: 193546
* Convert another llc -filetype=obj test.Rafael Espindola2013-10-281-27/+0
| | | | llvm-svn: 193539
* Convert another llc -filetype=obj test.Rafael Espindola2013-10-281-16/+0
| | | | llvm-svn: 193538
* Convert another llc -filetype=obj test.Rafael Espindola2013-10-281-17/+0
| | | | llvm-svn: 193537
* Convert another llc -filetype=obj test.Rafael Espindola2013-10-281-17/+0
| | | | llvm-svn: 193536
* Convert a llc -filetype=obj test into a llvm-mc test.Rafael Espindola2013-10-281-16/+0
| | | | llvm-svn: 193534
* [arm] Implement eabi_attribute, cpu, and fpu directives.Logan Chien2013-10-282-38/+259
| | | | | | | | | | | | | | | | | | | | | | | | | | This commit allows the ARM integrated assembler to parse and assemble the code with .eabi_attribute, .cpu, and .fpu directives. To implement the feature, this commit moves the code from AttrEmitter to ARMTargetStreamers, and several new test cases related to cortex-m4, cortex-r5, and cortex-a15 are added. Besides, this commit also change the Subtarget->isFPOnlySP() to Subtarget->hasD16() to match the usage of .fpu directive. This commit changes the test cases: * Several .eabi_attribute directives in 2010-09-29-mc-asm-header-test.ll are removed because the .fpu directive already cover the functionality. * In the Cortex-A15 test case, the value for Tag_Advanced_SIMD_arch has be changed from 1 to 2, which is more precise. llvm-svn: 193524
* [SystemZ] Set usaAA to trueRichard Sandiford2013-10-285-18/+19
| | | | | | | | | | | | | | | | useAA significantly improves the handling of vector code that has TBAA information attached. It also helps other cases, as shown by the testsuite changes here. The only real downside I've seen is that it interferes with MergeConsecutiveStores. The problem is that that optimization works top down, starting at the first store in the chain, and looks for cases where the chain result is only used by a single related store. These related stores don't alias, so useAA will have rewritten all the later stores to use a different chain input (typically the same one as the first store). I think the advantages outweigh the disadvantages though, so for now I've just disabled alias analysis for the unaligned-01.ll test. llvm-svn: 193521
* [DAGCombiner] Respect volatility when checking for aliasesRichard Sandiford2013-10-281-1/+2
| | | | | | | | Making useAA() default to true for SystemZ showed that the combiner alias analysis wasn't handling volatile accesses. This hit many of the SystemZ tests, but I arbitrarily picked one for the purpose of this patch. llvm-svn: 193518
* Keep TBAA info when rewriting SelectionDAG loads and storesRichard Sandiford2013-10-281-0/+20
| | | | | | | | | | | | | | | | | Most SelectionDAG code drops the TBAA info when creating a new form of a load and store (e.g. during legalization, or when converting a plain load to an extending one). This patch tries to catch all cases where the TBAA information can legitimately be carried over. The patch adds alternative forms of getLoad() and getExtLoad() that take a MachineMemOperand instead of individual fields. (The corresponding getTruncStore() already exists.) The idea is to use the MachineMemOperand forms when all fields are carried over (size, pointer info, isVolatile, isNonTemporal, alignment and TBAA info). If some adjustment is being made, e.g. to narrow the load, then we still pass the individual fields but also pass the TBAA info. llvm-svn: 193517
* Make first substantial checkin of my port of ARM constant islands code to Mips.Reed Kotler2013-10-271-0/+35
| | | | | | | | | | | | Before I just ported the shell of the pass. I've tried to keep everything nearly identical to the ARM version. I think it will be very easy to eventually merge these two and create a new more general pass that other targets can use. I have some improvements I would like to make to allow pools to be shared across functions and some other things. When I'm all done we can think about making a more general pass. More to be ported but the basic mechanism works now almost as good as gcc mips16. llvm-svn: 193509
* AVX-512: PMIN/PMAX intrinsics and patternsElena Demikhovsky2013-10-271-0/+56
| | | | | | Patch by Cameron McInally <cameron.mcinally@nyu.edu> llvm-svn: 193497
* [X86][AVX512] Add patterns that match the AVX512 floating point register ↵Quentin Colombet2013-10-251-0/+14
| | | | | | | | vbroadcast intrinsics. Patch by Cameron McInally <cameron.mcinally@nyu.edu> llvm-svn: 193422
* [X86][AVX512] Add patterns that match the AVX512 floating point vbroadcast ↵Quentin Colombet2013-10-251-0/+14
| | | | | | | | intrinsics. Patch by Cameron McInally <cameron.mcinally@nyu.edu> llvm-svn: 193421
* ARM: don't expand atomicrmw inline on Cortex-M0Tim Northover2013-10-251-0/+1
| | | | | | | | | | There's a barrier instruction so that should still be used, but most actual atomic operations are going to need a platform decision on the correct behaviour (either nop if single-threaded or OS-support otherwise). rdar://problem/15287210 llvm-svn: 193399
* LegalizeDAG: allow libcalls for max/min atomic operationsTim Northover2013-10-251-0/+24
| | | | | | | | | | | ARM processors without ldrex/strex need to be able to make libcalls for all atomic operations, including the newer min/max versions. The alternative would probably be expanding these operations in terms of cmpxchg (as x86 does always), but in the configurations where this matters code-size tends to be paramount so the libcall is more desirable. llvm-svn: 193398
* ARM: Test r193381 a bit more thoroughly.Jim Grosbach2013-10-241-0/+2
| | | | | | Make sure we're predicating right based on CPU even if the triple is 'wrong'. llvm-svn: 193382
* ARM: Tweak usage of '*vfp' compiler_rt functions.Jim Grosbach2013-10-241-2/+2
| | | | | | | | | Only use them if the subtarget has ARM mode, as these routines are implemented as ARM code. rdar://15302004 llvm-svn: 193381
* ARM: Use non-VFP softcalls on embedded Darwinish targetsTim Northover2013-10-241-0/+22
| | | | | | | | | | | | | The compiler-rt functions __adddf3vfp and so on exist purely to allow Thumb1 code to make use of VFP instructions by switching back to ARM mode, they make no sense for M-class processors which don't even have an ARM mode. Given that justification, in practice this is a platform ABI decision so the actual check is based on that rather than CPU features. rdar://problem/15302004 llvm-svn: 193327
* Added test for -elf configuration, to see that _alloca call is properly Yaron Keren2013-10-241-9/+16
| | | | | | | | generated. See: http://llvm.org/viewvc/llvm-project?view=revision&revision=193289 llvm-svn: 193321
OpenPOWER on IntegriCloud