summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [ARM] Add earlyclobber constraint to pre/post-indexed ARM STR instructions.Tilmann Scheller2014-07-181-4/+6
| | | | | | | | | | The post-indexed instructions were missing the constraint, causing unpredictable STR instructions to be emitted. The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed. This fixes PR20323. llvm-svn: 213369
* Refactor ARM subarchitecture parsingRenato Golin2014-07-182-82/+99
| | | | | | | | | Re-commit of a patch to rework the triple parsing on ARM to a more sane model. Patch by Gabor Ballabas. llvm-svn: 213367
* extracting swapStruct into include/llvm/Support/MachO.h (no functional change)Artyom Skrobov2014-07-181-208/+9
| | | | llvm-svn: 213361
* R600: support f16 -> f64 conversion intrinsic.Tim Northover2014-07-181-0/+2
| | | | | | | | Unfortunately, we don't seem to have a direct truncation, but the extension can be legally split into two operations so we should support that. llvm-svn: 213357
* NVPTX: support direct f16 <-> f64 conversions via intrinsics.Tim Northover2014-07-181-0/+5
| | | | | | | | Clang may well start emitting these soon, and while it may not be directly relevant for OpenCL or GLSL, the instructions were just sitting there waiting to be used. llvm-svn: 213356
* Rename AlignAttribute to IntAttributeHal Finkel2014-07-184-27/+27
| | | | | | | | | | | | Currently the only kind of integer IR attributes that we have are alignment attributes, and so the attribute kind that takes an integer parameter is called AlignAttr, but that will change (we'll soon be adding a dereferenceable attribute that also takes an integer value). Accordingly, rename AlignAttribute to IntAttribute (class names, enums, etc.). No functionality change intended. llvm-svn: 213352
* R600: Implement TTI:getPopcntSupportMatt Arsenault2014-07-182-2/+12
| | | | | | | The test is just copied from X86, and I don't know of a better way to test it. llvm-svn: 213351
* X86: Constant fold converting vector setcc results to float.Jim Grosbach2014-07-181-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the result of a SETCC for X86 is 0 or -1 in each lane, we can move unary operations, in this case [su]int_to_fp through the mask operation and constant fold the operation away. Generally speaking: UNARYOP(AND(VECTOR_CMP(x,y), constant)) --> AND(VECTOR_CMP(x,y), constant2) where constant2 is UNARYOP(constant). This implements the transform where UNARYOP is [su]int_to_fp. For example, consider the simple function: define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind { %cmp = fcmp oeq <4 x float> %val, %test %ext = zext <4 x i1> %cmp to <4 x i32> %result = sitofp <4 x i32> %ext to <4 x float> ret <4 x float> %result } Before this change, the SSE code is generated as: LCPI0_0: .long 1 ## 0x1 .long 1 ## 0x1 .long 1 ## 0x1 .long 1 ## 0x1 .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo cmpeqps %xmm1, %xmm0 andps LCPI0_0(%rip), %xmm0 cvtdq2ps %xmm0, %xmm0 retq After, the code is improved to: LCPI0_0: .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .long 1065353216 ## float 1.000000e+00 .section __TEXT,__text,regular,pure_instructions .globl _foo .align 4, 0x90 _foo: ## @foo cmpeqps %xmm1, %xmm0 andps LCPI0_0(%rip), %xmm0 retq The cvtdq2ps has been constant folded away and the floating point 1.0f vector lanes are materialized directly via the ModRM operand of andps. llvm-svn: 213342
* AArch64: Constant fold converting vector setcc results to float.Jim Grosbach2014-07-182-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the result of a SETCC for AArch64 is 0 or -1 in each lane, we can move unary operations, in this case [su]int_to_fp through the mask operation and constant fold the operation away. Generally speaking: UNARYOP(AND(VECTOR_CMP(x,y), constant)) --> AND(VECTOR_CMP(x,y), constant2) where constant2 is UNARYOP(constant). This implements the transform where UNARYOP is [su]int_to_fp. For example, consider the simple function: define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind { %cmp = fcmp oeq <4 x float> %val, %test %ext = zext <4 x i1> %cmp to <4 x i32> %result = sitofp <4 x i32> %ext to <4 x float> ret <4 x float> %result } Before this change, the code is generated as: fcmeq.4s v0, v0, v1 movi.4s v1, #0x1 // Integer splat value. and.16b v0, v0, v1 // Mask lanes based on the comparison. scvtf.4s v0, v0 // Convert each lane to f32. ret After, the code is improved to: fcmeq.4s v0, v0, v1 fmov.4s v1, #1.00000000 // f32 splat value. and.16b v0, v0, v1 // Mask lanes based on the comparison. ret The svvtf.4s has been constant folded away and the floating point 1.0f vector lanes are materialized directly via fmov.4s. Rather than do the folding manually in the target code, teach getNode() in the generic SelectionDAG to handle folding constant operands of vector [su]int_to_fp nodes. It is reasonable (as noted in a FIXME) to do additional constant folding there as well, but I don't have test cases for those operations, so leaving them for another time when it becomes appropriate. rdar://17693791 llvm-svn: 213341
* Revert "[x86] Fold extract_vector_elt of a load into the Load's address ↵Michael J. Spencer2014-07-181-124/+90
| | | | | | | | | computation." There's a bug where this can create cycles in the DAG. It will take a bit to fix, so I'm backing it out for now. llvm-svn: 213339
* Reset the Subtarget in the AsmPrinter for each machine functionEric Christopher2014-07-182-6/+11
| | | | | | | and add explanatory comment about dual initialization. Fix use of the Subtarget to grab the information off of the target machine. llvm-svn: 213336
* Avoid resetting the UseSoftFloat and FloatABIType on the TargetMachineEric Christopher2014-07-187-18/+15
| | | | | | | | | | Options struct and move the comment to inMips16HardFloat. Use the fact that we now know whether or not we cared about soft float to set the libcalls. Accordingly rename mipsSEUsesSoftFloat to abiUsesSoftFloat and propagate since it's no longer CPU specific. llvm-svn: 213335
* [MCJIT] Fix the alignment requirements for ARM and AArch64 which were mistakenlyLang Hames2014-07-172-2/+2
| | | | | | | | | | relaxed in the big RuntimeDyldMachO cleanup of r213293. No test case yet - this was found via inspection and there's no easy way to test GOT alignment in RuntimeDyldChecker at the moment. I'm working on adding support for this now, and hope to have a test case for this soon. llvm-svn: 213331
* ms inline asm: Don't add x86 segment registers to the clobber list.Nico Weber2014-07-172-1/+7
| | | | | | | Clang tries to check the clobber list but doesn't list segment registers in its x86 register list. This fixes PR20343. llvm-svn: 213303
* Drop the udis86 wrapper from llvm::sysAlp Toker2014-07-173-82/+0
| | | | | | | | This optional dependency on the udis86 library was added some time back to aid JIT development, but doesn't make much sense to link into LLVM binaries these days. llvm-svn: 213300
* [AArch64] Cleanup AsmParser: no need to use dyn_cast + assert. cast does it ↵Arnaud A. de Grandmaison2014-07-171-41/+21
| | | | | | for us. llvm-svn: 213296
* Rectify r213231. Use proper version of 'ComputeNumSignBits'.Suyog Sarda2014-07-171-1/+1
| | | | | | | | | Earlier when the code was in InstCombine, we were calling the version of ComputeNumSignBits in InstCombine.h that automatically added the DataLayout* before calling into ValueTracking. When the code moved to InstSimplify, we are calling into ValueTracking directly without passing in the DataLayout*. This patch rectifies the same by passing DataLayout in ComputeNumSignBits. llvm-svn: 213295
* [MCJIT] Significantly refactor the RuntimeDyldMachO class.Lang Hames2014-07-177-826/+1097
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous implementation of RuntimeDyldMachO mixed logic for all targets within a single class, creating problems for readability, maintainability, and performance. To address these issues, this patch strips the RuntimeDyldMachO class down to just target-independent functionality, and moves all target-specific functionality into target-specific subclasses RuntimeDyldMachO. The new class hierarchy is as follows: class RuntimeDyldMachO Implemented in RuntimeDyldMachO.{h,cpp} Contains logic that is completely independent of the target. This consists mostly of MachO helper utilities which the derived classes use to get their work done. template <typename Impl> class RuntimeDyldMachOCRTPBase<Impl> : public RuntimeDyldMachO Implemented in RuntimeDyldMachO.h Contains generic MachO algorithms/data structures that defer to the Impl class for target-specific behaviors. RuntimeDyldMachOARM : public RuntimeDyldMachOCRTPBase<RuntimeDyldMachOARM> RuntimeDyldMachOARM64 : public RuntimeDyldMachOCRTPBase<RuntimeDyldMachOARM64> RuntimeDyldMachOI386 : public RuntimeDyldMachOCRTPBase<RuntimeDyldMachOI386> RuntimeDyldMachOX86_64 : public RuntimeDyldMachOCRTPBase<RuntimeDyldMachOX86_64> Implemented in their respective *.h files in lib/ExecutionEngine/RuntimeDyld/MachOTargets Each of these contains the relocation logic specific to their target architecture. llvm-svn: 213293
* [ASan] Don't instrument load/stores with !nosanitize metadata.Alexey Samsonov2014-07-171-0/+3
| | | | | | | | | This is used to avoid instrumentation of instructions added by UBSan in Clang frontend (see r213291). This fixes PR20085. Reviewed in http://reviews.llvm.org/D4544. llvm-svn: 213292
* Typo: exists -> exitsHans Wennborg2014-07-171-1/+1
| | | | llvm-svn: 213290
* [NVPTX] Improve handling of FP fusionJustin Holewinski2014-07-175-48/+62
| | | | | | | | | We now consider the FPOpFusion flag when determining whether to fuse ops. We also explicitly emit add.rn when fusion is disabled to prevent ptxas from fusing the operations on its own. llvm-svn: 213287
* Fix typosMatt Arsenault2014-07-171-3/+3
| | | | llvm-svn: 213285
* [X86] AVX512: Add disassembler support for compressed displacementAdam Nemet2014-07-173-3/+21
| | | | | | | | | | | | There are two parts here. First is to modify tablegen to adjust the encoding type ENCODING_RM with the scaling factor. The second is to use the new encoding types to compute the correct displacement in the decoder. Fixes <rdar://problem/17608489> llvm-svn: 213281
* [X86] AVX512: Rename EVEX_CD8V to CD8_FormAdam Nemet2014-07-171-5/+5
| | | | | | | | This is to match the naming of CD8_EltSize, CD8_Scale, etc. No functional change. llvm-svn: 213280
* [X86] AVX512: Use the TD version of CD8_Scale in the assemblerAdam Nemet2014-07-173-62/+16
| | | | | | | | | | | Passes the computed scaling factor in TSFlags rather than the old attributes. Also removes the C++ version of computing the scaling factor (MemObjSize) along with the asserts added by the previous patch. No functional change. llvm-svn: 213279
* [X86] AVX512: Move compressed displacement logic to TDAdam Nemet2014-07-172-0/+34
| | | | | | | | | | | | | | | This does not actually move the logic yet but reimplements it in the Tablegen language. Then asserts that the new implementation results in the same value. The next patch will remove the assert and the temporary use of the TSFlags and remove the C++ implementation. The formula requires a limited form of the logical left and right operators. I implemented these with the bit-extract/insert operator (i.e. blah{bits}). No functional change. llvm-svn: 213278
* [TableGen] Allow shift operators to take bits<n>Adam Nemet2014-07-171-2/+4
| | | | | | | | | | | | | | | Convert the operand to int if possible, i.e. if the value is properly initialized. (I suppose there is further room for improvement here to also peform the shift if the uninitialized bits are shifted out.) With this little change we can now compute the scaling factor for compressed displacement with pure tablegen code in the X86 backend. This is useful because both the X86-disassembler-specific part of tablegen and the assembler need this and TD is the natural sharing place. The patch also adds the missing documentation for the shift and add operator. llvm-svn: 213277
* [NVPTX] Add missing .v4 qualifier on vector store instructionJustin Holewinski2014-07-171-1/+1
| | | | llvm-svn: 213276
* MC: correct DWARF header for PE/COFF assembly inputSaleem Abdulrasool2014-07-171-4/+5
| | | | | | | | | | | | | | The header contains an offset to the DWARF abbreviations for the CU. The offset must be section relative for COFF and absolute for others. The non-assembly code path for the DWARF header generation already had the correct emission for the headers. This corrects just the assembly path. Due to the invalid relocation, processing of the debug information would halt previously on the first assembly input as the associated abbreviations would be out of range as they would have the location increased by image base and the section offset. This address PR20332. llvm-svn: 213275
* MC: fix MCAsmInfo usage for windows-itaniumSaleem Abdulrasool2014-07-171-1/+2
| | | | | | Windows itanium uses the GNUCOFF assmebly format, not ELF. llvm-svn: 213274
* MC: collapse emission of producerSaleem Abdulrasool2014-07-171-7/+3
| | | | | | | | Rather than use three EmitBytes, concatenate the string at compile time, constructing a single StringRef and emitting the data in one shot. This also creates nicer assembly output. NFC. llvm-svn: 213273
* [NVPTX] Flag surface/texture query instructions with IsTexSurfQueryJustin Holewinski2014-07-171-0/+6
| | | | | | | Also, add some tests to make sure we can handle surface/texture queries on both Fermi and Kepler+. llvm-svn: 213268
* [NVPTX] Add more surface/texture intrinsics, including CUDA unified texture ↵Justin Holewinski2014-07-179-801/+6542
| | | | | | | | | | | fetch This also uses TSFlags to mark machine instructions that are surface/texture accesses, as well as the vector width for surface operations. This is used to simplify some of the switch statements that need to detect surface/texture instructions llvm-svn: 213256
* ARM: support direct f16 <-> f64 conversionsTim Northover2014-07-172-7/+21
| | | | | | ARMv8 has instructions to handle it, otherwise a libcall is needed. llvm-svn: 213254
* CodeGen: generate single libcall for fptrunc -> f16 operations.Tim Northover2014-07-175-20/+32
| | | | | | | | | | | | Previously we asserted on this code. Currently compiler-rt doesn't actually implement any of these new libcalls, but external help is pretty much the only viable option for LLVM. I've followed the much more generic "__truncST2" naming, as opposed to the odd name for f32 -> f16 truncation. This can obviously be changed later, or overridden by any targets that need to. llvm-svn: 213252
* X86: support double extension of f16 type.Tim Northover2014-07-171-0/+4
| | | | | | | | | | | x86 has no native ability to extend an f16 to f64, but the same result is obtained if we expand it into two separate extensions: f16 -> f32 -> f64. Unfortunately the same is not true for truncate, so that still results in a compilation failure. llvm-svn: 213251
* CodeGen: extend f16 conversions to permit types > float.Tim Northover2014-07-1713-53/+80
| | | | | | | | | | | | | | | | | | | This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248
* Port memory barriers intrinsics to AArch64Yi Kong2014-07-172-6/+22
| | | | | | | | | | | Memory barrier __builtin_arm_[dmb, dsb, isb] intrinsics are required to implement their corresponding ACLE and MSVC intrinsics. This patch ports ARM dmb, dsb, isb intrinsic to AArch64. Differential Revision: http://reviews.llvm.org/D4520 llvm-svn: 213247
* [mips] .reginfo is 8 byte aligned on N32.Daniel Sanders2014-07-171-1/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D4540 llvm-svn: 213246
* [mips] Correct ELF e_flags for the N32 ABI when using a mips-* triple rather ↵Daniel Sanders2014-07-171-15/+11
| | | | | | | | | | | | | | | | | | | than a mips64-* triple Summary: Generally speaking, mips-* vs mips64-* should not be used to make decisions about the content or format of the ELF. This should be based on the ABI and CPU in use. For example, `mips-linux-gnu-clang -mips64r2 -mabi=64` should produce an ELF64 as should `mips64-linux-gnu-clang -mabi=64`. Conversely, `mips64-linux-gnu-clang -mabi=n32` should produce an ELF32 as should `mips-linux-gnu-clang -mips64r2 -mabi=n32`. This patch fixes the e_flags but leaves the ELF32 vs ELF64 issue for now since there is no apparent way to base this decision on the ABI and CPU. Differential Revision: http://reviews.llvm.org/D4539 llvm-svn: 213244
* [mips] Correct .MIPS.abiflags for -mfpxx on MIPS32r6Daniel Sanders2014-07-172-4/+10
| | | | | | | | | | | Summary: The cpr1_size field describes the minimum register width to run the program rather than the size of the registers on the target. MIPS32r6 was acting as if -mfp64 has been given because it starts off with 64-bit FPU registers. Differential Revision: http://reviews.llvm.org/D4538 llvm-svn: 213243
* [mips] Fix ELF e_flags related to -mabicalls and -mplt.Daniel Sanders2014-07-171-0/+6
| | | | | | | | | | | | | | Summary: These options are not implemented yet but we act as if they are always given. The integrated assembler is driven by the clang driver so the e_flag test cases should match the e_flags emitted by GCC+GAS rather than GAS by itself. Differential Revision: http://reviews.llvm.org/D4536 llvm-svn: 213242
* Fix the prefix for arm64 tripleYi Kong2014-07-171-3/+2
| | | | | | | | | | | | Triple.cpp still returns "arm64" as prefix for arm64 triple, causing Clang not being able to select the correct GCCBuiltin IR. This patch changes the value to correct prefix "aarch64". Regression test will be added in the coming patch. Differential Revision: http://reviews.llvm.org/D4516 llvm-svn: 213240
* [msan] Avoid redundant origin stores.Evgeniy Stepanov2014-07-171-1/+4
| | | | | | | | | | | | | Origin is meaningless for fully initialized values. Avoid storing origin for function arguments that are known to be always initialized (i.e. shadow is a compile-time null constant). This is not about correctness, but purely an optimization. Seems to affect compilation time of blacklisted functions significantly. llvm-svn: 213239
* Move ashr optimization from InstCombineShift to InstSimplify.Suyog Sarda2014-07-172-5/+5
| | | | | | | | | Refactor code, no functionality change, test case moved from instcombine to instsimplify. Differential Revision: http://reviews.llvm.org/D4102 llvm-svn: 213231
* Use range forMatt Arsenault2014-07-171-6/+4
| | | | llvm-svn: 213230
* R600: Short circuit alloca check if address space isn't private.Matt Arsenault2014-07-171-1/+1
| | | | | | | Skip calling GetUnderlyingObject in cases where it obviously isn't from an alloca. This should only be a compile time improvement. llvm-svn: 213229
* Fix Typo (first commit to test commit access)Suyog Sarda2014-07-171-1/+1
| | | | llvm-svn: 213228
* MC: make WinEH opcode an opaque valueSaleem Abdulrasool2014-07-172-16/+29
| | | | | | | | | | | This makes the opcode an opaque value (unsigned int) rather than the enumeration. This permits the use of target specific operands. Split out the generic type into a MCWinEH header and add a supporting MCWin64EH::Instruction to abstract out the selection of the opcode and construction of the actual instruction. llvm-svn: 213221
* Improve BasicAA CS-CS queries (redux)Hal Finkel2014-07-173-130/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303." with a fix for the bug in pr20303. As it turned out, the relevant code was both wrong and over-conservative (because, as with the code it replaced, it would return the overall ModRef mask even if just Ref had been implied by the argument aliasing results). Hopefully, this correctly fixes both problems. Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've cleaned up a little and added in DSE's test directory). The BasicAA test has also been updated to check for this error. Original commit message: BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 213219
OpenPOWER on IntegriCloud