bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[MCJIT] Fix the alignment requirements for ARM and AArch64 which were mistakenly	Lang Hames	2014-07-17	2	-2/+2
\| \| \| \| \| \| \| \| \| \|	relaxed in the big RuntimeDyldMachO cleanup of r213293. No test case yet - this was found via inspection and there's no easy way to test GOT alignment in RuntimeDyldChecker at the moment. I'm working on adding support for this now, and hope to have a test case for this soon. llvm-svn: 213331
*	ms inline asm: Don't add x86 segment registers to the clobber list.	Nico Weber	2014-07-17	2	-1/+7
\| \| \| \| \| \| \|	Clang tries to check the clobber list but doesn't list segment registers in its x86 register list. This fixes PR20343. llvm-svn: 213303
*	Drop the udis86 wrapper from llvm::sys	Alp Toker	2014-07-17	3	-82/+0
\| \| \| \| \| \| \| \|	This optional dependency on the udis86 library was added some time back to aid JIT development, but doesn't make much sense to link into LLVM binaries these days. llvm-svn: 213300
*	[AArch64] Cleanup AsmParser: no need to use dyn_cast + assert. cast does it ↵	Arnaud A. de Grandmaison	2014-07-17	1	-41/+21
\| \| \| \| \| \|	for us. llvm-svn: 213296
*	Rectify r213231. Use proper version of 'ComputeNumSignBits'.	Suyog Sarda	2014-07-17	1	-1/+1
\| \| \| \| \| \| \| \| \|	Earlier when the code was in InstCombine, we were calling the version of ComputeNumSignBits in InstCombine.h that automatically added the DataLayout* before calling into ValueTracking. When the code moved to InstSimplify, we are calling into ValueTracking directly without passing in the DataLayout*. This patch rectifies the same by passing DataLayout in ComputeNumSignBits. llvm-svn: 213295
*	[MCJIT] Significantly refactor the RuntimeDyldMachO class.	Lang Hames	2014-07-17	7	-826/+1097
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous implementation of RuntimeDyldMachO mixed logic for all targets within a single class, creating problems for readability, maintainability, and performance. To address these issues, this patch strips the RuntimeDyldMachO class down to just target-independent functionality, and moves all target-specific functionality into target-specific subclasses RuntimeDyldMachO. The new class hierarchy is as follows: class RuntimeDyldMachO Implemented in RuntimeDyldMachO.{h,cpp} Contains logic that is completely independent of the target. This consists mostly of MachO helper utilities which the derived classes use to get their work done. template <typename Impl> class RuntimeDyldMachOCRTPBase<Impl> : public RuntimeDyldMachO Implemented in RuntimeDyldMachO.h Contains generic MachO algorithms/data structures that defer to the Impl class for target-specific behaviors. RuntimeDyldMachOARM : public RuntimeDyldMachOCRTPBase<RuntimeDyldMachOARM> RuntimeDyldMachOARM64 : public RuntimeDyldMachOCRTPBase<RuntimeDyldMachOARM64> RuntimeDyldMachOI386 : public RuntimeDyldMachOCRTPBase<RuntimeDyldMachOI386> RuntimeDyldMachOX86_64 : public RuntimeDyldMachOCRTPBase<RuntimeDyldMachOX86_64> Implemented in their respective *.h files in lib/ExecutionEngine/RuntimeDyld/MachOTargets Each of these contains the relocation logic specific to their target architecture. llvm-svn: 213293
*	[ASan] Don't instrument load/stores with !nosanitize metadata.	Alexey Samsonov	2014-07-17	1	-0/+3
\| \| \| \| \| \| \| \| \|	This is used to avoid instrumentation of instructions added by UBSan in Clang frontend (see r213291). This fixes PR20085. Reviewed in http://reviews.llvm.org/D4544. llvm-svn: 213292
*	Typo: exists -> exits	Hans Wennborg	2014-07-17	1	-1/+1
\| \| \| \|	llvm-svn: 213290
*	[NVPTX] Improve handling of FP fusion	Justin Holewinski	2014-07-17	5	-48/+62
\| \| \| \| \| \| \| \| \|	We now consider the FPOpFusion flag when determining whether to fuse ops. We also explicitly emit add.rn when fusion is disabled to prevent ptxas from fusing the operations on its own. llvm-svn: 213287
*	Fix typos	Matt Arsenault	2014-07-17	1	-3/+3
\| \| \| \|	llvm-svn: 213285
*	[X86] AVX512: Add disassembler support for compressed displacement	Adam Nemet	2014-07-17	3	-3/+21
\| \| \| \| \| \| \| \| \| \| \| \|	There are two parts here. First is to modify tablegen to adjust the encoding type ENCODING_RM with the scaling factor. The second is to use the new encoding types to compute the correct displacement in the decoder. Fixes <rdar://problem/17608489> llvm-svn: 213281
*	[X86] AVX512: Rename EVEX_CD8V to CD8_Form	Adam Nemet	2014-07-17	1	-5/+5
\| \| \| \| \| \| \| \|	This is to match the naming of CD8_EltSize, CD8_Scale, etc. No functional change. llvm-svn: 213280
*	[X86] AVX512: Use the TD version of CD8_Scale in the assembler	Adam Nemet	2014-07-17	3	-62/+16
\| \| \| \| \| \| \| \| \| \| \|	Passes the computed scaling factor in TSFlags rather than the old attributes. Also removes the C++ version of computing the scaling factor (MemObjSize) along with the asserts added by the previous patch. No functional change. llvm-svn: 213279
*	[X86] AVX512: Move compressed displacement logic to TD	Adam Nemet	2014-07-17	2	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This does not actually move the logic yet but reimplements it in the Tablegen language. Then asserts that the new implementation results in the same value. The next patch will remove the assert and the temporary use of the TSFlags and remove the C++ implementation. The formula requires a limited form of the logical left and right operators. I implemented these with the bit-extract/insert operator (i.e. blah{bits}). No functional change. llvm-svn: 213278
*	[TableGen] Allow shift operators to take bits<n>	Adam Nemet	2014-07-17	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Convert the operand to int if possible, i.e. if the value is properly initialized. (I suppose there is further room for improvement here to also peform the shift if the uninitialized bits are shifted out.) With this little change we can now compute the scaling factor for compressed displacement with pure tablegen code in the X86 backend. This is useful because both the X86-disassembler-specific part of tablegen and the assembler need this and TD is the natural sharing place. The patch also adds the missing documentation for the shift and add operator. llvm-svn: 213277
*	[NVPTX] Add missing .v4 qualifier on vector store instruction	Justin Holewinski	2014-07-17	1	-1/+1
\| \| \| \|	llvm-svn: 213276
*	MC: correct DWARF header for PE/COFF assembly input	Saleem Abdulrasool	2014-07-17	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The header contains an offset to the DWARF abbreviations for the CU. The offset must be section relative for COFF and absolute for others. The non-assembly code path for the DWARF header generation already had the correct emission for the headers. This corrects just the assembly path. Due to the invalid relocation, processing of the debug information would halt previously on the first assembly input as the associated abbreviations would be out of range as they would have the location increased by image base and the section offset. This address PR20332. llvm-svn: 213275
*	MC: fix MCAsmInfo usage for windows-itanium	Saleem Abdulrasool	2014-07-17	1	-1/+2
\| \| \| \| \| \|	Windows itanium uses the GNUCOFF assmebly format, not ELF. llvm-svn: 213274
*	MC: collapse emission of producer	Saleem Abdulrasool	2014-07-17	1	-7/+3
\| \| \| \| \| \| \| \|	Rather than use three EmitBytes, concatenate the string at compile time, constructing a single StringRef and emitting the data in one shot. This also creates nicer assembly output. NFC. llvm-svn: 213273
*	[NVPTX] Flag surface/texture query instructions with IsTexSurfQuery	Justin Holewinski	2014-07-17	1	-0/+6
\| \| \| \| \| \| \|	Also, add some tests to make sure we can handle surface/texture queries on both Fermi and Kepler+. llvm-svn: 213268
*	[NVPTX] Add more surface/texture intrinsics, including CUDA unified texture ↵	Justin Holewinski	2014-07-17	9	-801/+6542
\| \| \| \| \| \| \| \| \| \| \|	fetch This also uses TSFlags to mark machine instructions that are surface/texture accesses, as well as the vector width for surface operations. This is used to simplify some of the switch statements that need to detect surface/texture instructions llvm-svn: 213256
*	ARM: support direct f16 <-> f64 conversions	Tim Northover	2014-07-17	2	-7/+21
\| \| \| \| \| \|	ARMv8 has instructions to handle it, otherwise a libcall is needed. llvm-svn: 213254
*	CodeGen: generate single libcall for fptrunc -> f16 operations.	Tim Northover	2014-07-17	5	-20/+32
\| \| \| \| \| \| \| \| \| \| \| \|	Previously we asserted on this code. Currently compiler-rt doesn't actually implement any of these new libcalls, but external help is pretty much the only viable option for LLVM. I've followed the much more generic "__truncST2" naming, as opposed to the odd name for f32 -> f16 truncation. This can obviously be changed later, or overridden by any targets that need to. llvm-svn: 213252
*	X86: support double extension of f16 type.	Tim Northover	2014-07-17	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	x86 has no native ability to extend an f16 to f64, but the same result is obtained if we expand it into two separate extensions: f16 -> f32 -> f64. Unfortunately the same is not true for truncate, so that still results in a compilation failure. llvm-svn: 213251
*	CodeGen: extend f16 conversions to permit types > float.	Tim Northover	2014-07-17	13	-53/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes the two intrinsics @llvm.convert.from.f16 and @llvm.convert.to.f16 accept types other than simple "float". This is only strictly needed for the truncate operation, since otherwise double rounding occurs and there's no way to represent the strict IEEE conversion. However, for symmetry we allow larger types in the extend too. During legalization, we can expand an "fp16_to_double" operation into two extends for convenience, but abort when the truncate isn't legal. A new libcall is probably needed here. Even after this commit, various target tweaks are needed to actually use the extended intrinsics. I've put these into separate commits for clarity, so there are no actual tests of f64 conversion here. llvm-svn: 213248
*	Port memory barriers intrinsics to AArch64	Yi Kong	2014-07-17	2	-6/+22
\| \| \| \| \| \| \| \| \| \| \|	Memory barrier __builtin_arm_[dmb, dsb, isb] intrinsics are required to implement their corresponding ACLE and MSVC intrinsics. This patch ports ARM dmb, dsb, isb intrinsic to AArch64. Differential Revision: http://reviews.llvm.org/D4520 llvm-svn: 213247
*	[mips] .reginfo is 8 byte aligned on N32.	Daniel Sanders	2014-07-17	1	-1/+2
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D4540 llvm-svn: 213246
*	[mips] Correct ELF e_flags for the N32 ABI when using a mips-* triple rather ↵	Daniel Sanders	2014-07-17	1	-15/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	than a mips64-* triple Summary: Generally speaking, mips-* vs mips64-* should not be used to make decisions about the content or format of the ELF. This should be based on the ABI and CPU in use. For example, `mips-linux-gnu-clang -mips64r2 -mabi=64` should produce an ELF64 as should `mips64-linux-gnu-clang -mabi=64`. Conversely, `mips64-linux-gnu-clang -mabi=n32` should produce an ELF32 as should `mips-linux-gnu-clang -mips64r2 -mabi=n32`. This patch fixes the e_flags but leaves the ELF32 vs ELF64 issue for now since there is no apparent way to base this decision on the ABI and CPU. Differential Revision: http://reviews.llvm.org/D4539 llvm-svn: 213244
*	[mips] Correct .MIPS.abiflags for -mfpxx on MIPS32r6	Daniel Sanders	2014-07-17	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \|	Summary: The cpr1_size field describes the minimum register width to run the program rather than the size of the registers on the target. MIPS32r6 was acting as if -mfp64 has been given because it starts off with 64-bit FPU registers. Differential Revision: http://reviews.llvm.org/D4538 llvm-svn: 213243
*	[mips] Fix ELF e_flags related to -mabicalls and -mplt.	Daniel Sanders	2014-07-17	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These options are not implemented yet but we act as if they are always given. The integrated assembler is driven by the clang driver so the e_flag test cases should match the e_flags emitted by GCC+GAS rather than GAS by itself. Differential Revision: http://reviews.llvm.org/D4536 llvm-svn: 213242
*	Fix the prefix for arm64 triple	Yi Kong	2014-07-17	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Triple.cpp still returns "arm64" as prefix for arm64 triple, causing Clang not being able to select the correct GCCBuiltin IR. This patch changes the value to correct prefix "aarch64". Regression test will be added in the coming patch. Differential Revision: http://reviews.llvm.org/D4516 llvm-svn: 213240
*	[msan] Avoid redundant origin stores.	Evgeniy Stepanov	2014-07-17	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	Origin is meaningless for fully initialized values. Avoid storing origin for function arguments that are known to be always initialized (i.e. shadow is a compile-time null constant). This is not about correctness, but purely an optimization. Seems to affect compilation time of blacklisted functions significantly. llvm-svn: 213239
*	Move ashr optimization from InstCombineShift to InstSimplify.	Suyog Sarda	2014-07-17	2	-5/+5
\| \| \| \| \| \| \| \| \|	Refactor code, no functionality change, test case moved from instcombine to instsimplify. Differential Revision: http://reviews.llvm.org/D4102 llvm-svn: 213231
*	Use range for	Matt Arsenault	2014-07-17	1	-6/+4
\| \| \| \|	llvm-svn: 213230
*	R600: Short circuit alloca check if address space isn't private.	Matt Arsenault	2014-07-17	1	-1/+1
\| \| \| \| \| \| \|	Skip calling GetUnderlyingObject in cases where it obviously isn't from an alloca. This should only be a compile time improvement. llvm-svn: 213229
*	Fix Typo (first commit to test commit access)	Suyog Sarda	2014-07-17	1	-1/+1
\| \| \| \|	llvm-svn: 213228
*	MC: make WinEH opcode an opaque value	Saleem Abdulrasool	2014-07-17	2	-16/+29
\| \| \| \| \| \| \| \| \| \| \|	This makes the opcode an opaque value (unsigned int) rather than the enumeration. This permits the use of target specific operands. Split out the generic type into a MCWinEH header and add a supporting MCWin64EH::Instruction to abstract out the selection of the opcode and construction of the actual instruction. llvm-svn: 213221
*	Improve BasicAA CS-CS queries (redux)	Hal Finkel	2014-07-17	3	-130/+151
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303." with a fix for the bug in pr20303. As it turned out, the relevant code was both wrong and over-conservative (because, as with the code it replaced, it would return the overall ModRef mask even if just Ref had been implied by the argument aliasing results). Hopefully, this correctly fixes both problems. Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've cleaned up a little and added in DSE's test directory). The BasicAA test has also been updated to check for this error. Original commit message: BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 213219
*	Partially revert r210444 due to performance regression	Jingyue Wu	2014-07-16	1	-57/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Converting outermost zext(a) to sext(a) causes worse code when the computation of zext(a) could be reused. For example, after converting ... = array[zext(a)] ... = array[zext(a) + 1] to ... = array[sext(a)] ... = array[zext(a) + 1], the program computes sext(a), which is actually unnecessary. I added one test in split-gep-and-gvn.ll to illustrate this scenario. Also, with r211281 and r211084, we annotate more "nuw" tags to computation involving CUDA intrinsics such as threadIdx.x. These annotations help with splitting GEP a lot, rendering the benefit we get from this reverted optimization only marginal. Test Plan: make check-all Reviewers: eliben, meheff Reviewed By: meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D4542 llvm-svn: 213209
*	Fixed formatting, removed bug reference, renamed testcase	Sanjay Patel	2014-07-16	1	-3/+4
\| \| \| \| \| \|	Thanks to Duncan Exon Smith for reviewing and cleanup suggestions. llvm-svn: 213205
*	[FastISel] Local values shouldn't be alive across an inline asm call with ↵	Juergen Ributzka	2014-07-16	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	side effects. This fixes an issue where a local value is defined before and used after an inline asm call with side effects. This fix simply flushes the local value map, which updates the insertion point for the inline asm call to be above any previously defined local values. This fixes <rdar://problem/17694203> llvm-svn: 213203
*	[MCJIT] Improve a RuntimeDyldChecker diagnostic.	Lang Hames	2014-07-16	1	-3/+7
\| \| \| \| \| \| \|	When a RuntimeDyldChecker test requests an invalid operand for an instruction, print the decoded instruction to aid diagnosis. llvm-svn: 213202
*	trivial fix for PR20314	Sanjay Patel	2014-07-16	1	-1/+4
\| \| \| \| \| \|	Make sure that the AddrInst is an Instruction. llvm-svn: 213197
*	Remove Atom references in description.	Sanjay Patel	2014-07-16	1	-4/+3
\| \| \| \| \| \|	Any CPU can run this pass. llvm-svn: 213190
*	Utilize CastInst::CreatePointerBitCastOrAddrSpaceCast here.	Manuel Jacob	2014-07-16	1	-9/+6
\| \| \| \|	llvm-svn: 213189
*	[RegisterCoalescer] Moving the RegisterCoalescer subtarget hook onto the ↵	Chris Bieneman	2014-07-16	5	-66/+68
\| \| \| \| \| \|	TargetRegisterInfo instead of the TargetSubtargetInfo. llvm-svn: 213188
*	[NVPTX] Honor alignment on vector loads/stores	Justin Holewinski	2014-07-16	1	-5/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were not considering the stated alignment on vector loads/stores, leading us to generate vector instructions even when we do not have sufficient alignment. Now, for IR like: %1 = load <4 x float>, <4 x float>* %ptr, align 4 we will generate correct, conservative PTX like: ld.f32 ... [%ptr] ld.f32 ... [%ptr+4] ld.f32 ... [%ptr+8] ld.f32 ... [%ptr+12] Or if we have an alignment of 8 (for example), we can generate code like: ld.v2.f32 ... [%ptr] ld.v2.f32 ... [%ptr+8] llvm-svn: 213186
*	Remove unnecessary/redundant std::move	David Blaikie	2014-07-16	1	-1/+1
\| \| \| \| \| \|	(run returns unique_ptr by value already) llvm-svn: 213174
*	Added documentation for SizeMultiplier in the ARM subtarget hook for ↵	Chris Bieneman	2014-07-16	1	-2/+11
\| \| \| \| \| \| \| \|	register coalescing. Also fixed some 80 col violations. No functional code changes. llvm-svn: 213169
*	[NVPTX] Rename registers %fl -> %fd and %rl -> %rd	Justin Holewinski	2014-07-16	4	-8/+8
\| \| \| \| \| \|	This matches the internal behavior of NVIDIA tools like libnvvm. llvm-svn: 213168