bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants.	Jim Grosbach	2014-07-23	2	-6/+6
\| \| \| \| \| \| \| \|	The transform to constant fold unary operations with an AND across a vector comparison applies when the constant is not a splat of a scalar as well. llvm-svn: 213800
*	X86: restrict combine to when type sizes are safe.	Jim Grosbach	2014-07-23	2	-6/+10
\| \| \| \| \| \| \| \|	The folding of unary operations through a vector compare and mask operation is only safe if the unary operation result is of the same size as its input. For example, it's not safe for [su]itofp from v4i32 to v4f64. llvm-svn: 213799
*	DAG: fp->int conversion for non-splat constants.	Jim Grosbach	2014-07-23	1	-12/+11
\| \| \| \| \| \| \| \| \| \|	Constant fold the lanes of the input constant build_vector individually so we correctly handle when the vector elements are not all the same constant value. PR20394 llvm-svn: 213798
*	[NVPTX] Silence a GCC warning found by the buildbots	Justin Holewinski	2014-07-23	1	-1/+1
\| \| \| \| \| \| \|	The cast to NVPTXTargetLowering was missing a 'const', but let's just access the right pointer through the subtarget anyway. llvm-svn: 213793
*	Do not add unroll disable metadata after unrolling pass for loops with ↵	Mark Heffernan	2014-07-23	1	-3/+4
\| \| \| \| \| \|	#pragma clang loop unroll(full). llvm-svn: 213789
*	[FastISel][AArch64] Fix return type in FastLowerCall.	Juergen Ributzka	2014-07-23	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	I used the wrong method to obtain the return type inside FinishCall. This fix simply uses the return type from FastLowerCall, which we already determined to be a valid type. Reduced test case from Chad. Thanks. llvm-svn: 213788
*	[NVPTX] mul.wide generation works for any smaller integer source types, not ↵	Justin Holewinski	2014-07-23	1	-2/+2
\| \| \| \| \| \|	just the next smaller power of two llvm-svn: 213784
*	AsmParser: remove deprecated LLIR support	Saleem Abdulrasool	2014-07-23	3	-19/+0
\| \| \| \| \| \| \|	linker_private and linker_private_weak were deprecated in 3.5. Remove support for them now that the 3.5 branch has been created. llvm-svn: 213777
*	ExecutionEngine: remove a stray semicolon	Saleem Abdulrasool	2014-07-23	1	-1/+1
\| \| \| \| \| \|	Detected via GCC 4.8 [-Wpedantic]. llvm-svn: 213776
*	[NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations ↵	Justin Holewinski	2014-07-23	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	are disabled With optimizations disabled, we disable the isel patterns for mul.wide; but we were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE ISD nodes in DAGCombine if the optimization level is not zero. llvm-svn: 213773
*	In unroll pragma syntax and loop hint metadata, change "enable" forms to a ↵	Mark Heffernan	2014-07-23	1	-42/+34
\| \| \| \| \| \|	new form using the string "full". llvm-svn: 213772
*	[AArch64] Lower sdiv x, pow2 using add + select + shift.	Chad Rosier	2014-07-23	3	-3/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The target-independent DAGcombiner will generate: asr w1, X, #31 w1 = splat sign bit. add X, X, w1, lsr #28 X = X + 0 or pow2-1 asr w0, X, asr #4 w0 = X/pow2 However, the add + shifts is expensive, so generate: add w0, X, 15 w0 = X + pow2-1 cmp X, wzr X - 0 csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X; asr w0, X, asr 4 w0 = X/pow2 llvm-svn: 213758
*	[SKX] Enabling mask instructions: encoding, lowering	Robert Khasanov	2014-07-23	3	-24/+111
\| \| \| \| \| \| \| \|	KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 213757
*	ARM: spot SBFX-compatbile code expressed with sign_extend_inreg	Tim Northover	2014-07-23	1	-0/+20
\| \| \| \| \| \| \| \|	We were assuming all SBFX-like operations would have the shl/asr form, but often when the field being extracted is an i8 or i16, we end up with a SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though. llvm-svn: 213754
*	ARM: add patterns for [su]xta[bh] from just a shift.	Tim Northover	2014-07-23	2	-0/+20
\| \| \| \| \| \| \| \|	Although the final shifter operand is a rotate, this actually only matters for the half-word extends when the amount == 24. Otherwise folding a shift in is just as good. llvm-svn: 213753
*	Enable partial libcall inlining for all targets by default.	James Molloy	2014-07-23	3	-2/+5
\| \| \| \| \| \| \| \|	This pass attempts to speculatively use a sqrt instruction if one exists on the target, falling back to a libcall if the target instruction returned NaN. This was enabled for MIPS and System-Z, but is well guarded and is good for most targets - GCC does this for (that I've checked) X86, ARM and AArch64. llvm-svn: 213752
*	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM STRB ↵	Tilmann Scheller	2014-07-23	1	-1/+5
\| \| \| \| \| \| \| \|	instructions. The ARM ARM prohibits STRB instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRB instructions with unpredictable behavior. llvm-svn: 213750
*	AArch64: remove "arm64_be" support in favour of "aarch64_be".	Tim Northover	2014-07-23	8	-49/+17
\| \| \| \| \| \| \| \| \|	There really is no arm64_be: it was a useful fiction to test big-endian support while both backends existed in parallel, but now the only platform that uses the name (iOS) doesn't have a big-endian variant, let alone one called "arm64_be". llvm-svn: 213748
*	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM STR ↵	Tilmann Scheller	2014-07-23	1	-0/+13
\| \| \| \| \| \| \| \|	instructions. The ARM ARM prohibits STR instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STR instructions with unpredictable behavior. llvm-svn: 213745
*	AArch64: remove arm64 triple enumerator.	Tim Northover	2014-07-23	11	-45/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and invites bugs where only one is checked. In reality, the only legitimate difference between the two (arm64 usually means iOS) is also present in the OS part of the triple and that's what should be checked. We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so there aren't any LLVM-side test changes. llvm-svn: 213743
*	Revert r211771. It was: "[X86] Improve the selection of SSE3/AVX addsub ↵	Andrea Di Biagio	2014-07-23	1	-43/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	instructions". This chang fully reverts r211771. That revision added a canonicalization rule which has the potential to causes a combine-cycle in the target-independent canonicalizing DAG combine. The plan is to move the logic that forms target specific addsub nodes as part of the lowering of shuffles. llvm-svn: 213736
*	[ARM] Add earlyclobber constraint to pre/post-indexed ARM STRH instructions.	Tilmann Scheller	2014-07-23	1	-2/+4
\| \| \| \| \| \| \| \|	The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted. The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed. llvm-svn: 213729
*	[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate	Chandler Carruth	2014-07-23	1	-52/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727
*	We may visit a call that uses an alloca multiple times in ↵	Nick Lewycky	2014-07-23	1	-5/+3
\| \| \| \| \| \|	callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405! llvm-svn: 213726
*	RuntimeDyldMachOAArch64.h: Fix a warning. [-Wunused-variable]	NAKAMURA Takumi	2014-07-23	1	-0/+1
\| \| \| \|	llvm-svn: 213710
*	[MCJIT] Make stub_addr functionality in RuntimeDyldChecker work in release mode.	Lang Hames	2014-07-22	1	-2/+0
\| \| \| \| \| \| \| \| \|	There's no reason to restrict this particular piece of RuntimeDyldChecker functionality to +Asserts builds. This should fix failures in MachO_x86-64_PIC_relocations.s on release bots. llvm-svn: 213708
*	[MCJIT] Teach RuntimeDyldChecker to handle underscores at the start of symbols.	Lang Hames	2014-07-22	1	-1/+1
\| \| \| \| \| \| \| \|	RuntimeDyldChecker had been testing isalpha(Expr[0]) to recognise symbol tokens, and throwing unrecognized token errors when it hit symbols with leading underscores. This fixes that. llvm-svn: 213706
*	[FastIsel][AArch64] Add support for the FastLowerCall and ↵	Juergen Ributzka	2014-07-22	1	-136/+81
\| \| \| \| \| \| \| \| \| \| \| \| \|	FastLowerIntrinsicCall target-hooks. This commit modifies the existing call lowering functions to be used as the FastLowerCall and FastLowerIntrinsicCall target-hooks instead. This enables patchpoint intrinsic lowering for AArch64. This fixes <rdar://problem/17733076> llvm-svn: 213704
*	[MCJIT] Improve stub_addr file-not-found diagnostic to help track down a	Lang Hames	2014-07-22	1	-2/+17
\| \| \| \| \| \|	buildbot failure. llvm-svn: 213701
*	[MCJIT] Refactor and add stub inspection to the RuntimeDyldChecker framework.	Lang Hames	2014-07-22	5	-548/+777
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a 'stub_addr' builtin that can be used to find the address of the stub for a given (<file>, <section>, <symbol>) tuple. This address can be used both to verify the contents of stubs (by loading from the returned address) and to verify references to stubs (by comparing against the returned address). Example (1) - Verifying stub contents: Load 8 bytes (assuming a 64-bit target) from the stub for 'x' in the __text section of f.o, and compare that value against the addres of 'x'. # rtdyld-check: *{8}(stub_addr(f.o, __text, x) = x Example (2) - Verifying references to stubs: Decode the immediate of the instruction at label 'l', and verify that it's equal to the offset from the next instruction's PC to the stub for 'y' in the __text section of f.o (i.e. it's the correct PC-rel difference). # rtdyld-check: decode_operand(l, 4) = stub_addr(f.o, __text, y) - next_pc(l) l: movq y@GOTPCREL(%rip), %rax Since stub inspection requires cooperation with RuntimeDyldImpl this patch pimpl-ifies RuntimeDyldChecker. Its implementation is moved in to a new class, RuntimeDyldCheckerImpl, that has access to the definition of RuntimeDyldImpl. llvm-svn: 213698
*	Appease the buildbots.	Juergen Ributzka	2014-07-22	1	-0/+1
\| \| \| \|	llvm-svn: 213694
*	[RuntimeDyld][MachO][AArch64] Add a helper function for encoding addends in ↵	Juergen Ributzka	2014-07-22	1	-75/+110
\| \| \| \| \| \| \| \| \| \| \| \| \|	instructions. Factor out the addend encoding into a helper function and simplify the processRelocationRef. Also add a few simple rtdyld tests. More tests to come once GOTs can be tested too. Related to <rdar://problem/17768539> llvm-svn: 213689
*	[RuntimeDyld][MachO][AArch64] Implement the decodeAddend method.	Juergen Ributzka	2014-07-22	1	-0/+92
\| \| \| \| \| \| \|	This adds the required functionality to decode the immediate encoded in an instruction that is referenced in a relocation entry. llvm-svn: 213688
*	[RuntimeDyld][MachO][AArch64] Add assertion to check for duplicate addend ↵	Juergen Ributzka	2014-07-22	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	definition. In MachO for AArch64 it is possible to have an explicit addend defined by the ARM64_RELOC_ADDEND relocation or having an addend encoded within the instruction. Only one of them are allowed per relocation. llvm-svn: 213687
*	[RuntimeDyld] Change the return type of decodeAddend to match the storage type.	Juergen Ributzka	2014-07-22	2	-6/+6
\| \| \| \|	llvm-svn: 213686
*	This patch implements optimization as mentioned in PR19753: Optimize ↵	Suyog Sarda	2014-07-22	2	-0/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	comparisons with "ashr/lshr exact" of a constanst. It handles the errors which were seen in PR19958 where wrong code was being emitted due to earlier patch. Added code for lshr as well as non-exact right shifts. It implements : (icmp eq/ne (ashr/lshr const2, A), const1)" -> (icmp eq/ne A, Log2(const2/const1)) -> (icmp eq/ne A, Log2(const2) - Log2(const1)) Differential Revision: http://reviews.llvm.org/D4068 llvm-svn: 213678
*	Added InstCombine transform for pattern "(A & B) ^ (A ^ B) -> (A \| B)"	Suyog Sarda	2014-07-22	1	-0/+8
\| \| \| \| \| \| \| \|	Patch idea by Ankit Jain ! Differential Revision: http://reviews.llvm.org/D4618 llvm-svn: 213677
*	Added InstCombine Transform for patterns:	Suyog Sarda	2014-07-22	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	"((~A & B) \| A) -> (A \| B)" and "((A & B) \| ~A) -> (~A \| B)" Original Patch credit to Ankit Jain !! Differential Revision: http://reviews.llvm.org/D4591 llvm-svn: 213676
*	[ASan] Fix comments about __sanitizer_cov function	Alexey Samsonov	2014-07-22	1	-3/+2
\| \| \| \|	llvm-svn: 213673
*	Make use of the align parameter attribute for all pointer arguments	Hal Finkel	2014-07-22	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We previously supported the align attribute on all (pointer) parameters, but we only used it for byval parameters. However, it is completely consistent at the IR level to treat 'align n' on all pointer parameters as an alignment assumption on the pointer, and now we wll. Specifically, this causes computeKnownBits to use the align attribute on all pointer parameters, not just byval parameters. I've also added an explicit parameter attribute test for this to test/Bitcode/attributes.ll. And I've updated the LangRef to document the align parameter attribute (as it turns out, it was not documented at all previously, although the byval documentation mentioned that it could be used). There are (at least) two benefits to doing this: - It allows enhancing alignment based on the pointer alignment after inlining callees. - It allows simplification of pointer arithmetic. llvm-svn: 213670
*	X86: drop relocations on __eh_frame sections globally.	Tim Northover	2014-07-22	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this, we produce non-extern relocations when targeting older OS X versions that ld64 can't cope with in the particular context of __eh_frame sections (who'd want generic relocation-processing anyway?). This means that an updated linker (ld64 from Xcode 3.2.6 or later) may be needed when targeting such platforms with a modern version of LLVM, but this is probably the case anyway and a reasonable requirement. PR20212, rdar://problem/17544795 llvm-svn: 213665
*	This patch implements transform for pattern "(A \| B) ^ (~A) -> (A \| ~B)".	Suyog Sarda	2014-07-22	1	-0/+6
\| \| \| \| \| \| \| \|	Patch Credit to Ankit Jain !! Differential Revision: http://reviews.llvm.org/D4588 llvm-svn: 213662
*	[mips] Fix two patterns that select i32's (for MIPS32r6) / i64's (for MIPS64r6)	Sasa Stankovic	2014-07-22	2	-4/+4
\| \| \| \| \| \| \| \| \| \|	from setne comparison with an i32. The patterns that are fixed: * (select (i32 (setne i32, immZExt16)), i32, i32) (for MIPS32r6) * (select (i32 (setne i32, immZExt16)), i64, i64) (for MIPS64r6) llvm-svn: 213653
*	AVX-512: Fixed intrinsic of VSQRTPS/PD instructions.	Elena Demikhovsky	2014-07-22	1	-23/+7
\| \| \| \| \| \|	I set number and types of parameters according to GCC intrinsics. llvm-svn: 213640
*	fixed typo in comment	Sanjay Patel	2014-07-22	1	-1/+1
\| \| \| \|	llvm-svn: 213614
*	[SDAG] Refactor the code for inserting a newly allocated SDNode into the	Chandler Carruth	2014-07-22	1	-96/+86
\| \| \| \| \| \| \| \| \| \| \|	DAG into a helper function. This adds a trip through the (very minimal) verification logic in a bunch of places that were missing it, but shouldn't have any other impact outside of refactoring. I'm hoping to use this to do more clever things when DAG nodes are inserted into the graph. llvm-svn: 213612
*	[SDAG] Remove a giant pile of asserts that may have helped track down	Chandler Carruth	2014-07-22	1	-40/+3
\| \| \| \| \| \| \| \| \| \| \|	a bug in 2010 when they were added but are adding no value today. In fact, they are utter lies. NodeAllocator is used to allocate almost all of these node types. I don't know what we were trying to assert here, and the docs don't give any answer. Until we once again stumble upon a bug needing help, let's clear the path for improvements. llvm-svn: 213610
*	Rename metadata llvm.loop.vectorize.unroll to llvm.loop.vectorize.interleave.	Mark Heffernan	2014-07-21	2	-14/+18
\| \| \| \|	llvm-svn: 213588
*	Match semantics of PointerMayBeCapturedBefore to its name by default	Hal Finkel	2014-07-21	2	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As it turns out, the capture tracker named CaptureBefore used by AA, and now available via the PointerMayBeCapturedBefore function, would have been more-aptly named CapturedBeforeOrAt, because it considers captures at the instruction provided. This is not always what one wants, and it is difficult to get the strictly-before behavior given only the current interface. This adds an additional parameter which controls whether or not you want to include captures at the provided instruction. The default is not to include the instruction provided, so that 'Before' matches its name. No functionality change intended. llvm-svn: 213582
*	Revert "Recommit r212203: Don't try to construct debug LexicalScopes ↵	David Blaikie	2014-07-21	4	-33/+4
\| \| \| \| \| \| \| \|	hierarchy for functions that do not have top level debug information." This reverts commit r212649 while I investigate/reduce/etc PR20367. llvm-svn: 213581