bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	R600: Implement enableClusterLoads()	Matt Arsenault	2014-07-24	2	-0/+7
\| \| \| \|	llvm-svn: 213831
*	[AArch64] Fix a bug generating incorrect instruction when building small vector.	Kevin Qin	2014-07-24	1	-38/+63
\| \| \| \| \| \| \| \| \|	This bug is introduced by r211144. The element of operand may be smaller than the element of result, but previous commit can only handle the contrary condition. This commit is to handle this scenario and generate optimized codes like ZIP1. llvm-svn: 213830
*	[AArch64] Disable some optimization cases for type conversion from sint to ↵	Jiangning Liu	2014-07-24	1	-3/+4
\| \| \| \| \| \|	fp, because those optimization cases are micro-architecture dependent and only make sense for Cyclone. A new predicate Cyclone is introduced in .td file. llvm-svn: 213827
*	Fixed PR20411 - bug in getINSERTPS()	Filipe Cabecinhas	2014-07-24	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	When we had a vector_shuffle where we had an input from each vector, we could miscompile it because we were assuming the input from V2 wouldn't be moved from where it was on the vector. Added a test case. llvm-svn: 213826
*	SimplifyCFG: fix a bug in switch to table conversion	Manman Ren	2014-07-23	1	-4/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use gep to access the global array "switch.table", and the table index should be treated as unsigned. When the highest bit is 1, this commit zero-extends the index to an integer type with larger size. For a switch on i2, we used to generate: %switch.tableidx = sub i2 %0, -2 getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate %switch.tableidx = sub i2 %0, -2 %switch.tableidx.zext = zext i2 %switch.tableidx to i3 getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext rdar://17735071 llvm-svn: 213815
*	Fix the build when building with only the ARM backend.	Rafael Espindola	2014-07-23	1	-1/+1
\| \| \| \|	llvm-svn: 213814
*	Fix indenting.	Eric Christopher	2014-07-23	1	-13/+14
\| \| \| \|	llvm-svn: 213811
*	Reorganize and simplify local variables.	Eric Christopher	2014-07-23	1	-13/+11
\| \| \| \|	llvm-svn: 213809
*	Finish inverting the MC -> Object dependency.	Rafael Espindola	2014-07-23	8	-9/+9
\| \| \| \| \| \| \|	There were still some disassembler bits in lib/MC, but their use of Object was only visible in the includes they used, not in the symbols. llvm-svn: 213808
*	Remove the query for TargetMachine and TargetInstrInfo since we're	Eric Christopher	2014-07-23	1	-3/+1
\| \| \| \| \| \|	already inside TargetInstrInfo. llvm-svn: 213806
*	ArgPromo+DebugInfo: Handle updating debug info over multiple applications of ↵	David Blaikie	2014-07-23	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument promotion. While the subprogram map cache used by Dead Argument Elimination works there, I made a mistake when reusing it for Argument Promotion in r212128 because ArgPromo may transform functions more than once whereas DAE transforms each function only once, removing all the dead arguments in one go. To address this, ensure that the map is updated after each argument promotion. In retrospect it might be a little wasteful to create a map of all subprograms when only handling a single CGSCC, but the alternative is walking the debug info for each function in the CGSCC that gets updated. It's not clear to me what the right tradeoff is there, but since the current tradeoff seems to be working OK (and the code to keep things updated is very cheap), let's stick with that for now. llvm-svn: 213805
*	[X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants.	Jim Grosbach	2014-07-23	2	-6/+6
\| \| \| \| \| \| \| \|	The transform to constant fold unary operations with an AND across a vector comparison applies when the constant is not a splat of a scalar as well. llvm-svn: 213800
*	X86: restrict combine to when type sizes are safe.	Jim Grosbach	2014-07-23	2	-6/+10
\| \| \| \| \| \| \| \|	The folding of unary operations through a vector compare and mask operation is only safe if the unary operation result is of the same size as its input. For example, it's not safe for [su]itofp from v4i32 to v4f64. llvm-svn: 213799
*	DAG: fp->int conversion for non-splat constants.	Jim Grosbach	2014-07-23	1	-12/+11
\| \| \| \| \| \| \| \| \| \|	Constant fold the lanes of the input constant build_vector individually so we correctly handle when the vector elements are not all the same constant value. PR20394 llvm-svn: 213798
*	[NVPTX] Silence a GCC warning found by the buildbots	Justin Holewinski	2014-07-23	1	-1/+1
\| \| \| \| \| \| \|	The cast to NVPTXTargetLowering was missing a 'const', but let's just access the right pointer through the subtarget anyway. llvm-svn: 213793
*	Do not add unroll disable metadata after unrolling pass for loops with ↵	Mark Heffernan	2014-07-23	1	-3/+4
\| \| \| \| \| \|	#pragma clang loop unroll(full). llvm-svn: 213789
*	[FastISel][AArch64] Fix return type in FastLowerCall.	Juergen Ributzka	2014-07-23	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	I used the wrong method to obtain the return type inside FinishCall. This fix simply uses the return type from FastLowerCall, which we already determined to be a valid type. Reduced test case from Chad. Thanks. llvm-svn: 213788
*	[NVPTX] mul.wide generation works for any smaller integer source types, not ↵	Justin Holewinski	2014-07-23	1	-2/+2
\| \| \| \| \| \|	just the next smaller power of two llvm-svn: 213784
*	AsmParser: remove deprecated LLIR support	Saleem Abdulrasool	2014-07-23	3	-19/+0
\| \| \| \| \| \| \|	linker_private and linker_private_weak were deprecated in 3.5. Remove support for them now that the 3.5 branch has been created. llvm-svn: 213777
*	ExecutionEngine: remove a stray semicolon	Saleem Abdulrasool	2014-07-23	1	-1/+1
\| \| \| \| \| \|	Detected via GCC 4.8 [-Wpedantic]. llvm-svn: 213776
*	[NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations ↵	Justin Holewinski	2014-07-23	1	-2/+1
\| \| \| \| \| \| \| \| \| \|	are disabled With optimizations disabled, we disable the isel patterns for mul.wide; but we were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE ISD nodes in DAGCombine if the optimization level is not zero. llvm-svn: 213773
*	In unroll pragma syntax and loop hint metadata, change "enable" forms to a ↵	Mark Heffernan	2014-07-23	1	-42/+34
\| \| \| \| \| \|	new form using the string "full". llvm-svn: 213772
*	[AArch64] Lower sdiv x, pow2 using add + select + shift.	Chad Rosier	2014-07-23	3	-3/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The target-independent DAGcombiner will generate: asr w1, X, #31 w1 = splat sign bit. add X, X, w1, lsr #28 X = X + 0 or pow2-1 asr w0, X, asr #4 w0 = X/pow2 However, the add + shifts is expensive, so generate: add w0, X, 15 w0 = X + pow2-1 cmp X, wzr X - 0 csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X; asr w0, X, asr 4 w0 = X/pow2 llvm-svn: 213758
*	[SKX] Enabling mask instructions: encoding, lowering	Robert Khasanov	2014-07-23	3	-24/+111
\| \| \| \| \| \| \| \|	KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 213757
*	ARM: spot SBFX-compatbile code expressed with sign_extend_inreg	Tim Northover	2014-07-23	1	-0/+20
\| \| \| \| \| \| \| \|	We were assuming all SBFX-like operations would have the shl/asr form, but often when the field being extracted is an i8 or i16, we end up with a SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though. llvm-svn: 213754
*	ARM: add patterns for [su]xta[bh] from just a shift.	Tim Northover	2014-07-23	2	-0/+20
\| \| \| \| \| \| \| \|	Although the final shifter operand is a rotate, this actually only matters for the half-word extends when the amount == 24. Otherwise folding a shift in is just as good. llvm-svn: 213753
*	Enable partial libcall inlining for all targets by default.	James Molloy	2014-07-23	3	-2/+5
\| \| \| \| \| \| \| \|	This pass attempts to speculatively use a sqrt instruction if one exists on the target, falling back to a libcall if the target instruction returned NaN. This was enabled for MIPS and System-Z, but is well guarded and is good for most targets - GCC does this for (that I've checked) X86, ARM and AArch64. llvm-svn: 213752
*	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM STRB ↵	Tilmann Scheller	2014-07-23	1	-1/+5
\| \| \| \| \| \| \| \|	instructions. The ARM ARM prohibits STRB instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRB instructions with unpredictable behavior. llvm-svn: 213750
*	AArch64: remove "arm64_be" support in favour of "aarch64_be".	Tim Northover	2014-07-23	8	-49/+17
\| \| \| \| \| \| \| \| \|	There really is no arm64_be: it was a useful fiction to test big-endian support while both backends existed in parallel, but now the only platform that uses the name (iOS) doesn't have a big-endian variant, let alone one called "arm64_be". llvm-svn: 213748
*	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM STR ↵	Tilmann Scheller	2014-07-23	1	-0/+13
\| \| \| \| \| \| \| \|	instructions. The ARM ARM prohibits STR instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STR instructions with unpredictable behavior. llvm-svn: 213745
*	AArch64: remove arm64 triple enumerator.	Tim Northover	2014-07-23	11	-45/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and invites bugs where only one is checked. In reality, the only legitimate difference between the two (arm64 usually means iOS) is also present in the OS part of the triple and that's what should be checked. We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so there aren't any LLVM-side test changes. llvm-svn: 213743
*	Revert r211771. It was: "[X86] Improve the selection of SSE3/AVX addsub ↵	Andrea Di Biagio	2014-07-23	1	-43/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	instructions". This chang fully reverts r211771. That revision added a canonicalization rule which has the potential to causes a combine-cycle in the target-independent canonicalizing DAG combine. The plan is to move the logic that forms target specific addsub nodes as part of the lowering of shuffles. llvm-svn: 213736
*	[ARM] Add earlyclobber constraint to pre/post-indexed ARM STRH instructions.	Tilmann Scheller	2014-07-23	1	-2/+4
\| \| \| \| \| \| \| \|	The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted. The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed. llvm-svn: 213729
*	[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate	Chandler Carruth	2014-07-23	1	-52/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727
*	We may visit a call that uses an alloca multiple times in ↵	Nick Lewycky	2014-07-23	1	-5/+3
\| \| \| \| \| \|	callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405! llvm-svn: 213726
*	RuntimeDyldMachOAArch64.h: Fix a warning. [-Wunused-variable]	NAKAMURA Takumi	2014-07-23	1	-0/+1
\| \| \| \|	llvm-svn: 213710
*	[MCJIT] Make stub_addr functionality in RuntimeDyldChecker work in release mode.	Lang Hames	2014-07-22	1	-2/+0
\| \| \| \| \| \| \| \| \|	There's no reason to restrict this particular piece of RuntimeDyldChecker functionality to +Asserts builds. This should fix failures in MachO_x86-64_PIC_relocations.s on release bots. llvm-svn: 213708
*	[MCJIT] Teach RuntimeDyldChecker to handle underscores at the start of symbols.	Lang Hames	2014-07-22	1	-1/+1
\| \| \| \| \| \| \| \|	RuntimeDyldChecker had been testing isalpha(Expr[0]) to recognise symbol tokens, and throwing unrecognized token errors when it hit symbols with leading underscores. This fixes that. llvm-svn: 213706
*	[FastIsel][AArch64] Add support for the FastLowerCall and ↵	Juergen Ributzka	2014-07-22	1	-136/+81
\| \| \| \| \| \| \| \| \| \| \| \| \|	FastLowerIntrinsicCall target-hooks. This commit modifies the existing call lowering functions to be used as the FastLowerCall and FastLowerIntrinsicCall target-hooks instead. This enables patchpoint intrinsic lowering for AArch64. This fixes <rdar://problem/17733076> llvm-svn: 213704
*	[MCJIT] Improve stub_addr file-not-found diagnostic to help track down a	Lang Hames	2014-07-22	1	-2/+17
\| \| \| \| \| \|	buildbot failure. llvm-svn: 213701
*	[MCJIT] Refactor and add stub inspection to the RuntimeDyldChecker framework.	Lang Hames	2014-07-22	5	-548/+777
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces a 'stub_addr' builtin that can be used to find the address of the stub for a given (<file>, <section>, <symbol>) tuple. This address can be used both to verify the contents of stubs (by loading from the returned address) and to verify references to stubs (by comparing against the returned address). Example (1) - Verifying stub contents: Load 8 bytes (assuming a 64-bit target) from the stub for 'x' in the __text section of f.o, and compare that value against the addres of 'x'. # rtdyld-check: *{8}(stub_addr(f.o, __text, x) = x Example (2) - Verifying references to stubs: Decode the immediate of the instruction at label 'l', and verify that it's equal to the offset from the next instruction's PC to the stub for 'y' in the __text section of f.o (i.e. it's the correct PC-rel difference). # rtdyld-check: decode_operand(l, 4) = stub_addr(f.o, __text, y) - next_pc(l) l: movq y@GOTPCREL(%rip), %rax Since stub inspection requires cooperation with RuntimeDyldImpl this patch pimpl-ifies RuntimeDyldChecker. Its implementation is moved in to a new class, RuntimeDyldCheckerImpl, that has access to the definition of RuntimeDyldImpl. llvm-svn: 213698
*	Appease the buildbots.	Juergen Ributzka	2014-07-22	1	-0/+1
\| \| \| \|	llvm-svn: 213694
*	[RuntimeDyld][MachO][AArch64] Add a helper function for encoding addends in ↵	Juergen Ributzka	2014-07-22	1	-75/+110
\| \| \| \| \| \| \| \| \| \| \| \| \|	instructions. Factor out the addend encoding into a helper function and simplify the processRelocationRef. Also add a few simple rtdyld tests. More tests to come once GOTs can be tested too. Related to <rdar://problem/17768539> llvm-svn: 213689
*	[RuntimeDyld][MachO][AArch64] Implement the decodeAddend method.	Juergen Ributzka	2014-07-22	1	-0/+92
\| \| \| \| \| \| \|	This adds the required functionality to decode the immediate encoded in an instruction that is referenced in a relocation entry. llvm-svn: 213688
*	[RuntimeDyld][MachO][AArch64] Add assertion to check for duplicate addend ↵	Juergen Ributzka	2014-07-22	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	definition. In MachO for AArch64 it is possible to have an explicit addend defined by the ARM64_RELOC_ADDEND relocation or having an addend encoded within the instruction. Only one of them are allowed per relocation. llvm-svn: 213687
*	[RuntimeDyld] Change the return type of decodeAddend to match the storage type.	Juergen Ributzka	2014-07-22	2	-6/+6
\| \| \| \|	llvm-svn: 213686
*	This patch implements optimization as mentioned in PR19753: Optimize ↵	Suyog Sarda	2014-07-22	2	-0/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	comparisons with "ashr/lshr exact" of a constanst. It handles the errors which were seen in PR19958 where wrong code was being emitted due to earlier patch. Added code for lshr as well as non-exact right shifts. It implements : (icmp eq/ne (ashr/lshr const2, A), const1)" -> (icmp eq/ne A, Log2(const2/const1)) -> (icmp eq/ne A, Log2(const2) - Log2(const1)) Differential Revision: http://reviews.llvm.org/D4068 llvm-svn: 213678
*	Added InstCombine transform for pattern "(A & B) ^ (A ^ B) -> (A \| B)"	Suyog Sarda	2014-07-22	1	-0/+8
\| \| \| \| \| \| \| \|	Patch idea by Ankit Jain ! Differential Revision: http://reviews.llvm.org/D4618 llvm-svn: 213677
*	Added InstCombine Transform for patterns:	Suyog Sarda	2014-07-22	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	"((~A & B) \| A) -> (A \| B)" and "((A & B) \| ~A) -> (~A \| B)" Original Patch credit to Ankit Jain !! Differential Revision: http://reviews.llvm.org/D4591 llvm-svn: 213676
*	[ASan] Fix comments about __sanitizer_cov function	Alexey Samsonov	2014-07-22	1	-3/+2
\| \| \| \|	llvm-svn: 213673