bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	R600: Implement enableClusterLoads()	Matt Arsenault	2014-07-24	2	-0/+7
\| \| \| \|	llvm-svn: 213831
*	[AArch64] Fix a bug generating incorrect instruction when building small vector.	Kevin Qin	2014-07-24	2	-38/+70
\| \| \| \| \| \| \| \| \|	This bug is introduced by r211144. The element of operand may be smaller than the element of result, but previous commit can only handle the contrary condition. This commit is to handle this scenario and generate optimized codes like ZIP1. llvm-svn: 213830
*	[AArch64] Disable some optimization cases for type conversion from sint to ↵	Jiangning Liu	2014-07-24	2	-4/+30
\| \| \| \| \| \|	fp, because those optimization cases are micro-architecture dependent and only make sense for Cyclone. A new predicate Cyclone is introduced in .td file. llvm-svn: 213827
*	Fixed PR20411 - bug in getINSERTPS()	Filipe Cabecinhas	2014-07-24	2	-0/+32
\| \| \| \| \| \| \| \| \| \|	When we had a vector_shuffle where we had an input from each vector, we could miscompile it because we were assuming the input from V2 wouldn't be moved from where it was on the vector. Added a test case. llvm-svn: 213826
*	IR: Add Value::sortUseList()	Duncan P. N. Exon Smith	2014-07-24	3	-0/+176
\| \| \| \| \| \| \| \| \| \| \|	Add `Value::sortUseList()`, templated on the comparison function to use. The sort is an iterative merge sort that uses a binomial vector of already-merged lists to limit the size overhead to `O(1)`. This is part of PR5680. llvm-svn: 213824
*	Add a VS "14" msbuild toolset	Reid Kleckner	2014-07-23	5	-0/+75
\| \| \| \| \| \| \| \| \| \| \|	This allows people to try clang inside MSBuild with the VS "14" CTP releases. Fixes PR20341. Patch by Marcel Raad! llvm-svn: 213819
*	SimplifyCFG: fix a bug in switch to table conversion	Manman Ren	2014-07-23	2	-4/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use gep to access the global array "switch.table", and the table index should be treated as unsigned. When the highest bit is 1, this commit zero-extends the index to an integer type with larger size. For a switch on i2, we used to generate: %switch.tableidx = sub i2 %0, -2 getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate %switch.tableidx = sub i2 %0, -2 %switch.tableidx.zext = zext i2 %switch.tableidx to i3 getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext rdar://17735071 llvm-svn: 213815
*	Fix the build when building with only the ARM backend.	Rafael Espindola	2014-07-23	1	-1/+1
\| \| \| \|	llvm-svn: 213814
*	Document what backwards compatibility we provide for bitcode.	Rafael Espindola	2014-07-23	1	-0/+23
\| \| \| \|	llvm-svn: 213813
*	Let llvm/test/CodeGen/X86/avx512*-mask-op.ll(s) aware of Win32 x64 calling ↵	NAKAMURA Takumi	2014-07-23	3	-10/+10
\| \| \| \| \| \|	convention. llvm-svn: 213812
*	Fix indenting.	Eric Christopher	2014-07-23	1	-13/+14
\| \| \| \|	llvm-svn: 213811
*	[x86] Rip out some broken test cases for avx512 i1 store support.	Chandler Carruth	2014-07-23	1	-29/+0
\| \| \| \| \| \| \| \| \| \|	It isn't reasonable to test storing things using undef pointers -- storing through those is at best "good luck" and really should be transformed to "unreachable". Random changes in the combiner can randomly break these tests for no good reason. I'm following up on the original commit regarding the right long-term strategy here. llvm-svn: 213810
*	Reorganize and simplify local variables.	Eric Christopher	2014-07-23	1	-13/+11
\| \| \| \|	llvm-svn: 213809
*	Finish inverting the MC -> Object dependency.	Rafael Espindola	2014-07-23	8	-9/+9
\| \| \| \| \| \| \|	There were still some disassembler bits in lib/MC, but their use of Object was only visible in the includes they used, not in the symbols. llvm-svn: 213808
*	[RuntimeDyld][AArch64] Update relocation tests and also add a simple GOT test.	Juergen Ributzka	2014-07-23	1	-25/+40
\| \| \| \|	llvm-svn: 213807
*	Remove the query for TargetMachine and TargetInstrInfo since we're	Eric Christopher	2014-07-23	1	-3/+1
\| \| \| \| \| \|	already inside TargetInstrInfo. llvm-svn: 213806
*	ArgPromo+DebugInfo: Handle updating debug info over multiple applications of ↵	David Blaikie	2014-07-23	2	-9/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument promotion. While the subprogram map cache used by Dead Argument Elimination works there, I made a mistake when reusing it for Argument Promotion in r212128 because ArgPromo may transform functions more than once whereas DAE transforms each function only once, removing all the dead arguments in one go. To address this, ensure that the map is updated after each argument promotion. In retrospect it might be a little wasteful to create a map of all subprograms when only handling a single CGSCC, but the alternative is walking the debug info for each function in the CGSCC that gets updated. It's not clear to me what the right tradeoff is there, but since the current tradeoff seems to be working OK (and the code to keep things updated is very cheap), let's stick with that for now. llvm-svn: 213805
*	Test debug info in arg promotion with an actual promotion case, rather than ↵	David Blaikie	2014-07-23	1	-5/+8
\| \| \| \| \| \| \| \| \|	a degenerate arg promotion that's actually DAE performed by ArgPromo Also the debug location I had here was bogus, describing the location of the call site as in the callee - and unnecessary, so just drop it. llvm-svn: 213803
*	Use an explicit triple in testcase.	Jim Grosbach	2014-07-23	1	-1/+1
\| \| \| \| \| \|	Make the test work better on non-darwin hosts. Hopefully. llvm-svn: 213801
*	[X86,AArch64] Extend vcmp w/ unary op combine to work w/ more constants.	Jim Grosbach	2014-07-23	4	-6/+44
\| \| \| \| \| \| \| \|	The transform to constant fold unary operations with an AND across a vector comparison applies when the constant is not a splat of a scalar as well. llvm-svn: 213800
*	X86: restrict combine to when type sizes are safe.	Jim Grosbach	2014-07-23	4	-9/+51
\| \| \| \| \| \| \| \|	The folding of unary operations through a vector compare and mask operation is only safe if the unary operation result is of the same size as its input. For example, it's not safe for [su]itofp from v4i32 to v4f64. llvm-svn: 213799
*	DAG: fp->int conversion for non-splat constants.	Jim Grosbach	2014-07-23	2	-13/+26
\| \| \| \| \| \| \| \| \| \|	Constant fold the lanes of the input constant build_vector individually so we correctly handle when the vector elements are not all the same constant value. PR20394 llvm-svn: 213798
*	[NVPTX] Add some extra tests for mul.wide to test non-power-of-two source types	Justin Holewinski	2014-07-23	1	-0/+22
\| \| \| \|	llvm-svn: 213794
*	[NVPTX] Silence a GCC warning found by the buildbots	Justin Holewinski	2014-07-23	1	-1/+1
\| \| \| \| \| \| \|	The cast to NVPTXTargetLowering was missing a 'const', but let's just access the right pointer through the subtarget anyway. llvm-svn: 213793
*	Do not add unroll disable metadata after unrolling pass for loops with ↵	Mark Heffernan	2014-07-23	2	-17/+50
\| \| \| \| \| \|	#pragma clang loop unroll(full). llvm-svn: 213789
*	[FastISel][AArch64] Fix return type in FastLowerCall.	Juergen Ributzka	2014-07-23	2	-4/+16
\| \| \| \| \| \| \| \| \| \|	I used the wrong method to obtain the return type inside FinishCall. This fix simply uses the return type from FastLowerCall, which we already determined to be a valid type. Reduced test case from Chad. Thanks. llvm-svn: 213788
*	[NVPTX] mul.wide generation works for any smaller integer source types, not ↵	Justin Holewinski	2014-07-23	2	-2/+24
\| \| \| \| \| \|	just the next smaller power of two llvm-svn: 213784
*	[SKX] Added missed test files for rev 213757	Robert Khasanov	2014-07-23	4	-0/+236
\| \| \| \|	llvm-svn: 213780
*	AsmParser: remove deprecated LLIR support	Saleem Abdulrasool	2014-07-23	4	-36/+0
\| \| \| \| \| \| \|	linker_private and linker_private_weak were deprecated in 3.5. Remove support for them now that the 3.5 branch has been created. llvm-svn: 213777
*	ExecutionEngine: remove a stray semicolon	Saleem Abdulrasool	2014-07-23	1	-1/+1
\| \| \| \| \| \|	Detected via GCC 4.8 [-Wpedantic]. llvm-svn: 213776
*	[SKX] Fix lowercase "error:" in rev 213757	Robert Khasanov	2014-07-23	1	-22/+22
\| \| \| \|	llvm-svn: 213774
*	[NVPTX] Make sure we do not generate MULWIDE ISD nodes when optimizations ↵	Justin Holewinski	2014-07-23	2	-11/+19
\| \| \| \| \| \| \| \| \| \|	are disabled With optimizations disabled, we disable the isel patterns for mul.wide; but we were still generating MULWIDE ISD nodes. Now, we only try to generate MULWIDE ISD nodes in DAGCombine if the optimization level is not zero. llvm-svn: 213773
*	In unroll pragma syntax and loop hint metadata, change "enable" forms to a ↵	Mark Heffernan	2014-07-23	4	-103/+65
\| \| \| \| \| \|	new form using the string "full". llvm-svn: 213772
*	test commit: remove trailing space	Alex Lorenz	2014-07-23	1	-5/+5
\| \| \| \|	llvm-svn: 213770
*	[AArch64] Lower sdiv x, pow2 using add + select + shift.	Chad Rosier	2014-07-23	5	-3/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The target-independent DAGcombiner will generate: asr w1, X, #31 w1 = splat sign bit. add X, X, w1, lsr #28 X = X + 0 or pow2-1 asr w0, X, asr #4 w0 = X/pow2 However, the add + shifts is expensive, so generate: add w0, X, 15 w0 = X + pow2-1 cmp X, wzr X - 0 csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X; asr w0, X, asr 4 w0 = X/pow2 llvm-svn: 213758
*	[SKX] Enabling mask instructions: encoding, lowering	Robert Khasanov	2014-07-23	6	-28/+199
\| \| \| \| \| \| \| \|	KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 213757
*	ARM: spot SBFX-compatbile code expressed with sign_extend_inreg	Tim Northover	2014-07-23	3	-2/+39
\| \| \| \| \| \| \| \|	We were assuming all SBFX-like operations would have the shl/asr form, but often when the field being extracted is an i8 or i16, we end up with a SIGN_EXTEND_INREG acting on a shift instead. Simple enough to check for though. llvm-svn: 213754
*	ARM: add patterns for [su]xta[bh] from just a shift.	Tim Northover	2014-07-23	4	-6/+58
\| \| \| \| \| \| \| \|	Although the final shifter operand is a rotate, this actually only matters for the half-word extends when the amount == 24. Otherwise folding a shift in is just as good. llvm-svn: 213753
*	Enable partial libcall inlining for all targets by default.	James Molloy	2014-07-23	3	-2/+5
\| \| \| \| \| \| \| \|	This pass attempts to speculatively use a sqrt instruction if one exists on the target, falling back to a libcall if the target instruction returned NaN. This was enabled for MIPS and System-Z, but is well guarded and is good for most targets - GCC does this for (that I've checked) X86, ARM and AArch64. llvm-svn: 213752
*	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM STRB ↵	Tilmann Scheller	2014-07-23	2	-1/+21
\| \| \| \| \| \| \| \|	instructions. The ARM ARM prohibits STRB instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STRB instructions with unpredictable behavior. llvm-svn: 213750
*	Added release notes for MIPS.	Daniel Sanders	2014-07-23	1	-1/+69
\| \| \| \|	llvm-svn: 213749
*	AArch64: remove "arm64_be" support in favour of "aarch64_be".	Tim Northover	2014-07-23	19	-63/+31
\| \| \| \| \| \| \| \| \|	There really is no arm64_be: it was a useful fiction to test big-endian support while both backends existed in parallel, but now the only platform that uses the name (iOS) doesn't have a big-endian variant, let alone one called "arm64_be". llvm-svn: 213748
*	[ARM] Make the assembler reject unpredictable pre/post-indexed ARM STR ↵	Tilmann Scheller	2014-07-23	2	-0/+30
\| \| \| \| \| \| \| \|	instructions. The ARM ARM prohibits STR instructions with writeback into the source register. With this commit this constraint is now enforced and we stop assembling STR instructions with unpredictable behavior. llvm-svn: 213745
*	AArch64: remove arm64 triple enumerator.	Tim Northover	2014-07-23	12	-47/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Having both Triple::arm64 and Triple::aarch64 is extremely confusing, and invites bugs where only one is checked. In reality, the only legitimate difference between the two (arm64 usually means iOS) is also present in the OS part of the triple and that's what should be checked. We still parse the "arm64" triple, just canonicalise it to Triple::aarch64, so there aren't any LLVM-side test changes. llvm-svn: 213743
*	Revert r211771. It was: "[X86] Improve the selection of SSE3/AVX addsub ↵	Andrea Di Biagio	2014-07-23	2	-196/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	instructions". This chang fully reverts r211771. That revision added a canonicalization rule which has the potential to causes a combine-cycle in the target-independent canonicalizing DAG combine. The plan is to move the logic that forms target specific addsub nodes as part of the lowering of shuffles. llvm-svn: 213736
*	[x86] Clean up a test case to use check labels and spell out the exact	Chandler Carruth	2014-07-23	1	-74/+116
\| \| \| \| \| \| \| \| \| \|	instruction sequences with CHECK-NEXT for these test cases. This notably exposes how absolutely horrible the generated code is for several of these test cases, and will make any future updates to the test as our vector instruction selection gets better. llvm-svn: 213732
*	[ARM] Add regression test for the earlyclobber constraint of ARM STRB.	Tilmann Scheller	2014-07-23	1	-0/+10
\| \| \| \| \| \|	The constraint was added in r213369. llvm-svn: 213730
*	[ARM] Add earlyclobber constraint to pre/post-indexed ARM STRH instructions.	Tilmann Scheller	2014-07-23	2	-3/+15
\| \| \| \| \| \| \| \|	The post-indexed instructions were missing the constraint, causing unpredictable STRH instructions to be emitted. The earlyclobber constraint on the pre-indexed STR instructions is not strictly necessary, as the instruction selection for pre-indexed STR instructions goes through an additional layer of pseudo instructions which have the constraint defined, however it doesn't hurt to specify the constraint directly on the pre-indexed instructions as well, since at some point someone might create instances of them programmatically and then the constraint is definitely needed. llvm-svn: 213729
*	[SDAG] Make the DAGCombine worklist not grow endlessly due to duplicate	Chandler Carruth	2014-07-23	16	-273/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	insertions. The old behavior could cause arbitrarily bad memory usage in the DAG combiner if there was heavy traffic of adding nodes already on the worklist to it. This commit switches the DAG combine worklist to work the same way as the instcombine worklist where we null-out removed entries and only add new entries to the worklist. My measurements of codegen time shows slight improvement. The memory utilization is unsurprisingly dominated by other factors (the IR and DAG itself I suspect). This change results in subtle, frustrating churn in the particular order in which DAG combines are applied which causes a number of minor regressions where we fail to match a pattern previously matched by accident. AFAICT, all of these should be using AddToWorklist to directly or should be written in a less brittle way. None of the changes seem drastically bad, and a few of the changes seem distinctly better. A major change required to make this work is to significantly harden the way in which the DAG combiner handle nodes which become dead (zero-uses). Previously, we relied on the ability to "priority-bump" them on the combine worklist to achieve recursive deletion of these nodes and ensure that the frontier of remaining live nodes all were added to the worklist. Instead, I've introduced a routine to just implement that precise logic with no indirection. It is a significantly simpler operation than that of the combiner worklist proper. I suspect this will also fix some other problems with the combiner. I think the x86 changes are really minor and uninteresting, but the avx512 change at least is hiding a "regression" (despite the test case being just noise, not testing some performance invariant) that might be looked into. Not sure if any of the others impact specific "important" code paths, but they didn't look terribly interesting to me, or the changes were really minor. The consensus in review is to fix any regressions that show up after the fact here. Thanks to the other reviewers for checking the output on other architectures. There is a specific regression on ARM that Tim already has a fix prepped to commit. Differential Revision: http://reviews.llvm.org/D4616 llvm-svn: 213727
*	We may visit a call that uses an alloca multiple times in ↵	Nick Lewycky	2014-07-23	2	-5/+17
\| \| \| \| \| \|	callUsesLocalStack, sometimes with IsNocapture true and sometimes with IsNocapture false. We accidentally skipped work we needed to do in the IsNocapture=false case if we were called with IsNocapture=true the first time. Fixes PR20405! llvm-svn: 213726