bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[x86] Fix a bug in the v8i16 shuffling exposed by the new splat-like	Chandler Carruth	2014-06-28	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \|	lowering for v16i8. ASan and some bots caught this bug with existing test cases. Fixing it even fixed a miscompile with one of the test cases. I'm still a bit suspicious of this test case as I've not taken a proper amount of time to think about it, but the fix here is strict goodness. llvm-svn: 211976
*	[x86] Add handling for splat-like widenings of v16i8 shuffles.	Chandler Carruth	2014-06-28	1	-5/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These show up really frequently, not the least with actual splats. =] We lowered these quite badly before. The new code path tries to widen i8 shuffles to i16 shuffles in a splat-like way. There are still some inefficiencies in our i16 splat logic though, so we aren't really done here. Also, for certain patterns (bit of a gather-and-splat) we still generate pretty silly code, and I've left a fixme for addressing it. However, I'm not actually worried about this code pattern as much. The old shuffle lowering generates a 29 instruction monstrosity for it that should execute much more slowly. llvm-svn: 211974
*	This file wasn't supposed to be checked in	David Majnemer	2014-06-28	1	-156/+0
\| \| \| \| \| \| \| \| \|	This was generated while trying to debug a test, it shouldn't have been checked in. Thanks to Alexander Kornienko for spotting this. llvm-svn: 211973
*	Revert "Temporary hack to try cleaning extra .s file from bots."	Matt Arsenault	2014-06-27	1	-1/+0
\| \| \| \|	llvm-svn: 211967
*	Temporary hack to try cleaning extra .s file from bots.	Matt Arsenault	2014-06-27	1	-0/+1
\| \| \| \|	llvm-svn: 211963
*	[AArch64] Fix memset ICE when memset value is f128.	Chad Rosier	2014-06-27	1	-0/+19
\| \| \| \|	llvm-svn: 211960
*	[x86] Fix another bug hit when bootstrapping with the new shuffle	Chandler Carruth	2014-06-27	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	lowering. For maximum irony, I had already discovered this bug, diagnosed it, and left FIXMEs about it in the test cases. =[ I just failed to go back over those until after i had reduced a bootstrap miscompile down to a single TU, stared at the assembly for an hour, and figured out the bug. Again. Oh well. llvm-svn: 211955
*	[NVPTX] Add reflect intrinsic (better than matching by function name)	Justin Holewinski	2014-06-27	1	-0/+14
\| \| \| \| \| \|	Also clean up some of the logic in NVVMReflect.cpp while we're messing around in there. llvm-svn: 211948
*	[NVPTX] Add 'b' asm constraint	Justin Holewinski	2014-06-27	1	-0/+7
\| \| \| \|	llvm-svn: 211946
*	[NVPTX] Error out if initializer is given for variable in an address space ↵	Justin Holewinski	2014-06-27	1	-0/+5
\| \| \| \| \| \|	that does not support initialization llvm-svn: 211943
*	[NVPTX] Add support for .managed variables for UVM	Justin Holewinski	2014-06-27	1	-0/+11
\| \| \| \|	llvm-svn: 211942
*	[NVPTX] Emit .weak linkage for link_once, weak, available_externally, and ↵	Justin Holewinski	2014-06-27	1	-0/+9
\| \| \| \| \| \|	common linkage llvm-svn: 211941
*	[NVPTX] Fix handling of ldg/ldu intrinsics.	Justin Holewinski	2014-06-27	3	-5/+47
\| \| \| \| \| \| \| \| \| \|	The address space of the pointer must be global (1) for these intrinsics. There must also be alignment metadata attached to the intrinsic calls, e.g. %val = tail call i32 @llvm.nvvm.ldu.i.global.i32.p1i32(i32 addrspace(1)* %ptr), !align !0 !0 = metadata !{i32 4} llvm-svn: 211939
*	[NVPTX] Clean up argument lowering code and properly handle alignment for ↵	Justin Holewinski	2014-06-27	1	-0/+13
\| \| \| \| \| \|	structs and vectors llvm-svn: 211938
*	[NVPTX] Add support for [SHL,SRA,SRL]_PARTS	Justin Holewinski	2014-06-27	1	-0/+38
\| \| \| \|	llvm-svn: 211936
*	[NVPTX] Implement fma and imad contraction as target DAGCombiner patterns	Justin Holewinski	2014-06-27	2	-0/+46
\| \| \| \| \| \|	This also introduces DAGCombiner patterns for mul.wide to multiply two smaller integers and produce a larger integer llvm-svn: 211935
*	[NVPTX] Add support for efficient rotate instructions on SM 3.2+	Justin Holewinski	2014-06-27	1	-0/+58
\| \| \| \|	llvm-svn: 211934
*	[NVPTX] Add missing isel patterns for 64-bit atomics	Justin Holewinski	2014-06-27	1	-0/+141
\| \| \| \|	llvm-svn: 211933
*	[NVPTX] Add isel patterns for bit-field extract (bfe)	Justin Holewinski	2014-06-27	1	-0/+32
\| \| \| \|	llvm-svn: 211932
*	[NVPTX] Add support for isspacep instruction	Justin Holewinski	2014-06-27	1	-0/+35
\| \| \| \|	llvm-svn: 211931
*	[NVPTX] Add support for envreg reads	Justin Holewinski	2014-06-27	1	-0/+139
\| \| \| \|	llvm-svn: 211930
*	[NVPTX] Emit .weak when linkage is not external, internal, or private	Justin Holewinski	2014-06-27	1	-0/+12
\| \| \| \|	llvm-svn: 211926
*	[x86] Fix a miscompile in the new shuffle lowering uncovered by	Chandler Carruth	2014-06-27	1	-19/+23
\| \| \| \| \| \| \| \| \|	a bootstrap. I managed to mis-remember how PACKUS worked on x86, and was using undef for the high bytes instead of zero. The fix is fairly obvious. llvm-svn: 211922
*	IR: Add COMDATs to the IR	David Majnemer	2014-06-27	7	-0/+294
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This new IR facility allows us to represent the object-file semantic of a COMDAT group. COMDATs allow us to tie together sections and make the inclusion of one dependent on another. This is required to implement features like MS ABI VFTables and optimizing away certain kinds of initialization in C++. This functionality is only representable in COFF and ELF, Mach-O has no similar mechanism. Differential Revision: http://reviews.llvm.org/D4178 llvm-svn: 211920
*	Fix test so it doesn't try to write out temporary files into the test tree.	David Blaikie	2014-06-27	1	-1/+1
\| \| \| \|	llvm-svn: 211916
*	R600: Don't crash on unhandled instruction in promote alloca	Matt Arsenault	2014-06-27	3	-0/+65
\| \| \| \|	llvm-svn: 211906
*	[PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex	Ulrich Weigand	2014-06-27	1	-0/+362
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've run into a bug where current LLVM at -O0 (with fast-isel) generated invalid code like: ld 0, 20936(1) # 8-byte Folded Reload stw 12, 10348(0) stw 12, 10344(0) The underlying vreg had been introduced as base register by the Local Stack Slot Allocation pass. That register was constrained to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match the ADDI instruction used to set it, but it was not constrained to G8RC_NOX0 to fit the use of the register in an address. That should have happened in PPCRegisterInfo::resolveFrameIndex. This patch adds an appropriate constrainRegClass call. Reviewed by Hal Finkel. llvm-svn: 211897
*	[x86] Teach the target combine step to aggressively fold pshufd insturcions.	Chandler Carruth	2014-06-27	2	-2/+61
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows it to fold pshufd instructions across intervening half-shuffles and other noise. This pattern actually shows up in the generic lowering tests, but I've also added direct tests using intrinsics to make sure that the specific desired functionality is working even if the lowering stuff changes in the future. Differential Revision: http://reviews.llvm.org/D4292 llvm-svn: 211892
*	[x86] Teach the target-specific combining how to aggressively fold	Chandler Carruth	2014-06-27	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	half-shuffles, even looking through intervening instructions in a chain. Summary: This doesn't happen to show up with any test cases I've found for the current shuffle lowering, but previous attempts would benefit from this and it seems generally useful. I've tested it directly using intrinsics, which also shows that it will work with hand vectorized code as well. Note that even though pshufd isn't directly used in these tests, it gets exercised because we combine some of the half shuffles into a pshufd first, and then merge them. Differential Revision: http://reviews.llvm.org/D4291 llvm-svn: 211890
*	[x86] Teach the X86 backend to DAG-combine SSE2 shuffles that are	Chandler Carruth	2014-06-27	2	-42/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	trivially redundant. This fixes several cases in the new vector shuffle lowering algorithm which would generate redundant shuffle instructions for the sake of simplicity. I'm also deleting a testcase which was somewhat ridiculous. It was checking for a bug in 2007 about incorrectly transforming shuffles by looking for the string "-86" in the output of a pretty substantial function. This test case doesn't seem to have any value at this point. Differential Revision: http://reviews.llvm.org/D4240 llvm-svn: 211889
*	[x86] Begin a significant overhaul of how vector lowering is done in the	Chandler Carruth	2014-06-27	4	-0/+977
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	x86 backend. This sketches out a new code path for vector lowering, hidden behind an off-by-default flag while it is under development. The fundamental idea behind the new code path is to aggressively break down the problem space in ways that ease selecting the odd set of instructions available on x86, and carefully avoid scalarizing code even when forced to use older ISAs. Notably, this starts off restricting itself to SSE2 and implements the complete vector shuffle and blend space for 128-bit vectors in SSE2 without scalarizing. The plan is to layer on top of this ISA extensions where we can bail out of the complex SSE2 lowering and opt for a cheaper, specialized instruction (or set of instructions). It also needs to be generalized to AVX and AVX512 vector widths. Currently, this does a decent but not perfect job for SSE2. There are some specific shortcomings that I plan to address: - We need a peephole combine to fold together shuffles where possible. There are cases where a previous shuffle could be modified slightly to arrange for elements to be in the correct position and a later shuffle eliminated. Doing this eagerly added quite a bit of complexity, and so my plan is to combine away these redundancies afterward. - There are a lot more clever ways to use unpck and pack that need to be added. This is essential for real world shuffles as it turns out... Once SSE2 is polished a bit I should be able to get interesting numbers on performance improvements on benchmarks conducive to vectorization. All of this will be off by default until it is functionally equivalent of course. Differential Revision: http://reviews.llvm.org/D4225 llvm-svn: 211888
*	MachineScheduler: add some book-keeping to fix an assert.	Andrew Trick	2014-06-27	1	-0/+50
\| \| \| \| \| \| \| \|	Fixe for Bug 20057 - Assertion failied in llvm::SUnit* llvm::SchedBoundary::pickOnlyChoice(): Assertion `i <= (HazardRec->getMaxLookAhead() + MaxObservedStall) && "permanent hazard"' Thanks to Chad for the test case. llvm-svn: 211865
*	R600: Add some testcases for promote alloca pass.	Matt Arsenault	2014-06-27	1	-1/+68
\| \| \| \| \| \| \|	More complicated GEPs are skipped. Add some tests to actually stress this skipping. llvm-svn: 211859
*	[StackMaps] Enable patchpoint liveness analysis per default.	Juergen Ributzka	2014-06-26	1	-2/+2
\| \| \| \|	llvm-svn: 211817
*	[Stackmaps] Remove the liveness calculation for stackmap intrinsics.	Juergen Ributzka	2014-06-26	1	-110/+41
\| \| \| \| \| \| \| \| \| \|	There is no need to calculate the liveness information for stackmaps. The liveness information is still available for the patchpoint intrinsic and that is also the intended usage model. Related to <rdar://problem/17473725> llvm-svn: 211816
*	R600/SI: Add FP mode bits to binary.	Matt Arsenault	2014-06-26	1	-0/+10
\| \| \| \| \| \| \| \|	The default rounding mode to initialize the mode register needs to be reported to the runtime. Fill in other bits a kernel may be interested in setting for future use. llvm-svn: 211791
*	[X86] Improve the selection of SSE3/AVX addsub instructions.	Andrea Di Biagio	2014-06-26	1	-2/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch teaches the backend how to canonicalize a shuffle vectors according to the rule: - (shuffle (FADD A, B), (FSUB A, B), Mask) -> (shuffle (FSUB A, -B), (FADD A, -B), Mask) Where 'Mask' is: <0,5,2,7> ;; for v4f32 and v4f64 shuffles. <0,3> ;; for v2f64 shuffles. <0,9,2,11,4,13,6,15> ;; for v8f32 shuffles. In general, ISel only knows how to pattern-match a canonical 'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction. This new rule allows to convert a non-canonical dag sequence into a canonical one that will be matched by a single ADDSUB at ISel stage. The idea of converting a non-canonical ADDSUB into a canonical one by swapping the first two operands of the shuffle, and then negating the second operand of the FADD and FSUB, was originally proposed by Hal Finkel. llvm-svn: 211771
*	R600: Fix vector FMA	Matt Arsenault	2014-06-26	1	-7/+65
\| \| \| \|	llvm-svn: 211757
*	[FastISel][X86] Only fold the cmp into the select when both instructions are ↵	Juergen Ributzka	2014-06-25	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \|	in the same basic block. If the cmp is in a different basic block, then it is possible that not all operands of that compare have defined registers. This can happen when one of the operands to the cmp is a load and the load gets folded into the cmp. In this case FastISel will skip the load instruction and the vreg is never defined. llvm-svn: 211730
*	[X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP ↵	Andrea Di Biagio	2014-06-25	5	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(or VPERM2X128). This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to the check for 'isBlendMask'; the idea is that, when possible, we should firstly check if a shuffle performs a blend, and in case, try to lower it into a BLENDI instead of selecting a SHUFP or (worse) a VPERM2X128. In general: - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128; - BLENDPS/D instructions tend to always have better 'reciprocal throughput' than the equivalent SHUFPS/D; - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more than one execution port. This patch: - Moves the check for 'isBlendMask' immediately before the check for 'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE'; - Updates existing tests for sse/avx shuffle/blend instructions to verify that we select (v)blendps/d when possible (instead of (v)shufps/d or vperm2f128). llvm-svn: 211720
*	Rename loop unrolling and loop vectorizer metadata to have a common prefix.	Eli Bendersky	2014-06-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[LLVM part] These patches rename the loop unrolling and loop vectorizer metadata such that they have a common 'llvm.loop.' prefix. Metadata name changes: llvm.vectorizer.* => llvm.loop.vectorizer.* llvm.loopunroll.* => llvm.loop.unroll.* This was a suggestion from an earlier review (http://reviews.llvm.org/D4090) which added the loop unrolling metadata. Patch by Mark Heffernan. llvm-svn: 211710
*	[x86] Add intrinsics for the pshufd, pshuflw, and pshufhw instructions.	Chandler Carruth	2014-06-25	1	-0/+27
\| \| \| \|	llvm-svn: 211694
*	Re-apply r211399, "Generate native unwind info on Win64" with a fix to ↵	NAKAMURA Takumi	2014-06-25	5	-38/+214
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ignore SEH pseudo ops in X86 JIT emitter. -- This patch enables LLVM to emit Win64-native unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment. Because the unwind info is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer. Patch by Vadim Chugunov! Reviewed By: rnk Differential Revision: http://reviews.llvm.org/D4081 llvm-svn: 211691
*	[X86] Add target combine rule to select ADDSUB instructions from a build_vector	Andrea Di Biagio	2014-06-25	1	-0/+318
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch teaches the backend how to combine a build_vector that implements an 'addsub' between packed float vectors into a sequence of vector add and vector sub followed by a VSELECT. The new VSELECT is expected to be lowered into a BLENDI. At ISel stage, the sequence 'vector add + vector sub + BLENDI' is pattern-matched against ISel patterns added at r211427 to select 'addsub' instructions. Added three more ISel patterns for ADDSUB. Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub' instructions. llvm-svn: 211679
*	Fix a regression from r211653.	Rafael Espindola	2014-06-25	1	-0/+18
\| \| \| \| \| \| \|	The method was empty in the null streamer but I mistakenly replaced it with the aborting one in MCStreamer. llvm-svn: 211666
*	CodeGen/X86/pr20088.ll: Add -march=x86-64, or llc fails due to non-x86 ↵	NAKAMURA Takumi	2014-06-25	1	-1/+1
\| \| \| \| \| \|	default target. llvm-svn: 211659
*	[FastISel][X86] Fold XALU condition into branch and compare.	Juergen Ributzka	2014-06-24	1	-2/+348
\| \| \| \| \| \| \|	Optimize the codegen of select and branch instructions to directly use the EFLAGS from the {s\|u}{add\|sub\|mul}.with.overflow intrinsics. llvm-svn: 211645
*	R600: Promote i64 stores to v2i32	Tom Stellard	2014-06-24	1	-2/+1
\| \| \| \| \| \|	Now we need only one 64-bit pattern for stores. llvm-svn: 211643
*	Print a=b as an assignment.	Rafael Espindola	2014-06-24	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	In assembly the expression a=b is parsed as an assignment, so it should be printed as one. This remove a truly horrible hack for producing a label with "a=.". It would be used by codegen but would never be reached by the asm parser. Sorry I missed this when it was first committed. llvm-svn: 211639
*	R600: Fix inconsistency in rsq instructions.	Matt Arsenault	2014-06-24	4	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \|	R600 was using a clamped version of rsq, but SI was not. Add a new rsq_clamped intrinsic and use them consistently. It's unclear to me from the documentation what behavior the R600 instructions have, so I assume they have the legacy behavior described by the SI documents. For R600, use RECIPSQRT_IEEE for both llvm.AMDGPU.rsq.legacy and llvm.AMDGPU.rsq. R600 also has RECIPSQRT_FF, which I'm not sure how it fits in here. llvm-svn: 211637