bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[gc.statepoint] Change gc.statepoint intrinsic's return type to token type ↵	Chen Li	2015-12-26	58	-440/+433
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of i32 type Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob Subscribers: reames, mjacob, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15662 llvm-svn: 256443
*	Add test case for r256433. "[X86] Fix shuffle decoding for variable VPERMIL ↵	Craig Topper	2015-12-26	1	-1/+9
\| \| \| \| \| \|	to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct." llvm-svn: 256435
*	Revert r256432 "Test"	Craig Topper	2015-12-26	1	-9/+1
\| \| \| \| \| \|	This is the test case for r256433, but it got committed incorrectly in my local repo. llvm-svn: 256434
*	Test	Craig Topper	2015-12-26	1	-1/+9
\| \| \| \|	llvm-svn: 256432
*	[WebAssembly] Fix handling of COPY instructions in WebAssemblyRegStackify.	Dan Gohman	2015-12-25	2	-9/+47
\| \| \| \| \| \| \| \| \| \| \| \| \|	Move RegStackify after coalescing and teach it to use LiveIntervals instead of depending on SSA form. This avoids a problem where a register in a COPY instruction is stackified and then subsequently coalesced with a register that is not stackified. This also puts it after the scheduler, which allows us to simplify the EXPR_STACK constraint, as we no longer have instructions being reordered after stackification and before coloring. llvm-svn: 256402
*	[InstCombine] transform more extract/insert pairs into shuffles (PR2109)	Sanjay Patel	2015-12-24	1	-16/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394
*	[X86][PKU] Add {RD,WR}PKRU encoding	Asaf Badouh	2015-12-24	1	-0/+8
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15711 llvm-svn: 256366
*	AVX-512: Kreg set 0/1 optimization	Elena Demikhovsky	2015-12-24	3	-60/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patterns that set a mask register to 0/1 KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn are replaced with KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization. KNL does not recognize dependency-breaking idioms for mask registers, so kxnor %k1, %k1, %k2 has a RAW dependence on %k1. Using %k0 as the undef input register is a performance heuristic based on the assumption that %k0 is used less frequently than the other mask registers, since it is not usable as a write mask. Differential Revision: http://reviews.llvm.org/D15739 llvm-svn: 256365
*	AVX512: VPMOVM2B/W/D/Q intrinsic implementation.	Igor Breger	2015-12-24	4	-35/+193
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org//D15747 llvm-svn: 256364
*	AMDGPU/SI: Fix encoding of flat instructions on VI	Tom Stellard	2015-12-24	1	-209/+307
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15735 llvm-svn: 256360
*	WebAssembly: remove 'external' from test	JF Bastien	2015-12-23	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Linker testing was sad at seeing an unresolved external symbol. For now don't do that: it's valid but we're not playing with multi-file linking yet, and the LLVM tests are used as hacky sanity tests for single-file linking (the GCC torture tests are much better for this purpose). Another solution would be to use '.extern' to make the intent explicit (don't simple-file link this, there's an unresolved symbol), some assemblers use '.extern' while others ignore it, so we wouldn't really be inventing anything new. Reviewers: sunfish, kripken Subscribers: jfb, llvm-commits, dschuff Differential Revision: http://reviews.llvm.org/D15753 llvm-svn: 256353
*	[Statepoints] Use Indirect operands for spill slots	Philip Reames	2015-12-23	1	-33/+31
\| \| \| \| \| \| \| \| \| \|	Teach the statepoint lowering code to emit Indirect stackmap entries for spill inserted by StatepointLowering (i.e. SelectionDAG), but Direct stackmap entries for in-IR allocas which represent manual stack slots. This is what the docs call for (http://llvm.org/docs/StackMaps.html#stack-map-format), but we've been emitting both as Direct. This was pointed out recently on the mailing list as a bug. It also blocks http://reviews.llvm.org/D15632 which extends the lowering to handle vector-of-pointers since only Indirect references can encode a variable sized slot. To implement this, I introduced a new flag on the StackObject class used to maintian information about stack slots. I original considered (and prototyped in http://reviews.llvm.org/D15632), the idea of using the existing isSpillSlot flag, but end up deciding that was a bit too risky and that the cost of adding a new flag was low. Having the new flag will also allow us - in the future - to emit better comments in verbose assembly which indicate where a particular stack spill around a call comes from. (deopt, gc, regalloc). Differential Revision: http://reviews.llvm.org/D15759 llvm-svn: 256352
*	llvm-dwarfdump: Add support for dumping .dSYM bundles.	Adrian Prantl	2015-12-23	1	-0/+6
\| \| \| \| \| \| \| \|	This replicates the logic of Darwin dwarfdump for manually opening up .dSYM bundles without introducing any new dependencies. <rdar://problem/20491670> llvm-svn: 256350
*	[X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined	Simon Pilgrim	2015-12-23	6	-71/+179
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151. As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops. Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well. Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions). Differential Revision: http://reviews.llvm.org/D15477 llvm-svn: 256332
*	[OperandBundles] Have GlobalsModRef play nice with operand bundles	David Majnemer	2015-12-23	1	-1/+11
\| \| \| \| \| \| \|	A call site's use of a Value might not correspond to an argument operand but to a bundle operand. llvm-svn: 256329
*	[OperandBundles] Have TailCallElim play nice with operand bundles	David Majnemer	2015-12-23	1	-0/+10
\| \| \| \| \| \| \| \| \|	A call site's use of a Value might not correspond to an argument operand but to a bundle operand. This fixes PR25928. llvm-svn: 256328
*	[OperandBundles] Have InstCombine play nice with operand bundles	David Majnemer	2015-12-23	1	-0/+11
\| \| \| \| \| \| \|	Don't assume a call's use corresponds to an argument operand, it might correspond to a bundle operand. llvm-svn: 256327
*	[OperandBundles] Have DeadArgElim play nice with operand bundles	David Majnemer	2015-12-23	1	-0/+12
\| \| \| \| \| \| \|	A call site's use of a Value might not correspond to an argument operand but to a bundle operand. llvm-svn: 256326
*	AVX512BW: Enable packed word shift for 512bit vector. Enable lowering scalar ↵	Igor Breger	2015-12-23	10	-492/+1554
\| \| \| \| \| \| \| \|	immidiate shift v64i8 .Fix predicate for AVX1/2 shifts. Differential Revision: http://reviews.llvm.org/D15713 llvm-svn: 256324
*	[WinEH] Don't visit the same catchswitch twice	David Majnemer	2015-12-23	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \|	We visited the same catchswitch twice because it was both the child of another funclet and the predecessor of a cleanuppad. Instead, change the numbering algorithm to only recurse if the unwind destination of the inner funclet agrees with the unwind destination of the catchswitch. This fixes PR25926. llvm-svn: 256317
*	Form reform for MCDwarf.	Paul Robinson	2015-12-23	5	-15/+20
\| \| \| \| \| \| \| \| \|	MCDwarf emits a canned abbreviation table, but was not emitting proper forms for DWARF version 4, which is the default after r249655. Differential Revision: http://reviews.llvm.org/D15732 llvm-svn: 256313
*	[RS4GC] Fix base pair printing for constants.	Manuel Jacob	2015-12-23	2	-2/+2
\| \| \| \| \| \| \|	Previously, "%" + name of the value was printed for each derived and base pointer. This is correct for instructions, but wrong for e.g. globals. llvm-svn: 256305
*	AMDGPU/SI: Use flat for global load/store when targeting HSA	Changpeng Fang	2015-12-22	7	-15/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256282
*	Also add unnamed_addr to functions.	Rafael Espindola	2015-12-22	1	-0/+6
\| \| \| \|	llvm-svn: 256281
*	Delete dead GlobalAliases.	Rafael Espindola	2015-12-22	3	-3/+6
\| \| \| \|	llvm-svn: 256276
*	Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA"	Rafael Espindola	2015-12-22	7	-39/+15
\| \| \| \| \| \| \| \|	This reverts commit r256273. It broke CodeGen/AMDGPU/llvm.dbg.value.ll llvm-svn: 256275
*	AMDGPU/SI: Use flat for global load/store when targeting HSA	Changpeng Fang	2015-12-22	7	-15/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For some reason doing executing an MUBUF instruction with the addr64 bit set and a zero base pointer in the resource descriptor causes the memory operation to be dropped when the shader is executed using the HSA runtime. This kind of MUBUF instruction is commonly used when the pointer is stored in VGPRs. The base pointer field in the resource descriptor is set to zero and and the pointer is stored in the vaddr field. This patch resolves the issue by only using flat instructions for global memory operations when targeting HSA. This is an overly conservative fix as all other configurations of MUBUF instructions appear to work. Reviewers: tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15543 llvm-svn: 256273
*	[BPI] Replace weights by probabilities in BPI.	Cong Hou	2015-12-22	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	This patch removes all weight-related interfaces from BPI and replace them by probability versions. With this patch, we won't use edge weight anymore in either IR or MC passes. Edge probabilitiy is a better representation in terms of CFG update and validation. Differential revision: http://reviews.llvm.org/D15519 llvm-svn: 256263
*	Remove deprecated llvm.experimental.gc.result.{int,float,ptr} intrinsics.	Manuel Jacob	2015-12-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These were deprecated 11 months ago when a generic llvm.experimental.gc.result intrinsic, which works for all types, was added. Reviewers: sanjoy, reames Subscribers: sanjoy, chenli, llvm-commits Differential Revision: http://reviews.llvm.org/D15719 llvm-svn: 256262
*	[RS4GC] Fix crash in the case that a live variable has a constant base.	Manuel Jacob	2015-12-22	2	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously, RS4GC crashed in CreateGCRelocates() because it assumed that every base is also in the array of live variables, which isn't true if a live variable has a constant base. This change fixes the crash by making sure CreateGCRelocates() won't try to relocate a live variable with a constant base. This would be unnecessary anyway because anything with a constant base won't move. Reviewers: reames Subscribers: llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15556 llvm-svn: 256252
*	[AArch64] Promote loads from stored	Jun Bum Lim	2015-12-22	3	-5/+671
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a recommit of r256004 which was reverted in r256160. The issue was the incorrect promotion for half and byte loads transformed into mov instructions. This fix will replace half and byte type loads only with bit field extracts. Original commit message: This change promotes load instructions which directly read from stored by replacing them with mov instructions. If the store is wider than the load, the load will be replaced with a bitfield extract. For example : STRWui %W1, %X0, 1 %W0 = LDRHHui %X0, 3 becomes STRWui %W1, %X0, 1 %W0 = UBFMWri %W1, 16, 31 llvm-svn: 256249
*	[X86][AVX512] Add rcp14 and rsqrt14 intrinsics	Asaf Badouh	2015-12-22	1	-0/+181
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15414 llvm-svn: 256237
*	[ASMPrinter] Fix missing handling of DW_OP_bit_piece	Keno Fischer	2015-12-22	1	-0/+64
\| \| \| \| \| \| \| \|	In r256077, I added printing for DIExpressions in DEBUG_VALUE comments, but neglected to handle DW_OP_bit_piece operands. Thanks to Mikael Holmen and Joerg Sonnenberger for spotting this. llvm-svn: 256236
*	[MC] Don't use the architecture to govern which object file format to use	David Majnemer	2015-12-22	3	-11/+11
\| \| \| \| \| \| \| \| \| \| \|	InitMCObjectFileInfo was trying to override the triple in awkward ways. For example, a triple specifying COFF but not Windows was forced as ELF. This makes it easy for internal invariants to get violated, such as those which triggered PR25912. This fixes PR25912. llvm-svn: 256226
*	Partial fix for PR25912, see comment 13. Should fix the sanitizer bootstrap bot	Kostya Serebryany	2015-12-22	1	-1/+1
\| \| \| \|	llvm-svn: 256225
*	Handle empty Subprogram list when linking metadata.	Teresa Johnson	2015-12-22	1	-0/+17
\| \| \| \| \| \| \| \|	Use an iterator that handles an empty subprogram list. Fixes PR25915. llvm-svn: 256224
*	Determine callee's hotness and adjust threshold based on that. NFC.	Easwaran Raman	2015-12-22	2	-0/+78
\| \| \| \| \| \| \| \| \| \|	This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold callees and uses values of inlinehint-threshold and inlinecold-threshold respectively as the thresholds for such callees. Differential Revision: http://reviews.llvm.org/D15245 llvm-svn: 256222
*	[safestack] Add option for non-TLS unsafe stack pointer.	Evgeniy Stepanov	2015-12-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an option, -safe-stack-no-tls, for using normal storage instead of thread-local storage for the unsafe stack pointer. This can be useful when SafeStack is applied to an operating system kernel. http://reviews.llvm.org/D15673 Patch by Michael LeMay. llvm-svn: 256221
*	[PGO] Fix another comdat related issue for COFF	Xinliang David Li	2015-12-22	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	The linker requires that a comdat section must be associated with a another comdat section that precedes it. This means the comdat section's name needs to use the profile name var's name. Patch tested by Johan Engelen. llvm-svn: 256220
*	Fix test case comment (NFC)	Xinliang David Li	2015-12-21	1	-2/+2
\| \| \| \|	llvm-svn: 256206
*	[cfi] Fix LowerBitSets on 32-bit targets.	Evgeniy Stepanov	2015-12-21	1	-0/+21
\| \| \| \| \| \| \|	This code attempts to truncate IntPtrTy to i32, which may be the same type. llvm-svn: 256205
*	[MC, COFF] Support link /incremental conditionally	David Majnemer	2015-12-21	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Today, we always take into account the possibility that object files produced by MC may be consumed by an incremental linker. This results in us initialing fields which vary with time (TimeDateStamp) which harms hermetic builds (e.g. verifying a self-host went well) and produces sub-optimal code because we cannot assume anything about the relative position of functions within a section (call sites can get redirected through incremental linker thunks). Let's provide an MCTargetOption which controls this behavior so that we can disable this functionality if we know a-priori that the build will not rely on /incremental. llvm-svn: 256203
*	Enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst	Jun Bum Lim	2015-12-21	4	-4/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is recommit of r256028 with minor fixes in unittests: CodeGen/Mips/eh.ll CodeGen/Mips/insn-zero-size-bb.ll Original commit message: When identifying blocks post-dominated by an unreachable-terminated block in BranchProbabilityInfo, consider only the edge to the normal destination block if the terminator is InvokeInst and let calcInvokeHeuristics() decide edge weights for the InvokeInst. llvm-svn: 256202
*	Resubmit r256193 with test fix: assertion failure analyzed	Xinliang David Li	2015-12-21	1	-0/+7
\| \| \| \|	llvm-svn: 256201
*	Revert r256193: build bot failure triggered	Xinliang David Li	2015-12-21	1	-8/+0
\| \| \| \|	llvm-svn: 256198
*	[X86][SSE] Transform truncations between vectors of integers into ↵	Cong Hou	2015-12-21	2	-154/+193
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	X86ISD::PACKUS/PACKSS operations during DAG combine. This patch transforms truncation between vectors of integers into X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in lowering phase because after type legalization, the original truncation will be turned into a BUILD_VECTOR with each element that is extracted from a vector and then truncated, and from them it is difficult to do this optimization. This greatly improves the performance of truncations on some specific types. Cost table is updated accordingly. Differential revision: http://reviews.llvm.org/D14588 llvm-svn: 256194
*	[PGO] Fix profile var comdat generation problem with COFF	Xinliang David Li	2015-12-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \|	When targeting COFF, it is required that a comdat section to have a global obj with the same name as the comdat (except for comdats with select kind to be associative). This fix makes sure that the comdat is keyed on the data variable for COFF. Also improved test coverage for this. llvm-svn: 256193
*	[ValueTracking] Properly handle non-sized types in isAligned function.	Michael Zolotukhin	2015-12-21	1	-0/+20
\| \| \| \| \| \| \| \| \| \|	Reviewers: apilipenko, reames, sanjoy, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15597 llvm-svn: 256192
*	Fix PR24563 (LiveDebugVariables unconditionally propagates all DBG_VALUEs)	Adrian Prantl	2015-12-21	2	-2/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LiveDebugVariables unconditionally propagates all DBG_VALUE down the dominator tree, which happens to work fine if there already is another DBG_VALUE or the DBG_VALUE happends to describe a single-assignment vreg but is otherwise wrong if the DBG_VALUE is coming from only one of the predecessors. In r255759 we introduced a proper data flow analysis scheduled after LiveDebugVariables that correctly propagates DBG_VALUEs across basic block boundaries. With the new pass in place, the incorrect propagation in LiveDebugVariables can be retired witout loosing any of the benefits where LiveDebugVariables happened to do the right thing. llvm-svn: 256188
*	Convert the CodeGen/ARM/sched-it-debug-nodes.ll testcase from IR -> MIR.	Adrian Prantl	2015-12-21	2	-88/+160
\| \| \| \| \| \| \|	NFC PR24563 llvm-svn: 256187