bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Add intrinsics for reading and writing to the flags register	David Majnemer	2016-01-01	8	-30/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LLVM's targets need to know if stack pointer adjustments occur after the prologue. This is needed to correctly determine if the red-zone is appropriate to use or if a frame pointer is required. Normally, LLVM can figure this out very precisely by reasoning about the contents of the MachineFunction. There is an interesting corner case: inline assembly. The vast majority of inline assembly which will perform a push or pop is done so to pair up with pushf or popf as appropriate. Unfortunately, this inline assembly doesn't mark the stack pointer as clobbered because, well, it isn't. The stack pointer is decremented and then immediately incremented. Because of this, LLVM was changed in r256456 to conservatively assume that inline assembly contain a sequence of stack operations. This is unfortunate because the vast majority of inline assembly will not end up manipulating the stack pointer in any way at all. Instead, let's provide a more principled solution: an intrinsic. FWIW, other compilers (MSVC and GCC among them) also provide this functionality as an intrinsic. llvm-svn: 256685
*	[LibCallSimplifier] propagate FMF when shrinking binary calls	Sanjay Patel	2015-12-31	1	-0/+16
\| \| \| \|	llvm-svn: 256682
*	[LibCallSimplifier] propagate FMF when shrinking unary calls	Sanjay Patel	2015-12-31	1	-19/+17
\| \| \| \|	llvm-svn: 256679
*	change function names to avoid accidentally matching the substring	Sanjay Patel	2015-12-31	1	-40/+53
\| \| \| \|	llvm-svn: 256678
*	add 'fast' attribute to calls to show that the flag isn't being propagated	Sanjay Patel	2015-12-31	1	-55/+57
\| \| \| \|	llvm-svn: 256677
*	[AVX512] add PSRLQ and PSRLD Intrinsic	Michael Zuckerman	2015-12-31	2	-0/+227
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15770 llvm-svn: 256673
*	[X86] Avoid folding scalar loads into unary sse intrinsics	Michael Kuperstein	2015-12-31	1	-10/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not folding these cases tends to avoid partial register updates: sqrtss (%eax), %xmm0 Has a partial update of %xmm0, while movss (%eax), %xmm0 sqrtss %xmm0, %xmm0 Has a clobber of the high lanes immediately before the partial update, avoiding a potential stall. Given this, we only want to fold when optimizing for size. This is consistent with the patterns we already have for some of the fp/int converts, and in X86InstrInfo::foldMemoryOperandImpl() Differential Revision: http://reviews.llvm.org/D15741 llvm-svn: 256671
*	[X86][PKU] Add {RD,WR}PKRU intrinsics	Asaf Badouh	2015-12-31	1	-0/+25
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15808 llvm-svn: 256670
*	[ValueTracking] fix bug computing isKnownToBeAPowerOfTwo() with arithmetic ↵	Sanjay Patel	2015-12-30	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	shift right (PR25900) This is a fix for: https://llvm.org/bugs/show_bug.cgi?id=25900 If we think that an arithmetic right shift of a power of two is always a power of two, an sdiv gets wrongly converted to udiv. Differential Revision: http://reviews.llvm.org/D15827 llvm-svn: 256655
*	[JumpThreading] Fix opcode bonus in getJumpThreadDuplicationCost()	Geoff Berry	2015-12-29	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \|	The code that was meant to adjust the duplication cost based on the terminator opcode was not being executed in cases where the initial threshold was hit inside the loop. Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D15536 llvm-svn: 256568
*	[AVX512] add PSRLW Intrinsic	Michael Zuckerman	2015-12-29	2	-0/+122
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15751 llvm-svn: 256558
*	Fix gold test after r256465.	James Y Knight	2015-12-29	1	-1/+1
\| \| \| \| \| \| \|	That commit added a new pass, and this test is sensitive to what the first pass after verify is called. llvm-svn: 256532
*	Accept dwarf version 5 for CIE versions.	Eric Christopher	2015-12-28	1	-0/+1
\| \| \| \|	llvm-svn: 256527
*	[Thumb] Fix assembler error 'cannot honor width suffix pop {lr}'	Artyom Skrobov	2015-12-28	1	-4/+106
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: * avoid generating POP {LR} in Thumb1 epilogues * combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass * combine POP {LR} + B + BX LR -> POP {PC} on v5T+ Test cases by Ana Pazos Differential Revision: http://reviews.llvm.org/D15707 llvm-svn: 256523
*	[x86] lower calls to fmin and llvm.minnum.* using minss/minsd/minps/minpd ↵	Sanjay Patel	2015-12-28	1	-26/+149
\| \| \| \| \| \| \| \| \| \| \|	(PR24475) This is a follow-on to: http://reviews.llvm.org/rL255700 http://reviews.llvm.org/rL256454 http://reviews.llvm.org/rL256510 llvm-svn: 256522
*	[RS4GC] Fix rematerialization of bitcast of bitcast.	Manuel Jacob	2015-12-28	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Previously, only the outer (last) bitcast was rematerialized, resulting in a use of the unrelocated inner (first) bitcast after the statepoint. See the test case for an example. Reviewers: igor-laevsky, reames Subscribers: reames, alex, llvm-commits, sanjoy Differential Revision: http://reviews.llvm.org/D15789 llvm-svn: 256520
*	Implemented cost model for masked gather and scatter operations	Elena Demikhovsky	2015-12-28	1	-1/+214
\| \| \| \| \| \| \| \| \|	The cost is calculated for all X86 targets. When gather/scatter instruction is not supported we calculate the cost of scalar sequence. Differential revision: http://reviews.llvm.org/D15677 llvm-svn: 256519
*	[x86] lower calls to fmax and llvm.maxnum.* using maxps/maxpd (PR24475)	Sanjay Patel	2015-12-28	1	-24/+93
\| \| \| \| \| \| \| \|	This is a follow-on to: http://reviews.llvm.org/rL255700 http://reviews.llvm.org/rL256454 llvm-svn: 256510
*	Specify triple so 'make check' passes on darwin x86-64	Sanjay Patel	2015-12-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The check lines were added with: http://reviews.llvm.org/rL256458 http://reviews.llvm.org/rL256460 but on a darwin target, the output looks like: ## InlineAsm Start rorq %rdi ## InlineAsm End ## InlineAsm Start rorq %rsi ## InlineAsm End leaq (%rsi,%rdi), %rax retq llvm-svn: 256507
*	Support clrex instruction on ARMv6k. Patch by Andrew Turner.	Roman Divacky	2015-12-28	1	-5/+18
\| \| \| \|	llvm-svn: 256505
*	[X86] Better support for the MCU psABI (LLVM part)	Michael Kuperstein	2015-12-28	1	-1/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for the MCU psABI in a way different from r251223 and r251224, basically reverting most of these two patches. The problem with the approach taken in r251223/4 is that it only handled libcalls that originated from the backend. However, the mid-end also inserts quite a few libcalls and assumes these use the platform's default calling convention. The previous patch tried to insert inregs when necessary both in the FE and, somewhat hackily, in the CG. Instead, we now define a new default calling convention for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does. Differential Revision: http://reviews.llvm.org/D15054 llvm-svn: 256494
*	[X86][AVX512] Lower broadcast sub vector to vector inrtrinsics	Asaf Badouh	2015-12-28	4	-0/+227
\| \| \| \| \| \| \| \| \| \| \| \| \|	lower broadcast<type>x<vector> to shuffles. there are two cases: 1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0. 2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op). Differential Revision: http://reviews.llvm.org/D15790 llvm-svn: 256490
*	[X86][AVX512] add fp scalar broadcast intrinsics	Asaf Badouh	2015-12-28	2	-14/+81
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15790 llvm-svn: 256489
*	[AVX512] Bring vmovq instructions names into alignment with the AVX and SSE ↵	Craig Topper	2015-12-28	1	-12/+12
\| \| \| \| \| \| \| \|	names. Add a missing encoding to disassembler and assembler. I believe this also fixes a case where a 64-bit memory form that is documented as being unsupported in 32-bit mode was able to be selected there. llvm-svn: 256483
*	AVX512: Change VPMOVB2M DAG lowering , use CVT2MASK node instead TRUNCATE.	Igor Breger	2015-12-27	13	-860/+3962
\| \| \| \| \| \| \| \| \|	Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB. Implement VPMOVB/W/D/Q2M intrinsic. Differential Revision: http://reviews.llvm.org/D15675 llvm-svn: 256470
*	[attrs] Extract the pure inference of function attributes into	Chandler Carruth	2015-12-27	3	-24/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a standalone pass. There is no call graph or even interesting analysis for this part of function attributes -- it is literally inferring attributes based on the target library identification. As such, we can do it using a much simpler module pass that just walks the declarations. This can also happen much earlier in the pass pipeline which has benefits for any number of other passes. In the process, I've cleaned up one particular aspect of the logic which was necessary in order to separate the two passes cleanly. It now counts inferred attributes independently rather than just counting all the inferred attributes as one, and the counts are more clearly explained. The two test cases we had for this code path are both ... woefully inadequate and copies of each other. I've kept the superset test and updated it. We need more testing here, but I had to pick somewhere to stop fixing everything broken I saw here. Differential Revision: http://reviews.llvm.org/D15676 llvm-svn: 256466
*	[attrs] Split off the forced attributes utility into its own pass that	Chandler Carruth	2015-12-27	2	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	is (by default) run much earlier than FuncitonAttrs proper. This allows forcing optnone or other widely impactful attributes. It is also a bit simpler as the force attribute behavior needs no specific iteration order. I've added the pass into the default module pass pipeline and LTO pass pipeline which mirrors where function attrs itself was being run. Differential Revision: http://reviews.llvm.org/D15668 llvm-svn: 256465
*	Make the test properly constrained	David Majnemer	2015-12-27	1	-1/+1
\| \| \| \|	llvm-svn: 256460
*	Try to passify buildbot	David Majnemer	2015-12-27	1	-1/+10
\| \| \| \|	llvm-svn: 256458
*	Prune the feature "tls". No one is using it since TLS is enabled for Cygwin.	NAKAMURA Takumi	2015-12-27	1	-4/+0
\| \| \| \|	llvm-svn: 256457
*	[X86, Win64] Use a frame pointer if pushf is emitted	David Majnemer	2015-12-27	6	-15/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A frame pointer must be used if stack pointer is modified after the prologue. LLVM will emit pushf/popf if we need to save/restore the FLAGS register, requiring us to have a frame pointer for the function. There is a small twist: this sequence might exist in user code via inline-assembly. For now, conservatively assume that such functions require a frame pointer. For real world justification, please see clang's implementation of __readeflags. This fixes PR25945. llvm-svn: 256456
*	[WinEH] Add comments explaining the EH tables	David Majnemer	2015-12-27	3	-16/+16
\| \| \| \| \| \| \|	This is aids in debugging WinEH, similar functionality is present for DWARF EH. llvm-svn: 256455
*	[x86] lower calls to llvm.maxnum.v4f32 using maxps	Sanjay Patel	2015-12-26	1	-131/+16
\| \| \| \| \| \| \|	This is a follow-on to: http://reviews.llvm.org/rL255700 llvm-svn: 256454
*	Fix safepoint intrinsic signatures in test.	Benjamin Kramer	2015-12-26	2	-4/+4
\| \| \| \| \| \|	Should bring back the bots after r256443. llvm-svn: 256450
*	[gc.statepoint] Change gc.statepoint intrinsic's return type to token type ↵	Chen Li	2015-12-26	58	-440/+433
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of i32 type Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob Subscribers: reames, mjacob, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D15662 llvm-svn: 256443
*	Add test case for r256433. "[X86] Fix shuffle decoding for variable VPERMIL ↵	Craig Topper	2015-12-26	1	-1/+9
\| \| \| \| \| \|	to be tolerant of the Constant type not matching due to folding in the constant pool and to get VPERMILPD correct." llvm-svn: 256435
*	Revert r256432 "Test"	Craig Topper	2015-12-26	1	-9/+1
\| \| \| \| \| \|	This is the test case for r256433, but it got committed incorrectly in my local repo. llvm-svn: 256434
*	Test	Craig Topper	2015-12-26	1	-1/+9
\| \| \| \|	llvm-svn: 256432
*	[WebAssembly] Fix handling of COPY instructions in WebAssemblyRegStackify.	Dan Gohman	2015-12-25	2	-9/+47
\| \| \| \| \| \| \| \| \| \| \| \| \|	Move RegStackify after coalescing and teach it to use LiveIntervals instead of depending on SSA form. This avoids a problem where a register in a COPY instruction is stackified and then subsequently coalesced with a register that is not stackified. This also puts it after the scheduler, which allows us to simplify the EXPR_STACK constraint, as we no longer have instructions being reordered after stackification and before coloring. llvm-svn: 256402
*	[InstCombine] transform more extract/insert pairs into shuffles (PR2109)	Sanjay Patel	2015-12-24	1	-16/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394
*	[X86][PKU] Add {RD,WR}PKRU encoding	Asaf Badouh	2015-12-24	1	-0/+8
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D15711 llvm-svn: 256366
*	AVX-512: Kreg set 0/1 optimization	Elena Demikhovsky	2015-12-24	3	-60/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patterns that set a mask register to 0/1 KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn are replaced with KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization. KNL does not recognize dependency-breaking idioms for mask registers, so kxnor %k1, %k1, %k2 has a RAW dependence on %k1. Using %k0 as the undef input register is a performance heuristic based on the assumption that %k0 is used less frequently than the other mask registers, since it is not usable as a write mask. Differential Revision: http://reviews.llvm.org/D15739 llvm-svn: 256365
*	AVX512: VPMOVM2B/W/D/Q intrinsic implementation.	Igor Breger	2015-12-24	4	-35/+193
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org//D15747 llvm-svn: 256364
*	AMDGPU/SI: Fix encoding of flat instructions on VI	Tom Stellard	2015-12-24	1	-209/+307
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15735 llvm-svn: 256360
*	WebAssembly: remove 'external' from test	JF Bastien	2015-12-23	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Linker testing was sad at seeing an unresolved external symbol. For now don't do that: it's valid but we're not playing with multi-file linking yet, and the LLVM tests are used as hacky sanity tests for single-file linking (the GCC torture tests are much better for this purpose). Another solution would be to use '.extern' to make the intent explicit (don't simple-file link this, there's an unresolved symbol), some assemblers use '.extern' while others ignore it, so we wouldn't really be inventing anything new. Reviewers: sunfish, kripken Subscribers: jfb, llvm-commits, dschuff Differential Revision: http://reviews.llvm.org/D15753 llvm-svn: 256353
*	[Statepoints] Use Indirect operands for spill slots	Philip Reames	2015-12-23	1	-33/+31
\| \| \| \| \| \| \| \| \| \|	Teach the statepoint lowering code to emit Indirect stackmap entries for spill inserted by StatepointLowering (i.e. SelectionDAG), but Direct stackmap entries for in-IR allocas which represent manual stack slots. This is what the docs call for (http://llvm.org/docs/StackMaps.html#stack-map-format), but we've been emitting both as Direct. This was pointed out recently on the mailing list as a bug. It also blocks http://reviews.llvm.org/D15632 which extends the lowering to handle vector-of-pointers since only Indirect references can encode a variable sized slot. To implement this, I introduced a new flag on the StackObject class used to maintian information about stack slots. I original considered (and prototyped in http://reviews.llvm.org/D15632), the idea of using the existing isSpillSlot flag, but end up deciding that was a bit too risky and that the cost of adding a new flag was low. Having the new flag will also allow us - in the future - to emit better comments in verbose assembly which indicate where a particular stack spill around a call comes from. (deopt, gc, regalloc). Differential Revision: http://reviews.llvm.org/D15759 llvm-svn: 256352
*	llvm-dwarfdump: Add support for dumping .dSYM bundles.	Adrian Prantl	2015-12-23	1	-0/+6
\| \| \| \| \| \| \| \|	This replicates the logic of Darwin dwarfdump for manually opening up .dSYM bundles without introducing any new dependencies. <rdar://problem/20491670> llvm-svn: 256350
*	[X86][AVX] Only shuffle the lower half of vectors if the upper half is undefined	Simon Pilgrim	2015-12-23	6	-71/+179
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151. As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops. Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well. Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions). Differential Revision: http://reviews.llvm.org/D15477 llvm-svn: 256332
*	[OperandBundles] Have GlobalsModRef play nice with operand bundles	David Majnemer	2015-12-23	1	-1/+11
\| \| \| \| \| \| \|	A call site's use of a Value might not correspond to an argument operand but to a bundle operand. llvm-svn: 256329
*	[OperandBundles] Have TailCallElim play nice with operand bundles	David Majnemer	2015-12-23	1	-0/+10
\| \| \| \| \| \| \| \| \|	A call site's use of a Value might not correspond to an argument operand but to a bundle operand. This fixes PR25928. llvm-svn: 256328