bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[WinEH] Let cleanups post-dominated by unreachable get executed	David Majnemer	2016-01-22	4	-4/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cleanups in C++ are a little weird. They are only guaranteed to be reliably executed if, and only if, there is a viable catch handler which can handle the exception. This means that reachability of a cleanup is lexically determined by it being nested with a try-block which unwinds to a catch. It is cannot be reasoned about by examining the control flow edges leaving a cleanup. Usually this is not a problem. It becomes a problem when there are no edges out of a cleanup because we believed that code post-dominated by the cleanup is dead. In LLVM's case, this code is what informs the personality routine about the presence of a suitable catch handler. However, the lack of edges to that catch handler makes the handler become unreachable which causes us to remove it. By removing the handler, the cleanup becomes unreachable. Instead, inject a catch-all handler with every cleanup that has no unwind edges. This will allow us to properly unwind the stack. This fixes PR25997. llvm-svn: 258580
*	fixed to test features, not CPU models	Sanjay Patel	2016-01-22	1	-73/+73
\| \| \| \|	llvm-svn: 258568
*	AMDGPU: Add new name for barrier intrinsic	Matt Arsenault	2016-01-22	1	-0/+28
\| \| \| \|	llvm-svn: 258558
*	AMDGPU: Rename intrinsics to use amdgcn prefix	Matt Arsenault	2016-01-22	19	-212/+308
\| \| \| \| \| \| \| \| \| \| \|	The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557
*	[AArch64] Cleanup ccmp test check labels. NFC.	Ahmed Bougacha	2016-01-22	1	-10/+10
\| \| \| \|	llvm-svn: 258541
*	AMDGPU: Fix crash with invariant markers	Matt Arsenault	2016-01-22	1	-0/+25
\| \| \| \| \| \| \| \|	The promote alloca pass didn't handle these intrinsics and crashed. These intrinsics should accept any address space, but for now just erase them to avoid breaking. llvm-svn: 258537
*	[NVPTX] expand mul_lohi to mul_lo and mul_hi	Jingyue Wu	2016-01-22	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes PR26186. Reviewers: grosser, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D16479 llvm-svn: 258536
*	[AArch64] Lower 2-CC FCCMPs (one/ueq) using AND'ed CCs.	Ahmed Bougacha	2016-01-22	1	-18/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current behavior is incorrect, as the two CCs returned by changeFPCCToAArch64CC, intended to be OR'ed, are instead used in an AND ccmp chain. Consider: define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) { %cc1 = fcmp one float %a, %b %cc2 = fcmp olt float %c, %d %and = and i1 %cc1, %cc2 %r = select i1 %and, i32 %e, i32 %f ret i32 %r } Assuming (%a < %b) and (%c < %d); we used to do: fcmp s0, s1 # nzcv <- 1000 orr w8, wzr, #0x1 # w8 <- 1 csel w9, w8, wzr, mi # w9 <- 1 csel w8, w8, w9, gt # w8 <- 1 fcmp s2, s3 # nzcv <- 1000 cset w9, mi # w9 <- 1 tst w8, w9 # (w8 & w9) == 1, so: nzcv <- 0000 csel w0, w0, w1, ne # w0 <- w0 We now do: fcmp s2, s3 # nzcv <- 1000 fccmp s0, s1, #0, mi # mi, so: nzcv <- 1000 fccmp s0, s1, #8, le # !le, so: nzcv <- 1000 csel w0, w0, w1, pl # !pl, so: w0 <- w1 In other words, we transformed: (c < d) && ((a < b) \|\| (a > b)) into: (c < d) && (a u>= b) && (a u<= b) whereas, per De Morgan's, we wanted: (c < d) && !((a u>= b) && (a u<= b)) Note that this problem doesn't occur in the test-suite. changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt. We can't represent that in the fccmp chain; it can't express arbitrary OR sequences, as one comment explains: In general we can create code for arbitrary "... (and (and A B) C)" sequences. We can also implement some "or" expressions, because "(or A B)" is equivalent to "not (and (not A) (not B))" and we can implement some negation operations. [...] However there is no way to negate the result of a partial sequence. Instead, introduce changeFPCCToANDAArch64CC, which produces the conjunct cond codes: - (a one b) == ((a olt b) \|\| (a ogt b)) == ((a ord b) && (a une b)) - (a ueq b) == ((a uno b) \|\| (a oeq b)) == ((a ule b) && (a uge b)) Note that, at first, one might think that, when PushNegate is true, we should use the disjunct CCs, in effect doing: (a \|\| b) = !(!a && !(b)) = !(!a && !(b1 \|\| b2)) <- changeFPCCToAArch64CC(b, b1, b2) = !(!a && !b1 && !b2) However, we can take advantage of the fact that the CC is already negated, which lets us avoid special-casing PushNegate and doing the simpler to reason about: (a \|\| b) = !(!a && (!b)) = !(!a && (b1 && b2)) <- changeFPCCToANDAArch64CC(!b, b1, b2) = !(!a && b1 && b2) This makes both emitConditionalCompare cases behave identically, and produces correct ccmp sequences for the 2-CC fcmps. llvm-svn: 258533
*	[Hexagon] Use general purpose registers to spill pred/mod registers into	Krzysztof Parzyszek	2016-01-22	1	-0/+42
\| \| \| \| \| \|	Patch by Tobias Edler Von Koch. llvm-svn: 258527
*	AMDGPU: Rename some r600 intrinsics to use correct TargetPrefix	Matt Arsenault	2016-01-22	2	-9/+9
\| \| \| \| \| \|	These ones aren't directly emitted by mesa and inserted by a pass. llvm-svn: 258523
*	AMDGPU: Remove AMDGPU.fract intrinsic	Matt Arsenault	2016-01-22	4	-71/+87
\| \| \| \| \| \| \|	Mesa doesn't use this, and this is pattern matched already from fsub x, (ffloor x) llvm-svn: 258513
*	[SelectionDAG] Fold more offsets into GlobalAddresses	Dan Gohman	2016-01-22	5	-11/+725
\| \| \| \| \| \| \| \|	This reapplies r258296 and r258366, and also fixes an existing bug in SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the offset in a GlobalAddressSDNode, which is uncovered by those patches. llvm-svn: 258482
*	Do not lower VSETCC if operand is an f16 vector	Pirama Arumuga Nainar	2016-01-22	2	-0/+358
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SETCC with f16 vectors has OperationAction set to Expand but still gets lowered to FCM* intrinsics based on its result type. This patch skips lowering of VSETCC if the operand is an f16 vector. v4 and v8 tests included. Reviewers: ab, jmolloy Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D15361 llvm-svn: 258471
*	Revert "[SelectionDAG] Fold more offsets into GlobalAddresses"	Reid Kleckner	2016-01-22	4	-701/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts r258296 and the follow up r258366. With this change, we miscompiled the following program on Windows: #include <string> #include <iostream> static const char kData[] = "asdf jkl;"; int main() { std::string s(kData + 3, sizeof(kData) - 3); std::cout << s << '\n'; } llvm-svn: 258465
*	Avoid unnecessary stack realignment in musttail thunks with SSE2 enabled	Reid Kleckner	2016-01-21	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \|	The X86 musttail implementation finds register parameters to forward by running the calling convention algorithm until a non-register location is returned. However, assigning a vector memory location has the side effect of increasing the function's stack alignment. We shouldn't increase the stack alignment when we are only looking for register parameters, so this change conditionalizes it. llvm-svn: 258442
*	[X86][SSE] Improve i16 splatting shuffles	Simon Pilgrim	2016-01-21	9	-212/+169
\| \| \| \| \| \| \| \| \| \| \| \|	Better handling of the annoying pshuflw/pshufhw ops which only shuffle lower/upper halves of a vector. Added vXi16 unary shuffle support for cases where i16 elements (from the same half of the source) are being splatted to the whole of one of the halves. This avoids the general lowering case which must shuffle the 32-bit elements first - meaning that we used to end up with unnecessary duplicate pshuflw/pshufhw shuffles. Note this has the side effect of a lot of SSSE3 test cases no longer needing to use PSHUFB, as it falls below the 3 op combine threshold for when PSHUFB is typically worth it. I've raised PR26183 to discuss if the threshold should be changed and whether we need to make it more specific to the target CPU. Differential Revision: http://reviews.llvm.org/D14901 llvm-svn: 258440
*	AVX512: Masked move intrinsic implementation.	Igor Breger	2016-01-21	10	-23/+324
\| \| \| \| \| \| \| \|	Implemented intrinsic for the follow instructions (reg move) : VMOVDQU8/16, VMOVDQA32/64, VMOVAPS/PD. Differential Revision: http://reviews.llvm.org/D16316 llvm-svn: 258398
*	[AVX512] Adding VPERMT2B and VPERMI2B Intrinsics	Michael Zuckerman	2016-01-21	2	-0/+170
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D16398 llvm-svn: 258397
*	[SelectionDAG] Fix constant offset folding to avoid commuting ↵	Dan Gohman	2016-01-20	1	-0/+18
\| \| \| \| \| \| \| \| \|	non-commutative operators. This fixes a miscompile in MultiSource/Benchmarks/MiBench/consumer-lame introduced in r258296. llvm-svn: 258366
*	AMDGPU/SI: Promote i1 SETCC operations	Tom Stellard	2016-01-20	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: While working on uniform branching, I've hit a few cases where we emit i1 SETCC operations. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D16233 llvm-svn: 258352
*	AMDGPU: Remove AMDGPU.trunc intrinsic	Matt Arsenault	2016-01-20	1	-17/+0
\| \| \| \|	llvm-svn: 258348
*	AMDGPU: Remove AMDIL.fraction intrinsic	Matt Arsenault	2016-01-20	2	-19/+3
\| \| \| \|	llvm-svn: 258347
*	AMDGPU: Remove AMDIL.round.nearest intrinsic	Matt Arsenault	2016-01-20	1	-12/+0
\| \| \| \|	llvm-svn: 258346
*	AMDGPU: Remove abs intrinsic	Matt Arsenault	2016-01-20	1	-47/+0
\| \| \| \|	llvm-svn: 258343
*	AMDGPU: Remove min/max intrinsics	Matt Arsenault	2016-01-20	5	-163/+2
\| \| \| \| \| \|	This removes support for mesa 11.0.x llvm-svn: 258342
*	[AVX512] Adding VPERMB Intrinsics	Michael Zuckerman	2016-01-20	2	-0/+63
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D16296 llvm-svn: 258316
*	Proper handling of diamond-like cases in if-conversion	Krzysztof Parzyszek	2016-01-20	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \|	If converter was somewhat careless about "diamond" cases, where there was no join block, or in other words, where the true/false blocks did not have analyzable branches. In such cases, it was possible for it to remove (needed) branches, resulting in a loss of entire basic blocks. Differential Revision: http://reviews.llvm.org/D16156 llvm-svn: 258310
*	AVX512: Store (MOVNTPD, MOVNTPS, MOVNTDQ) using non-temporal hint intrinsic ↵	Igor Breger	2016-01-20	1	-0/+32
\| \| \| \| \| \| \| \|	implementation. Differential Revision: http://reviews.llvm.org/D16350 llvm-svn: 258309
*	[SelectionDAG] Fold more offsets into GlobalAddresses	Dan Gohman	2016-01-20	3	-11/+683
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SelectionDAG previously missed opportunities to fold constants into GlobalAddresses in several areas. For example, given `(add (add GA, c1), y)`, it would often reassociate to `(add (add GA, y), c1)`, missing the opportunity to create `(add GA+c, y)`. This isn't often visible on targets such as X86 which effectively reassociate adds in their complex address-mode folding logic, however it is currently visible on WebAssembly since it currently has very simple address mode folding code that doesn't reassociate anything. This patch fixes this by making SelectionDAG fold offsets into GlobalAddresses at the same times that it folds constants together, so that it doesn't miss any opportunities to perform such folding. Differential Revision: http://reviews.llvm.org/D16090 llvm-svn: 258296
*	[WebAssembly] Tighten up some regexes in some tests.	Dan Gohman	2016-01-20	4	-80/+80
\| \| \| \|	llvm-svn: 258295
*	[WebAssembly] Don't stackify stores across instructions with side effects.	Dan Gohman	2016-01-20	2	-12/+36
\| \| \| \|	llvm-svn: 258285
*	AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputs	Tom Stellard	2016-01-20	3	-2/+69
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D15035 llvm-svn: 258256
*	[MachineSink] Don't break ImplicitNulls	Sanjoy Das	2016-01-20	1	-0/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This teaches MachineSink to not sink instructions that might break the implicit null check optimization that runs later. This should not affect frontends that do not use implicit null checks. Reviewers: aadg, reames, hfinkel, atrick Subscribers: majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D14632 llvm-svn: 258254
*	[X86] Do not run shrink-wrapping on function with split-stack attribute or HiPE	Quentin Colombet	2016-01-19	1	-6/+77
\| \| \| \| \| \| \| \| \| \| \|	calling convention. The implementation of the related callbacks in the x86 backend for such functions are not ready to deal with a prologue block that is not the entry block of the function. This fixes PR26107, but the longer term solution would be to fix those callbacks. llvm-svn: 258221
*	[X86][SSE] Add VZEXT_MOVL target shuffle decoding.	Simon Pilgrim	2016-01-19	2	-12/+4
\| \| \| \| \| \|	Add support for decoding VZEXT_MOVL target shuffle masks, allowing it to be used as a source in target shuffle combines. llvm-svn: 258215
*	[X86][SSE] Add INSERTPS target shuffle combines.	Simon Pilgrim	2016-01-19	3	-28/+8
\| \| \| \| \| \| \| \| \| \|	As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles. This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements. Differential Revision: http://reviews.llvm.org/D16072 llvm-svn: 258205
*	[WebAssembly] Rematerialize constants rather than hold them live in registers.	Dan Gohman	2016-01-19	8	-60/+81
\| \| \| \| \| \| \| \| \|	Teach the register stackifier to rematerialize constants that have multiple uses instead of leaving them in registers. In the WebAssembly encoding, it's the same code size to materialize most constants as it is to read a value from a register. llvm-svn: 258142
*	[WebAssembly] Change a FIXME to a TODO in a comment.	Dan Gohman	2016-01-19	1	-1/+1
\| \| \| \|	llvm-svn: 258139
*	[WebAssembly] Re-enable this test, now that interactions with the coalescer ↵	Dan Gohman	2016-01-19	1	-3/+8
\| \| \| \| \| \|	are resolved. llvm-svn: 258138
*	[WebAssembly] Re-enable loop idiom recognition for memcpy et al.	Dan Gohman	2016-01-19	1	-53/+0
\| \| \| \|	llvm-svn: 258125
*	[X86][AVX512]fix dag & add intrinsics for fixupimm	Asaf Badouh	2016-01-19	2	-0/+346
\| \| \| \| \| \| \| \|	cover all width and types (pd/ps/sd/ss) of fixupimm instruction and inrtinsics Differential Revision: http://reviews.llvm.org/D16313 llvm-svn: 258124
*	AMDGPU: Reduce 64-bit SRAs	Matt Arsenault	2016-01-18	3	-20/+30
\| \| \| \|	llvm-svn: 258096
*	AMDGPU: Split 64-bit and of constant up	Matt Arsenault	2016-01-18	3	-51/+399
\| \| \| \| \| \| \| \| \| \|	This breaks the tests that were meant for testing 64-bit inline immediates, so move those to shl where they won't be broken up. This should be repeated for the other related bit ops. llvm-svn: 258095
*	[X86][AVX2] Ensure integer execution domain for integer blend tests	Simon Pilgrim	2016-01-18	1	-2/+6
\| \| \| \|	llvm-svn: 258094
*	AMDGPU: Generalize shl combine	Matt Arsenault	2016-01-18	1	-0/+47
\| \| \| \| \| \| \|	Reduce 64-bit shl with constant > 32. We already special cased this for the == 32 case, but this also works for any >= 32 constant. llvm-svn: 258092
*	[X86][SSE] Regenerate vector blend commutation tests	Simon Pilgrim	2016-01-18	2	-46/+48
\| \| \| \|	llvm-svn: 258091
*	AMDGPU: Reduce 64-bit lshr by constant to 32-bit	Matt Arsenault	2016-01-18	2	-2/+64
\| \| \| \| \| \|	64-bit shifts are very slow on some subtargets. llvm-svn: 258090
*	AMDGPU: Cleanup sra test	Matt Arsenault	2016-01-18	1	-163/+206
\| \| \| \|	llvm-svn: 258086
*	[X86][AVX2] Broadcast subvectors	Simon Pilgrim	2016-01-18	7	-18/+137
\| \| \| \| \| \| \| \|	AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly. Differential Revision: http://reviews.llvm.org/D16050 llvm-svn: 258081
*	AVX512: Masked store intrinsic implementation.	Igor Breger	2016-01-18	4	-12/+400
\| \| \| \| \| \| \| \|	Implemented intrinsic for the follow instructions (store) : VMOVDQU8/16/32/64, VMOVDQA32/64, VMOVAPS/PD, VMOVUPS/PD. Differential Revision: http://reviews.llvm.org/D16271 llvm-svn: 258047