bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] test/testn intrinsics lowering to IR. llvm part.	Uriel Korach	2017-11-13	1	-24/+0
\| \| \| \| \| \| \| \| \|	Remove builtins from llvm and add AutoUpgrade support. Also add fast-isel tests for the TEST and TESTN instructions. Differential Revision: https://reviews.llvm.org/D38736 llvm-svn: 318036
*	[x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IR	Jina Nahias	2017-11-13	1	-16/+0
\| \| \| \| \| \| \| \| \|	This patch, together with a matching clang patch (https://reviews.llvm.org/D38672), implements the lowering of X86 shuffle i/f intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38671 Change-Id: I1e7d359a74743e995ec356237a85214ce55d3661 llvm-svn: 318026
*	[X86] Use EVEX encoded VRNDSCALE instructions to implement the legacy round ↵	Craig Topper	2017-11-13	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	intrinsics. The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0. This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space. We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used. I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches. llvm-svn: 318008
*	[X86] Split VRNDSCALE/VREDUCE/VGETMANT/VRANGE ISD nodes into versions with ↵	Craig Topper	2017-11-13	1	-39/+39
\| \| \| \| \| \| \| \|	and without the rounding operand. NFCI I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations. llvm-svn: 318007
*	[X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics.	Craig Topper	2017-11-12	1	-2/+2
\| \| \| \| \| \|	This fixes a bug where we selected packed instructions for scalar intrinsics. llvm-svn: 317999
*	[X86] Remove some no longer needed intrinsic lowering code.	Craig Topper	2017-11-12	1	-1/+1
\| \| \| \|	llvm-svn: 317997
*	[X86] Allow legacy vcvtps2ph intrinsics to select EVEX encoded instructions. ↵	Craig Topper	2017-11-08	1	-0/+2
\| \| \| \| \| \| \| \|	Rely on EVEX->VEX to convert back. Missed store folding opportunities will be fixed in a subsequent commit. llvm-svn: 317661
*	[X86] Add support for using EVEX instructions for the legacy vcvtph2ps ↵	Craig Topper	2017-11-07	1	-4/+6
\| \| \| \| \| \| \| \|	intrinsics. Looks like there's some missed load folding opportunities for i64 loads. llvm-svn: 317544
*	[x86][AVX512] Lowering Broadcastm intrinsics to LLVM IR	Jina Nahias	2017-11-06	1	-6/+0
\| \| \| \| \| \| \| \| \|	This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38684 Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540 llvm-svn: 317458
*	[X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible.	Craig Topper	2017-11-06	1	-0/+8
\| \| \| \|	llvm-svn: 317454
*	[X86] Add scalar FMA ISD nodes without rounding mode. NFC	Craig Topper	2017-11-06	1	-10/+10
\| \| \| \| \| \|	Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453
*	[X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy ↵	Craig Topper	2017-11-04	1	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413
*	[X86][XOP] Merge rotation opcodes with AVX512 equivalents. NFCI.	Simon Pilgrim	2017-09-26	1	-8/+8
\| \| \| \| \| \| \| \|	The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations. Differential Revision: https://reviews.llvm.org/D37949 llvm-svn: 314207
*	[X86] Finishing broadcastf32x2 and broadcasti32x2 intrinsics lowering to IR. ↵	Uriel Korach	2017-09-26	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \|	llvm side. Removing X86 broadcast(f/i)32x2 intrinsics from llvm. Adding autoUpgrade support. Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll. Differential Revision: https://reviews.llvm.org/D38220 llvm-svn: 314195
*	[X86] Make IFMA instructions during isel so we can fold broadcast loads.	Craig Topper	2017-09-24	1	-12/+13
\| \| \| \| \| \|	This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us. llvm-svn: 314083
*	[x86] Lowering Mask Set1 intrinsics to LLVM IR	Jina Nahias	2017-09-19	1	-24/+0
\| \| \| \| \| \| \| \|	This patch, together with a matching clang patch (https://reviews.llvm.org/D37668), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37669 llvm-svn: 313625
*	[X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native ↵	Craig Topper	2017-09-16	1	-4/+0
\| \| \| \| \| \| \| \|	shuffles. I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates. llvm-svn: 313450
*	[X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm)	Uriel Korach	2017-09-13	1	-18/+0
\| \| \| \| \| \| \| \|	This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR. differential revision: https://reviews.llvm.org/D37693. llvm-svn: 313134
*	[X86] Lower _mm[256\|512]_[mask[z]]_avg_epu[8\|16] intrinsics to native llvm IR	Yael Tsafrir	2017-09-12	1	-10/+0
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D37560 llvm-svn: 313013
*	Revert "adding autoUpgrade support to broadcast[f\|i]32x2 intrinsics"	Uriel Korach	2017-09-10	1	-0/+10
\| \| \| \| \| \|	This reverts commit r312879 - An accidental partial commit. llvm-svn: 312880
*	adding autoUpgrade support to broadcast[f\|i]32x2 intrinsics	Uriel Korach	2017-09-10	1	-10/+0
\| \| \| \|	llvm-svn: 312879
*	[X86] Remove X86ISD::FMADD in favor ISD::FMA	Craig Topper	2017-08-23	1	-22/+22
\| \| \| \| \| \| \| \| \| \|	There's no reason to have a target specific node with the same semantics as a target independent opcode. This should simplify D36335 so that it doesn't need to touch X86ISelDAGToDAG.cpp Differential Revision: https://reviews.llvm.org/D36983 llvm-svn: 311568
*	[AVX512] Remove and autoupgrade many of the broadcast intrinsics	Craig Topper	2017-08-11	1	-25/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This autoupgrades most of the broadcast intrinsics. They've been unused in clang for some time. This leaves the 32x2 intrinsics because they are still used in clang. Reviewers: RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36606 llvm-svn: 310725
*	[X86] Add addsub intrinsics to the intrinsic lowering table so we have a ↵	Craig Topper	2017-07-30	1	-0/+4
\| \| \| \| \| \|	single set of isel patterns. llvm-svn: 309502
*	[AVX-512] Remove and autoupgrade the masked integer compare intrinsics	Craig Topper	2017-06-22	1	-24/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These intrinsics aren't used by clang and haven't been for a while. There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today. Reviewers: spatel, RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34389 llvm-svn: 306047
*	[X86] Remove unused value from IntrinsicType enum. NFC	Craig Topper	2017-05-14	1	-1/+1
\| \| \| \|	llvm-svn: 303018
*	[X86][LLVM] Converting __mm{\|256\|512}_movm_epi{8\|16\|32\|64} LLVMIR call into ↵	Michael Zuckerman	2017-04-04	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \|	generic intrinsics. This patch is a part one of two reviews, one for the clang and the other for LLVM. The patch deletes the back-end intrinsics and adds support for them in the auto upgrade. Differential Revision: https://reviews.llvm.org/D31393 llvm-svn: 299432
*	[AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time ↵	Craig Topper	2017-03-19	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of isel Summary: Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust. This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely. Reviewers: zvi, delena, RKSimon, igorb Reviewed By: igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31056 llvm-svn: 298228
*	[SelectionDAG] Add a signed integer absolute ISD node	Simon Pilgrim	2017-03-14	1	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \|	Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780
*	[X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input ↵	Craig Topper	2017-03-13	1	-1/+18
\| \| \| \| \| \| \| \|	source optimizations to break execution dependencies. For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2. llvm-svn: 297652
*	[X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes.	Craig Topper	2017-03-12	1	-0/+4
\| \| \| \| \| \|	This allows us to remove a duplicate set of patterns. llvm-svn: 297593
*	[AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD ↵	Craig Topper	2017-02-24	1	-12/+12
\| \| \| \| \| \|	opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC llvm-svn: 296094
*	[AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz ↵	Craig Topper	2017-02-24	1	-12/+0
\| \| \| \| \| \| \| \|	intrinsics with select. Clang has been emitting cltz intrinsics for a while now. llvm-svn: 296091
*	[AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions ↵	Craig Topper	2017-02-22	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \|	when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810
*	[AVX-512] Remove 128/256-bit masked fp max/min intrinsics. Upgrade them to ↵	Craig Topper	2017-02-18	1	-8/+0
\| \| \| \| \| \|	legacy unmasked intrinsics and select instructions. llvm-svn: 295543
*	[AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked ↵	Craig Topper	2017-02-16	1	-12/+4
\| \| \| \| \| \| \| \|	intrinsics with select instructions. For 512-bit add new unmasked intrinsics. The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO. llvm-svn: 295290
*	[AVX-512] Remove vinsert intrinsics and autoupgrade to native ↵	Craig Topper	2017-01-03	1	-25/+1
\| \| \| \| \| \|	shufflevectors. There are some codegen problems here that I'll try to fix in future commits. llvm-svn: 290864
*	[AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them ↵	Craig Topper	2016-12-27	1	-12/+0
\| \| \| \| \| \|	to unmasked intrinsics plus a select. llvm-svn: 290583
*	[AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can ↵	Craig Topper	2016-12-27	1	-0/+2
\| \| \| \| \| \| \| \|	add them to InstCombine with the 128 and 256 bit versions. The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded. llvm-svn: 290573
*	Added a template for building target specific memory node in DAG.	Elena Demikhovsky	2016-12-21	1	-0/+73
\| \| \| \| \| \| \| \| \| \|	I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp. There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern. Differential Revision: https://reviews.llvm.org/D27899 llvm-svn: 290250
*	[X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for ↵	Craig Topper	2016-12-11	1	-4/+2
\| \| \| \| \| \|	being able to constant fold them in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289350
*	[X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being ↵	Craig Topper	2016-12-10	1	-2/+1
\| \| \| \| \| \|	able to constant fold it in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289344
*	[AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a ↵	Craig Topper	2016-12-10	1	-8/+0
\| \| \| \| \| \|	select around the unmasked avx1 intrinsics. llvm-svn: 289340
*	[X86] Use X86ISD::CVTTP2SI and X86ISD::CVTTP2UI for lowering 128-bit ↵	Craig Topper	2016-12-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements. Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements. Similar things have already been done for other convert intrinsics. llvm-svn: 289316
*	[AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics	Craig Topper	2016-12-09	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection. This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits. We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version. We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that. This fixes PR30913. Reviewers: delena, zvi, v_klochkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27144 llvm-svn: 289190
*	[X86] Generalize CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes. NFCI	Simon Pilgrim	2016-11-24	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions. This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types. Differential Revision: https://reviews.llvm.org/D27072 llvm-svn: 287870
*	[AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native ↵	Craig Topper	2016-11-23	1	-12/+0
\| \| \| \| \| \|	shuffles. llvm-svn: 287744
*	[AVX-512] Replace masked 16-bit element variable shift intrinsics with new ↵	Craig Topper	2016-11-18	1	-9/+9
\| \| \| \| \| \| \| \| \| \|	unmasked versions and selects. The same thing was done to 32-bit and 64-bit element sizes previously. This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics. llvm-svn: 287312
*	[X86][AVX512] Autoupgrade lossless i32/u32 to f64 conversion intrinsics with ↵	Simon Pilgrim	2016-11-16	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \|	generic IR Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen. LLVM counterpart to D26686 Differential Revision: https://reviews.llvm.org/D26736 llvm-svn: 287108
*	[X86][AVX512] Removing llvm x86 intrinsics for _mm_mask_move_{ss\|sd} intrinsics.	Ayman Musa	2016-11-16	1	-4/+0
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D26128 llvm-svn: 287087