bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[x86][AVX512] Lowering kunpack intrinsics to LLVM IR	Jina Nahias	2017-12-05	1	-3/+0
\| \| \| \| \| \| \| \| \|	This patch, together with a matching clang patch (https://reviews.llvm.org/D39719), implements the lowering of X86 kunpack intrinsics to IR. Differential Revision: https://reviews.llvm.org/D39720 Change-Id: I4088d9428478f9457f6afddc90bd3d66b3daf0a1 llvm-svn: 319778
*	[x86][icelake]GFNI	Coby Tayree	2017-11-26	1	-1/+21
\| \| \| \| \| \| \| \| \| \|	galois field arithmetic (GF(2^8)) insns: gf2p8affineinvqb gf2p8affineqb gf2p8mulb Differential Revision: https://reviews.llvm.org/D40373 llvm-svn: 318993
*	[X86] Add separate intrinsics for scalar FMA4 instructions.	Craig Topper	2017-11-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits. I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512. I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before. I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics. fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39851 llvm-svn: 318984
*	[X86][SSE] Use (V)PHMINPOSUW for vXi16 SMAX/SMIN/UMAX/UMIN horizontal ↵	Simon Pilgrim	2017-11-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	reductions (PR32841) (V)PHMINPOSUW determines the UMIN element in an v8i16 input, with suitable bit flipping it can also be used for SMAX/SMIN/UMAX cases as well. This patch matches vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions and reduces the input down to a v8i16 vector before calling (V)PHMINPOSUW. A later patch will use this for v16i8 reductions as well (PR32841). Differential Revision: https://reviews.llvm.org/D39729 llvm-svn: 318917
*	[x86][icelake]BITALG	Coby Tayree	2017-11-23	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	2/3 vpshufbitqmb encoding 3/3 vpshufbitqmb intrinsics Differential Revision: https://reviews.llvm.org/D40222 llvm-svn: 318904
*	[x86][icelake]VNNI	Coby Tayree	2017-11-21	1	-0/+26
\| \| \| \| \| \| \| \| \|	Introducing Vector Neural Network Instructions, consisting of: vpdpbusd{s} vpdpwssd{s} Differential Revision: https://reviews.llvm.org/D40208 llvm-svn: 318746
*	[x86][icelake]vbmi2	Coby Tayree	2017-11-21	1	-0/+107
\| \| \| \| \| \| \| \| \| \| \|	introducing vbmi2, consisting of vpcompress{b,w} vpexpand{b,w} vpsh{l,r}d{w,d,q} vpsh{l,r}dv{w,d,q} Differential Revision: https://reviews.llvm.org/D40206 llvm-svn: 318745
*	[X86] test/testn intrinsics lowering to IR. llvm part.	Uriel Korach	2017-11-13	1	-24/+0
\| \| \| \| \| \| \| \| \|	Remove builtins from llvm and add AutoUpgrade support. Also add fast-isel tests for the TEST and TESTN instructions. Differential Revision: https://reviews.llvm.org/D38736 llvm-svn: 318036
*	[x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IR	Jina Nahias	2017-11-13	1	-16/+0
\| \| \| \| \| \| \| \| \|	This patch, together with a matching clang patch (https://reviews.llvm.org/D38672), implements the lowering of X86 shuffle i/f intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38671 Change-Id: I1e7d359a74743e995ec356237a85214ce55d3661 llvm-svn: 318026
*	[X86] Use EVEX encoded VRNDSCALE instructions to implement the legacy round ↵	Craig Topper	2017-11-13	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	intrinsics. The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0. This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space. We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used. I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches. llvm-svn: 318008
*	[X86] Split VRNDSCALE/VREDUCE/VGETMANT/VRANGE ISD nodes into versions with ↵	Craig Topper	2017-11-13	1	-39/+39
\| \| \| \| \| \| \| \|	and without the rounding operand. NFCI I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations. llvm-svn: 318007
*	[X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics.	Craig Topper	2017-11-12	1	-2/+2
\| \| \| \| \| \|	This fixes a bug where we selected packed instructions for scalar intrinsics. llvm-svn: 317999
*	[X86] Remove some no longer needed intrinsic lowering code.	Craig Topper	2017-11-12	1	-1/+1
\| \| \| \|	llvm-svn: 317997
*	[X86] Allow legacy vcvtps2ph intrinsics to select EVEX encoded instructions. ↵	Craig Topper	2017-11-08	1	-0/+2
\| \| \| \| \| \| \| \|	Rely on EVEX->VEX to convert back. Missed store folding opportunities will be fixed in a subsequent commit. llvm-svn: 317661
*	[X86] Add support for using EVEX instructions for the legacy vcvtph2ps ↵	Craig Topper	2017-11-07	1	-4/+6
\| \| \| \| \| \| \| \|	intrinsics. Looks like there's some missed load folding opportunities for i64 loads. llvm-svn: 317544
*	[x86][AVX512] Lowering Broadcastm intrinsics to LLVM IR	Jina Nahias	2017-11-06	1	-6/+0
\| \| \| \| \| \| \| \| \|	This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38684 Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540 llvm-svn: 317458
*	[X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible.	Craig Topper	2017-11-06	1	-0/+8
\| \| \| \|	llvm-svn: 317454
*	[X86] Add scalar FMA ISD nodes without rounding mode. NFC	Craig Topper	2017-11-06	1	-10/+10
\| \| \| \| \| \|	Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453
*	[X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy ↵	Craig Topper	2017-11-04	1	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413
*	[X86][XOP] Merge rotation opcodes with AVX512 equivalents. NFCI.	Simon Pilgrim	2017-09-26	1	-8/+8
\| \| \| \| \| \| \| \|	The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations. Differential Revision: https://reviews.llvm.org/D37949 llvm-svn: 314207
*	[X86] Finishing broadcastf32x2 and broadcasti32x2 intrinsics lowering to IR. ↵	Uriel Korach	2017-09-26	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \|	llvm side. Removing X86 broadcast(f/i)32x2 intrinsics from llvm. Adding autoUpgrade support. Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll. Differential Revision: https://reviews.llvm.org/D38220 llvm-svn: 314195
*	[X86] Make IFMA instructions during isel so we can fold broadcast loads.	Craig Topper	2017-09-24	1	-12/+13
\| \| \| \| \| \|	This required changing the ISD opcode for these instructions to have the commutable operands first and the addend last. This way tablegen can autogenerate the additional patterns for us. llvm-svn: 314083
*	[x86] Lowering Mask Set1 intrinsics to LLVM IR	Jina Nahias	2017-09-19	1	-24/+0
\| \| \| \| \| \| \| \|	This patch, together with a matching clang patch (https://reviews.llvm.org/D37668), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37669 llvm-svn: 313625
*	[X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native ↵	Craig Topper	2017-09-16	1	-4/+0
\| \| \| \| \| \| \| \|	shuffles. I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates. llvm-svn: 313450
*	[X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm)	Uriel Korach	2017-09-13	1	-18/+0
\| \| \| \| \| \| \| \|	This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR. differential revision: https://reviews.llvm.org/D37693. llvm-svn: 313134
*	[X86] Lower _mm[256\|512]_[mask[z]]_avg_epu[8\|16] intrinsics to native llvm IR	Yael Tsafrir	2017-09-12	1	-10/+0
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D37560 llvm-svn: 313013
*	Revert "adding autoUpgrade support to broadcast[f\|i]32x2 intrinsics"	Uriel Korach	2017-09-10	1	-0/+10
\| \| \| \| \| \|	This reverts commit r312879 - An accidental partial commit. llvm-svn: 312880
*	adding autoUpgrade support to broadcast[f\|i]32x2 intrinsics	Uriel Korach	2017-09-10	1	-10/+0
\| \| \| \|	llvm-svn: 312879
*	[X86] Remove X86ISD::FMADD in favor ISD::FMA	Craig Topper	2017-08-23	1	-22/+22
\| \| \| \| \| \| \| \| \| \|	There's no reason to have a target specific node with the same semantics as a target independent opcode. This should simplify D36335 so that it doesn't need to touch X86ISelDAGToDAG.cpp Differential Revision: https://reviews.llvm.org/D36983 llvm-svn: 311568
*	[AVX512] Remove and autoupgrade many of the broadcast intrinsics	Craig Topper	2017-08-11	1	-25/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This autoupgrades most of the broadcast intrinsics. They've been unused in clang for some time. This leaves the 32x2 intrinsics because they are still used in clang. Reviewers: RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36606 llvm-svn: 310725
*	[X86] Add addsub intrinsics to the intrinsic lowering table so we have a ↵	Craig Topper	2017-07-30	1	-0/+4
\| \| \| \| \| \|	single set of isel patterns. llvm-svn: 309502
*	[AVX-512] Remove and autoupgrade the masked integer compare intrinsics	Craig Topper	2017-06-22	1	-24/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These intrinsics aren't used by clang and haven't been for a while. There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today. Reviewers: spatel, RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34389 llvm-svn: 306047
*	[X86] Remove unused value from IntrinsicType enum. NFC	Craig Topper	2017-05-14	1	-1/+1
\| \| \| \|	llvm-svn: 303018
*	[X86][LLVM] Converting __mm{\|256\|512}_movm_epi{8\|16\|32\|64} LLVMIR call into ↵	Michael Zuckerman	2017-04-04	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \|	generic intrinsics. This patch is a part one of two reviews, one for the clang and the other for LLVM. The patch deletes the back-end intrinsics and adds support for them in the auto upgrade. Differential Revision: https://reviews.llvm.org/D31393 llvm-svn: 299432
*	[AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time ↵	Craig Topper	2017-03-19	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of isel Summary: Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust. This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely. Reviewers: zvi, delena, RKSimon, igorb Reviewed By: igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31056 llvm-svn: 298228
*	[SelectionDAG] Add a signed integer absolute ISD node	Simon Pilgrim	2017-03-14	1	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \|	Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering. ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns. At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom. Differential Revision: https://reviews.llvm.org/D29639 llvm-svn: 297780
*	[X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input ↵	Craig Topper	2017-03-13	1	-1/+18
\| \| \| \| \| \| \| \|	source optimizations to break execution dependencies. For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2. llvm-svn: 297652
*	[X86] Lower SSE/AVX cmpps/pd intrinsics directly to X86ISD::CMPP SDNodes.	Craig Topper	2017-03-12	1	-0/+4
\| \| \| \| \| \|	This allows us to remove a duplicate set of patterns. llvm-svn: 297593
*	[AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD ↵	Craig Topper	2017-02-24	1	-12/+12
\| \| \| \| \| \|	opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC llvm-svn: 296094
*	[AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz ↵	Craig Topper	2017-02-24	1	-12/+0
\| \| \| \| \| \| \| \|	intrinsics with select. Clang has been emitting cltz intrinsics for a while now. llvm-svn: 296091
*	[AVX-512] Allow legacy scalar min/max intrinsics to select EVEX instructions ↵	Craig Topper	2017-02-22	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \|	when available This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION. I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others. Differential revision: https://reviews.llvm.org/D30186 llvm-svn: 295810
*	[AVX-512] Remove 128/256-bit masked fp max/min intrinsics. Upgrade them to ↵	Craig Topper	2017-02-18	1	-8/+0
\| \| \| \| \| \|	legacy unmasked intrinsics and select instructions. llvm-svn: 295543
*	[AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked ↵	Craig Topper	2017-02-16	1	-12/+4
\| \| \| \| \| \| \| \|	intrinsics with select instructions. For 512-bit add new unmasked intrinsics. The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO. llvm-svn: 295290
*	[AVX-512] Remove vinsert intrinsics and autoupgrade to native ↵	Craig Topper	2017-01-03	1	-25/+1
\| \| \| \| \| \|	shufflevectors. There are some codegen problems here that I'll try to fix in future commits. llvm-svn: 290864
*	[AVX-512] Remove masked pmuldq and pmuludq intrinsics and autoupgrade them ↵	Craig Topper	2016-12-27	1	-12/+0
\| \| \| \| \| \|	to unmasked intrinsics plus a select. llvm-svn: 290583
*	[AVX-512] Add 512-bit unmasked intrinsics for pmuldq and pmuludq so we can ↵	Craig Topper	2016-12-27	1	-0/+2
\| \| \| \| \| \| \| \|	add them to InstCombine with the 128 and 256 bit versions. The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded. llvm-svn: 290573
*	Added a template for building target specific memory node in DAG.	Elena Demikhovsky	2016-12-21	1	-0/+73
\| \| \| \| \| \| \| \| \| \|	I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp. There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store. In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern. Differential Revision: https://reviews.llvm.org/D27899 llvm-svn: 290250
*	[X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for ↵	Craig Topper	2016-12-11	1	-4/+2
\| \| \| \| \| \|	being able to constant fold them in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289350
*	[X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being ↵	Craig Topper	2016-12-10	1	-2/+1
\| \| \| \| \| \|	able to constant fold it in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289344
*	[AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a ↵	Craig Topper	2016-12-10	1	-8/+0
\| \| \| \| \| \|	select around the unmasked avx1 intrinsics. llvm-svn: 289340