bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Reorder X86any* PatFrags to put the strict node first so that chain ↵	Craig Topper	2020-01-03	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	property will be inferred for the instruction by the tablegen backend. Also use X86any_vfpround instead of X86vfpround in some instruction definitions so the strict version can be used to infer the chain property. Without these changes we don't propagate strict FP chain through isel for some instructions.
*	add strict float for round operation	Liu, Chen3	2020-01-01	1	-0/+6
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D72026
*	add custom operation for strict fpextend/fpround	Liu, Chen3	2019-12-27	1	-0/+21
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D71892
*	[X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend	Wang, Pengfei	2019-12-26	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71871
*	[X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.	Craig Topper	2019-12-24	1	-4/+21
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D71850
*	Enable STRICT_FP_TO_SINT/UINT on X86 backend	Liu, Chen3	2019-12-19	1	-2/+2
\| \| \| \| \| \|	This patch is mainly for custom lowering the vector operation. Differential Revision: https://reviews.llvm.org/D71592
*	[X86] Add strict fma support	Wang, Pengfei	2019-12-18	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add strict fma support Reviewers: craig.topper, RKSimon, LiuChen3 Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71604
*	[FPEnv][X86] Constrained FCmp intrinsics enabling on X86	Wang, Pengfei	2019-12-11	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision. Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70582
*	[Codegen][ARM] Add addressing modes from masked loads and stores	David Green	2019-11-26	1	-13/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176
*	[X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load ↵	Craig Topper	2019-10-01	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349
*	[X86] Use xorps to create fp128 +0.0 constants.	Craig Topper	2019-09-09	1	-0/+4
\| \| \| \| \| \|	This matches what we do for f32/f64. gcc also does this for fp128. llvm-svn: 371357
*	[X86] Separate the memory size of vzext_load/vextract_store from the element ↵	Craig Topper	2019-07-15	1	-8/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	size of the result type. Use them improve the codegen of v2f32 loads/stores with sse1 only. Summary: SSE1 only supports v4f32. But does have instructions like movlps/movhps that load/store 64-bits of memory. This patch breaks the connection between the node VT of the vzext_load/vextract_store patterns and the memory VT. Enabling a v4f32 node with a 64-bit memory VT. I've used i64 as the memory VT here. I've written the PatFrag predicate to just check the store size not the specific VT. I think the VT will only matter for CSE purposes. We could use v2f32, but if we want to start using these operations in more places a simple integer type might make the most sense. I'd like to maybe use this same thing for SSE2 and later as well, but that will need more work to be supported by EltsFromConsecutiveLoads to avoid regressing lit tests. I'd maybe also like to combine bitcasts with these load/stores nodes now that the types are disconnected. And I'd also like to consider canonicalizing (scalar_to_vector + load) to vzext_load. If you want I can split the mechanical tablegen stuff where I added the 32/64 off from the sse1 change. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64528 llvm-svn: 366034
*	[X86] Use v4i32 vzloads instead of v2i64 for vpmovzx/vpmovsx patterns where ↵	Craig Topper	2019-07-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	only 32-bits are loaded. v2i64 vzload defines a 64-bit memory access. It doesn't look like we have any coverage for this either way. Also remove some vzload usages where the instruction loads only 16-bits. llvm-svn: 364851
*	[X86] Remove isel patterns that look for (vzext_movl (scalar_to_vector (load)))	Craig Topper	2019-06-25	1	-7/+0
\| \| \| \| \| \| \| \|	I believe these all get canonicalized to vzext_movl. The only case where that wasn't true was when the load was loadi32 and the load was an extload aligned to 32 bits. But that was fixed in r364207. Differential Revision: https://reviews.llvm.org/D63701 llvm-svn: 364337
*	[X86][SelectionDAG] Cleanup and simplify masked_load/masked_store in ↵	Craig Topper	2019-06-23	1	-43/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tablegen. Use more precise PatFrags for scalar masked load/store. Rename masked_load/masked_store to masked_ld/masked_st to discourage their direct use. We need to check truncating/extending and compressing/expanding before using them. This revealed that our scalar masked load/store patterns were misusing these. With those out of the way, renamed masked_load_unaligned and masked_store_unaligned to remove the "_unaligned". We didn't check the alignment anyway so the name was somewhat misleading. Make the aligned versions inherit from masked_load/store instead from a separate identical version. Merge the 3 different alignments PatFrags into a single version that uses the VT from the SDNode to determine the size that the alignment needs to match. llvm-svn: 364150
*	[X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into ↵	Craig Topper	2019-06-08	1	-1/+1
\| \| \| \| \| \| \| \|	X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns. Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table. llvm-svn: 362894
*	[X86] Add VP2INTERSECT instructions	Pengfei Wang	2019-05-31	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Support Intel AVX512 VP2INTERSECT instructions in llvm Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62366 llvm-svn: 362188
*	[X86] Remove result type constraints from the ↵	Craig Topper	2019-05-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	extloadv2f32/extloadv4f32/extloadv8f32 PatFrags. NFC The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags. The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64. llvm-svn: 362175
*	[X86] Explcitly disable VEXTRACT instruction matching for an immediate of 0. ↵	Craig Topper	2019-05-22	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove a bunch of isel patterns that become unnecessary. We effectively had a second set of isel patterns that tried to use a regular store instruction and an extract_subreg instruction. Or a masked move and an extract_subreg. These patterns were intended to override the matching of VEXTRACT instructions by taking advantage of the priority of the explicit immediate 0 for the index. This patch instaed just disables the immediate 0 matchin the VEXTRACT patterns. This each of the component pieces of the larger patterns will match by themselves. This found a bug of sorts were we didn't use 128-bit store for 512->128 extract on KNL. Its unclear what the right thing here should be. Using the vextract avoids constraining the register allocator to use xmm0-15. But it always results in a longer encoding if the register allocator ends up choosing xmm0-15 anyway. llvm-svn: 361431
*	[X86] Strengthen type constraints on some specialized X86 ISD opcodes that ↵	Craig Topper	2019-05-15	1	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \|	don't have any flexibility. NFC These particular instructions only operate on 128-bit vectors and have no wider equivalents. And the element size is always known. One could argue that MOVSS/MOVSD could be merged, but that's probably disruptive to code in X86ISelLowering and probably low value. llvm-svn: 360815
*	Enable AVX512_BF16 instructions, which are supported for BFLOAT16 in Cooper Lake	Luo, Yuanke	2019-05-06	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. VCVTNE2PS2BF16: Convert Two Packed Single Data to One Packed BF16 Data. VCVTNEPS2BF16: Convert Packed Single Data to Packed BF16 Data. VDPBF16PS: Dot Product of BF16 Pairs Accumulated into Packed Single Precision. For more details about BF16 isa, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Author: LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, RKSimon, spatel Reviewed By: craig.topper Subscribers: kristina, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60550 llvm-svn: 360017
*	[X86] Use MOVQ for i64 atomic_stores when SSE2 is enabled	Craig Topper	2019-04-27	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If we have SSE2 we can use a MOVQ to store 64-bits and avoid falling back to a cmpxchg8b loop. If its a seq_cst store we need to insert an mfence after the store. Reviewers: spatel, RKSimon, reames, jfb, efriedma Reviewed By: RKSimon Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60546 llvm-svn: 359368
*	[X86] Derive ssmem and sdmem from X86MemOperand. NFCI	Craig Topper	2019-04-09	1	-12/+2
\| \| \| \| \| \|	This changes the operand type from v4f32/v2f64 to iPTR which seems more correct. But that doesn't seem to do anything other than change the comments in X86GenDAGISel.inc. Probably because we use a ComplexPattern to do the matching so there's no autogenerated code to change. llvm-svn: 357959
*	[X86] Remove a couple unused SDNodeXForms. NFC	Craig Topper	2019-03-25	1	-11/+0
\| \| \| \|	llvm-svn: 356867
*	[X86] Merge printf32mem/printi32mem into a single printdwordmem. Do the same ↵	Craig Topper	2019-03-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	for all other printing functions. The only thing the print methods currently need to know is the string to print for the memory size in intel syntax. This patch merges the functions based on this string. If we ever need something else in the future, its easy to split them back out. This reduces the number of cases in the assembly printers. It shrinks the intel printer to only use 7 bytes per instruction instead of 8. llvm-svn: 356352
*	[X86] Add SCALAR_SINT_TO_FP/SCALAR_UINT_TO_FP ISD opcodes without rounding mode.	Craig Topper	2019-03-11	1	-3/+4
\| \| \| \| \| \|	After this we no longer need to match FROUND_CURRENT or FROUND_NO_EXC during isel so I remove those. llvm-svn: 355807
*	[X86] Split SCALEF(S) ISD opcodes into a version without rounding mode.	Craig Topper	2019-03-11	1	-2/+4
\| \| \| \|	llvm-svn: 355806
*	[X86] Split RCP28/RSQRT/GETEXP/EXP2 ISD opcodes into SAE and current ↵	Craig Topper	2019-03-11	1	-7/+14
\| \| \| \| \| \|	direction nodes. Remove rounding mode operand. llvm-svn: 355805
*	[X86] Rename _RND versions of RANGE/REDUCE/GETMANT/RDNSCALE ISD opcodes to ↵	Craig Topper	2019-03-11	1	-17/+8
\| \| \| \| \| \| \| \|	_SAE. Remove SAE operand. No need to explicitly store it and match it during isel. llvm-svn: 355804
*	[X86] Rename X86ISD::CVTPH2PS_RND to CVTPH2PS_SAE. Remove SAE operand.	Craig Topper	2019-03-11	1	-4/+3
\| \| \| \|	llvm-svn: 355803
*	[X86] Rename the CVTT*_RND ISD nodes to _SAE and remove the SAE operand. ↵	Craig Topper	2019-03-11	1	-13/+22
\| \| \| \| \| \| \| \|	Split VFPROUNDS_RND/VFPEXT(S)_RND into versions without rounding operand. For VFPEXT(S) we only need current rounding mode and an SAE version. Neither need extra operand. llvm-svn: 355802
*	[X86] Rename X86ISD::CMPM_RND and X86ISD::FSETCCM_RND to _SAE instead of ↵	Craig Topper	2019-03-11	1	-13/+2
\| \| \| \| \| \| \| \|	_RND. Remove rounding operand. The operand could only be the SAE encoding so no need to include it. llvm-svn: 355801
*	[X86] Split the VFIXUPIMM/VFIXUPIMMS nodes into a current rounding mode and ↵	Craig Topper	2019-03-11	1	-9/+10
\| \| \| \| \| \| \| \|	SAE ISD opcode. Remove matching of FROUND_CURRENT and FROUND_NO_EXC for these nodes from isel table. llvm-svn: 355800
*	[X86] Begin removing matching of FROUND_CURRENT and FROUND_NO_EXC from isel ↵	Craig Topper	2019-03-11	1	-4/+9
\| \| \| \| \| \| \| \| \| \|	tables. Instead I plan to have dedicated nodes for FROUND_CURRENT and FROUND_NO_EXC. This patch starts with FADDS/FSUBS/FMULS/FDIVS/FMAXS/FMINS/FSQRTS. llvm-svn: 355799
*	[X86] Use X86ISD::VFPROUND instead of ISD::FP_ROUND for 256 and 512 bit ↵	Craig Topper	2019-01-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cvtpd2ps intrinsics. Summary: Use X86ISD::VFPROUND in the instruction isel patterns. Add new patterns for ISD::FP_ROUND to maintain support for fptrunc in IR. In the process I found a couple duplicate isel patterns which I also deleted in this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D56991 llvm-svn: 351762
*	[X86] Change avx512 COMPRESS and EXPAND lowering to use a single masked node ↵	Craig Topper	2019-01-21	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of expand/compress+select. Summary: For compress, a select node doesn't semantically reflect the behavior of the instruction. The mask would have holes in it, but the resulting write is to contiguous elements at the bottom of the vector. Furthermore, as far as the compressing and expanding is concerned the behavior is depended on the mask. You can't just have an expand/compress node that only reads the input vector. That node would have no meaning by itself. This all only works because we pattern match the compress/expand+select back to the instruction. But conceivably an optimization of the select could break the pattern and leave something meaningless. This patch modifies the expand and compress node to take the mask and passthru as additional inputs and gets rid of the select all together. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57002 llvm-svn: 351761
*	[X86] Add masked MCVTSI2P/MCVTUI2P ISD opcodes to model the cvtqq2ps ↵	Craig Topper	2019-01-19	1	-0/+10
\| \| \| \| \| \| \| \|	cvtuqq2ps nodes that produce less than 128-bits of results. These nodes zero the upper half of the result and can't be represented with vselect. llvm-svn: 351666
*	Update the file headers across all of the LLVM projects in the monorepo	Chandler Carruth	2019-01-19	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
*	[X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlv	Craig Topper	2019-01-16	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits. This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts. We do have an existing InstCombine optimization to use shl/lshr when the shift amounts are all constant and in bounds. That should prevent some loss of constant folding from this change. Patch by zhutianyang and Craig Topper Differential Revision: https://reviews.llvm.org/D56695 llvm-svn: 351381
*	[X86] Use X86ISD::BLENDV for blendv intrinsics. Replace vselect with blendv ↵	Craig Topper	2019-01-16	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	just before isel table lookup. Remove vselect isel patterns. This cleans up the duplication we have with both intrinsic isel patterns and vselect isel patterns. This should also allow the intrinsics to get SimplifyDemandedBits support for the condition. I've switched the canonical pattern in isel to use the X86ISD::BLENDV node instead of VSELECT. Since it always seemed weird to move from BLENDV with its relaxed rules on condition bits to VSELECT which has strict rules about all bits of the condition element being the same. Its more correct to go from VSELECT to BLENDV. Differential Revision: https://reviews.llvm.org/D56771 llvm-svn: 351380
*	[X86] Update type profile for DBPSADBW to indicate the immediate is an i8 ↵	Craig Topper	2019-01-14	1	-1/+1
\| \| \| \| \| \| \| \|	not just any int. Removes some type checks from X86GenDAGISel.inc llvm-svn: 351033
*	[X86] Add more ISD nodes to handle masked versions of ↵	Craig Topper	2019-01-13	1	-0/+13
\| \| \| \| \| \| \| \| \| \|	VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877 llvm-svn: 351018
*	[X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which ↵	Craig Topper	2019-01-13	1	-0/+8
\| \| \| \| \| \| \| \| \| \|	only produces 2 result elements and zeroes the upper elements. We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this. Fixes another case from PR34877. llvm-svn: 351017
*	[X86] Add ISD node for masked version of CVTPS2PH.	Craig Topper	2019-01-12	1	-0/+7
\| \| \| \| \| \| \| \| \| \|	The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do. This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there. Fixes another instruction for PR34877. llvm-svn: 350994
*	[X86] Add ISD nodes for masked truncate so we can properly represent when ↵	Craig Topper	2019-01-12	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \|	the output has more elements than the input due to needing to be 128 bits. We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask. This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that. This fixes some of PR34877 llvm-svn: 350985
*	[X86] Require second operand of X86vshiftuniform to be an integer. NFC	Craig Topper	2019-01-05	1	-1/+1
\| \| \| \| \| \|	We don't need to require the first operand to be an integer because we already said it was the same type as the result which we also constrained to an integer. llvm-svn: 350455
*	[X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS	Nikita Popov	2018-12-18	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for @llvm.sadd.sat() and @llvm.ssub.sat() intrinsics. This is a followup to D55787 and part of PR40056. Differential Revision: https://reviews.llvm.org/D55833 llvm-svn: 349520
*	[X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS	Nikita Popov	2018-12-18	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes UADDSAT and USUBSAT. As a side-effect, this also makes codegen for the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable. This only replaces use in the X86 backend, and does not move any of the ADDUS/SUBUS X86 specific combines into generic codegen. Differential Revision: https://reviews.llvm.org/D55787 llvm-svn: 349481
*	[SelectionDAG][X86] Relax restriction on the width of an input to ↵	Craig Topper	2018-11-13	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	_EXTEND_VECTOR_INREG. Use them and regular _EXTEND to replace the X86 specific VSEXT/VZEXT opcodes Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output. This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes. X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code. I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements. The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits. Differential Revision: https://reviews.llvm.org/D54346 llvm-svn: 346784
*	[X86] Stop promoting vector and/or/xor/andn to vXi64.	Craig Topper	2018-10-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation. This patch removes the promotion and adds isel patterns instead. The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward. Differential Revision: https://reviews.llvm.org/D53268 llvm-svn: 345408