bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] Add floating point execution domain to ↵	Craig Topper	2019-11-30	1	-24/+24
\| \| \| \|	comi/ucomi/cvtss2si/cvtsd2si/cvttss2si/cvttsd2si/cvtsi2ss/cvtsi2sd instructions.
*	Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ↵	Craig Topper	2019-10-01	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ops." This seems to be causing some performance regresions that I'm trying to investigate. One thing that stands out is that this transform can increase the live range of the operands of the earlier logic op. This can be bad for register allocation. If there are two logic op inputs we should really combine the one that is closest, but SelectionDAG doesn't have a good way to do that. Maybe we need to do this as a basic block transform in Machine IR. llvm-svn: 373401
*	[X86] Add custom isel logic to match VPTERNLOG from 2 logic ops.	Craig Topper	2019-09-29	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's room from improvement here, but this is a decent starting point. There are a few minor regressions in the vector-rotate tests, where we are now forming a vpternlog from an and before we get a chance to form it for a bitselect that we were matching previously. This results in an AND and an ANDN feeding the vpternlog where previously we just had an AND after the vpternlog. I think we can probably DAG combine the AND with the bitselect to get back to similar codegen. llvm-svn: 373172
*	Recommit r367901 "[X86] Enable ↵	Craig Topper	2019-08-07	1	-35/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183
*	Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."	Mitch Phillips	2019-08-06	1	-60/+35
\| \| \| \| \| \| \| \| \|	This reverts commit 3de33245d2c992c9e0af60372043540b60f3a810. This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information. llvm-svn: 368107
*	[X86] Enable -x86-experimental-vector-widening-legalization by default.	Craig Topper	2019-08-05	1	-35/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 367901
*	[X86] Fix the pattern for merge masked vcvtps2pd.	Craig Topper	2019-06-03	1	-9/+34
\| \| \| \| \| \| \| \|	r362199 fixed it for zero masking, but not zero masking. The load folding in the peephole pass hid the bug. This patch turns off the peephole pass on the relevant test to ensure coverage. llvm-svn: 362440
*	[SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205)	Simon Pilgrim	2019-06-03	1	-10/+18
\| \| \| \| \| \| \| \|	We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction Differential Revision: https://reviews.llvm.org/D62807 llvm-svn: 362397
*	[X86] Remove patterns for X86VSintToFP/X86VUintToFP+loadv4f32 to v2f64.	Craig Topper	2019-05-31	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	These patterns can incorrectly narrow a volatile load from 128-bits to 64-bits. Similar to PR42079. Switch to using (v4i32 (bitcast (v2i64 (scalar_to_vector (loadi64))))) as the load pattern used in the instructions. This probably still has issues in 32-bit mode where loadi64 isn't legal. Maybe we should use VZMOVL for widened loads even when we don't need the upper bits as zeroes? llvm-svn: 362203
*	[X86] Add test cases for failure to use 128-bit masked vcvtdq2pd when load ↵	Craig Topper	2019-05-31	1	-0/+106
\| \| \| \| \| \|	starts as v2i32. llvm-svn: 362202
*	[X86] Remove avx512 isel patterns for fpextend+load. Prefer to only match fp ↵	Craig Topper	2019-05-31	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	extloads instead. DAG combine will usually fold fpextend+load to an fp extload anyway. So the 256 and 512 patterns were probably unnecessary. The 128 bit pattern was special in that it looked for a v4f32 load, but then used it in an instruction that only loads 64-bits. This is bad if the load happens to be volatile. We could probably make the patterns volatile aware, but that's more work for something that's probably rare. The peephole pass might kick in and save us anyway. We might also be able to fix this with some additional DAG combines. This also adds patterns for vselect+extload to enabled masked vcvtps2pd to be used. Previously we looked for the unlikely vselect+fpextend+load. llvm-svn: 362199
*	[X86] Add test to show missed opportunity to use masked vcvtps2pd for ↵	Craig Topper	2019-05-31	1	-0/+24
\| \| \| \| \| \|	vselect+extload. llvm-svn: 362198
*	[X86] Remove the suffix on vcvt[u]si2ss/sd register variants in assembly ↵	Craig Topper	2019-05-06	1	-103/+103
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	printing. We require d/q suffixes on the memory form of these instructions to disambiguate the memory size. We don't require it on the register forms, but need to support parsing both with and without it. Previously we always printed the d/q suffix on the register forms, but it's redundant and inconsistent with gcc and objdump. After this patch we should support the d/q for parsing, but not print it when its unneeded. llvm-svn: 360085
*	[X86][SSE] SimplifyDemandedBitsForTargetNode - PCMPGT(0,X) sign mask	Simon Pilgrim	2019-02-04	1	-4/+0
\| \| \| \| \| \| \| \|	For PCMPGT(0, X) patterns where we only demand the sign bit (e.g. BLENDV or MOVMSK) then we can use X directly. Differential Revision: https://reviews.llvm.org/D57667 llvm-svn: 353051
*	[X86] Add a few more fptosi test cases to demonstrate ↵	Craig Topper	2018-12-12	1	-0/+38
\| \| \| \| \| \| \| \|	-x86-experimental-vector-widening legalization not combining vpacksswb+vpmovdw. We are able to combine vpackuswb+vpmovdw, but we didn't have packsswb+vpmovdw at the time that combine was added. llvm-svn: 348909
*	[X86] Add more tests for -x86-experimental-vector-widening-legalization	Craig Topper	2018-11-13	1	-1124/+0
\| \| \| \| \| \| \| \| \| \|	I'm looking into whether we can make this the default legalization strategy. Adding these tests to help cover the changes that will be necessary. This patch adds copies of some tests with the command line switch enabled. By making copies its easier to compare the two legalization strategies. I've also removed RUN lines from some of these tests that already had -x86-experimental-vector-widening-legalization llvm-svn: 346745
*	[x86] allow vector load narrowing with multi-use values	Sanjay Patel	2018-11-10	1	-20/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595
*	[TargetLowering] Improve vXi64 UINT_TO_FP vXf64 support (P38226)	Simon Pilgrim	2018-10-25	1	-156/+44
\| \| \| \| \| \| \| \| \| \| \| \|	As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well. Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets. The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it. Differential Revision: https://reviews.llvm.org/D53649 llvm-svn: 345256
*	[X86] Add -x86-experimental-vector-widening-legalization run line to ↵	Craig Topper	2018-08-31	1	-0/+1181
\| \| \| \| \| \| \| \|	avx512-cvt.ll This will cover the (v2i32 (setcc v2f32)) case in replaceNodeResults. That code shouldn't be needed at all in this mode. A future patch will skip it. llvm-svn: 341171
*	[X86] Add custom execution domain fixing for 128/256-bit integer logic ↵	Craig Topper	2018-07-15	1	-108/+38
\| \| \| \| \| \| \| \| \| \| \| \|	operations with AVX512F, but not AVX512DQ. AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions. Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences. This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used. llvm-svn: 337137
*	[X86] Use vpmovq2m/vpmovd2m for truncate to vXi1 when possible.	Craig Topper	2018-02-19	1	-208/+459
\| \| \| \| \| \|	Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway. llvm-svn: 325534
*	[X86] Custom legalize (v2i32 (setcc (v2f32))) so that we don't end up with a ↵	Craig Topper	2018-02-10	1	-7/+2
\| \| \| \| \| \| \| \| \| \|	(v4i1 (setcc (v4f32))) Undef VLX, getSetCCResultType returns v2i1/v4i1 for v2f32/v4f32 so default type legalization will end up changing the setcc result type back to vXi1 if it had been extended. The resulting extend gets messed up further by type legalization and is difficult to recombine back to (v4i32 (setcc (v4f32))) after legalization. I went ahead and enabled this for SSE2 and later since its always the result we want and this helps type legalization get there in less steps. llvm-svn: 324822
*	[X86] Extend inputs with elements smaller than i32 to sint_to_fp/uint_to_fp ↵	Craig Topper	2018-02-10	1	-392/+138
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	before type legalization. This prevents extends of masks being introduced during lowering where it become difficult to combine them out. There are a few oddities in here. We sometimes concatenate two k-registers produced by two compares, sign_extend the combined pair, then extract two halves. This worked better previously because the sign_extend wasn't created until after the fp_to_sint was split which led to a split sign_extend being created. We probably also need to custom type legalize (v2i32 (sext v2i1)) via widening. llvm-svn: 324820
*	[X86] Remove some check-prefixes from avx512-cvt.ll to prepare for an ↵	Craig Topper	2018-02-10	1	-259/+325
\| \| \| \| \| \| \| \| \| \|	upcoming patch. The update script sometimes has trouble when there are check-prefixes representing every possible combination of feature flags. I have a patch where the update script was generating something that didn't pass lit. This patch just removes some check-prefixes and expands out some of the checks to workaround this. llvm-svn: 324819
*	[X86] Custom legalize (v2i1 (fp_to_uint/fp_to_sint v2f64)) without AVX512VL.	Craig Topper	2018-02-10	1	-96/+21
\| \| \| \| \| \|	Strangely the code was already present, just the setOperationAction wasn't being called without VLX. llvm-svn: 324806
*	[X86] Legalize zero extends from vXi1 to vXi16/vXi32/vXi64 using a sign ↵	Craig Topper	2018-02-10	1	-78/+154
\| \| \| \| \| \| \| \| \| \|	extend and a shift. This avoids a constant pool load to create 1. The int->float are showing converts to mask and back. We probably need to widen inputs to sint_to_fp/uint_to_fp before type legalization. llvm-svn: 324805
*	[X86] Modify a few tests to not use icmps that are provably false.	Craig Topper	2018-02-06	1	-6/+6
\| \| \| \| \| \| \| \|	These used things like unsigned less than zero, which is always false because there is no unsigned number less than zero. I plan to teach DAG combine to optimize these so need to stop using them. llvm-svn: 324315
*	Followup on Proposal to move MIR physical register namespace to '$' sigil.	Puyan Lotfi	2018-01-31	1	-76/+76
\| \| \| \| \| \| \| \| \| \| \| \|	Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922
*	[X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and ↵	Craig Topper	2018-01-23	1	-64/+30
\| \| \| \| \| \| \| \| \| \|	inserting into a vXi1 vector. The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code. This patch is a little larger than it should be due to differences between the DQI handling between the two today. llvm-svn: 323212
*	[X86] Replace CVT2MASK ISD opcode with PCMPGTM compared to zero.	Craig Topper	2018-01-08	1	-92/+181
\| \| \| \| \| \|	CVT2MASK is just checking the sign bit which can be represented with a comparison with zero. llvm-svn: 321985
*	[X86] Make v2i1 and v4i1 legal types without VLX	Craig Topper	2018-01-07	1	-133/+253
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type. It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway. This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly. We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added. I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all. There's definitely room for improvement with some follow up patches. Reviewers: RKSimon, zvi, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41560 llvm-svn: 321967
*	[SelectionDAG] Teach WidenVecOp_Convert to widen the operation if a widened ↵	Craig Topper	2018-01-02	1	-95/+52
\| \| \| \| \| \|	result type would still be legal. llvm-svn: 321638
*	[X86] Promote vXi1 fp_to_uint/fp_to_sint to vXi32 to avoid scalarization.	Craig Topper	2018-01-01	1	-3416/+105
\| \| \| \|	llvm-svn: 321632
*	[X86] Add test cases for vXi1 fptosi/fptoui.	Craig Topper	2018-01-01	1	-0/+3680
\| \| \| \| \| \|	Currently we do a lot of scalarization in these test cases. llvm-svn: 321631
*	[X86] Fix (v2f64 (s/uint_to_fp (v2i1))) to avoid scalarization without AVX512DQ.	Craig Topper	2017-12-24	1	-30/+18
\| \| \| \| \| \|	Previously we extended v2i1 to v2f64 and then tried to use cvtuqq2pd/cvtqq2pd, but that only works with avx512dq. So we ended up scalarizing it. Now we widen to v4i1 first and extend to v4i32. llvm-svn: 321420
*	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.	Francis Visoiu Mistrih	2017-12-07	1	-33/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022
*	[X86] Use vector widening to support sign extend from i1 when the dest type ↵	Craig Topper	2017-12-05	1	-8/+4
\| \| \| \| \| \| \| \| \| \|	is not 512-bits and vlx is not enabled. Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements. If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate. llvm-svn: 319740
*	[X86] Use vector widening to support zero extend from i1 when the dest type ↵	Craig Topper	2017-12-05	1	-14/+9
\| \| \| \| \| \| \| \| \| \|	is not 512-bits and vlx is not enabled. Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements. If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate. llvm-svn: 319728
*	[CodeGen] Unify MBB reference format in both MIR and debug output	Francis Visoiu Mistrih	2017-12-04	1	-174/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665
*	[X86] Promote fp_to_sint v16f32->v16i16/v16i8 to avoid scalarization.	Craig Topper	2017-11-29	1	-95/+4
\| \| \| \|	llvm-svn: 319266
*	[X86] Add test cases for fptosi v16f32->v16i8/v16i16 to show scalarization.	Craig Topper	2017-11-29	1	-0/+112
\| \| \| \|	llvm-svn: 319261
*	[X86] Mark ISD::FP_TO_UINT v16i8/v16i16 as Promote under AVX512 instead of ↵	Craig Topper	2017-11-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	legal. Fix infinite loop in op legalization when promotion requires 2 steps. Previously we had an isel pattern to add the truncate. Instead use Promote to add the truncate to the DAG before isel. The Promote legalization code had to be updated to prevent an infinite loop if promotion took multiple steps because it wasn't remembering the previously tried value. llvm-svn: 319259
*	[CodeGen] Print register names in lowercase in both MIR and debug output	Francis Visoiu Mistrih	2017-11-28	1	-34/+34
\| \| \| \| \| \| \| \| \| \| \|	As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187
*	[X86] Make AVX512_512_SET0 XMM16-31 lower to 128-bit XOR when AVX512VL is ↵	Craig Topper	2017-10-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	enabled. Use 128-bit VLX instruction when VLX is enabled. Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior. Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used? llvm-svn: 316978
*	[SelectionDAG] Add VSELECT support to computeKnownBits	Simon Pilgrim	2017-10-30	1	-11/+11
\| \| \| \|	llvm-svn: 316944
*	[SelectionDAG] Add VSELECT support to ComputeNumSignBits	Simon Pilgrim	2017-10-24	1	-2/+2
\| \| \| \|	llvm-svn: 316457
*	[X86][SKX][KNL] Updated regression tests to use -mattr instead of -mcpu ↵	Gadi Haber	2017-09-27	1	-872/+291
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flag.NFC. NFC. Updated 8 regression tests to use -mattr instead of -mcpu flag as follows: -mcpu=knl --> -mattr=+avx512f -mcpu=skx --> -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq The updates are as part of the preparation of a large commit to add all instruction scheduling for the SKX target. Reviewers: delena, zvi, RKSimon Differential Revision: https://reviews.llvm.org/D38222 Change-Id: I2381c9b5bb75ecacfca017243c22d054f6eddd14 llvm-svn: 314306
*	[X86] Teach the execution domain fixing tables to use movlhps inplace of ↵	Craig Topper	2017-09-18	1	-65/+65
\| \| \| \| \| \| \| \|	unpcklpd for the packed single domain. MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter. llvm-svn: 313509
*	X86 Tests: More AVX512 conversions tests. NFC	Zvi Rackover	2017-09-11	1	-0/+885
\| \| \| \| \| \|	Adding more tests for AVX512 fp<->int conversions that were missing. llvm-svn: 312921
*	X86: Improve AVX512 fptoui lowering	Zvi Rackover	2017-09-07	1	-142/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add patterns for fptoui <16 x float> to <16 x i8> fptoui <16 x float> to <16 x i16> Reviewers: igorb, delena, craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37505 llvm-svn: 312704