bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[x86] preserve test intent by removing undef	Sanjay Patel	2018-05-17	1	-26/+20
\| \| \| \| \| \| \| \| \| \| \| \|	We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in https://reviews.llvm.org/D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332648
*	[x86] preserve test intent by removing undef	Sanjay Patel	2018-05-17	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \|	We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in https://reviews.llvm.org/D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332640
*	[X86][BtVer2] ADC/SBB take 2cy on an ALU pipe, not 1cy like ADD/SUB	Simon Pilgrim	2018-05-17	2	-34/+34
\| \| \| \|	llvm-svn: 332616
*	[X86] Split WriteADC/WriteADCRMW scheduler classes	Simon Pilgrim	2018-05-17	1	-16/+16
\| \| \| \| \| \|	For integer ALU instructions taking eflags as an input (ADC/SBB/ADCX/ADOX) llvm-svn: 332605
*	[X86] Update SNB/generic scheduler tests missed from rL332536	Simon Pilgrim	2018-05-16	2	-35/+35
\| \| \| \|	llvm-svn: 332540
*	[X86][SSE] Reduce instruction/register usages for v4i32 vector shifts (PR37441)	Simon Pilgrim	2018-05-16	7	-385/+317
\| \| \| \| \| \| \| \| \| \|	As suggested by Fabian on PR37441, use PSHUFLW to extend shift amount types for use with PSRAD/PSRLD to reduce register pressure. Some of this ideally would be done by combineTargetShuffle but its tricky to do as most of the shuffles are sharing inputs. Differential Revision: https://reviews.llvm.org/D46959 llvm-svn: 332524
*	[x86] preserve test intent by removing undef	Sanjay Patel	2018-05-16	1	-29/+32
\| \| \| \| \| \| \| \| \| \| \| \|	We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we make those fixes. llvm-svn: 332501
*	[x86] preserve test intent by removing undef	Sanjay Patel	2018-05-16	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \|	We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we make those fixes. llvm-svn: 332500
*	[x86] preserve test intent by removing undef	Sanjay Patel	2018-05-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we make those fixes. llvm-svn: 332499
*	[X86][AVX512DQ] Use packed instructions for scalar FP<->i64 conversions on ↵	Craig Topper	2018-05-16	3	-230/+493
\| \| \| \| \| \| \| \| \| \| \| \|	32-bit targets As i64 types are not legal on 32-bit targets, insert these into a suitable zero vector and use the packed vXi64<->FP conversion instructions instead. Fixes PR3163. Differential Revision: https://reviews.llvm.org/D43441 llvm-svn: 332498
*	[x86] add run with unsafe global param; NFC	Sanjay Patel	2018-05-16	1	-207/+356
\| \| \| \|	llvm-svn: 332486
*	[x86] add tests for DAG FP undef operands; NFC	Sanjay Patel	2018-05-16	1	-0/+497
\| \| \| \|	llvm-svn: 332484
*	[X86][SSE] Fix tests for vector rotates by splat variable.	Simon Pilgrim	2018-05-16	3	-493/+426
\| \| \| \| \| \|	We weren't correctly splatting the offset shift llvm-svn: 332435
*	[X86][SSE] Add tests for vector rotates by splat variable.	Simon Pilgrim	2018-05-15	3	-0/+1525
\| \| \| \|	llvm-svn: 332410
*	[x86][eflags] Fix PR37431 by teaching the EFLAGS copy lowering to	Chandler Carruth	2018-05-15	2	-0/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	specially handle SETB_C* pseudo instructions. Summary: While the logic here is somewhat similar to the arithmetic lowering, it is different enough that it made sense to have its own function. I actually tried a bunch of different optimizations here and none worked well so I gave up and just always do the arithmetic based lowering. Looking at code from the PR test case, we actually pessimize a bunch of code when generating these. Because SETB_C* pseudo instructions clobber EFLAGS, we end up creating a bunch of copies of EFLAGS to feed multiple SETB_C* pseudos from a single set of EFLAGS. This in turn causes the lowering code to ruin all the clever code generation that SETB_C* was hoping to achieve. None of this is needed. Whenever we're generating multiple SETB_C* instructions from a single set of EFLAGS we should instead generate a single maximally wide one and extract subregs for all the different desired widths. That would result in substantially better code generation. But this patch doesn't attempt to address that. The test case from the PR is included as well as more directed testing of the specific lowering pattern used for these pseudos. Reviewers: craig.topper Subscribers: sanjoy, mcrosier, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D46799 llvm-svn: 332389
*	[X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classes	Simon Pilgrim	2018-05-15	2	-11/+11
\| \| \| \| \| \| \| \|	BtVer2 - Fixes schedules for (V)CVTPS2PD instructions A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first llvm-svn: 332376
*	[DAG] propagate FMF for all FPMathOperators	Sanjay Patel	2018-05-15	3	-50/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. It gets us most of the FMF functionality that we want without adding any state bits to the flags. It also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch. It should provide a superset of the functionality from D46563 - the extra tests show propagation and codegen diffs for fcmp, vecreduce, and FP libcalls. The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, so that shouldn't make any difference. Differential Revision: https://reviews.llvm.org/D46854 llvm-svn: 332358
*	[X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classes	Simon Pilgrim	2018-05-15	1	-2/+2
\| \| \| \| \| \| \| \| \|	Btver2 - VCVTPH2PSYrm needs to double pump the AGU Broadwell - missing VCVTPS2PH*mr stores extra latency Allows us to remove the WriteCvtF2FSt conversion store class llvm-svn: 332357
*	[X86] Improve unsigned saturation downconvert detection.	Artur Gainullin	2018-05-15	2	-58/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: New unsigned saturation downconvert patterns detection was implemented in X86 Codegen: (truncate (smin (smax (x, C1), C2)) to dest_type), where C1 >= 0 and C2 is unsigned max of destination type. (truncate (smax (smin (x, C2), C1)) to dest_type) where C1 >= 0, C2 is unsigned max of destination type and C1 <= C2. These two patterns are equivalent to: (truncate (umin (smax(x, C1), unsigned_max_of_dest_type)) to dest_type) Reviewers: RKSimon Subscribers: llvm-commits, a.elovikov Differential Revision: https://reviews.llvm.org/D45315 llvm-svn: 332336
*	[X86] Add fast isel tests for some of the avx512 truncate intrinsics to ↵	Craig Topper	2018-05-15	4	-0/+451
\| \| \| \| \| \|	match current clang codegen. llvm-svn: 332326
*	[X86] Remove and autoupgrade avx512.vbroadcast.ss/avx512.vbroadcast.sd ↵	Craig Topper	2018-05-14	2	-20/+20
\| \| \| \| \| \|	intrinsics. llvm-svn: 332271
*	[X86][BtVer2] Fix MMX/YMM integer vector nt store schedules	Simon Pilgrim	2018-05-14	2	-2/+2
\| \| \| \| \| \|	MMX was missing and YMM was tagged as a fp nt store llvm-svn: 332269
*	[X86] Add fast isel test cases for the clang output for 512-bit cvtps2pd ↵	Craig Topper	2018-05-14	1	-0/+92
\| \| \| \| \| \|	related intrinsics. llvm-svn: 332214
*	[X86] Remove and autoupgrade the cvtusi2sd intrinsic. Use ↵	Craig Topper	2018-05-14	2	-11/+11
\| \| \| \| \| \|	uitofp+insertelement instead. llvm-svn: 332206
*	[X86] Add patterns for combining movss+uint_to_fp into the intrinsic ↵	Craig Topper	2018-05-13	1	-12/+6
\| \| \| \| \| \| \| \|	instructions under AVX512. This matches what we do for sint_to_fp. llvm-svn: 332205
*	[X86] Add fast-isel test cases for _mm_cvtu32_sd, _mm_cvtu64_sd, ↵	Craig Topper	2018-05-13	1	-0/+98
\| \| \| \| \| \|	_mm_cvtu32_ss, and _mm_cvtu64_ss. llvm-svn: 332204
*	[X86] Remove and autoupgrade masked vpermd/vpermps intrinsics.	Craig Topper	2018-05-13	2	-40/+40
\| \| \| \|	llvm-svn: 332198
*	Follow-up to rL332176 by adding a test case for PR37264.	Dimitry Andric	2018-05-13	1	-0/+12
\| \| \| \| \| \|	Noticed by Simon Pilgrim. llvm-svn: 332197
*	[X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic ↵	Craig Topper	2018-05-13	1	-1/+1
\| \| \| \| \| \|	instructions. llvm-svn: 332189
*	[X86] Remove some unused CHECK lines from tests.	Craig Topper	2018-05-13	2	-12/+0
\| \| \| \|	llvm-svn: 332188
*	[X86] Remove an autoupgrade legacy cvtss2sd intrinsics.	Craig Topper	2018-05-13	3	-85/+55
\| \| \| \|	llvm-svn: 332187
*	[X86] Remove and autoupgrade cvtsi2ss/cvtsi2sd intrinsics to match what ↵	Craig Topper	2018-05-12	10	-127/+118
\| \| \| \| \| \|	clang has used for a very long time. llvm-svn: 332186
*	[X86] Remove some unused masked conversion intrinsics that can be replaced ↵	Craig Topper	2018-05-12	2	-144/+144
\| \| \| \| \| \| \| \|	with an older intrinsic and a select. This is what clang already uses. llvm-svn: 332170
*	[X86] Remove and autoupgrade a bunch of FMA instrinsics that are no longer ↵	Craig Topper	2018-05-11	2	-24/+12
\| \| \| \| \| \|	used by clang. llvm-svn: 332146
*	[DAGCombiner] Set the right SDLoc on extended SETCC uses (7/N)	Vedant Kumar	2018-05-11	2	-11/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to reflect a simplification to the load (ExtLoad). Based on my reading, ExtendSetCCUses may create new nodes to extend a constant attached to a SETCC. It also creates fresh SETCC nodes which refer to any updated operands. ISTM that the location applied to the new constant and SETCC nodes should be the same as the location of the ExtLoad. This was suggested by Adrian in https://reviews.llvm.org/D45995. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46216 llvm-svn: 332119
*	[DAGCombiner] Set the right SDLoc on a newly-created sextload (6/N)	Vedant Kumar	2018-05-11	8	-293/+390
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This teaches tryToFoldExtOfLoad to set the right location on a newly-created extload. With that in place, the logic for performing a certain ([s\|z]ext (load ...)) combine becomes identical for sexts and zexts, and we can get rid of one copy of the logic. The test case churn is due to dependencies on IROrders inherited from the wrong SDLoc. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46158 llvm-svn: 332118
*	[X86][BtVer2] Model ymm move as double pumped instructions	Simon Pilgrim	2018-05-11	1	-2/+2
\| \| \| \| \| \|	We still need to handle mmx/xmm moves as 'decode-only' no-pipe instructions llvm-svn: 332109
*	Use iteration instead of recursion in CFIInserter	Sanjoy Das	2018-05-11	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This recursive step can overflow the stack. Reviewers: djokov, petarj Subscribers: mcrosier, jlebar, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D46671 llvm-svn: 332101
*	[X86] Split WriteF/WriteVec Move/Load/Store scheduler classes by vector width	Simon Pilgrim	2018-05-11	4	-145/+145
\| \| \| \| \| \|	Fixes a SNB issue that was missing vlddqu/vmovntdqa ymm instructions llvm-svn: 332094
*	[X86] Add new patterns for masked scalar load/store to match clang's codegen ↵	Craig Topper	2018-05-10	1	-0/+153
\| \| \| \| \| \| \| \| \| \| \| \|	from r331958. Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets. So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened. We may be able to drop some of the old patterns, but I leave that for a future patch. llvm-svn: 332049
*	[X86] Split ↵	Simon Pilgrim	2018-05-10	4	-65/+65
\| \| \| \| \| \| \| \|	WriteVecALU/WriteVecLogic/WriteShuffle/WriteVarShuffle/WritePSADBW/WritePHAdd scheduler classes Split off XMM classes from the default (MMX) classes. llvm-svn: 331999
*	[x86] fix fmaxnum/fminnum with nnan	Sanjay Patel	2018-05-10	2	-46/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With nnan, there's no need for the masked merge / blend sequence (that probably costs much more than the min/max instruction). Somewhere between clang 5.0 and 6.0, we started producing these intrinsics for fmax()/fmin() in C source instead of libcalls or fcmp/select. The backend wasn't prepared for that, so we regressed perf in those cases. Note: it's possible that other targets have similar problems as seen here. Noticed while investigating PR37403 and related bugs: https://bugs.llvm.org/show_bug.cgi?id=37403 The IR FMF propagation cases still don't work. There's a proposal that might fix those cases in D46563. llvm-svn: 331992
*	[x86] fix test names; NFC	Sanjay Patel	2018-05-10	2	-7/+7
\| \| \| \|	llvm-svn: 331989
*	[x86] add tests for maxnum/minnum intrinsics with nnan; NFC	Sanjay Patel	2018-05-10	2	-0/+197
\| \| \| \| \| \| \| \| \| \| \| \|	Clang 6.0 was updated to create these intrinsics rather than libcalls or fcmp/select, but the backend wasn't prepared to handle that optimally. This bug is not the primary reason for PR37403: https://bugs.llvm.org/show_bug.cgi?id=37403 ...but it's probably more important for x86 perf. llvm-svn: 331988
*	[DAG] Avoid using deleted node in rebuildSetCC	Nirav Dave	2018-05-10	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The combine in rebuildSetCC may be combined to another node leaving our references stale. Keep a handle on it to avoid stale references. Fixes PR36602. Reviewers: dbabokin, RKSimon, eli.friedman, davide Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D46404 llvm-svn: 331985
*	[X86] ptwrite intrinsic	Gabor Buella	2018-05-10	2	-0/+81
\| \| \| \| \| \| \| \| \| \|	Reviewers: craig.topper, RKSimon Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D46539 llvm-svn: 331961
*	[GlobalISel][Legalizer] Widening the second src op of shifts bug fix	Roman Tereshin	2018-05-09	3	-4/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its value as a (small) unsigned integer, therefore its incorrect to widen it in any way but by zero extending it. G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their destination and first source operands, but not the "number of bits to shift" operand). Generally, shifts aren't as similar to regular binary operations as it might seem, for instance, they aren't commutative nor associative and the second source operand usually requires a special treatment. Reviewers: bogner, javed.absar, aivchenk, rovka Reviewed By: bogner Subscribers: igorb, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D46413 llvm-svn: 331926
*	[DAGCombiner] In visitBITCAST when trying to constant fold the bitcast, only ↵	Craig Topper	2018-05-09	1	-4/+2
\| \| \| \| \| \| \| \| \| \|	call getBitcast if its an fp->int or int->fp conversion even when before legalize ops. Previously if !LegalOperations we would blindly call getBitcast and hope that getNode would constant fold it. But if the conversion is between a vector and a scalar, getNode has no simplification. This means we would just get back the original N. We would then return that N which would make the caller of visitBITCAST think that we used CombineTo and did our own worklist management. This prevents target specific optimizations from being called for vector/scalar bitcasts until after legal operations. llvm-svn: 331896
*	[X86] Cleanup WriteFStore/WriteVecStore schedules	Simon Pilgrim	2018-05-09	1	-2/+2
\| \| \| \| \| \| \| \|	MOVNTPD/MOVNTPS should be WriteFStore Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction.... llvm-svn: 331864
*	[X86] Combine (vXi1 (bitcast (-1)))) and (vXi1 (bitcast (0))) to all ones or ↵	Craig Topper	2018-05-09	1	-0/+38
\| \| \| \| \| \|	all zeros vXi1 vector. llvm-svn: 331847