bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[DAGCombiner] [NFC] Improve X div/rem 1 fold	David Bolvansky	2018-09-28	1	-8/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52661 llvm-svn: 343349
*	llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)	Fangrui Song	2018-09-27	4	-12/+9
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: The convenience wrapper in STLExtras is available since rL342102. Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D52573 llvm-svn: 343163
*	[DAG] SelectionDAGLegalize::ExpandLegalINT_TO_FP - use getFPExtendOrRound ↵	Simon Pilgrim	2018-09-26	1	-11/+1
\| \| \| \| \| \| \| \|	helper. NFCI. Handles SrcVT == DstVT as well. llvm-svn: 343121
*	[DAG] ExpandLegalINT_TO_FP - pull out repeated getValueType() call. NFCI.	Simon Pilgrim	2018-09-26	1	-9/+9
\| \| \| \|	llvm-svn: 343101
*	[CodeGen] Enable tail calls for functions with NonNull attributes.	David Green	2018-09-26	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	Adding NonNull as attributes to returned pointers has the unfortunate side effect of disabling tail calls. This patch ignores the NonNull attribute when we decide whether to tail merge, in the same way that we ignore the NoAlias attribute, as it has no affect on the call sequence. Differential Revision: https://reviews.llvm.org/D52238 llvm-svn: 343091
*	Run VerifyDAGDiverence in debug only	Mikael Nilsson	2018-09-26	2	-0/+16
\| \| \| \| \| \| \| \| \|	VerifyDAGDiverence costs compilation time, avoid running it in non-debug builds. Differential Revision: https://reviews.llvm.org/D52454 llvm-svn: 343086
*	[DAGCombiner] Remove unnecessary check for visitSDIVLike/visitUDIVLike ↵	Craig Topper	2018-09-25	1	-2/+1
\| \| \| \| \| \| \| \|	returning a UDIVREM or SDIVREM node. This shouldn't be possible and is a leftover from when we used to recursively call combine here. llvm-svn: 343049
*	Unify landing pad information adding routines (NFC)	Heejin Ahn	2018-09-25	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We have `llvm::addLandingPadInfo` and `MachineFunction::addLandingPad`, both of which add landing pad information to populate `LandingPadInfo` but are called from different locations, which was confusing. This patch unifies them with one `MachineFunction::addLandingPad` function, which now has functionlities of both functions. Reviewers: rnk Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52428 llvm-svn: 343018
*	[x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449)	Sanjay Patel	2018-09-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the final (I hope!) problem pattern mentioned in PR37749: https://bugs.llvm.org/show_bug.cgi?id=37749 We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops. We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op. The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, we have this vector-type-legalized sequence: t29: v8i32 = concat_vectors t27, t28 t30: v4i64 = bitcast t29 t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ... t31: v4i64 = bitcast t18 t32: v4i64 = xor t30, t31 t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ... t34: v4i64 = bitcast t9 t35: v4i64 = and t32, t34 t36: v8i32 = bitcast t35 t37: v4i32 = extract_subvector t36, Constant:i64<0> t38: v4i32 = extract_subvector t36, Constant:i64<4> Differential Revision: https://reviews.llvm.org/D52318 llvm-svn: 343008
*	[LegalizeDAG] Prune Predecessor check in ↵	Nirav Dave	2018-09-25	1	-0/+1
\| \| \| \| \| \|	ExpandExtractFromVectorThroughStack. NFCI. llvm-svn: 342985
*	[DAGCombine] Improve Predecessor check in SimplifySelectOps. NFCI.	Nirav Dave	2018-09-25	1	-4/+36
\| \| \| \| \| \| \|	Reuse search space bookkeeping across multiple predecessor checks qdone to avoid redundancy. This should cut search cost by ~4x. llvm-svn: 342984
*	[DAGCombine] Share predecessor bookkeeping in CombineToPostIndexedLoadStore. ↵	Nirav Dave	2018-09-25	1	-2/+9
\| \| \| \| \| \|	NFCI. llvm-svn: 342983
*	[DAGCombine] Don't fold dependent loads across SELECT_CC.	Nirav Dave	2018-09-25	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DAGCombine will try to fold two loads that feed a SELECT or SELECT_CC after the select, resulting in a select of an address and a single load after. If either of the loads depend on the other, this is not legal as it could introduce cycles. However, it only checked this if the opcode was a SELECT, and not for a SELECT_CC. Unfortunately, the only reproducer I have for this is for our downstream target. I've tried getting it to trigger on an upstream one but haven't been successful. Patch thanks to Bevin Hansson. llvm-svn: 342980
*	[DAGCombiner] use UADDO to optimize saturated unsigned add	Sanjay Patel	2018-09-24	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886
*	Remove debug printf leftover from r342397	Hans Wennborg	2018-09-24	1	-2/+0
\| \| \| \|	llvm-svn: 342863
*	[DAGCombiner] Remove some dead code from ConstantFoldBITCASTofBUILD_VECTOR	Craig Topper	2018-09-24	1	-9/+2
\| \| \| \| \| \|	This code handled SCALAR_TO_VECTOR being returned by the recursion, but the code that used to return SCALAR_TO_VECTOR was removed in 2015. llvm-svn: 342856
*	[DAGCombiner] Clarify a comment. NFC	Craig Topper	2018-09-23	1	-2/+4
\| \| \| \| \| \|	This comment was misleading about why we were restricting to before legalize types. The reason given would only apply to before legalize ops. But there is a before legalize types reason that should also be listed. llvm-svn: 342851
*	[LegalizeTypes] Fix bad indentation. NFC	Craig Topper	2018-09-23	1	-1/+1
\| \| \| \|	llvm-svn: 342850
*	[DAGCombiner][x86] extend decompose of integer multiply into shift/add with ↵	Sanjay Patel	2018-09-23	1	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	negation This is an alternative to https://reviews.llvm.org/D37896. We can't decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some existing code that overlaps with this transform. This extends D52195 and may resolve PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 (still an open question about transforming legal vector multiplies, but we could open another bug report for those) llvm-svn: 342844
*	[DAGCombiner] Simplify some code in visitBITCAST. NFCI	Craig Topper	2018-09-22	1	-9/+3
\| \| \| \|	llvm-svn: 342826
*	[DAGCombiner] Rewrite r331896 in a different way to address a FIXME. NFCI	Craig Topper	2018-09-22	1	-11/+14
\| \| \| \|	llvm-svn: 342809
*	[SelectionDAG] replace duplicated peekThroughBitcast helper functions; NFCI	Sanjay Patel	2018-09-20	2	-38/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant. Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code. No functional change intended. I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps. Differential Revision: https://reviews.llvm.org/D52285 llvm-svn: 342669
*	[SelectionDAG] allow vector types with isBitwiseNot()	Sanjay Patel	2018-09-19	2	-4/+5
\| \| \| \| \| \| \|	The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits, and the test diff in add.ll is from a DAGCombiner transform. llvm-svn: 342594
*	Copy utilities updated and added for MI flags	Michael Berg	2018-09-19	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds a GlobalIsel copy utility into MI for flags and updates the instruction emitter for the SDAG path. Some tests show new behavior and I added one for GlobalIsel which mirrors an SDAG test for handling nsw/nuw. Reviewers: spatel, wristow, arsenm Reviewed By: arsenm Subscribers: wdng Differential Revision: https://reviews.llvm.org/D52006 llvm-svn: 342576
*	[DAGCombiner][x86] add transform/hook to decompose integer multiply into ↵	Sanjay Patel	2018-09-19	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shift/add This is an alternative to D37896. I don't see a way to decompose multiplies generically without a target hook to tell us when it's profitable. ARM and AArch64 may be able to remove some duplicate code that overlaps with this transform. As a first step, we're only getting the most clear wins on the vector examples requested in PR34474: https://bugs.llvm.org/show_bug.cgi?id=34474 As noted in the code comment, it's likely that the x86 constraints are tighter than necessary, but it may not always be a win to replace a pmullw/pmulld. Differential Revision: https://reviews.llvm.org/D52195 llvm-svn: 342554
*	ScheduleDAG: Cleanup dumping code; NFC	Matthias Braun	2018-09-19	5	-20/+32
\| \| \| \| \| \| \| \| \| \| \| \|	- Instead of having both `SUnit::dump(ScheduleDAG)` and `ScheduleDAG::dumpNode(ScheduleDAG)`, just keep the latter around. - Add `ScheduleDAG::dump()` and avoid code duplication in several places. Implement it for different ScheduleDAG variants. - Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()` functions. They were only ever used for debug dumping and putting the function into ScheduleDAG is consistent with the `dumpNode()` change. llvm-svn: 342520
*	Revert "Revert r342183 "[DAGCombine] Fix crash when store merging created an ↵	Amara Emerson	2018-09-17	1	-1/+10
\| \| \| \| \| \| \| \|	extract_subvector with invalid index."" Fixed the assertion failure. llvm-svn: 342397
*	[DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)	Sanjay Patel	2018-09-16	3	-1/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040. Copying the motivational statement by @evandro from that patch: "This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp, 447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks. Otherwise, no regressions on x86-64 or A64." I'm proposing to add only the minimum support for a DAG node here. Since we don't have an LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I don't think we need to worry about DAG builder, legalization, a strict variant, etc. We should be able to expand as needed when adding more functionality/transforms. For reference, these are transform suggestions currently listed in SimplifyLibCalls.cpp: // * cbrt(expN(X)) -> expN(x/3) // * cbrt(sqrt(x)) -> pow(x,1/6) // * cbrt(cbrt(x)) -> pow(x,1/9) Also, given that we bail out on long double for now, there should not be any logical differences between platforms (unless there's some platform out there that has pow() but not cbrt()). Differential Revision: https://reviews.llvm.org/D51753 llvm-svn: 342348
*	Revert r342183 "[DAGCombine] Fix crash when store merging created an ↵	Reid Kleckner	2018-09-14	1	-8/+1
\| \| \| \| \| \| \| \| \|	extract_subvector with invalid index." Causes 'isVector() && "Invalid vector type!"' assertion when building Skia in Chrome. llvm-svn: 342265
*	Fix debug info for SelectionDAG legalization of DAG nodes with two results.	Adrian Prantl	2018-09-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the debug info handling for SelectionDAG legalization of DAG nodes with two results. When an replaced SDNode has more than one result, transferDbgValues was always copying the SDDbgValue from the first result and attaching them to all members. In reality SelectionDAG::ReplaceAllUsesWith() is given an array of SDNodes (though the type signature doesn't make this obvious (cf. the call site code in ReplaceNode()). rdar://problem/44162227 Differential Revision: https://reviews.llvm.org/D52112 llvm-svn: 342264
*	fix noasserts build	Adrian Prantl	2018-09-14	1	-0/+2
\| \| \| \|	llvm-svn: 342247
*	SelectionDAG: Add compact SDDbgValue representation to -dag-dump-verbose output	Adrian Prantl	2018-09-14	2	-0/+19
\| \| \| \|	llvm-svn: 342245
*	fix typos	Adrian Prantl	2018-09-14	1	-1/+1
\| \| \| \|	llvm-svn: 342241
*	[DAGCombine] Fix crash when store merging created an extract_subvector with ↵	Amara Emerson	2018-09-13	1	-1/+8
\| \| \| \| \| \| \| \|	invalid index. Differential Revision: https://reviews.llvm.org/D51831 llvm-svn: 342183
*	DAG: Fix expansion of unaligned FP loads and stores	Matt Arsenault	2018-09-13	1	-4/+6
\| \| \| \| \| \| \| \| \|	This was trying to scalarizing a scalar FP type, resulting in an assert. Fixes unaligned f64 stack stores for AMDGPU. llvm-svn: 342132
*	[DAGCombiner] improve formatting for select+setcc code; NFC	Sanjay Patel	2018-09-12	1	-17/+15
\| \| \| \|	llvm-svn: 342095
*	[SelectionDAG] Remove some code from PromoteIntOp_MGATHER that handles ↵	Craig Topper	2018-09-12	1	-8/+1
\| \| \| \| \| \| \| \|	UpdateNodeOperands returning an existing node instead of updating. I suspect this became unecessary when the CSE of mgather was fixed in r338080. It may still be possible to hit this if we widen the element type of a gather outside of type legalization and the promote the mask of a separate gather node so they become the same. But that seems pretty unlikely. llvm-svn: 342022
*	DAG: Handle odd vector sizes in calling conv splitting	Matt Arsenault	2018-09-10	1	-12/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This already worked if only one register piece was used, but didn't if a type was split into multiple, unequal sized pieces. Fixes not splitting 3i16/v3f16 into two registers for AMDGPU. This will also allow fixing the ABI for 16-bit vectors in a future commit so that it's the same for all subtargets. llvm-svn: 341801
*	[SelectionDAG] enhance vector demanded elements to look at a vector select ↵	Sanjay Patel	2018-09-09	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	condition operand This is the DAG equivalent of D51433. If we know we're not using all vector lanes, use that knowledge to potentially simplify a vselect condition. The reduction/horizontal tests show that we are eliminating AVX1 operations on the upper half of 256-bit vectors because we don't need those anyway. I'm not sure what the pr34592 test is showing. That's run with -O0; is SimplifyDemandedVectorElts supposed to be running there? Differential Revision: https://reviews.llvm.org/D51696 llvm-svn: 341762
*	[DAGCombiner] foldBitcastedFPLogic - Add basic vector support	Simon Pilgrim	2018-09-07	1	-8/+8
\| \| \| \| \| \| \| \|	Add support for bitcasts from float type to an integer type of the same element bitwidth. There maybe cases where we need to support different widths (e.g. as SSE __m128i is treated as v2i64) - but I haven't seen cases of this in the wild yet. llvm-svn: 341652
*	[DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))	Sanjay Patel	2018-09-05	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. Here, we only do the transform when the target tells us that sqrt can be lowered with inline code. This is the basic case. Some potential enhancements are in the TODO comments: 1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper). 2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF. 3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right). Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should also be covered by existing tests). Differential Revision: https://reviews.llvm.org/D51630 llvm-svn: 341481
*	DAG: Factor out helper function for odd vector sizes	Matt Arsenault	2018-09-04	1	-22/+28
\| \| \| \|	llvm-svn: 341392
*	[CodeGen] Fix remaining zext() assertions in SelectionDAG	Scott Linder	2018-09-04	2	-16/+14
\| \| \| \| \| \| \| \|	Fix remaining cases not committed in https://reviews.llvm.org/D49574 Differential Revision: https://reviews.llvm.org/D50659 llvm-svn: 341380
*	DAG: Handle extract_vector_elt in isKnownNeverNaN	Matt Arsenault	2018-09-03	1	-0/+3
\| \| \| \|	llvm-svn: 341317
*	[DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle inverted pattern	Roman Lebedev	2018-09-02	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A follow-up for D49266 / rL337166 + D49497 / rL338044. This is still the same pattern to check for the [lack of] signed truncation, but in this case the constants and the predicate are negated. https://rise4fun.com/Alive/BDV https://rise4fun.com/Alive/n7Z Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma, dmgreen Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51532 llvm-svn: 341287
*	[DAGCombiner] Fix bad identation. NFC	Craig Topper	2018-08-30	1	-1/+1
\| \| \| \|	llvm-svn: 341103
*	[NFC] Rename the DivergenceAnalysis to LegacyDivergenceAnalysis	Nicolai Haehnle	2018-08-30	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433). The purpose of this patch is to free up the name DivergenceAnalysis for the new generic implementation. The generic implementation class will be shared by specialized divergence analysis classes. Patch by: Simon Moll Reviewed By: nhaehnle Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D50434 Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb llvm-svn: 341071
*	DAG: Don't use ABI copies in some contexts	Matt Arsenault	2018-08-30	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	If an ABI-like value is used in a different block, the type split used is not necessarily the same as the call's ABI. The value is used through an intermediate copy virtual registers from the other block. This resulted in copies with inconsistent sizes later. Fixes regressions since r338197 when AMDGPU started splitting vector types for calls. llvm-svn: 341018
*	[DAGCombiner] Add X / X -> 1 & X % X -> 0 folds	Simon Pilgrim	2018-08-29	1	-1/+18
\| \| \| \| \| \| \| \|	Adds more divrem folds to try and get in sync with InstructionSimplify Differential Revision: https://reviews.llvm.org/D50636 llvm-svn: 340919
*	[X86] Support v2i32 gather/scatter indices with ↵	Craig Topper	2018-08-29	3	-21/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-x86-experimental-vector-widening-legalization Summary: This is split out from D41062 to cover the code in LegalVectorTypes.cpp Reviewers: RKSimon, spatel, efriedma Reviewed By: efriedma Subscribers: sdardis, jvesely, nhaehnle, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D51337 llvm-svn: 340891