bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[LegalizeDAG][X86] Add support for turning STRICT_FADD/SUB/MUL/DIV into ↵	Craig Topper	2019-11-21	1	-5/+16
\| \| \| \| \| \| \|	libcalls. Use it for fp128 on x86-64. This requires a minor hack for f32/f64 strict fadd/fsub to avoid turning those into libcalls.
*	[X86] Mark vector STRICT_FADD/STRICT_FSUB as Legal and add mutation to ↵	Craig Topper	2019-11-21	1	-3/+15
\| \| \| \| \| \| \|	X86ISelDAGToDAG The prevents LegalizeVectorOps from scalarizing them. We'll need to remove the X86 mutation code when we add isel patterns.
*	[PGO][PGSO] DAG.shouldOptForSize part.	Hiroshi Yamauchi	2019-11-21	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: (Split of off D67120) SelectionDAG::shouldOptForSize changes for profile guided size optimization. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70095
*	[X86] Change legalization action for f128 fadd/fsub/fmul/fdiv from Custom to ↵	Craig Topper	2019-11-21	1	-12/+4
\| \| \| \| \| \| \| \| \| \| \|	LibCall. The custom code just emits a libcall, but we can do the same with generic code. The only difference is that the generic code can form tail calls where the custom code couldn't. This is responsible for the test changes. This avoids needing to modify the Custom handling for strict fp.
*	[X86] Fix i16->f128 sitofp to promote the i16 to i32 before trying to form a ↵	Craig Topper	2019-11-20	1	-8/+9
\| \| \| \| \| \|	libcall. Previously one of the test cases added here gave an error.
*	[X86] Fix f128->i16 fptosi to promote the i16 to i32 before trying to form a ↵	Craig Topper	2019-11-20	1	-15/+16
\| \| \| \| \| \|	libcall. Previously one of the test cases added here gave an error.
*	[X86] Mark vector STRICT_FP_ROUND as Legal instead of Custom.	Craig Topper	2019-11-20	1	-3/+9
\| \| \| \| \| \| \| \| \|	The Custom handler doesn't do anything for these nodes anyway. SelectionDAGISel won't mutate them if they are Legal or Custom. X86 has custom code for mutating them due to missing isel patterns. When the isel patterns are added Legal will be the right answer. So go ahead a change it now since that's where we'll end up.
*	[musttail] Don't forward AL on Win64	Reid Kleckner	2019-11-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AL is only used for varargs on SysV platforms. Don't forward it on Windows. This allows control flow guard to set up an extra hidden parameter in RAX, as described in PR44049. This also has the effect of freeing up RAX for use in virtual member pointer thunks, which may also be a nice little code size improvement on Win64. Fixes PR44049 Reviewers: ajpaverd, efriedma, hans Differential Revision: https://reviews.llvm.org/D70413
*	[LegalizeDAG][X86] Enable STRICT_FP_TO_SINT/UINT to be promoted	Craig Topper	2019-11-19	1	-4/+7
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D70220
*	[X86] Add custom type legalization and lowering for scalar ↵	Craig Topper	2019-11-19	1	-29/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	STRICT_FP_TO_SINT/UINT This is a first pass at Custom lowering for these operations. I also updated some of the vector code where it was obviously easy and straightforward. More work needed in follow up. This enables these operations to be handled with X87 where special rounding control adjustments are needed to perform a truncate. Still need to fix Promotion in the target independent code in LegalizeDAG. llrint/llround split into separate test file because we can't make a strict libcall properly yet either and we need to do that when i64 isn't a legal type. This does not include any isel support. So we still rely on the mutation in SelectionDAGIsel to remove the strict from this stuff later. Except for the X87 stuff which goes through custom nodes that already had chains. Differential Revision: https://reviews.llvm.org/D70214
*	DAG: Add function context to isFMAFasterThanFMulAndFAdd	Matt Arsenault	2019-11-19	1	-2/+2
\| \| \| \| \| \| \| \|	AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.
*	[X86][SSE] Remove XFormVExtractWithShuffleIntoLoad to prevent legalization ↵	Simon Pilgrim	2019-11-19	1	-122/+2
\| \| \| \| \| \| \| \|	infinite loops (PR43971) As detailed in PR43971/D70267, the use of XFormVExtractWithShuffleIntoLoad causes issues where we end up in infinite loops of extract(targetshuffle(vecload)) -> extract(shuffle(vecload)) -> extract(vecload) -> extract(targetshuffle(vecload)), there are just too many legalization checks at every stage that we can't guarantee that extract(shuffle(vecload)) -> scalarload can occur. At the moment we see a number of minor regressions as we don't fold extract(shuffle(vecload)) -> scalarload before legal ops, these can be addressed in future patches and extension of X86ISelLowering's combineExtractWithShuffle.
*	[SVE][CodeGen] Scalable vector MVT size queries	Graham Hunter	2019-11-18	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871
*	[X86] Don't set the operation action for i16 SINT_TO_FP to Promote just ↵	Craig Topper	2019-11-13	1	-3/+9
\| \| \| \| \| \| \|	because SSE1 is enabled. Instead do custom promotion in the handler so that we can still allow i16 to be used with fp80. And f64 without sse2.
*	[X86] Fix typo in comment. NFC	Craig Topper	2019-11-13	1	-1/+1
\|
*	[X86] Move all the FP_TO_XINT/XINT_TO_FP setOperationActions into the same ↵	Craig Topper	2019-11-13	1	-41/+28
\| \| \| \| \| \| \| \| \|	!useSoftFloat block. Qualify all of the Promote actions for these with !useSoftFloat too. NFCI The Promote action doesn't apply until LegalizeDAG. By the time we get there, we would have already softened all the FP operations if useSoftFloat was true. So there wouldn't be any operation left to Promote.
*	[X86] Remove setOperationAction for FP_TO_SINT v8i16.	Craig Topper	2019-11-12	1	-8/+0
\| \| \| \| \| \| \| \|	This is no longer needed after widening legalization as we custom legalize v8i8 ourselves. Added entries to the cost model, but bumped the cost slightly to account for the truncate shuffle that wasn't costed before.
*	[X86] Don't consider v64i1 as a legal type unless v64i8 is also a legal type.	Craig Topper	2019-11-12	1	-25/+47
\| \| \| \| \|	This avoids some nasty issues with argument passing and lowering of arbitrary v64i8 shuffles.
*	[X86] Only pass v64i8/v32i16 as v16i32 on non-avx512bw targets if the v16i32 ↵	Craig Topper	2019-11-12	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	type won't be split by prefer-vector-width=256 Otherwise just let the v64i8/v32i16 types be split to v32i8/v16i16. In reality this shouldn't happen because it means we have a 512-bit vector argument, but min-legal-vector-width says a value less than 512. But a 512-bit argument should have been factored into the preferred vector width.
*	[X86] Update stale comment. NFC	Craig Topper	2019-11-11	1	-2/+2
\|
*	[X86] Remove setOperationAction lines that say to promote MVT::i1	Craig Topper	2019-11-11	1	-6/+0
\| \| \| \| \| \| \| \|	MVT::i1 should be removed by type legalization before we reach any code that would act on the promote action. Mainly to avoid replicating this for strict FP versions of these operations.
*	[X86] Remove some else branches after checking for !useSoftFloat() that set ↵	Craig Topper	2019-11-11	1	-9/+0
\| \| \| \| \| \| \| \| \|	operations to Expand. If we're using soft floats, then these operations shoudl be softened during type legalization. They'll never get to LegalizeVectorOps or LegalizeDAG so they don't need to be Expanded there.
*	[AArch64][X86] Don't assume __powidf2 is available on Windows.	Eli Friedman	2019-11-08	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013
*	[X86] Remove unused variable. NFC	Craig Topper	2019-11-06	1	-1/+0
\|
*	[X86] Remove dead code from combineStore.	Craig Topper	2019-11-06	1	-44/+10
\| \| \| \| \| \|	Leftovers from before we switched to widening legalization. Fixes PR43919.
*	[X86] Clamp large constant shift amounts for MMX shift intrinsics to 8-bits.	Craig Topper	2019-11-06	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MMX intrinsics for shift by immediate take a 32-bit shift amount but the hardware for shifting by immediate only encodes 8-bits. For the intrinsic we don't require the shift amount to fit in 8-bits in the frontend because we don't check that its an immediate in the frontend. If its is not an immediate we move it to an MMX register and use the shift by register. But if it is an immediate we'll use the shift by immediate instruction. But we need to change the shift amount to 8-bits. We were previously doing this accidentally by masking it in the encoder. But this can make a large shift amount into a small in bounds shift amount. Instead we should clamp larger shift amounts to 255 so that the they don't become in bounds. Fixes PR43922
*	[X86ISelLowering] Fixed typo in assert. NFCI.	Dávid Bolvanský	2019-11-06	1	-1/+1
\|
*	[x86] avoid crashing when splitting AVX stores with non-simple type (PR43916)	Sanjay Patel	2019-11-06	1	-3/+5
\| \| \| \| \|	The store splitting transform was assuming a simple type (MVT), but that's not necessarily the case as shown in the test.
*	[X86] LowerAVXExtend - fix dodgy self-comparison assert.	Simon Pilgrim	2019-11-06	1	-1/+1
\| \| \| \|	PVS Studio noticed that we were asserting "VT.getVectorNumElements() == VT.getVectorNumElements()" instead of "VT.getVectorNumElements() == InVT.getVectorNumElements()".
*	[X86] Gate select->fmin/fmax transform on NoSignedZeros instead of UnsafeFPMath	Benjamin Kramer	2019-11-05	1	-8/+7
\|
*	[X86/Atomics] (Semantically) revert G246098, switch back to the old atomic ↵	Philip Reames	2019-11-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	example When writing an email for a follow up proposal, I realized one of the diffs in the committed change was incorrect. Digging into it revealed that the fix is complicated enough to require some thought, so reverting in the meantime. The problem is visible in this diff (from the revert): ; X64-SSE-LABEL: store_fp128: ; X64-SSE: # %bb.0: -; X64-SSE-NEXT: movaps %xmm0, (%rdi) +; X64-SSE-NEXT: subq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 32 +; X64-SSE-NEXT: movaps %xmm0, (%rsp) +; X64-SSE-NEXT: movq (%rsp), %rsi +; X64-SSE-NEXT: movq {{[0-9]+}}(%rsp), %rdx +; X64-SSE-NEXT: callq __sync_lock_test_and_set_16 +; X64-SSE-NEXT: addq $24, %rsp +; X64-SSE-NEXT: .cfi_def_cfa_offset 8 ; X64-SSE-NEXT: retq store atomic fp128 %v, fp128* %fptr unordered, align 16 ret void The problem here is three fold: 1) x86-64 doesn't guarantee atomicity of anything larger than 8 bytes. Some platforms observably break this guarantee, others don't, but the codegen isn't considering this, so it's wrong on at least some platforms. 2) When I started to track down the problem, I discovered that DAGCombiner had stripped the atomicity off the store entirely. This comes down to idiomatic usage of DAG.getStore passing all MMO components separately as opposed to just passing the MMO. 3) On x86 (not -64), there are cases where 8 byte atomiciy is supported, but only for floating point operations. This would seem to imply that operation typing matters for correctness, and DAGCombine happily folds away bitcasts. I'm not 100% sure there's a problem here, but I'm not entirely sure there isn't either. I plan on returning to each issue in turn; sorry for the churn here.
*	[X86] Specifically limit fmin/fmax commutativity to NoNaNs + NoSignedZeros	Benjamin Kramer	2019-11-05	1	-2/+3
\| \| \| \| \|	The backend UnsafeFPMath flag is not a superset of all the others, so limit it to the exact bits needed.
*	[X86] Convert ShrinkMode to scoped enum class. NFCI.	Simon Pilgrim	2019-11-04	1	-11/+15
\|
*	[X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using ↵	Simon Pilgrim	2019-11-04	1	-0/+17
\| \| \| \| \| \| \| \| \| \|	DemandedElts mask (REAPPLIED) If we don't demand all elements, then attempt to combine to a simpler shuffle. At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts (see D66004). This reapplies rL368307 (reverted at rL369167) after the fix for the infinite loop reported at PR43024 was applied at rG3f087e38a2e7b87a5adaaac1c1b61e51220e7ff3
*	[X86][SSE] combineX86ShufflesRecursively - at Depth==0, only resolve ↵	Simon Pilgrim	2019-11-03	1	-6/+31
\| \| \| \| \| \| \| \|	KnownZero if it removes an input. This stops infinite loops where KnownUndef elements are converted to Zeroable, resulting in KnownZero elements which are then simplified (via SimplifyDemandedElts etc.) back to KnownUndef elements........ Prep fix for PR43024 which will allow rL368307 to be re-applied.
*	[X86][SSE] combineX86ShufflesRecursively - don't bother merging shuffles ↵	Simon Pilgrim	2019-11-03	1	-92/+105
\| \| \| \| \| \|	with empty roots. NFCI. This doesn't affect actual codegen, but is a minor refactor toward fixing PR43024 where we need to avoid excess changes (folding zeroables etc.) to the shuffle mask at Depth == 0.
*	Fix uninitialized variable warning. NFCI.	Simon Pilgrim	2019-11-03	1	-1/+1
\|
*	[X86] Move computeZeroableShuffleElements before ↵	Simon Pilgrim	2019-11-02	1	-87/+87
\| \| \| \| \| \|	getTargetShuffleAndZeroables. NFCI. Prep work toward merging some of the functionality.
*	[X86] Change the behavior of canWidenShuffleElements used by ↵	Craig Topper	2019-11-01	1	-19/+14
\| \| \| \| \| \| \| \| \| \| \|	lowerV2X128Shuffle to match the behavior in lowerVectorShuffle with regards to zeroable elements. Previously we marked zeroable elements in a way that prevented the widening check from recognizing that it could widen. Now we only mark them zeroable if V2 is an all zeros vector. This matches what we do for widening elements in lowerVectorShuffle. Fixes PR43866.
*	[X86][AVX] Add support for and/or scalar bool reduction with AVX512 mask ↵	Simon Pilgrim	2019-11-01	1	-0/+6
\| \| \| \| \| \|	registers combineBitcastvxi1 only handles bitcast->MOVMSK combines, with mask registers we use BITCAST directly.
*	[X86] isFNEG - use switch() instead of if-else tree. NFCI.	Simon Pilgrim	2019-11-01	1	-33/+36
\| \| \| \|	In a future patch this will avoid some checks which don't need to be done for some opcodes.
*	[X86][SSE] Convert computeZeroableShuffleElements to emit KnownUndef and ↵	Simon Pilgrim	2019-10-31	1	-23/+35
\| \| \| \|	KnownZero
*	[X86] Add FIXME comment to merge more of computeZeroableShuffleElements and ↵	Simon Pilgrim	2019-10-30	1	-0/+1
\| \| \| \|	getTargetShuffleAndZeroables
*	[X86][SSE] combineX86ShuffleChain - use resolveZeroablesFromTargetShuffle ↵	Simon Pilgrim	2019-10-30	1	-4/+3
\| \| \| \|	helper. NFCI.
*	[X86] combineOrShiftToFunnelShift - use isOperationLegalOrCustom to check ↵	Simon Pilgrim	2019-10-30	1	-1/+2
\| \| \| \| \| \|	FSHL/FSHR support Remove hard wired legality check.
*	[X86] combineOrShiftToFunnelShift - use getShiftAmountTy instead of ↵	Simon Pilgrim	2019-10-30	1	-5/+8
\| \| \| \|	hardwiring to MVT::i8
*	[Alignment] Use Align for TFI.getStackAlignment() in X86ISelLowering	Guillaume Chatelet	2019-10-30	1	-26/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, craig.topper, rnk Reviewed By: rnk Subscribers: rnk, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69034
*	[X86] Make memcmp vector lowering handle arbitrary expansions	David Zarzycki	2019-10-30	1	-23/+43
\| \| \| \| \| \| \| \| \| \|	Teach combineVectorSizedSetCCEquality() to handle arbitrary memcmp expansions but do not change any default policy for now. This also fixes a bug in the memcmp expansion itself when large displacements are needed. https://reviews.llvm.org/D69507
*	[SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and ↵	Philip Reames	2019-10-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	stores w/StoreSDNode) by default Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default. As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other. This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case. Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations. This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics. If there are problems with this patch, the most likely area will be legalization. Differential Revision: https://reviews.llvm.org/D69219
*	[X86] Narrow i64 compares with constant to i32 when the upper 32-bits are ↵	Craig Topper	2019-10-29	1	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	known zero. This catches some cases. There are probably ways to improve this. I tried doing it as a combine on the setcc, but that broke some cases involving flag reuse in place of test. I renamed the isX86CCUnsigned to isX86CCSigned and flipped its polarity to make it consistent with the similar functions for ISD::SETCC. This avoids calling EQ/NE as being signed or unsigned. Fixes PR43823. Differential Revision: https://reviews.llvm.org/D69499