bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[FastISel][X86] If selectFNeg fails, fall back to SelectionDAG not treating ↵	Craig Topper	2019-05-07	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	it as an fsub. Summary: If fneg lowering for fsub -0.0, x fails we currently fall back to treating it as an fsub. This has different behavior for nans than the xor with sign bit trick we normally try to do. On X86, the xor trick for double fails fast-isel in 32-bit mode with sse2 due to 64 bit integer types not being available. With -O2 we would always use an xorpd for this case. If we use subsd, this creates an observable behavior difference between -O0 and -O2. So fall back to SelectionDAG if we can't fast-isel it, that way SelectionDAG will use the xorpd. I believe this patch is restoring the behavior prior to r345295 from last October. This was missed then because our fast isel case in 32-bit mode aborted fast-isel earlier for another reason. But I've added new tests to cover that. Reviewers: andrew.w.kaylor, cameron.mcinally, spatel, efriedma Reviewed By: cameron.mcinally Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61622 llvm-svn: 360111
*	[DebugInfo] Delete TypedDINodeRef	Fangrui Song	2019-05-07	8	-105/+72
\| \| \| \| \| \| \| \| \| \| \| \| \|	TypedDINodeRef<T> is a redundant wrapper of Metadata * that is actually a T . Accordingly, change DI{Node,Scope,Type}Ref uses to DI{Node,Scope,Type} or their const variants. This allows us to delete many resolve() calls that clutter the code. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D61369 llvm-svn: 360108
*	Fix bug in getCompleteTypeIndex in codeview debug info	Amy Huang	2019-05-06	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When there are multiple instances of a forward decl record type, only the first one is emitted with a type index, because the type is added to a map with a null type index. Avoid this by reordering so that forward decl types aren't added to the map. Reviewers: rnk Subscribers: aprantl, hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61460 llvm-svn: 360101
*	[FastISel] Pass the fneg input operand to hasTrivialKill in ↵	Craig Topper	2019-05-06	1	-1/+1
\| \| \| \| \| \| \| \|	FastISel::selectFNeg. We're trying to calculate the kill flag for OpReg which is the input so we need to pass the input here. llvm-svn: 360097
*	Fix pr33010, a 2 year old crashing regression	Philip Reames	2019-05-06	1	-0/+4
\| \| \| \| \| \| \| \|	The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>. The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this. Instead, simply prevent it's creation. Arguably, we shouldn't be using TargetFrameIndices in StatepointLowering at all, but that's a much deeper change. llvm-svn: 360090
*	[SelectionDAG][X86] Support inline assembly returning an mmx register into a ↵	Craig Topper	2019-05-06	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type with fewer than 64 bits. It's possible to use the 'y' mmx constraint with a type narrower than 64-bits. This patch supports this by bitcasting the mmx type to 64-bits and then truncating to the desired type. There are probably other missing type combinations we need to support, but this is the case we have a bug report for. Fixes PR41748. Differential Revision: https://reviews.llvm.org/D61582 llvm-svn: 360069
*	Revert r359392 and r358887	Craig Topper	2019-05-06	1	-25/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead" Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling" Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list. Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work there. I'll file a separate PR for that and add test cases. Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised if that bug can still be hit independent of that. This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again. llvm-svn: 360066
*	[SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635)	Nikita Popov	2019-05-06	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \|	This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635 by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is legal but former is not) for zero-or-all-ones boolean reductions (which are detected based on sign bits). Differential Revision: https://reviews.llvm.org/D61398 llvm-svn: 360054
*	[SelectionDAG] Replace llvm_unreachable at the end of getCopyFromParts with ↵	Craig Topper	2019-05-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a report_fatal_error. Based on PR41748, not all cases are handled in this function. llvm_unreachable is treated as an optimization hint than can prune code paths in a release build. This causes weird behavior when PR41748 is encountered on a release build. It appears to generate an fp_round instruction from the floating point code. Making this a report_fatal_error prevents incorrect optimization of the code and will instead generate a message to file a bug report. llvm-svn: 360008
*	[NFC] BasicBlock: refactor changePhiUses() out of replacePhiUsesWith(), use it	Roman Lebedev	2019-05-05	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It is a common thing to loop over every `PHINode` in some `BasicBlock` and change old `BasicBlock` incoming block to a new `BasicBlock` incoming block. `replaceSuccessorsPhiUsesWith()` already had code to do that, it just wasn't a function. So outline it into a new function, and use it. Reviewers: chandlerc, craig.topper, spatel, danielcdh Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61013 llvm-svn: 359996
*	[NFC] PHINode: introduce replaceIncomingBlockWith() function, use it	Roman Lebedev	2019-05-05	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There is `PHINode::getBasicBlockIndex()`, `PHINode::setIncomingBlock()` and `PHINode::getNumOperands()`, but no function to replace every specified `BasicBlock` predecessor with some other specified `BasicBlock`. Clearly, there are a lot of places that could use that functionality. Reviewers: chandlerc, craig.topper, spatel, danielcdh Reviewed By: craig.topper Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61011 llvm-svn: 359995
*	[SelectionDAG] Use any_of/all_of where possible. NFCI.	Simon Pilgrim	2019-05-05	1	-14/+4
\| \| \| \|	llvm-svn: 359974
*	[CodeGenPrepare] limit overflow intrinsic matching to a single basic block ↵	Sanjay Patel	2019-05-04	1	-28/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(2nd try) This is a subset of the original commit from rL359879 which was reverted because it could crash when using the 'RemovedInstructions' structure that enables delayed deletion of dead instructions. The motivating compile-time win does not require that change though. We should get most of that win from this change alone. Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359969
*	Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out ↵	Matt Arsenault	2019-05-03	1	-4/+41
\| \| \| \| \| \| \| \| \| \| \|	of a block" This reverts commit r359912. This should pass now, since the clang test was made less fragile in r359918. llvm-svn: 359919
*	[DAGCombine] Remove repeated variables. NFCI.	Simon Pilgrim	2019-05-03	1	-8/+3
\| \| \| \|	llvm-svn: 359915
*	Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out ↵	Nico Weber	2019-05-03	1	-41/+4
\| \| \| \| \| \| \| \|	of a block" Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail. llvm-svn: 359912
*	[TargetLowering] SimplifySetCC - remove repeated variable. NFCI.	Simon Pilgrim	2019-05-03	1	-2/+1
\| \| \| \| \| \|	Also reduce scope of Temp variable. llvm-svn: 359911
*	Revert "[CodeGenPrepare] limit overflow intrinsic matching to a single basic ↵	Evgeniy Stepanov	2019-05-03	1	-42/+47
\| \| \| \| \| \| \| \|	block" This reverts commit r359879, which introduced a compiler crash. llvm-svn: 359908
*	RegAllocFast: Add heuristic to detect values not live-out of a block	Matt Arsenault	2019-05-03	1	-4/+41
\| \| \| \| \| \| \| \| \|	Add an improved/new heuristic to catch more cases when values are not live out of a basic block. Patch by Matthias Braun llvm-svn: 359906
*	[SelectionDAG] CreateTopologicalOrder - don't use iterator	Simon Pilgrim	2019-05-03	1	-10/+6
\| \| \| \| \| \| \| \|	We shouldn't use an iterator to loop across a std::vector when the same loop is adding elements to that std::vector Found by cppcheck llvm-svn: 359900
*	[TargetLowering] ShrinkDemandedConstant - reduce scope of TLO.DAG variable. ↵	Simon Pilgrim	2019-05-03	1	-3/+2
\| \| \| \| \| \| \| \|	NFCI. Only ever used in one block llvm-svn: 359890
*	[CodeGenPrepare] limit overflow intrinsic matching to a single basic block	Sanjay Patel	2019-05-03	1	-47/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using/updating a dominator tree to match math overflow patterns may be very expensive in compile-time (because of the way CGP uses a DT), so just handle the single-block case. Also, we were restarting the iterator loops when doing the overflow intrinsic transforms by marking the dominator tree for update. That was done to prevent iterating over a removed instruction. But we can postpone the deletion using the existing "RemovedInsts" structure, and that means we don't need to update the DT. See post-commit thread for rL354298 for more details: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190422/646276.html Differential Revision: https://reviews.llvm.org/D61075 llvm-svn: 359879
*	[TargetLowering] expandUnalignedStore - cleanup EVT variables. NFCI.	Simon Pilgrim	2019-05-03	1	-23/+18
\| \| \| \| \| \|	Avoid duplicated EVTs and rename Store/Load VTs to avoid -Wshadow warnings. llvm-svn: 359877
*	Revert "[MIR] Add simple PRE pass to MachineCSE"	Anton Afanasyev	2019-05-03	1	-117/+9
\| \| \| \| \| \| \|	This reverts commit 9c20156de39b377190d7a91783d61877b303fe35. It breaks stage 2 of clang-ppc64be-linux-multistage. llvm-svn: 359875
*	[SelectionDAG] Use INT_MIN as (1 << 31) is UB for signed integers. NFCI.	Simon Pilgrim	2019-05-03	1	-2/+2
\| \| \| \|	llvm-svn: 359873
*	[SelectionDAG] computeKnownBits - remove some duplicate/shadow variables. NFCI.	Simon Pilgrim	2019-05-03	1	-6/+4
\| \| \| \|	llvm-svn: 359872
*	[MIR] Add simple PRE pass to MachineCSE	Anton Afanasyev	2019-05-03	1	-9/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 Reviewers: RKSimon Subscribers: hfinkel, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D56772 llvm-svn: 359870
*	[IRTranslator] Use the alloc size instead of the store size when translating ↵	Quentin Colombet	2019-05-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	allocas We use to incorrectly use the store size instead of the alloc size when creating the stack slot for allocas. On aarch64 this can be demonstrated by allocating weirdly sized types. For instance, in the added test case, we use an alloca for i19. We used to allocate a slot of size 24-bit (19 rounded up to the next byte), whereas we really want to use a full 32-bit slot for this type. llvm-svn: 359856
*	[AArch64][Windows] Compute function length correctly in unwind tables.	Eli Friedman	2019-05-03	2	-3/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The primary fix here is to WinException.cpp: we need to exclude jump tables when computing the length of a function, or else we fail to correctly compute the length. (We can only compute the number of bytes consumed by certain assembler directives after the entire file is parsed. ".p2align" is one of those directives, and is used by jump table generation.) The secondary fix, to MCWin64EH, is to make sure we don't silently miscompile if we hit a similar situation in the future. It's possible we could extend ARM64EmitUnwindInfo so it allows function bodies that contain assembler directives, but that's a lot more complicated; see the FIXME in MCWin64EH.cpp. Fixes https://bugs.llvm.org/show_bug.cgi?id=41581 . Differential Revision: https://reviews.llvm.org/D61095 llvm-svn: 359849
*	[SelectionDAG] Add asserts to verify the vectorness of input and output ↵	Craig Topper	2019-05-02	1	-0/+12
\| \| \| \| \| \| \| \| \| \|	types of TRUNCATE/ZERO_EXTEND/ANY_EXTEND/SIGN_EXTEND agree As a result of the underlying cause of PR41678 we created an ANY_EXTEND node with a scalar result type and v1i1 input type. Ideally we would have asserted for this instead of letting it go through to instruction selection and generate bad machine IR Differential Revision: https://reviews.llvm.org/D61463 llvm-svn: 359836
*	[DAGCombiner] try repeated fdiv divisor transform before building estimate ↵	Sanjay Patel	2019-05-02	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(2nd try) The original patch was committed at rL359398 and reverted at rL359695 because of infinite looping. This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop. Original commit message: This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 llvm-svn: 359793
*	[SelectionDAG] remove constant folding limitations based on FP exceptions	Sanjay Patel	2019-05-02	2	-27/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops), so it does not make sense to have them here in the DAG either. Nothing else in the backend tries to preserve exceptions (again outside of strict ops), so I don't see how this could have ever worked for real code that cares about FP exceptions. There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least partially) to preserve exceptions without even asking if the target supports FP exceptions. Those should be corrected in subsequent patches. Real support for FP exceptions requires several changes to handle the constrained/strict FP ops. Differential Revision: https://reviews.llvm.org/D61331 llvm-svn: 359791
*	Revert "[DAGCombiner] try repeated fdiv divisor transform before building ↵	Sanjay Patel	2019-05-01	1	-3/+3
\| \| \| \| \| \| \| \| \|	estimate" This reverts commit fb9a5307a94e6f1f850e4d89f79103b123f16279 (rL359398) because it can cause an infinite loop due to opposing combines. llvm-svn: 359695
*	DAG: allow DAG pointer size different from memory representation.	Tim Northover	2019-05-01	4	-47/+134
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation for supporting ILP32 on AArch64, this modifies the SelectionDAG builder code so that pointers are allowed to have a larger type when "live" in the DAG compared to memory. Pointers get zero-extended whenever they are loaded, and truncated prior to stores. In addition, a few not quite so obvious locations need updating: * A GEP that has not been marked inbounds needs to enforce the IR-documented 2s-complement wrapping at the memory pointer size. Inbounds GEPs are undefined if they overflow the address space, so no additional operations are needed. * Signed comparisons would give incorrect results if performed on the zero-extended values. This shouldn't affect CodeGen for now, but will become active when the AArch64 ILP32 support is committed. llvm-svn: 359676
*	[SelectionDAG] remove div-by-zero constant folding restriction	Sanjay Patel	2019-04-30	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We don't have this restriction in IR, so it should not be here either simply out of consistency. Code that wants to handle FP exceptions is expected to use the 'strict' variants of these nodes. We don't get the frem case because frem by 0.0 produces NaN (invalid), and that's the remaining check here (so the removed check for frem was dead code AFAIK). This is the only place in SDAG that uses "HasFPExceptions", so I think we should remove that entirely as a follow-up patch. llvm-svn: 359566
*	[TargetLowering] findOptimalMemOpLowering. NFCI.	Sjoerd Meijer	2019-04-30	2	-123/+119
\| \| \| \| \| \| \| \| \| \|	This was a local static funtion in SelectionDAG, which I've promoted to TargetLowering so that I can reuse it to estimate the cost of a memory operation in D59787. Differential Revision: https://reviews.llvm.org/D59766 llvm-svn: 359543
*	[AsmPrinter] Make AsmPrinter::HandlerInfo::Handler a unique_ptr	Fangrui Song	2019-04-30	1	-13/+13
\| \| \| \| \| \| \|	Handlers.clear() in AsmPrinter::doFinalization() will destroy these handlers. A unique_ptr makes the ownership clearer. llvm-svn: 359541
*	[TargetLowering] Change getOptimalMemOpType to take a function attribute list	Sjoerd Meijer	2019-04-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	The MachineFunction wasn't used in getOptimalMemOpType, but more importantly, this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType. This is the groundwork for the changes in D59766 and D59787, that allows implementation of TTI::getMemcpyCost. Differential Revision: https://reviews.llvm.org/D59785 llvm-svn: 359537
*	[DebugInfo] DW_OP_deref_size in PrologEpilogInserter.	Markus Lavin	2019-04-30	6	-3/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PrologEpilogInserter need to insert a DW_OP_deref_size before prepending a memory location expression to an already implicit expression to avoid having the existing expression act on the memory address instead of the value behind it. The reason for using DW_OP_deref_size and not plain DW_OP_deref is that big-endian targets need to read the right size as simply truncating a larger read would yield the wrong result (LSB bytes are not at the lower address). This re-commit fixes issues reported in the first one. Namely deref was inserted under wrong conditions and additionally the deref_size argument was incorrectly encoded. Differential Revision: https://reviews.llvm.org/D59687 llvm-svn: 359535
*	[DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the ↵	Zi Xuan Wu	2019-04-30	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	target when combine ISD::TRUNC node Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), if adde is not legal for the target. Even it's at type-legalize phase. Because adde is special and will not be legalized at operation-legalize phase later. This fixes: PR40922 https://bugs.llvm.org/show_bug.cgi?id=40922 Differential Revision: https://reviews.llvm.org//D60854 llvm-svn: 359532
*	computePolynomialFromPointer - add missing early-out return for non-pointer ↵	Simon Pilgrim	2019-04-29	1	-0/+1
\| \| \| \| \| \| \| \|	types. Reported in https://www.viva64.com/en/b/0629/ llvm-svn: 359486
*	[globalisel] Improve Legalizer debug output	Daniel Sanders	2019-04-29	2	-6/+62
\| \| \| \| \| \| \| \| \| \|	* LegalizeAction should be printed by name rather than number * Newly created instructions are incomplete at the point the observer first sees them. They are therefore recorded in a small vector and printed just before the legalizer moves on to another instruction. By this point, the instruction must be complete. llvm-svn: 359481
*	[DAG] Refactor DAGCombiner::ReassociateOps	Bjorn Pettersson	2019-04-29	1	-45/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Extract the logic for doing reassociations from DAGCombiner::reassociateOps into a helper function DAGCombiner::reassociateOpsCommutative, and use that helper to trigger reassociation on the original operand order, or the commuted operand order. Codegen is not identical since the operand order will be different when doing the reassociations for the commuted case. That causes some unfortunate churn in some test cases. Apart from that this should be NFC. Reviewers: spatel, craig.topper, tstellar Reviewed By: spatel Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61199 llvm-svn: 359476
*	[DebugInfo] Terminate more location-list ranges at the end of blocks	Jeremy Morse	2019-04-29	2	-20/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes PR40795, where constant-valued variable locations can "leak" into blocks placed at higher addresses. The root of this is that DbgEntityHistoryCalculator terminates all register variable locations at the end of each block, but not constant-value variable locations. Fixing this requires constant-valued DBG_VALUE instructions to be broadcast into all blocks where the variable location remains valid, as documented in the LiveDebugValues section of SourceLevelDebugging.rst, and correct termination in DbgEntityHistoryCalculator. Differential Revision: https://reviews.llvm.org/D59431 llvm-svn: 359426
*	[DAGCombiner] try repeated fdiv divisor transform before building estimate	Sanjay Patel	2019-04-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was originally part of D61028, but it's an independent diff. If we try the repeated divisor reciprocal transform before producing an estimate sequence, then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5 vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the full-precision division is only 3 cycle throughput, so that's probably the better perf default option and avoids problems from x86's inaccurate estimates. The last 2 tests show that users still have the option to override the defaults by using the function attributes for reciprocal estimates, but those patterns are potentially made faster by converting the vector ops (including ymm ops) to scalar math. Differential Revision: https://reviews.llvm.org/D61149 llvm-svn: 359398
*	[AsmPrinter] refactor to support %c w/ GlobalAddress'	Nick Desaulniers	2019-04-26	1	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Targets like ARM, MSP430, PPC, and SystemZ have complex behavior when printing the address of a MachineOperand::MO_GlobalAddress. Move that handling into a new overriden method in each base class. A virtual method was added to the base class for handling the generic case. Refactors a few subclasses to support the target independent %a, %c, and %n. The patch also contains small cleanups for AVRAsmPrinter and SystemZAsmPrinter. It seems that NVPTXTargetLowering is possibly missing some logic to transform GlobalAddressSDNodes for TargetLowering::LowerAsmOperandForConstraint to handle with "i" extended inline assembly asm constraints. Fixes: - https://bugs.llvm.org/show_bug.cgi?id=41402 - https://github.com/ClangBuiltLinux/linux/issues/449 Reviewers: echristo, void Reviewed By: void Subscribers: void, craig.topper, jholewinski, dschuff, jyknight, dylanmckay, sdardis, nemanjai, javed.absar, sbc100, jgravelle-google, eraman, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, jrtc27, atanasyan, jsji, llvm-commits, kees, tpimh, nathanchance, peter.smith, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D60887 llvm-svn: 359337
*	[DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI.	Simon Pilgrim	2019-04-26	1	-10/+11
\| \| \| \| \| \|	Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format. llvm-svn: 359326
*	[X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 ↵	Simon Pilgrim	2019-04-26	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	targets (PR40758) As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs). Differential Revision: https://reviews.llvm.org/D61068 llvm-svn: 359293
*	[GlobalISel] Fix inserting copies in the right position for reg definitions	Marcello Maggioni	2019-04-26	2	-12/+38
\| \| \| \| \| \| \| \| \| \| \| \| \|	When constrainRegClass is called if the constraining happens on a use the COPY needs to be inserted before the instruction that contains the MachineOperand, but if we are constraining a definition it actually needs to be added after the instruction. In addition, the COPY needs to have its operands flipped (in the use case we are copying from the old unconstrained register to the new constrained register, while in the definition case we are copying from the new constrained register that the instruction defines to the old unconstrained register). llvm-svn: 359282
*	[SelectionDAG][X86] Use stack load/store in PromoteIntRes_BITCAST when the ↵	Craig Topper	2019-04-25	1	-15/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	input needs to be be split and the output type is a vector. We had special case handling here, but it uses a scalar any_extend for the promotion then bitcasts to the final type. This won't split up the input data into multiple promoted elements like we need. This patch falls back to doing the conversion through memory. Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll changes. The changes to vector-half-conversions.ll are fixing a previously unknown miscompile from this issue. Differential Revision: https://reviews.llvm.org/D61114 llvm-svn: 359219