bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[MC] Fix bad indentation and 80 column violations. Use StringRef::front ↵	Craig Topper	2018-09-25	1	-29/+34
\| \| \| \| \| \|	instead of dereferencing StringRef::begin. NFC llvm-svn: 343010
*	[x86] avoid 256-bit andnp that requires insert/extract with AVX1 (PR37449)	Sanjay Patel	2018-09-25	2	-1/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the final (I hope!) problem pattern mentioned in PR37749: https://bugs.llvm.org/show_bug.cgi?id=37749 We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops. We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op. The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test, we have this vector-type-legalized sequence: t29: v8i32 = concat_vectors t27, t28 t30: v4i64 = bitcast t29 t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ... t31: v4i64 = bitcast t18 t32: v4i64 = xor t30, t31 t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ... t34: v4i64 = bitcast t9 t35: v4i64 = and t32, t34 t36: v8i32 = bitcast t35 t37: v4i32 = extract_subvector t36, Constant:i64<0> t38: v4i32 = extract_subvector t36, Constant:i64<4> Differential Revision: https://reviews.llvm.org/D52318 llvm-svn: 343008
*	[WebAssembly] Move/clone DBG_VALUE during WebAssemblyRegStackify pass	Yury Delendik	2018-09-25	1	-0/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The MoveForSingleUse or MoveAndTeeForMultiUse functions move wasm instructions, however DBG_VALUE stay unchanged -- moving or cloning these. Reviewers: dschuff Reviewed By: dschuff Subscribers: mattd, MatzeB, dschuff, sbc100, jgravelle-google, aheejin, sunfish, llvm-commits, aardappel Tags: #debug-info Differential Revision: https://reviews.llvm.org/D49034 llvm-svn: 343007
*	Revert "[ConstHoist] Do not rebase single (or few) dependent constant"	Jessica Paquette	2018-09-25	1	-46/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	This caused a couple test failures on a bot: CodeGen/X86/constant-hoisting-bfi.ll Transforms/ConstantHoisting/X86/ehpad.ll Example: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/53575/ llvm-svn: 343005
*	[RegAllocGreedy] avoid using physreg candidates that cannot be correctly spilled	Daniil Fukalov	2018-09-25	2	-9/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For the AMDGPU target if a MBB contains exec mask restore preamble, SplitEditor may get state when it cannot insert a spill instruction. E.g. for a MIR bb.100: %1 = S_OR_SAVEEXEC_B64 %2, implicit-def $exec, implicit-def $scc, implicit $exec and if the regalloc will try to allocate a virtreg to the physreg already assigned to virtreg %1, it should insert spill instruction before the S_OR_SAVEEXEC_B64 instruction. But it is not possible since can generate incorrect code in terms of exec mask. The change makes regalloc to ignore such physreg candidates. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D52052 llvm-svn: 343004
*	[MC] Replace NULL constant in code with nullptr.	Craig Topper	2018-09-25	1	-1/+1
\| \| \| \|	llvm-svn: 343003
*	[ConstHoist] Do not rebase single (or few) dependent constant	Zhaoshi Zheng	2018-09-25	1	-22/+46
\| \| \| \| \| \| \| \| \| \| \|	If an instance (InsertionPoint or IP) of Base constant A has only one or few rebased constants depending on it, do NOT rebase. One extra ADD instruction is required to materialize each rebased constant, assuming A and the rebased have the same materialization cost. Differential Revision: https://reviews.llvm.org/D52243 llvm-svn: 342994
*	Revert "[DebugInfo] Do not generate address info for removed debug labels."	Justin Bogner	2018-09-25	1	-3/+4
\| \| \| \| \| \| \| \| \| \|	The added test is failing on macOS: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/53550/ This reverts r342943. llvm-svn: 342993
*	[X86] Add AVX512 support to combineVectorSizedSetCCEquality.	Craig Topper	2018-09-25	1	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52424 llvm-svn: 342989
*	[InstCombine] narrow binops on concatenated vectors (PR33026)	Sanjay Patel	2018-09-25	1	-6/+28
\| \| \| \| \| \| \| \| \|	The motivating case from: https://bugs.llvm.org/show_bug.cgi?id=33026 ...has no shuffles now. This kind of pattern may occur during vectorization when targets have lumpy ISAs like SSE/AVX. llvm-svn: 342988
*	[ARM] Share predecessor bookkeeping in CombineBaseUpdate. NFCI.	Nirav Dave	2018-09-25	1	-2/+9
\| \| \| \|	llvm-svn: 342987
*	[AArch64] Share search bookkeeping in combines. NFCI.	Nirav Dave	2018-09-25	1	-15/+17
\| \| \| \| \| \| \| \|	Share predecessor search bookkeeping in both perform PostLD1Combine and performNEONPostLDSTCombine. This should be approximately a 4x and 2x performance improvement. llvm-svn: 342986
*	[LegalizeDAG] Prune Predecessor check in ↵	Nirav Dave	2018-09-25	1	-0/+1
\| \| \| \| \| \|	ExpandExtractFromVectorThroughStack. NFCI. llvm-svn: 342985
*	[DAGCombine] Improve Predecessor check in SimplifySelectOps. NFCI.	Nirav Dave	2018-09-25	1	-4/+36
\| \| \| \| \| \| \|	Reuse search space bookkeeping across multiple predecessor checks qdone to avoid redundancy. This should cut search cost by ~4x. llvm-svn: 342984
*	[DAGCombine] Share predecessor bookkeeping in CombineToPostIndexedLoadStore. ↵	Nirav Dave	2018-09-25	1	-2/+9
\| \| \| \| \| \|	NFCI. llvm-svn: 342983
*	[DAGCombine] Don't fold dependent loads across SELECT_CC.	Nirav Dave	2018-09-25	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DAGCombine will try to fold two loads that feed a SELECT or SELECT_CC after the select, resulting in a select of an address and a single load after. If either of the loads depend on the other, this is not legal as it could introduce cycles. However, it only checked this if the opcode was a SELECT, and not for a SELECT_CC. Unfortunately, the only reproducer I have for this is for our downstream target. I've tried getting it to trigger on an upstream one but haven't been successful. Patch thanks to Bevin Hansson. llvm-svn: 342980
*	Revert rL342916: [X86] Remove shift/rotate by CL memory (RMW) overrides	Simon Pilgrim	2018-09-25	5	-27/+81
\| \| \| \| \| \| \| \|	As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead. The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342969
*	[LoopUnroll] Add check to Latch's terminator in UnrollRuntimeLoopRemainder	David Green	2018-09-25	1	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	In this patch, I'm adding an extra check to the Latch's terminator in llvm::UnrollRuntimeLoopRemainder, similar to how it is already done in the llvm::UnrollLoop. The compiler would crash if this function is called with a malformed loop. Patch by Rodrigo Caetano Rocha! Differential Revision: https://reviews.llvm.org/D51486 llvm-svn: 342958
*	[AMDGPU] restore r342722 which was reverted with r342743	Sameer Sahasrabuddhe	2018-09-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[AMDGPU] lower-switch in preISel as a workaround for legacy DA Summary: The default target of the switch instruction may sometimes be an "unreachable" block, when it is guaranteed that one of the cases is always taken. The dominator tree concludes that such a switch instruction does not have an immediate post dominator. This confuses divergence analysis, which is unable to propagate sync dependence to the targets of the switch instruction. As a workaround, the AMDGPU target now invokes lower-switch as a preISel pass. LowerSwitch is designed to handle the unreachable default target correctly, allowing the divergence analysis to locate the correct immediate dominator of the now-lowered switch. llvm-svn: 342956
*	[mips] Correct MUL pattern for mips64	Stefan Maksimovic	2018-09-25	2	-1/+4
\| \| \| \| \| \| \| \|	Guard existing pattern with a predicate, introduce a new one for revision 6. Differential Revision: https://reviews.llvm.org/D51684 llvm-svn: 342946
*	Use unique_ptr to hold AsmInfo,MRI,MII,STI	Fangrui Song	2018-09-25	3	-12/+8
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: pcc, dblaikie Reviewed By: dblaikie Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52389 llvm-svn: 342945
*	Use TRI->regsOverlap() in MachineBasicBlock::computeRegisterLiveness	Mikael Holmen	2018-09-25	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For the loop that used MCRegAliasIterator this should be NFC. For the loop that previously used MCSubRegIterator we should now detect more cases where the register is actually live out that we previously missed. Reviewers: MatzeB, arsenm Reviewed By: MatzeB Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D52410 llvm-svn: 342944
*	[DebugInfo] Do not generate address info for removed debug labels.	Hsiangkai Wang	2018-09-25	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	In some senario, LLVM will remove llvm.dbg.labels in IR. For example, when the labels are in unreachable blocks, these labels will not be generated in LLVM IR. In the case, these debug labels will have address zero as their address. It is not legal address for debugger to set breakpoints or query sources. So, the patch inhibits the address info (DW_AT_low_pc) of removed labels. Differential Revision: https://reviews.llvm.org/D51908 llvm-svn: 342943
*	[MachineCopyPropagation] Reimplement CopyTracker in terms of register units	Justin Bogner	2018-09-25	1	-54/+58
\| \| \| \| \| \| \| \| \| \| \| \|	Change the copy tracker to keep a single map of register units instead of 3 maps of registers. This gives a very significant compile time performance improvement to the pass. I measured a 30-40% decrease in time spent in MCP on x86 and AArch64 and much more significant improvements on out of tree targets with more registers. Differential Revision: https://reviews.llvm.org/D52374 llvm-svn: 342942
*	Revert "[ORC] Switch to asynchronous resolution in JITSymbolResolver."	Lang Hames	2018-09-25	7	-145/+89
\| \| \| \| \| \| \| \| \|	This reverts commit r342939. MSVC's promise/future implementation does not like types that are not default constructible. Reverting while I figure out a solution. llvm-svn: 342941
*	[MachineCopyPropagation] Rework how we manage RegMask clobbers	Justin Bogner	2018-09-25	1	-35/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of updating the CopyTracker's maps each time we come across a RegMask, defer checking for this kind of interference until we're actually trying to propagate a copy. This avoids the need to repeatedly iterate over maps in the cases where we don't end up doing any work. This is a slight compile time improvement for MachineCopyPropagation as is, but it also enables a much bigger improvement that I'll follow up with soon. Differential Revision: https://reviews.llvm.org/D52370 llvm-svn: 342940
*	[ORC] Switch to asynchronous resolution in JITSymbolResolver.	Lang Hames	2018-09-25	7	-89/+145
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Asynchronous resolution (where the caller receives a callback once the requested set of symbols are resolved) is a core part of the new concurrent ORC APIs. This change extends the asynchronous resolution model down to RuntimeDyld, which is necessary to prevent deadlocks when compiling/linking on a fixed number of threads: If RuntimeDyld's linking process were a blocking operation, then any complete K-graph in a program will require at least K threads to link in the worst case, as each thread would block waiting for all the others to complete. Using callbacks instead allows the work to be passed between dependent threads until it is complete. For backwards compatibility, all existing RuntimeDyld functions will continue to operate in blocking mode as before. This change will enable the introduction of a new async finalization process in a subsequent patch to enable asynchronous JIT linking. llvm-svn: 342939
*	[WebAssembly] SIMD sqrt	Thomas Lively	2018-09-25	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D52387 llvm-svn: 342937
*	[X86] Don't create FILD ISD nodes when X87 is disabled.	Craig Topper	2018-09-25	1	-1/+2
\| \| \| \| \| \| \| \|	The included test case previously asserted because the type legalizer tried to soften the FILD ISD node. Fixes PR38819. llvm-svn: 342934
*	[X86] Remove superfluous curly braces. NFC	Craig Topper	2018-09-25	1	-2/+1
\| \| \| \|	llvm-svn: 342933
*	[X86] Update comment. Use 'glued' instead of 'flagged' NFC	Craig Topper	2018-09-25	1	-1/+1
\| \| \| \|	llvm-svn: 342932
*	[CUDA] Added basic support for compiling with CUDA-10.0	Artem Belevich	2018-09-24	1	-0/+5
\| \| \| \|	llvm-svn: 342924
*	[hwasan] Record and display stack history in stack-based reports.	Evgeniy Stepanov	2018-09-24	1	-35/+147
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Display a list of recent stack frames (not a stack trace!) when tag-mismatch is detected on a stack address. The implementation uses alignment tricks to get both the address of the history buffer, and the base address of the shadow with a single 8-byte load. See the comment in hwasan_thread_list.h for more details. Developed in collaboration with Kostya Serebryany. Reviewers: kcc Subscribers: srhines, kubamracek, mgorny, hiraditya, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52249 llvm-svn: 342923
*	Revert "[hwasan] Record and display stack history in stack-based reports."	Evgeniy Stepanov	2018-09-24	1	-147/+35
\| \| \| \| \| \|	This reverts commit r342921: test failures on clang-cmake-arm* bots. llvm-svn: 342922
*	[hwasan] Record and display stack history in stack-based reports.	Evgeniy Stepanov	2018-09-24	1	-35/+147
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Display a list of recent stack frames (not a stack trace!) when tag-mismatch is detected on a stack address. The implementation uses alignment tricks to get both the address of the history buffer, and the base address of the shadow with a single 8-byte load. See the comment in hwasan_thread_list.h for more details. Developed in collaboration with Kostya Serebryany. Reviewers: kcc Subscribers: srhines, kubamracek, mgorny, hiraditya, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52249 llvm-svn: 342921
*	Re-submitting changes in D51550 because it failed to patch.	Christy Lee	2018-09-24	1	-28/+57
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: javed.absar, trentxintong, courbet Reviewed By: trentxintong Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52433 llvm-svn: 342919
*	[InstCombine] add bitcast+extelt helper function; NFC	Sanjay Patel	2018-09-24	1	-14/+26
\| \| \| \| \| \| \| \|	We can handle patterns where the elements have different sizes, so refactoring ahead of trying to add another blob within these clauses. llvm-svn: 342918
*	[X86] Remove shift/rotate by CL memory (RMW) overrides	Simon Pilgrim	2018-09-24	5	-81/+27
\| \| \| \| \| \|	The uops are slightly different to the register variant, so requires a +1uop tweak llvm-svn: 342916
*	[X86] Infer 64bit feature support from the CPUID results in getHostCPUFeatures.	Craig Topper	2018-09-24	1	-0/+2
\| \| \| \| \| \|	After r341022, we more strictly check the 64bit feature in X86Subtargets constructor when a 64-bit triple is used. If we don't infer this feature for autodetected CPUs we might incorrectly report an error if the CPU name wasn't autodetected to a CPU that supports 64-bit. llvm-svn: 342914
*	[Power9] [LLVM] Add __float128 exponent GET and SET builtins	Stefan Pintilie	2018-09-24	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Added __builtin_vsx_scalar_extract_expq __builtin_vsx_scalar_insert_exp_qp Builtins should behave the same way as in GCC. Differential Revision: https://reviews.llvm.org/D48185 llvm-svn: 342910
*	[Analysis] add comment to generalize finding a scalar op from vector; NFC	Sanjay Patel	2018-09-24	1	-3/+4
\| \| \| \|	llvm-svn: 342906
*	[X86] Remove WriteDiv/WriteIDiv schedule overrides - use classes directly. NFCI.	Simon Pilgrim	2018-09-24	4	-125/+70
\| \| \| \| \| \| \| \|	We're missing quite a bit of data for these instruction, removing the overrides makes this obvious - inconsistent reg/mem variants is a concern as well. Also, we have Divider resources (HWDivider etc.) but they aren't actually used consistently. llvm-svn: 342904
*	[InstCombine] improve variable name and use 'match'; NFC	Sanjay Patel	2018-09-24	1	-13/+15
\| \| \| \| \| \| \| \| \| \| \|	'width' of a vector usually refers to the bit-width. https://bugs.llvm.org/show_bug.cgi?id=39016 shows a case where we could extend this fold to handle a case where the number of elements in the bitcasted vector is not equal to the resulting value. llvm-svn: 342902
*	[ARM] Adjust the cost model for Exynos	Evandro Menezes	2018-09-24	1	-2/+2
\| \| \| \| \| \| \|	Tune `MaxInterleaveFactor` and `LdStMultipleTiming`and remove `PartialUpdateClearance` for the Exynos processors. llvm-svn: 342900
*	[ARM] Adjust the feature set for Exynos	Evandro Menezes	2018-09-24	1	-0/+2
\| \| \| \| \| \|	Enable crypto and literals fusion for the Exynos processors. llvm-svn: 342899
*	[Thumb1] Any imm8 should have cost of 1	Zhaoshi Zheng	2018-09-24	1	-2/+2
\| \| \| \| \| \| \| \| \|	A simple MOVS rd, imm8 can materialize [-128, 127] in signed i8 type or [0, 255] in unsigned i8 type on Thumb1. Differential Revision: https://reviews.llvm.org/D52257 llvm-svn: 342898
*	[New PM][PassInstrumentation] IR printing support for New Pass Manager	Fedor Sergeev	2018-09-24	6	-12/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implementing -print-before-all/-print-after-all/-filter-print-func support through PassInstrumentation callbacks. - PrintIR routines implement printing callbacks. - StandardInstrumentations class provides a central place to manage all the "standard" in-tree pass instrumentations. Currently it registers PrintIR callbacks. Reviewers: chandlerc, paquette, philip.pfaffe Differential Revision: https://reviews.llvm.org/D50923 llvm-svn: 342896
*	[X86] Split WriteIMul into 8/16/32/64 implementations (PR36931)	Simon Pilgrim	2018-09-24	11	-373/+183
\| \| \| \| \| \| \| \|	Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases. This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions. llvm-svn: 342892
*	[Arm][AsmParser] Restrict register list size for VSTM/VLDM	Luke Cheeseman	2018-09-24	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	- The assembler accepts VSTM/VLDM with register lists (specifically double registers lists) with more than 16 registers specified - The Arm architecture reference manual says this instruction must not contain more than 16 registers when the registers are doubleword registers - This addresses one of the concerns in https://bugs.llvm.org/show_bug.cgi?id=38389 Differential Revision: https://reviews.llvm.org/D52082 llvm-svn: 342891
*	[DAGCombiner] use UADDO to optimize saturated unsigned add	Sanjay Patel	2018-09-24	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886