bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.	Elena Demikhovsky	2017-01-02	3	-9/+282
\| \| \| \| \| \| \| \| \| \| \| \|	X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost. In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426). * Shiffle-broadcast cost will be changed in Simon's upcoming patch. Differential Revision: https://reviews.llvm.org/D28118 llvm-svn: 290810
*	Reapply "[CodeGen] Fix invalid DWARF info on Win64"	Keno Fischer	2017-01-02	9	-16/+42
\| \| \| \| \| \| \|	This reapplies rL289013 (reverted in rL289014) with the fixes identified in D21731. Should hopefully pass the buildbots this time. llvm-svn: 290809
*	[selectiondag] Check PromotedFloats map during expansive checks.	Florian Hahn	2017-01-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: `PromotedFloats` needs to be checked in `DAGTypeLegalizer::PerformExpensiveChecks`. This patch fixes a few type legalization failures with expansive checks for ARM fp16 tests. Reviewers: baldrick, bogner, arsenm Subscribers: arsenm, aemerson, llvm-commits Differential Revision: https://reviews.llvm.org/D28187 llvm-svn: 290796
*	Fix an issue with isGuaranteedToTransferExecutionToSuccessor	Sanjoy Das	2016-12-31	1	-6/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm not sure if this was intentional, but today isGuaranteedToTransferExecutionToSuccessor returns true for readonly and argmemonly calls that may throw. This commit changes the function to not implicitly infer nounwind this way. Even if we eventually specify readonly calls as not throwing, isGuaranteedToTransferExecutionToSuccessor is not the best place to infer that. We should instead teach FunctionAttrs or some other such pass to tag readonly functions / calls as nounwind instead. llvm-svn: 290794
*	Avoid const_cast; NFC	Sanjoy Das	2016-12-31	1	-2/+3
\| \| \| \|	llvm-svn: 290793
*	[Inliner] remove unnecessary null checks from AddAlignmentAssumptions(); NFCI	Sanjay Patel	2016-12-31	1	-5/+3
\| \| \| \| \| \| \|	We bail out on the 1st line if the assumption cache is not set, so there's no need to check it after that. llvm-svn: 290787
*	[ValueTracking] make dominator tree requirement explicit for ↵	Sanjay Patel	2016-12-31	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isKnownNonNullFromDominatingCondition(); NFCI I don't think this hole is currently exposed, but I crashed regression tests for jump-threading and loop-vectorize after I added calls to isKnownNonNullAt() in InstSimplify as part of trying to solve PR28430: https://llvm.org/bugs/show_bug.cgi?id=28430 That's because they call into value tracking with a context instruction, but no other parts of the query structure filled in. For more background, see the discussion in: https://reviews.llvm.org/D27855 llvm-svn: 290786
*	[SmallPtrSet] Introduce a find primitive and rewrite count/erase in terms of it	Philip Reames	2016-12-31	1	-25/+0
\| \| \| \| \| \| \| \|	This was originally motivated by a compile time problem I've since figured out how to solve differently, but the cleanup seemed useful. We had the same logic - which essentially implemented find - in several places. By commoning them out, I can implement find and allow erase to be inlined at the call sites if profitable. Differential Revision: https://reviews.llvm.org/D28183 llvm-svn: 290779
*	[AVR] Optimize 16-bit ANDs with '1'	Dylan McKay	2016-12-31	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes PR 31345 Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28186 llvm-svn: 290778
*	[InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and ↵	Craig Topper	2016-12-31	1	-0/+2
\| \| \| \| \| \| \| \|	llvm.x86.avx512.vcomi.ss don't use the upper elements of their input. This was already done for the SSE/SSE2 version of the intrinsics. llvm-svn: 290776
*	[InstCombine][AVX-512] When turning intrinsics with masking into native IR, ↵	Craig Topper	2016-12-30	1	-9/+20
\| \| \| \| \| \| \| \|	don't emit a select if the mask is known to be all ones. This saves InstCombine the burden of having to optimize the select later. llvm-svn: 290774
*	Add a comment for a todo in LoopUnroll post cleanup	Philip Reames	2016-12-30	1	-0/+5
\| \| \| \|	llvm-svn: 290769
*	[LVI] Remove count/erase idiom in favor of checking result value of erase	Philip Reames	2016-12-30	1	-6/+2
\| \| \| \| \| \|	Minor compile time win. Avoids an additional O(N) scan in the case where we are removing an element and costs nothing when we aren't. llvm-svn: 290768
*	[MemDep] Handle gep with zeros for invariant.group	Piotr Padlewski	2016-12-30	1	-20/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: gep 0, 0 is equivalent to bitcast. LLVM canonicalizes it to getelementptr because it make SROA can then handle it. Simple case like void g(A &a) { z(a); if (glob) a.foo(); } void testG() { A a; g(a); } was not devirtualized with -fstrict-vtable-pointers because luck of handling for gep 0 in Memory Dependence Analysis Reviewers: dberlin, nlewycky, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28126 llvm-svn: 290763
*	[CVP] Adjust iteration order to reduce the amount of work required	Philip Reames	2016-12-30	1	-3/+8
\| \| \| \| \| \| \| \|	CVP doesn't care about the order of blocks visited, but by using a pre-order traversal over the graph we can a) not visit unreachable blocks and b) optimize as we go so that analysis of later blocks produce slightly more precise results. I noticed this via inspection and don't have a concrete example which points to the issue. llvm-svn: 290760
*	[LVI] Manually hoist computation from loop	Philip Reames	2016-12-30	1	-7/+12
\| \| \| \| \| \|	Minor compile time win. Not known to be a hot spot, just something I noticed while reading. llvm-svn: 290759
*	Caught a simple typo. I do not know of a way to test this, but it seems like ↵	Aaron Ballman	2016-12-30	1	-1/+1
\| \| \| \| \| \|	an unlikely thing to regress in the future. llvm-svn: 290757
*	[NewGVN] Remove unneeded newline from assertion message.	Davide Italiano	2016-12-30	1	-1/+1
\| \| \| \|	llvm-svn: 290755
*	[InstCombine] Address post-commit feedback	David Majnemer	2016-12-30	2	-2/+4
\| \| \| \|	llvm-svn: 290741
*	[libFuzzer] cleaner implementation of -print_pcs=1	Kostya Serebryany	2016-12-30	3	-7/+14
\| \| \| \|	llvm-svn: 290739
*	[LICM] When promoting scalars, allow inserting stores to thread-local allocas.	Michael Kuperstein	2016-12-30	1	-1/+2
\| \| \| \| \| \| \| \| \|	This is similar to the allocfn case - if an alloca is not captured, then it's necessarily thread-local. Differential Revision: https://reviews.llvm.org/D28170 llvm-svn: 290738
*	Use continuous boosting factor for complete unroll.	Dehao Chen	2016-12-30	1	-75/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously. The patch also simplified the code by reducing one parameter in UP. The patch only affects code-gen of two speccpu2006 benchmark: 445.gobmk binary size decreases 0.08%, no performance change. 464.h264ref binary size increases 0.24%, no performance change. Reviewers: mzolotukhin, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26989 llvm-svn: 290737
*	[LICM] Remove unneeded tracking of whether changes were made. NFC.	Michael Kuperstein	2016-12-30	1	-9/+7
\| \| \| \| \| \| \| \|	"Changed" doesn't actually change within the loop, so there's no reason to keep track of it - we always return false during analysis and true after the transformation is made. llvm-svn: 290735
*	[LICM] Make logic in promoteLoopAccessesToScalars easier to follow. NFC.	Michael Kuperstein	2016-12-30	1	-40/+47
\| \| \| \|	llvm-svn: 290734
*	[InstCombine] More thoroughly canonicalize the position of zexts	David Majnemer	2016-12-30	2	-9/+120
\| \| \| \| \| \| \| \|	We correctly canonicalized (add (sext x), (sext y)) to (sext (add x, y)) where possible. However, we didn't perform the same canonicalization for zexts or for muls. llvm-svn: 290733
*	[AVR] Optimize 16-bit ORs with '0'	Dylan McKay	2016-12-30	1	-12/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes PR 31344 Authored by Anmol P. Paralkar Reviewers: dylanmckay Subscribers: fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D28121 llvm-svn: 290732
*	Simplify FunctionLoweringInfo.cpp with range for loops	Reid Kleckner	2016-12-30	1	-40/+31
\| \| \| \| \| \| \|	I'm preparing to add some pattern matching code here, so simplify the code before I do. NFC llvm-svn: 290731
*	Include <algorithm> for std::max etc	Reid Kleckner	2016-12-30	1	-0/+1
\| \| \| \|	llvm-svn: 290730
*	[LICM] Compute exit blocks for promotion eagerly. NFC.	Michael Kuperstein	2016-12-29	1	-35/+36
\| \| \| \| \| \| \| \| \| \| \|	This moves the exit block and insertion point computation to be eager, instead of after seeing the first scalar we can promote. The cost is relatively small (the computation happens anyway, see discussion on D28147), and the code is easier to follow, and can bail out earlier if there's a catchswitch present. llvm-svn: 290729
*	[LICM] Don't try to promote in loops where we have no chance to promote. NFC.	Michael Kuperstein	2016-12-29	1	-10/+6
\| \| \| \| \| \| \| \| \| \|	We would check whether we have a prehader or dedicated exit blocks, and go into the promotion loop. Then, for each alias set we'd check if we have a preheader and dedicated exit blocks, and bail if not. Instead, bail immediately if we don't have both. llvm-svn: 290728
*	[LICM] Only recompute LCSSA when we actually promoted something.	Michael Kuperstein	2016-12-29	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \|	We want to recompute LCSSA only when we actually promoted a value. This means we only need to look at changes made by promotion when deciding whether to recompute it or not, not at regular sinking/hoisting. (This was what the code was documented as doing, just not what it did) Hopefully NFC. llvm-svn: 290726
*	NewGVN: Fix PR 31491 by ensuring that we touch the right instructions. ↵	Daniel Berlin	2016-12-29	1	-11/+21
\| \| \| \| \| \|	Change to one based numbering so we can assert we don't cause the same bug again. llvm-svn: 290724
*	[ADT] Delete RefCountedBaseVPTR.	Justin Lebar	2016-12-29	2	-15/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This class is unnecessary. Its comment indicated that it was a compile error to allocate an instance of a class that inherits from RefCountedBaseVPTR on the stack. This may have been true at one point, but it's not today. Moreover you really do not want to allocate any refcounted object on the stack, vptrs or not, so if we did have a way to prevent these objects from being stack-allocated, we'd want to apply it to regular RefCountedBase too, obviating the need for a separate RefCountedBaseVPTR class. It seems that the main way RefCountedBaseVPTR provides safety is by making its subclass's destructor virtual. This may have been helpful at one point, but these days clang will emit an error if you define a class with virtual functions that inherits from RefCountedBase but doesn't have a virtual destructor. Reviewers: compnerd, dblaikie Subscribers: cfe-commits, klimek, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D28162 llvm-svn: 290717
*	Revert "[COFF] Use 32-bit jump table entries in .rdata for Win64"	Reid Kleckner	2016-12-29	3	-33/+0
\| \| \| \| \| \| \| \|	This reverts commit r290694. It broke sanitizer tests on Win64. I'll probably bring this back, but the jump tables will just live in .text like they do for MSVC. llvm-svn: 290714
*	[TBAAVerifier] Be stricter around verifying scalar nodes	Sanjoy Das	2016-12-29	1	-24/+21
\| \| \| \| \| \| \| \| \| \| \|	This fixes the issue exposed in PR31393, where we weren't trying sufficiently hard to diagnose bad TBAA metadata. This does reduce the variety in the error messages we print out, but I think the tradeoff of verifying more, simply and quickly overrules the need for more helpful error messags here. llvm-svn: 290713
*	[TBAAVerifier] Make things const-consistent; NFC	Sanjoy Das	2016-12-29	1	-6/+6
\| \| \| \|	llvm-svn: 290712
*	[TBAAVerifier] Memoize validity of scalar tbaa nodes; NFCI	Sanjoy Das	2016-12-29	1	-5/+14
\| \| \| \|	llvm-svn: 290711
*	[AMDGPU][mc] Enable absolute expressions in .hsa_code_object_isa directive	Artem Tamazov	2016-12-29	1	-12/+17
\| \| \| \| \| \| \| \| \| \| \|	Among other stuff, this allows to use predefined .option.machine_version_major /minor/stepping symbols in the directive. Relevant test expanded at once (also file renamed for clarity). Differential Revision: https://reviews.llvm.org/D28140 llvm-svn: 290710
*	Introduce element-wise atomic memcpy intrinsic	Igor Laevsky	2016-12-29	3	-0/+94
\| \| \| \| \| \| \| \| \| \|	This change adds a new intrinsic which is intended to provide memcpy functionality with additional atomicity guarantees. Please refer to the review thread or language reference for further details. Differential Revision: https://reviews.llvm.org/D27133 llvm-svn: 290708
*	[InstCombine] Use getVectorNumElements instead of explicitly casting to ↵	Craig Topper	2016-12-29	1	-8/+7
\| \| \| \| \| \|	VectorType and calling getNumElements. NFC llvm-svn: 290707
*	[InstCombine] Fix typo in comment. NFC	Craig Topper	2016-12-29	1	-1/+1
\| \| \| \|	llvm-svn: 290706
*	[InstCombine] Use a 32-bits instead of 64-bits for storing the number of ↵	Craig Topper	2016-12-29	1	-2/+2
\| \| \| \| \| \|	elements in VectorType for a ShuffleVector. While there getVectorNumElements to avoid an explicit cast. NFC llvm-svn: 290705
*	[InstCombine][X86] If the lowest element of a scalar intrinsic isn't used ↵	Craig Topper	2016-12-29	1	-6/+18
\| \| \| \| \| \| \| \|	make sure we add it to the worklist so we can DCE it sooner. We bypassed the intrinsic and returned the passthru operand, but we should also add the intrinsic to the worklist since its now dead. This can allow DCE to find it sooner and remove it. Similar was done for InsertElement when the inserted element isn't demanded. llvm-svn: 290704
*	[libFuzzer] make __sanitizer_cov_trace_switch more predictable	Kostya Serebryany	2016-12-29	2	-24/+19
\| \| \| \|	llvm-svn: 290703
*	NewGVN: Sort Dominator Tree in RPO order, and use that for generating order.	Daniel Berlin	2016-12-29	1	-4/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The optimal iteration order for this problem is RPO order. We want to process as many preds of a backedge as we can before we process the backedge. At the same time, as we add predicate handling, we want to be able to touch instructions that are dominated by a given block by ranges (because a change in value numbering a predicate possibly affects all users we dominate that are using that predicate). If we don't do it this way, we can't do value inference over backedges (the paper covers this in depth). The newgvn branch currently overshoots the last part, and guarantees that it will touch at least the right set of instructions, but it does touch more. This is because the bitvector instruction ranges are currently generated in RPO order (so we take the max and the min of the ranges of dominated blocks, which means there are some in the middle we didn't have to touch that we did). We can do better by sorting the dominator tree, and then just using dominator tree order. As a preliminary, the dominator tree has some RPO guarantees, but not enough. It guarantees that for a given node, your idom must come before you in the RPO ordering. It guarantees no relative RPO ordering for siblings. We add siblings in whatever order they appear in the module. So that is what we fix. We sort the children array of the domtree into RPO order, and then use the dominator tree for ordering, instead of RPO, since the dominator tree is now a valid RPO ordering. Note: This would help any other pass that iterates a forward problem in dominator tree order. Most of them are single pass. It will still maximize whatever result they compute. We could also build the dominator tree in this order, but our incremental updates would still put it out of sort order, and recomputing the sort order is almost as hard as general incremental updates of the domtree. Also note that the sorting does not affect any tests, etc. Nothing depends on domtree order, including the verifier, the equals functions for domtree nodes, etc. How much could this matter, you ask? Here are the current numbers. This is generated by running NewGVN over all files in LLVM. Note that once we propagate equalities, the differences go up by an order of magnitude or two (IE instead of 29, the max ends up in the thousands, since the worst case we add a factor of N, where N is the number of branch predicates). So while it doesn't look that stark for the default ordering, it gets much much worse. There are also programs in the wild where the difference is already pretty stark (2 iterations vs hundreds). RPO ordering: 759040 Number of iterations is 1 112908 Number of iterations is 2 Default dominator tree ordering: 755081 Number of iterations is 1 116234 Number of iterations is 2 603 Number of iterations is 3 27 Number of iterations is 4 2 Number of iterations is 5 1 Number of iterations is 7 Dominator tree sorted: 759040 Number of iterations is 1 112908 Number of iterations is 2 <yay!> Really bad ordering (sort domtree siblings in postorder. not quite the worst possible, but yeah): 754008 Number of iterations is 1 21 Number of iterations is 10 8 Number of iterations is 11 6 Number of iterations is 12 5 Number of iterations is 13 2 Number of iterations is 14 2 Number of iterations is 15 3 Number of iterations is 16 1 Number of iterations is 17 2 Number of iterations is 18 96642 Number of iterations is 2 1 Number of iterations is 20 2 Number of iterations is 21 1 Number of iterations is 22 1 Number of iterations is 29 17266 Number of iterations is 3 2598 Number of iterations is 4 798 Number of iterations is 5 273 Number of iterations is 6 186 Number of iterations is 7 80 Number of iterations is 8 42 Number of iterations is 9 Reviewers: chandlerc, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28129 llvm-svn: 290699
*	Add a static_assert about the sizeof(GlobalValue)	Reid Kleckner	2016-12-29	1	-0/+7
\| \| \| \| \| \| \| \|	I added one for Value back in r262045, and I'm starting to think we should have these for any class with bitfields whose memory efficiency really matters. llvm-svn: 290698
*	Update equalsStoreHelper for the fact that only one branch can be true	Daniel Berlin	2016-12-29	1	-4/+5
\| \| \| \|	llvm-svn: 290697
*	[COFF] Use 32-bit jump table entries in .rdata for Win64	Reid Kleckner	2016-12-29	3	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We were already using 32-bit jump table entries, but this was a consequence of the default PIC model on Win64, and not an intentional design decision. This patch ensures that we always use 32-bit label difference jump table entries on Win64 regardless of the PIC model. This is a good idea because it saves executable size and object file size. Moving the jump tables to .rdata cleans up the disassembled object code and reduces the available ROP targets, but it requires adding one more RIP-relative lea to the code. COFF doesn't have relocations to express the difference between two arbitrary symbols, so we can't use the jump table label in the label difference like we do elsewhere. Fixes PR31488 Reviewers: majnemer, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28141 llvm-svn: 290694
*	Change Metadata Index emission in the bitcode to use 2x32 bits for the ↵	Mehdi Amini	2016-12-28	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	placeholder The Bitstream reader and writer are limited to handle a "size_t" at most, which means that we can't backpatch and read back a 64bits value on 32 bits platform. llvm-svn: 290693
*	Revert "[NewGVN] replace emplace_back with push_back"	Piotr Padlewski	2016-12-28	1	-7/+7
\| \| \| \|	llvm-svn: 290692