bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Revert "clang-misexpect: Profile Guided Validation of Performance ↵	Petr Hosek	2019-09-10	5	-208/+8
\| \| \| \| \| \| \| \|	Annotations in LLVM" This reverts commit r371484: this broke sanitizer-x86_64-linux-fast bot. llvm-svn: 371488
*	clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM	Petr Hosek	2019-09-10	5	-8/+208
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch contains the basic functionality for reporting potentially incorrect usage of __builtin_expect() by comparing the developer's annotation against a collected PGO profile. A more detailed proposal and discussion appears on the CFE-dev mailing list (http://lists.llvm.org/pipermail/cfe-dev/2019-July/062971.html) and a prototype of the initial frontend changes appear here in D65300 We revised the work in D65300 by moving the misexpect check into the LLVM backend, and adding support for IR and sampling based profiles, in addition to frontend instrumentation. We add new misexpect metadata tags to those instructions directly influenced by the llvm.expect intrinsic (branch, switch, and select) when lowering the intrinsics. The misexpect metadata contains information about the expected target of the intrinsic so that we can check against the correct PGO counter when emitting diagnostics, and the compiler's values for the LikelyBranchWeight and UnlikelyBranchWeight. We use these branch weight values to determine when to emit the diagnostic to the user. A future patch should address the comment at the top of LowerExpectIntrisic.cpp to hoist the LikelyBranchWeight and UnlikelyBranchWeight values into a shared space that can be accessed outside of the LowerExpectIntrinsic pass. Once that is done, the misexpect metadata can be updated to be smaller. In the long term, it is possible to reconstruct portions of the misexpect metadata from the existing profile data. However, we have avoided this to keep the code simple, and because some kind of metadata tag will be required to identify which branch/switch/select instructions are influenced by the use of llvm.expect Patch By: paulkirth Differential Revision: https://reviews.llvm.org/D66324 llvm-svn: 371484
*	[LoopVectorize] Leverage speculation safety to avoid masked.loads	Philip Reames	2019-09-09	1	-4/+85
\| \| \| \| \| \| \| \| \| \| \| \|	If we're vectorizing a load in a predicated block, check to see if the load can be speculated rather than predicated. This allows us to generate a normal vector load instead of a masked.load. To do so, we must prove that all bytes accessed on any iteration of the original loop are dereferenceable, and that all loads (across all iterations) are properly aligned. This is equivelent to proving that hoisting the load into the loop header in the original scalar loop is safe. Note: There are a couple of code motion todos in the code. My intention is to wait about a day - to be sure this sticks - and then perform the NFC motion without furthe review. Differential Revision: https://reviews.llvm.org/D66688 llvm-svn: 371452
*	[InstCombine] fold extract+insert into identity shuffle	Sanjay Patel	2019-09-08	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is similar to the existing fold for splats added with: rL365379 If we can adjust the shuffle mask to include another element in an identity mask (if it changes vector length, that's an extract/insert subvector operation in the backend), then that can eliminate extractelement/insertelement pairs in IR. All targets are expected to lower shuffles with identity masks efficiently. llvm-svn: 371340
*	Fix typo. NFCI	Simon Pilgrim	2019-09-07	1	-1/+1
\| \| \| \|	llvm-svn: 371317
*	[SimplifyCFG] SpeculativelyExecuteBB(): It's SpeculatedInstructions, not ↵	Roman Lebedev	2019-09-07	1	-7/+7
\| \| \| \| \| \| \| \| \|	SpeculationCost It counts the number of instructions we are ok speculating (at most 1 there), not their cost, so rename accordingly. llvm-svn: 371294
*	[Attributor] ValueSimplify Abstract Attribute	Hideto Ueno	2019-09-07	1	-4/+269
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch introduces initial `AAValueSimplify` which simplifies a value in a context. example - (for function returned) If all the return values are the same and constant, then we can replace callsite returned with the constant. - If an internal function takes the same value(constant) as an argument in the callsite, then we can replace the argument with that constant. Reviewers: jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66967 llvm-svn: 371291
*	Change TargetLibraryInfo analysis passes to always require Function	Teresa Johnson	2019-09-07	39	-141/+209
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is the first change to enable the TLI to be built per-function so that -fno-builtin* handling can be migrated to use function attributes. See discussion on D61634 for background. This is an enabler for fixing handling of these options for LTO, for example. This change should not affect behavior, as the provided function is not yet used to build a specifically per-function TLI, but rather enables that migration. Most of the changes were very mechanical, e.g. passing a Function to the legacy analysis pass's getTLI interface, or in Module level cases, adding a callback. This is similar to the way the per-function TTI analysis works. There was one place where we were looking for builtins but not in the context of a specific function. See FindCXAAtExit in lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround could provide the wrong behavior in some corner cases. Suggestions welcome. Reviewers: chandlerc, hfinkel Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66428 llvm-svn: 371284
*	[InstCombine] Refactor substitution of instruction in the parent BB (NFC)	Evandro Menezes	2019-09-06	1	-14/+9
\| \| \| \| \| \| \| \| \|	Add the new method `LibCallSimplifier::substituteInParent()` that calls `LibCallSimplifier::replaceAllUsesWith()' and `LibCallSimplifier::eraseFromParent()` back to back, simplifying the resulting code. llvm-svn: 371264
*	[SimplifyLibCalls] handle pow(x,-0.0) before it can assert (PR43233)	Sanjay Patel	2019-09-06	1	-2/+2
\| \| \| \| \| \|	https://bugs.llvm.org/show_bug.cgi?id=43233 llvm-svn: 371221
*	InstCombine: Fix crash on icmp of gep with addrspacecasted null	Matt Arsenault	2019-09-05	1	-2/+2
\| \| \| \|	llvm-svn: 371146
*	[SimplifyCFG] Don't SimplifyBranchOnICmpChain with ExtraCase	Vitaly Buka	2019-09-05	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Here we try to avoid issues with "explicit branch" with SimplifyBranchOnICmpChain which can check on undef. Msan by design reports branches on uninitialized memory and undefs, so we have false report here. In general msan does not like when we convert ``` // If at least one of them is true we can MSAN is ok if another is undefs if (a \|\| b) return; ``` into ``` // If 'a' is undef MSAN will complain even if 'b' is true if (a) return; if (b) return; ``` Example Before optimization we had something like this: ``` while (true) { bool maybe_undef = doStuff(); while (true) { char c = getChar(); if (c != 10 && c != 13) continue break; } // we know that c == 10 \|\| c == 13 if we get here, // so msan know that branch is not affected by maybe_undef if (maybe_undef \|\| c == 10 \|\| c == 13) continue; return; } ``` SimplifyBranchOnICmpChain will convert that into ``` while (true) { bool maybe_undef = doStuff(); while (true) { char c = getChar(); if (c != 10 && c != 13) continue; break; } // however msan will complain here: if (maybe_undef) continue; // we know that c == 10 \|\| c == 13, so either way we will get continue switch(c) { case 10: continue; case 13: continue; } return; } ``` Reviewers: eugenis, efriedma Reviewed By: eugenis, efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67205 llvm-svn: 371138
*	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned sub ↵	Roman Lebedev	2019-09-05	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	overflow' check A follow-up for r329011. This may be changed to produce @llvm.sub.with.overflow in a later patch, but for now just make things more consistent overall. A few observations stem from this: * There does not seem to be a similar one-instruction fold for uadd-overflow * I'm not sure we'll want to canonicalize `B u> A` as `usub.with.overflow`, so since the `icmp` here no longer refers to `sub`, reconstructing `usub.with.overflow` will be problematic, and will likely require standalone pass (similar to DivRemPairs). https://rise4fun.com/Alive/Zqs Name: (A - B) u> A --> B u> A %t0 = sub i8 %A, %B %r = icmp ugt i8 %t0, %A => %r = icmp ugt i8 %B, %A Name: (A - B) u<= A --> B u<= A %t0 = sub i8 %A, %B %r = icmp ule i8 %t0, %A => %r = icmp ule i8 %B, %A Name: C u< (C - D) --> C u< D %t0 = sub i8 %C, %D %r = icmp ult i8 %C, %t0 => %r = icmp ult i8 %C, %D Name: C u>= (C - D) --> C u>= D %t0 = sub i8 %C, %D %r = icmp uge i8 %C, %t0 => %r = icmp uge i8 %C, %D llvm-svn: 371101
*	[InstCombine] foldICmpBinOp(): consider inverted check in 'unsigned add ↵	Roman Lebedev	2019-09-05	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	overflow' check A follow-up for r342004. This will be changed to produce @llvm.add.with.overflow in a later patch, but for now just make things more consistent overall. https://rise4fun.com/Alive/qxE Name: (Op1 + X) u< Op1 --> ~Op1 u< X %t0 = add i8 %Op1, %X %r = icmp ult i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp ult i8 %n, %X Name: (Op1 + X) u>= Op1 --> ~Op1 u>= X %t0 = add i8 %Op1, %X %r = icmp uge i8 %t0, %Op1 => %n = xor i8 %Op1, -1 %r = icmp uge i8 %n, %X ;------------------------------------------------------------------------------- Name: Op0 u> (Op0 + X) --> X u> ~Op0 %t0 = add i8 %Op0, %X %r = icmp ugt i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ugt i8 %X, %n Name: Op0 u<= (Op0 + X) --> X u<= ~Op0 %t0 = add i8 %Op0, %X %r = icmp ule i8 %Op0, %t0 => %n = xor i8 %Op0, -1 %r = icmp ule i8 %X, %n llvm-svn: 371100
*	[MergedLoadStoreMotion] Sink stores to BB with more than 2 predecessors	Denis Bakhvalov	2019-09-05	1	-69/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have: bb5: br i1 %arg3, label %bb6, label %bb7 bb6: %tmp = getelementptr inbounds i32, i32* %arg1, i64 2 store i32 3, i32* %tmp, align 4 br label %bb9 bb7: %tmp8 = getelementptr inbounds i32, i32* %arg1, i64 2 store i32 3, i32* %tmp8, align 4 br label %bb9 bb9: ; preds = %bb4, %bb6, %bb7 ... We can't sink stores directly into bb9. This patch creates new BB that is successor of %bb6 and %bb7 and sinks stores into that block. SplitFooterBB is the parameter to the pass that controls that behavior. Change-Id: I7fdf50a772b84633e4b1b860e905bf7e3e29940f Differential: https://reviews.llvm.org/D66234 llvm-svn: 371089
*	[MemorySSA] Verify MSSAUpdater exists.	Alina Sbirlea	2019-09-05	1	-1/+2
\| \| \| \|	llvm-svn: 371087
*	[PGO][CHR] Speed up following long, interlinked use-def chains.	Hiroshi Yamauchi	2019-09-05	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Avoid visiting an instruction more than once by using a map. This is similar to https://reviews.llvm.org/rL361416. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67198 llvm-svn: 371086
*	[MemorySSA] Update MemorySSA when removing debug.value calls.	Alina Sbirlea	2019-09-05	1	-1/+3
\| \| \| \|	llvm-svn: 371084
*	[LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align	Guillaume Chatelet	2019-09-05	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67223 llvm-svn: 371063
*	[Attributor][Stats] Use the right statistics macro	Johannes Doerfert	2019-09-04	1	-2/+2
\| \| \| \|	llvm-svn: 370976
*	[Attributor][Fix] Make sure we do not delete live code	Johannes Doerfert	2019-09-04	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Liveness needs to mark edges, not blocks as dead. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67191 llvm-svn: 370975
*	[NewPM][Sancov] Make Sancov a Module Pass instead of 2 Passes	Leonard Chan	2019-09-04	2	-228/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch merges the sancov module and funciton passes into one module pass. The reason for this is because we ran into an out of memory error when attempting to run asan fuzzer on some protobufs (pc.cc files). I traced the OOM error to the destructor of SanitizerCoverage where we only call appendTo[Compiler]Used which calls appendToUsedList. I'm not sure where precisely in appendToUsedList causes the OOM, but I am able to confirm that it's calling this function repeatedly that causes the OOM. (I hacked sancov a bit such that I can still create and destroy a new sancov on every function run, but only call appendToUsedList after all functions in the module have finished. This passes, but when I make it such that appendToUsedList is called on every sancov destruction, we hit OOM.) I don't think the OOM is from just adding to the SmallSet and SmallVector inside appendToUsedList since in either case for a given module, they'll have the same max size. I suspect that when the existing llvm.compiler.used global is erased, the memory behind it isn't freed. I could be wrong on this though. This patch works around the OOM issue by just calling appendToUsedList at the end of every module run instead of function run. The same amount of constants still get added to llvm.compiler.used, abd we make the pass usage and logic simpler by not having any inter-pass dependencies. Differential Revision: https://reviews.llvm.org/D66988 llvm-svn: 370971
*	[MemorySSA] Re-enable MemorySSA use.	Alina Sbirlea	2019-09-04	1	-0/+4
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370957
*	[Attributor][Fix] Ensure the attribute names are created properly	Johannes Doerfert	2019-09-04	1	-1/+3
\| \| \| \| \| \| \|	The names of the attributes were not always created properly which caused problems with the yaml output. llvm-svn: 370956
*	[NFC] Switch last couple of invariant_load checks to use hasMetadata	Philip Reames	2019-09-04	3	-3/+3
\| \| \| \|	llvm-svn: 370948
*	[InstCombine] sub(xor(x, y), or(x, y)) -> neg(and(x, y))	David Bolvansky	2019-09-04	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: ``` Name: sub(xor(x, y), or(x, y)) -> neg(and(x, y)) %or = or i32 %y, %x %xor = xor i32 %x, %y %sub = sub i32 %xor, %or => %sub1 = and i32 %x, %y %sub = sub i32 0, %sub1 Optimization: sub(xor(x, y), or(x, y)) -> neg(and(x, y)) Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/8OI Reviewers: lebedev.ri Reviewed By: lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67188 llvm-svn: 370945
*	[InstCombine] Fold sub (and A, B) (or A, B)) to neg (xor A, B)	David Bolvansky	2019-09-04	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: ``` Name: sub(and(x, y), or(x, y)) -> neg(xor(x, y)) %or = or i32 %y, %x %and = and i32 %x, %y %sub = sub i32 %and, %or => %sub1 = xor i32 %x, %y %sub = sub i32 0, %sub1 Optimization: sub(and(x, y), or(x, y)) -> neg(xor(x, y)) Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/VI6 Found by @lebedev.ri. Also author of the proof. Reviewers: lebedev.ri, spatel Reviewed By: lebedev.ri Subscribers: llvm-commits, lebedev.ri Tags: #llvm Differential Revision: https://reviews.llvm.org/D67155 llvm-svn: 370934
*	[Instruction] Add hasMetadata(Kind) helper [NFC]	Philip Reames	2019-09-04	5	-7/+7
\| \| \| \| \| \|	It's a common idiom, so let's add the obvious wrapper for metadata kinds which are basically booleans. llvm-svn: 370933
*	[Attributor] Look at internal functions only on-demand	Johannes Doerfert	2019-09-04	1	-55/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of building attributes for internal functions which we do not update as long as we assume they are dead, we now do not create attributes until we assume the internal function to be live. This improves the number of required iterations, as well as the number of required updates, in real code. On our tests, the results are mixed. Reviewers: sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66914 llvm-svn: 370924
*	[Attributor] Use the white list for attributes consistently	Johannes Doerfert	2019-09-04	1	-60/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We create attributes on-demand so we need to check the white list on-demand. This also unifies the location at which we create, initialize, and eventually invalidate new abstract attributes. The tests show mixed results, a few more call site attributes are determined which can cause more iterations. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66913 llvm-svn: 370922
*	[Attributor] Deal more explicit with non-exact definitions	Johannes Doerfert	2019-09-04	1	-74/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Before we tried to rule out non-exact definitions early but that lead to on-demand attributes created for them anyway. As a consequence we needed to look at the definition in the initialize of each attribute again. This patch centralized this lookup and tightens the condition under which we give up on non-exact definitions. Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67115 llvm-svn: 370917
*	[InstSimplify] guard against unreachable code (PR43218)	Sanjay Patel	2019-09-04	1	-1/+6
\| \| \| \| \| \| \|	This would crash: https://bugs.llvm.org/show_bug.cgi?id=43218 llvm-svn: 370911
*	[Debuginfo][SROA] Need to handle dbg.value in SROA pass.	Alexey Lapshin	2019-09-04	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \|	SROA pass processes debug info incorrecly if applied twice. Specifically, after SROA works first time, instcombine converts dbg.declare intrinsics into dbg.value. Inlining creates new opportunities for SROA, so it is called again. This time it does not handle correctly previously inserted dbg.value intrinsics. Differential Revision: https://reviews.llvm.org/D64595 llvm-svn: 370906
*	[InstCombine] Fold sub (or A, B) (and A, B) to (xor A, B)	David Bolvansky	2019-09-04	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: ``` Name: sub or and to xor %or = or i32 %y, %x %and = and i32 %x, %y %sub = sub i32 %or, %and => %sub = xor i32 %x, %y Optimization: sub or and to xor Done: 1 Optimization is correct! ``` https://rise4fun.com/Alive/eJu Reviewers: spatel, lebedev.ri Reviewed By: lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67153 llvm-svn: 370883
*	[GVN] Remove a todo introduced w/rL370791	Philip Reames	2019-09-03	1	-3/+0
\| \| \| \| \| \| \| \| \| \|	When I dug into this, it turns out to be much more involved than I'd realized and doesn't actually simplify anything. The general purpose of the leader table is that we want to find the most-dominating definition quickly. The problem for equivalance folding is slightly different; we want to find the most dominating value whose definition block dominates our use quickly. To make this change, we'd end up having to restructure the leader table (either the sorting thereof, or maybe even introducing multiple leader tables per value) and that complexity is just not worth it. llvm-svn: 370824
*	[MemorySSA] Disable MemorySSA use.	Alina Sbirlea	2019-09-03	1	-4/+0
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370821
*	[Attributor] Use the delete API for liveness	Johannes Doerfert	2019-09-03	1	-5/+16
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: uenoku, sstefan1 Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66833 llvm-svn: 370818
*	[Attributor] Deduce "no-capture" argument attribute	Johannes Doerfert	2019-09-03	1	-10/+355
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the no-capture argument attribute deduction to the Attributor fixpoint framework. The new string attributed "no-capture-maybe-returned" is introduced to allow deduction of no-capture through functions that "capture" an argument but only by "returning" it. It is only used by the Attributor for testing. Differential Revision: https://reviews.llvm.org/D59922 llvm-svn: 370817
*	[MemorySSA] Re-enable MemorySSA use.	Alina Sbirlea	2019-09-03	1	-0/+4
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D58311 llvm-svn: 370811
*	[GVN] Propagate simple equalities from assumes within the tail of the block	Philip Reames	2019-09-03	1	-19/+74
\| \| \| \| \| \| \| \| \| \| \| \|	This extends the existing logic for propagating constant expressions in an analogous manner for what we do across basic blocks. The core point is that we chose some order of operands, and canonicalize uses towards that one. The heuristic used is inspired by the one used across blocks; in a follow up change, I'd plan to common them so that the cross block version uses the slightly stronger ordering herein. As noted by the TODOs in the code, there's a good amount of room for improving the existing code and making it more powerful. Some follow up work planned. Differential Revision: https://reviews.llvm.org/D66977 llvm-svn: 370791
*	Revert r370454 "[LoopIdiomRecognize] BCmp loop idiom recognition"	Roman Lebedev	2019-09-03	1	-869/+8
\| \| \| \| \| \| \| \| \| \|	https://bugs.llvm.org/show_bug.cgi?id=43206 was filed, claiming that there is a miscompilation. Reverting until i investigate. This reverts commit r370454 llvm-svn: 370788
*	[LV] Fix miscompiles by adding non-header PHI nodes to AllowedExit	Bjorn Pettersson	2019-09-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fold-tail currently supports reduction last-vector-value live-out's, but has yet to support last-scalar-value live-outs, including non-header phi's. As it relies on AllowedExit in order to detect them and bail out we need to add the non-header PHI nodes to AllowedExit, otherwise we end up with miscompiles. Solves https://bugs.llvm.org/show_bug.cgi?id=43166 Reviewers: fhahn, Ayal Reviewed By: fhahn, Ayal Subscribers: anna, hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67074 llvm-svn: 370721
*	[LV] Tail-folding, runtime scev checks	Sjoerd Meijer	2019-09-03	1	-2/+2
\| \| \| \| \| \| \| \| \|	Now that we allow tail-folding, not only when we optimise for size, make sure we do not run in this assert. Differential revision: https://reviews.llvm.org/D66932 llvm-svn: 370711
*	[LV] Tail-folding with runtime memory checks	Sjoerd Meijer	2019-09-03	1	-1/+4
\| \| \| \| \| \| \| \| \|	The loop vectorizer was running in an assert when it tried to fold the tail and had to emit runtime memory disambiguation checks. Differential revision: https://reviews.llvm.org/D66803 llvm-svn: 370707
*	[InstCombine] recognize bswap disguised as shufflevector	Sanjay Patel	2019-09-02	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	bitcast <N x i8> (shuf X, undef, <N, N-1,...0>) to i{N8} --> bswap (bitcast X to i{N8}) In PR43146: https://bugs.llvm.org/show_bug.cgi?id=43146 ...we have a more complicated case where SLP is making a mess of bswap. This patch won't do anything for that currently, but we need to improve bswap recognition in instcombine, SLP, and/or a standalone pass to avoid that problem. This is limited using the data-layout so we don't try to do this transform with actual vector types. The backend does not appear to have folds to convert in either direction, so we don't want to mess up something that is actually better lowered as a shuffle. On x86, we're trading something like this: vmovd %edi, %xmm0 vpshufb LCPI0_0(%rip), %xmm0, %xmm0 ## xmm0 = xmm0[3,2,1,0,u,u,u,u,u,u,u,u,u,u,u,u] vmovd %xmm0, %eax For: movl %edi, %eax bswapl %eax Differential Revision: https://reviews.llvm.org/D66965 llvm-svn: 370659
*	[InstCombine] mempcpy(d,s,n) to memcpy(d,s,n) + n	David Bolvansky	2019-08-31	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Back-end currently expands mempcpy, but middle-end should work with memcpy instead of mempcpy to enable more memcpy-optimization. GCC backend emits mempcpy, so LLVM backend could form it too, if we know mempcpy libcall is better than memcpy + n. https://godbolt.org/z/dOCG96 Reviewers: efriedma, spatel, craig.topper, RKSimon, jdoerfert Reviewed By: efriedma Subscribers: hjl.tools, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65737 llvm-svn: 370593
*	Fix cppcheck shadow variable and variable scope warnings. NFCI.	Simon Pilgrim	2019-08-31	1	-6/+5
\| \| \| \|	llvm-svn: 370580
*	[CVP] Generate simpler code for elided with.overflow intrinsics	Nikita Popov	2019-08-31	1	-2/+6
\| \| \| \| \| \| \| \| \| \|	Use a { iN undef, i1 false } struct as the base, and only insert the first operand, instead of using { iN undef, i1 undef } as the base and inserting both. This is the same as what we do in InstCombine. Differential Revision: https://reviews.llvm.org/D67034 llvm-svn: 370573
*	[SampleFDO] Add profile symbol list section to discriminate function being	Wei Mi	2019-08-31	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cold versus function being newly added. This is the second half of https://reviews.llvm.org/D66374. Profile symbol list is the collection of function symbols showing up in the binary which generates the current profile. It is used to discriminate function being cold versus function being newly added. Profile symbol list is only added for profile with ExtBinary format. During profile use compilation, when profile-sample-accurate is enabled, a function without profile will be regarded as cold only when it is contained in that list. Differential Revision: https://reviews.llvm.org/D66766 llvm-svn: 370563
*	[GVN] Verify value equality before doing phi translation for call instruction	Wei Mi	2019-08-30	1	-1/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an updated version of https://reviews.llvm.org/D66909 to fix PR42605. Basically, current phi translatation translates an old value number to an new value number for a call instruction based on the literal equality of call expression, without verifying there is no clobber in between. This is incorrect. To get a finegrain check, use MachineDependence analysis to do the job. However, this is still not ideal. Although given a call instruction, `MemoryDependenceResults::getCallDependencyFrom` returns identical call instructions without clobber in between using MemDepResult with its DepType to be `Def`. However, identical is too strict here and we want it to be relaxed a little to consider phi-translation -- callee is the same, param operands can be different. That means changing the semantic of `MemDepResult::Def` and I don't know the potential impact. So currently the patch is still conservative to only handle MemDepResult::NonFuncLocal, which means the current call has no function local clobber. If there is clobber, even if the clobber doesn't stand in between the current call and the call with the new value, we won't do phi-translate. Differential Revision: https://reviews.llvm.org/D67013 llvm-svn: 370547