bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	MemorySSA: Move updater to its own file	Daniel Berlin	2017-01-28	3	-339/+373
\| \| \| \|	llvm-svn: 293357
*	Introduce a basic MemorySSA updater, that supports insertDef,	Daniel Berlin	2017-01-28	1	-28/+368
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	insertUse, moveBefore and moveAfter operations. Summary: This creates a basic MemorySSA updater that handles arbitrary insertion of uses and defs into MemorySSA, as well as arbitrary movement around the CFG. It replaces the current splice API. It can be made to handle arbitrary control flow changes. Currently, it uses the same updater algorithm from D28934. The main difference is because MemorySSA is single variable, we have the complete def and use list, and don't need anyone to give it to us as part of the API. We also have to rename stores below us in some cases. If we go that direction in that patch, i will merge all the updater implementations (using an updater_traits or something to provide the get* functions we use, called read/write in that patch). Sadly, the current SSAUpdater algorithm is way too slow to use for what we are doing here. I have updated the tests we have to basically build memoryssa incrementally using the updater api, and make sure it still comes out the same. Reviewers: george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29047 llvm-svn: 293356
*	[RegisterCoalescing] Recommit the patch "Remove partial redundent copy".	Quentin Colombet	2017-01-28	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In r292621, the recommit fixes a bug related with live interval update after the partial redundent copy is moved. This recommit solves an additional bug related to the lack of update of subranges. The original patch is to solve the performance problem described in PR27827. Register coalescing sometimes cannot remove a copy because of interference. But if we can find a reverse copy in one of the predecessor block of the copy, the copy is partially redundent and we may remove the copy partially by moving it to the predecessor block without the reverse copy. Differential Revision: https://reviews.llvm.org/D28585 Re-apply r292621 Revert "Revert rL292621. Caused some internal build bot failures in apple." This reverts commit r292984. Original patch: Wei Mi <wmi@google.com> Subrange fix: Mostly Matthias Braun <matze@braunis.de> llvm-svn: 293353
*	[InstCombine] move icmp transforms that might be recognized as min/max and ↵	Sanjay Patel	2017-01-27	1	-10/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inf-loop (PR31751) This is a minimal patch to avoid the infinite loop in: https://llvm.org/bugs/show_bug.cgi?id=31751 But the general problem is bigger: we're not canonicalizing all of the min/max forms reported by value tracking's matchSelectPattern(), and we don't define min/max consistently. Some code uses matchSelectPattern(), other code uses matchers like m_Umax, and others have their own inline definitions which may be subtly different from any of the above. The reason that the test cases in this patch need a cast op to trigger is because we don't (yet) canonicalize all min/max forms based on matchSelectPattern() in canonicalizeMinMaxWithConstant(), but we do make min/max+cast transforms based on matchSelectPattern() in visitSelectInst(). The location of the icmp transforms that trigger the inf-loop seems arbitrary at best, so I'm moving those behind the min/max fence in visitICmpInst() as the quick fix. llvm-svn: 293345
*	Global DCE performance improvement	Mehdi Amini	2017-01-27	1	-60/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the original algorithm so that it scales better when meeting very large bitcode where every instruction does not implies a global. The target query is "how to you get all the globals referenced by another global"? Before this patch, it was doing this by walking the body (or the initializer) and collecting the references. What this patch is doing, it precomputing the answer to this query for the whole module by walking the use-list of every global instead. Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu> Differential Revision: https://reviews.llvm.org/D28549 llvm-svn: 293328
*	[PGO] add debug option to view raw count after prof use annotation	Xinliang David Li	2017-01-27	1	-1/+59
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D29045 llvm-svn: 293325
*	NFC: Add debug tracing for more cases where loop unrolling fails.	Anna Thomas	2017-01-27	1	-2/+8
\| \| \| \|	llvm-svn: 293313
*	[SLP] Refactoring of horizontal reduction analysis, NFC.	Alexey Bataev	2017-01-27	1	-24/+25
\| \| \| \| \| \| \| \| \| \| \|	Some checks in SLP horizontal reduction analysis function are performed several times, though it is enough to perform these checks only once during an initial attempt at adding candidate for the reduction instruction/reduced value. Differential Revision: https://reviews.llvm.org/D29175 llvm-svn: 293274
*	[LICM] When we are recomputing the alias sets for a subloop, we cannot	Chandler Carruth	2017-01-27	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	skip sub-subloops. The logic to skip subloops dated from when this code was shared with the cached case. Once it was factored out to only run in the case of recomputed subloops it became a dangerous bug. If a subsubloop contained an interfering instruction it would be silently skipped from the alias sets for LICM. With the old pass manager this was extremely hard to trigger as it would require failing to visit these subloops with the LICM pass but then visiting the outer loop somehow. I've not yet contrived any test case that actually manages to trigger this. But with the new pass manager we don't do the cross-loop caching hack that the old PM does and so we recompute alias set information from first principles. While this seems much cleaner and simpler it exposed this bug and would subtly miscompile code due to failing to correctly model the aliasing constraints of deeply nested loops. llvm-svn: 293273
*	Fix unused variable warning.	Richard Trieu	2017-01-27	1	-0/+1
\| \| \| \|	llvm-svn: 293260
*	NewGVN: Add basic dead and redundant store elimination	Daniel Berlin	2017-01-27	1	-29/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This adds basic dead and redundant store elimination to NewGVN. Unlike our current DSE, it will happily do cross-block DSE if it meets our requirements. We get a bunch of DSE's simple.ll cases, and some stuff it doesn't. Unlike DSE, however, we only try to eliminate stores of the same value to the same memory location, not just general stores to the same memory location. Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29149 llvm-svn: 293258
*	[NVPTX] [InstCombine] Add llvm_unreachable to appease MSVC.	Justin Lebar	2017-01-27	1	-0/+1
\| \| \| \|	llvm-svn: 293253
*	[NVPTX] Fix use-after-stack-free bug in InstCombineCalls.	Justin Lebar	2017-01-27	1	-1/+1
\| \| \| \| \| \|	Introduced in r293244. llvm-svn: 293251
*	Constant fold switch inst when looking for trivial conditions to unswitch on.	Xin Tong	2017-01-27	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Constant fold switch inst when looking for trivial conditions to unswitch on. Reviewers: sanjoy, chenli, hfinkel, efriedma Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D29037 llvm-svn: 293250
*	[PM] Port LoopLoadElimination to the new pass manager and wire it into	Chandler Carruth	2017-01-27	1	-27/+60
\| \| \| \| \| \| \| \| \| \| \|	the main pipeline. This is a very straight forward port. Nothing weird or surprising. This brings the number of missing passes from the new PM's pipeline down to three. llvm-svn: 293249
*	[NVPTX] Upgrade NVVM intrinsics in InstCombineCalls.	Justin Lebar	2017-01-27	1	-0/+250
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There are many NVVM intrinsics that we can't entirely get rid of, but that nonetheless often correspond to target-generic LLVM intrinsics. For example, if flush denormals to zero (ftz) is enabled, we can convert @llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32. On the other hand, if ftz is disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a non-ftz PTX instruction. In this case, we can, however, simplify the non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32. These transformations are particularly useful because they let us constant fold instructions that appear in libdevice, the bitcode library that ships with CUDA and essentially functions as its libm. Reviewers: tra Subscribers: hfinkel, majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D28794 llvm-svn: 293244
*	[LangRef] Make @llvm.sqrt(x) return undef, rather than have UB, for negative x.	Justin Lebar	2017-01-27	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Some frontends emit a speculate-and-select idiom for sqrt, wherein they compute sqrt(x), check if x is negative, and select NaN if it is: %cmp = fcmp olt double %a, -0.000000e+00 %sqrt = call double @llvm.sqrt.f64(double %a) %ret = select i1 %cmp, double 0x7FF8000000000000, double %sqrt This is technically UB as the LangRef is written today if %a is ever less than -0. But emitting code that's compliant with the current definition of sqrt would require a branch, which would then prevent us from matching this idiom in SelectionDAG (which we do today -- ISD::FSQRT has defined behavior on negative inputs), because SelectionDAG looks at one BB at a time. Nothing in LLVM takes advantage of this undefined behavior, as far as we can tell, and the fact that llvm.sqrt has UB dates from its initial addition to the LangRef. Reviewers: arsenm, mehdi_amini, hfinkel Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28797 llvm-svn: 293242
*	Revert a couple of InstCombine/Guard checkins	Sanjoy Das	2017-01-26	1	-29/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change reverts: r293061: "[InstCombine] Canonicalize guards for NOT OR condition" r293058: "[InstCombine] Canonicalize guards for AND condition" They miscompile cases like: ``` declare void @llvm.experimental.guard(i1, ...) define void @test_guard_not_or(i1 %A, i1 %B) { %C = or i1 %A, %B %D = xor i1 %C, true call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ] ret void } ``` because they do transfer the `i32 20, i32 30` parameters to newly created guard instructions. llvm-svn: 293227
*	NewGVN: Fix bug exposed by PR31761	Daniel Berlin	2017-01-26	1	-43/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This does not actually fix the testcase in PR31761 (discussion is ongoing on the testcase), but does fix a bug it exposes, where stores were not properly clobbering loads. We accomplish this by unifying the memory equivalence infratructure back into the normal congruence infrastructure, and then properly destroying congruence classes when memory state leaders disappear. Reviewers: davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29195 llvm-svn: 293216
*	[InstCombine] fold (X >>u C) << C --> X & (-1 << C)	Sanjay Patel	2017-01-26	1	-18/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We already have this fold when the lshr has one use, but it doesn't need that restriction. We may be able to remove some code from foldShiftedShift(). Also, move the similar: (X << C) >>u C --> X & (-1 >>u C) ...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst(). That whole function seems questionable since it is called by commonShiftTransforms(), but there's really not much in common if we're checking the shift opcodes for every fold. llvm-svn: 293215
*	NewGVN: Add algorithm overview	Daniel Berlin	2017-01-26	1	-0/+21
\| \| \| \|	llvm-svn: 293212
*	[InstCombine] use m_APInt to allow (X << C) >>u C --> X & (-1 >>u C) with ↵	Sanjay Patel	2017-01-26	1	-16/+24
\| \| \| \| \| \|	splat vectors llvm-svn: 293208
*	NewGVN: Make unreachable blocks be marked with unreachable	Daniel Berlin	2017-01-26	1	-18/+13
\| \| \| \|	llvm-svn: 293196
*	[LV] Fix an issue where forming LCSSA in the place that we did would	Chandler Carruth	2017-01-26	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	change the set of uniform instructions in the loop causing an assert failure. The problem is that the legalization checking also builds data structures mapping various facts about the loop body. The immediate cause was the set of uniform instructions. If these then change when LCSSA is formed, the data structures would already have been built and become stale. The included test case triggered an assert in loop vectorize that was reduced out of the new PM's pipeline. The solution is to form LCSSA early enough that no information is cached across the changes made. The only really obvious position is outside of the main logic to vectorize the loop. This also has the advantage of removing one case where forming LCSSA could mutate the loop but we wouldn't track that as a "Changed" state. If it is significantly advantageous to do some legalization checking prior to this, we can do a more careful positioning but it seemed best to just back off to a safe position first. llvm-svn: 293168
*	[TargetTransformInfo] Refactor and improve getScalarizationOverhead()	Jonas Paulsson	2017-01-26	1	-33/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155
*	[X86] Add demanded elts support for the inputs to pclmul intrinsic	Craig Topper	2017-01-26	1	-0/+38
\| \| \| \| \| \| \| \|	This intrinsic uses bit 0 and bit 4 of an immediate argument to determine which bits of its inputs to read. This patch uses this information to simplify the demanded elements of the input vectors. Differential Revision: https://reviews.llvm.org/D28979 llvm-svn: 293151
*	Revert test commit	Taewook Oh	2017-01-26	1	-1/+0
\| \| \| \|	llvm-svn: 293150
*	test commit	Taewook Oh	2017-01-26	1	-3/+4
\| \| \| \|	llvm-svn: 293148
*	[PM] Simplify the new PM interface to the loop unroller and expose two	Chandler Carruth	2017-01-26	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	factory functions for the two modes the loop unroller is actually used in in-tree: simplified full-unrolling and the entire thing including partial unrolling. I've also wired these up to nice names so you can express both of these being in a pipeline easily. This is a precursor to actually enabling these parts of the O2 pipeline. Differential Revision: https://reviews.llvm.org/D28897 llvm-svn: 293136
*	[LoopUnroll] Properly update loopinfo for runtime unrolling by 2	Michael Kuperstein	2017-01-26	3	-10/+19
\| \| \| \| \| \| \| \| \| \| \|	Even when we don't create a remainder loop (that is, when we unroll by 2), we may duplicate nested loops into the remainder. This is complicated by the fact the remainder may itself be either inserted into an outer loop, or at the top level. In the latter case, we may need to create new top-level loops. Differential Revision: https://reviews.llvm.org/D29156 llvm-svn: 293124
*	[NewGVN] Skip uses in unreachable blocks.	Davide Italiano	2017-01-26	1	-0/+6
\| \| \| \| \| \| \| \|	Otherwise we ask for a domtree node that's not there, and we crash. Differential Revision: https://reviews.llvm.org/D29145 llvm-svn: 293122
*	LowerTypeTests: Ignore external globals with type metadata.	Peter Collingbourne	2017-01-26	1	-3/+6
\| \| \| \| \| \|	Thanks to Davide Italiano for finding the problem and providing a test case. llvm-svn: 293119
*	[NewGVN] Simplify folding a lambda used only once. NFCI.	Davide Italiano	2017-01-25	1	-5/+3
\| \| \| \|	llvm-svn: 293112
*	MemorySSA: Link all defs together into an intrusive defslist, to make ↵	Daniel Berlin	2017-01-25	1	-36/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	updater easier Summary: This is the first in a series of patches to add a simple, generalized updater to MemorySSA. For MemorySSA, every def is may-def, instead of the normal must-def. (the best way to think of memoryssa is "everything is really one variable, with different versions of that variable at different points in the program). This means when updating, we end up having to do a bunch of work to touch defs below and above us. In order to support this quickly, i have ilist'd all the defs for each block. ilist supports tags, so this is quite easy. the only slightly messy part is that you can't have two iplists for the same type that differ only whether they have the ownership part enabled or not, because the traits are for the value type. The verifiers have been updated to test that the def order is correct. Reviewers: george.burgess.iv Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29046 llvm-svn: 293085
*	Add loop pass insertion point EP_LateLoopOptimizations	Krzysztof Parzyszek	2017-01-25	1	-0/+2
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28694 llvm-svn: 293067
*	[Guards] Introduce loop-predication pass	Artur Pilipenko	2017-01-25	3	-0/+282
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch introduces guard based loop predication optimization. The new LoopPredication pass tries to convert loop variant range checks to loop invariant by widening checks across loop iterations. For example, it will convert for (i = 0; i < n; i++) { guard(i < len); ... } to for (i = 0; i < n; i++) { guard(n - 1 < len); ... } After this transformation the condition of the guard is loop invariant, so loop-unswitch can later unswitch the loop by this condition which basically predicates the loop by the widened condition: if (n - 1 < len) for (i = 0; i < n; i++) { ... } else deoptimize This patch relies on an NFC change to make ScalarEvolution::isMonotonicPredicate public (revision 293062). Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D29034 llvm-svn: 293064
*	[InstCombine] Canonicalize guards for NOT OR condition	Artur Pilipenko	2017-01-25	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \|	This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D29075 Patch by Maxim Kazantsev. llvm-svn: 293061
*	[InstCombine][SSE] Add support for PACKSS/PACKUS constant folding	Simon Pilgrim	2017-01-25	1	-0/+94
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D28949 llvm-svn: 293060
*	[InstCombine] Canonicalize guards for AND condition	Artur Pilipenko	2017-01-25	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \|	This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D29074 Patch by Maxim Kazantsev. llvm-svn: 293058
*	[InstCombine] Allow InstrCombine to remove one of adjacent guards if they ↵	Artur Pilipenko	2017-01-25	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	are equivalent This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: majnemer, apilipenko Differential Revision: https://reviews.llvm.org/D29071 Patch by Maxim Kazantsev. llvm-svn: 293056
*	[SLP] Improve horizontal vectorization for non-power-of-2 number of	Alexey Bataev	2017-01-25	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	instructions. If number of instructions in horizontal reduction list is not power of 2 then only PowerOf2Floor(NumberOfInstructions) last elements are actually vectorized, other instructions remain scalar. Patch tries to vectorize the remaining elements either. Differential Revision: https://reviews.llvm.org/D28959 llvm-svn: 293042
*	[SimplifyCFG] Do not sink and merge inline-asm instructions.	Akira Hatanaka	2017-01-25	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conservatively disable sinking and merging inline-asm instructions as doing so can potentially create arguments that cannot satisfy the inline-asm constraints. For example, SimplifyCFG used to do the following transformation: (before) if.then: %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8) br label %if.end if.else: %1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6) br label %if.end (after) %.sink = select i1 %tobool, i32 6, i32 8 %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink) This would result in a crash in the backend since only immediate integer operands are permitted for constraint "n". rdar://problem/30110806 Differential Revision: https://reviews.llvm.org/D29111 llvm-svn: 293025
*	[PM] Teach LoopUnroll to update the LPM infrastructure as it unrolls	Chandler Carruth	2017-01-25	2	-1/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	loops. We do this by reconstructing the newly added loops after the unroll completes to avoid threading pass manager details through all the mess of the unrolling infrastructure. I've enabled some extra assertions in the LPM to try and catch issues here and enabled a bunch of unroller tests to try and make sure this is sane. Currently, I'm manually running loop-simplify when needed. That should go away once it is folded into the LPM infrastructure. Differential Revision: https://reviews.llvm.org/D28848 llvm-svn: 293011
*	[coroutines] Spill the result of the invoke instruction correctly	Gor Nishanov	2017-01-25	1	-9/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When we decide that the result of the invoke instruction need to be spilled, we need to insert the spill into a block that is on the normal edge coming out of the invoke instruction. (Prior to this change the code would insert the spill immediately after the invoke instruction, which breaks the IR, since invoke is a terminator instruction). In the following example, we will split the edge going into %cont and insert the spill there. ``` %r = invoke double @print(double 0.0) to label %cont unwind label %pad cont: %0 = call i8 @llvm.coro.suspend(token none, i1 false) switch i8 %0, label %suspend [i8 0, label %resume i8 1, label %cleanup] resume: call double @print(double %r) ``` Reviewers: majnemer Reviewed By: majnemer Subscribers: mehdi_amini, llvm-commits, EricWF Differential Revision: https://reviews.llvm.org/D29102 llvm-svn: 293006
*	Explicitly promote indirect calls before sample profile annotation.	Dehao Chen	2017-01-24	1	-5/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In iterative sample pgo where profile is collected from PGOed binary, we may see indirect call targets promoted and inlined in the profile. Before profile annotation, we need to make this happen in order to annotate correctly on IR. This patch explicitly promotes these indirect calls and inlines them before profile annotation. Reviewers: xur, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29040 llvm-svn: 292979
*	Remove the load hoisting code of MLSM, it is completely subsumed by GVNHoist	Daniel Berlin	2017-01-24	1	-174/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: GVNHoist performs all the optimizations that MLSM does to loads, in a more general way, and in a faster time bound (MLSM is N^3 in most cases, N^4 in a few edge cases). This disables the load portion. Note that the way ld_hoist_st_sink.ll is written makes one think that the loads should be moved to the while.preheader block, but 1. Neither MLSM nor GVNHoist do it (they both move them to identical places). 2. MLSM couldn't possibly do it anyway, as the while.preheader block is not the head of the diamond, while.body is. (GVNHoist could do it if it was legal). 3. At a glance, it's not legal anyway because the in-loop load conflict with the in-loop store, so the loads must stay in-loop. I am happy to update the test to use update_test_checks so that checking is tighter, just was going to do it as a followup. Note that i can find no particular benefit to the store portion on any real testcase/benchmark i have (even size-wise). If we really still want it, i am happy to commit to writing a targeted store sinker, just taking the code from the MemorySSA port of MergedLoadStoreMotion (which is N^2 worst case, and N most of the time). We can do what it does in a much better time bound. We also should be both hoisting and sinking stores, not just sinking them, anyway, since whether we should hoist or sink to merge depends basically on luck of the draw of where the blockers are placed. Nonetheless, i have left it alone for now. Reviewers: chandlerc, davide Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29079 llvm-svn: 292971
*	Use InstCombine's builder in foldSelectCttzCtlz instead of creating a new one.	Amaury Sechet	2017-01-24	1	-3/+2
\| \| \| \| \| \| \| \| \| \|	Summary: As per title. This will add the instructiions we are interested in in the worklist. Reviewers: mehdi_amini, majnemer, andreadb Differential Revision: https://reviews.llvm.org/D29081 llvm-svn: 292957
*	Fix formating in foldSelectCttzCtlz. NFC	Amaury Sechet	2017-01-24	1	-1/+1
\| \| \| \|	llvm-svn: 292934
*	[PH] Replace uses of AssertingVH from members of analysis results with	Chandler Carruth	2017-01-24	6	-48/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a lazy-asserting PoisoningVH. AssertVH is fundamentally incompatible with cache-invalidation of analysis results. The invaliadtion happens after the AssertingVH has already fired. Instead, use a PoisoningVH that will assert if the dangling handle is ever used rather than merely be assigned or destroyed. This patch also removes all of the (numerous) doomed attempts to work around this fundamental incompatibility. It is a pretty significant simplification IMO. The most interesting change is in the Inliner where we still do some clearing because we don't want to rely on the coarse grained invalidation strategy of the containing pass manager. However, I prefer the approach that contains this logic to the cleanup phase of the Inliner, and I think we could enhance the CGSCC analysis management layer to make this even better in the future if desired. The rest is straight cleanup. I've also added a test for one of the harder cases to work around: when a module analysis contains many AssertingVHes pointing at functions. Differential Revision: https://reviews.llvm.org/D29006 llvm-svn: 292928
*	[InstCombine][X86] MULDQ/MULUDQ undef -> zero	Simon Pilgrim	2017-01-24	2	-7/+1
\| \| \| \| \| \| \| \|	Added early out for single undef input - we were already supporting (and testing) this in the constant folding code, we just do it quicker now Drop undef handling from demanded elts code now that we handle it fully in InstCombiner::visitCallInst llvm-svn: 292913