bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[InstCombine] use m_APInt to allow icmp (and (sh X, Y), C2), C1 folds for ↵	Sanjay Patel	2016-09-07	2	-52/+22
\| \| \| \| \| \|	splat constant vectors llvm-svn: 280873
*	[SimplifyCFG] Don't try to create metadata-valued PHIs	Hal Finkel	2016-09-07	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can't create metadata-valued PHIs; don't try to do so when sinking. I created a test case for this using the @llvm.type.test intrinsic, because it takes a metadata parameter and does not have severe side effects (thus SimplifyCFG is willing to otherwise sink it). Previously, running the test case would crash with: Invalid use of metadata! %.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0> LLVM ERROR: Broken function found, compilation aborted! llvm-svn: 280866
*	[LoopUnroll] Correct a debug message. NFC.	Haicheng Wu	2016-09-07	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D24299 llvm-svn: 280865
*	[InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors	Sanjay Patel	2016-09-07	1	-43/+33
\| \| \| \| \| \| \| \|	This is a revert of r280676 which was a revert of r280637; ie, this is r280637 again. It was speculatively reverted to help debug buildbot failures. llvm-svn: 280861
*	Typo. NFC.	Chad Rosier	2016-09-07	1	-1/+1
\| \| \| \|	llvm-svn: 280834
*	[LoopInterchange] Improve debug output. NFC.	Chad Rosier	2016-09-07	1	-6/+6
\| \| \| \|	llvm-svn: 280820
*	[LoopInterchange] Improve debug output. NFC.	Chad Rosier	2016-09-07	1	-4/+6
\| \| \| \|	llvm-svn: 280819
*	[LSV] Use the original loads' names for the extractelement instructions.	Justin Lebar	2016-09-07	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: LSV replaces multiple adjacent loads with one vectorized load and a bunch of extractelement instructions. This patch makes the extractelement instructions' names match those of the original loads, for (hopefully) improved readability. Reviewers: asbirlea, tstellarAMD Subscribers: arsenm, mzolotukhin Differential Revision: https://reviews.llvm.org/D23748 llvm-svn: 280818
*	[InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining ↵	Andrea Di Biagio	2016-09-07	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	logic. This fixes a similar issue to the one already fixed by r280804 (revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts in the extrq/extrqi combining logic. However, it turns out that even the insertq/insertqi logic was affected by the same problem. llvm-svn: 280807
*	[InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the ↵	Andrea Di Biagio	2016-09-07	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operands of extrq/extrqi intrinsic calls. This patch fixes an assertion failure caused by unsafe dynamic casts on the constant operands of sse4a intrinsic calls to extrq/extrqi The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently checks if the input operands are constants. Internally, that logic relies on dyn_casts of values returned by calls to method Constant::getAggregateElement. However, method getAggregateElemet may return nullptr if the constant element cannot be retrieved. So, all the dyn_casts can potentially fail. This is what happens for example if a constexpr value is passed in input to an extrq/extrqi intrinsic call. This patch fixes the problem by using a dyn_cast_or_null (instead of a simple dyn_cast) on the result of each call to Constant::getAggregateElement. Added reproducible test cases to x86-sse4a.ll. Differential Revision: https://reviews.llvm.org/D24256 llvm-svn: 280804
*	Revert "[EfficiencySanitizer] Adds shadow memory parameters for 40-bit ↵	Renato Golin	2016-09-07	1	-34/+9
\| \| \| \| \| \| \| \| \| \| \|	virtual memory address." This reverts commit r280796, as it broke the AArch64 bots for no reason. The tests were passing and we should try to keep them passing, so a proper review should make that happen. llvm-svn: 280802
*	[EfficiencySanitizer] Adds shadow memory parameters for 40-bit virtual ↵	Sagar Thakur	2016-09-07	1	-9/+34
\| \| \| \| \| \| \| \| \| \| \|	memory address. Adding 40-bit shadow memory parameters because MIPS64 uses 40-bit virtual memory addresses. Reviewed by bruening Differential: D23801 llvm-svn: 280796
*	[SimplifyCFG] Followup fix to r280790	James Molloy	2016-09-07	1	-1/+3
\| \| \| \| \| \|	In failure cases it's not guaranteed that the PHI we're inspecting is actually in the successor block! In this case we need to bail out early, and never query getIncomingValueForBlock() as that will cause an assert. llvm-svn: 280794
*	[SimplifyCFG] Update workaround for PR30188 to also include loads	James Molloy	2016-09-07	1	-2/+7
\| \| \| \| \| \| \| \|	I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores. Fixes PR30188 (again). llvm-svn: 280792
*	[SimplifyCFG] Check PHI uses more accurately	James Molloy	2016-09-07	1	-1/+3
\| \| \| \| \| \| \| \|	PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG. Fixes PR30292. llvm-svn: 280790
*	Fix typo in comment, NFC	Nick Lewycky	2016-09-07	1	-1/+1
\| \| \| \|	llvm-svn: 280774
*	Explicitly require DominatorTreeAnalysis pass for instsimplify pass.	Dehao Chen	2016-09-06	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: DominatorTreeAnalysis is always required by instsimplify. Reviewers: danielcdh, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24173 llvm-svn: 280760
*	fix formatting; NFC	Sanjay Patel	2016-09-06	1	-19/+14
\| \| \| \|	llvm-svn: 280727
*	[JumpThreading] Only write back branch-weight MDs for blocks that originally ↵	Adam Nemet	2016-09-06	1	-1/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	had PGO info Currently the pass updates branch weights in the IR if the function has any PGO info (entry frequency is set). However we could still have regions of the CFG that does not have branch weights collected (e.g. a cold region). In this case we'd use static estimates. Since static estimates for branches are determined independently, they are inconsistent. Updating them can "randomly" inflate block frequencies. I've run into this in a completely cold loop of h264ref from SPEC. -Rpass-with-hotness showed the loop to be completely cold during inlining (before JT) but completely hot during vectorization (after JT). The new testcase demonstrate the problem. We check array elements against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting check. The block names should be self-explanatory. In this example, jump threading incorrectly updates the weight of the loop-exiting branch to 0, drastically inflating the frequency of the loop (in the range of billions). There is no run-time profile info for edges inside the loop, so branch probabilities are estimated. These are the resulting branch and block frequencies for the loop body: check_1 (16) (8) / \| eq_1 \| (8) \ \| check_2 (16) (8) / \| eq_2 \| (8) \ \| check_3 (16) (1) / \| (loop exit) \| (15) \| (back edge) First we thread eq_1 -> check_2 to check_3. Frequencies are updated to remove the frequency of eq_1 from check_2 and then from the false edge leaving check_2. Changed frequencies are highlighted with * : check_1 (16) (8) / \| eq_1~ \| (8) / \| / check_2 (8) / (8) / \| \ eq_2 \| (0) \ \ \| ` --- check_3 (16) (1) / \| (loop exit) \| (15) \| (back edge) Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new back edges. Frequencies are updated to remove the frequency of eq_1 and eq_3 from check_3 and then the false edge leaving check_3 (changed frequencies are highlighted with ): check_1 (16) (8) / \| eq_1~ \| (8) / \| / check_2 (8) / (8) / \| /-- eq_2~ \| (0) (back edge) \| check_3 (0) (0) / \| (loop exit) \| (0*) \| (back edge) As a result, the loop exit edge ends up with 0 frequency which in turn makes the loop header to have maximum frequency. There are a few potential problems here: 1. The profile data seems odd. There is a single profile sample of the loop being entered. On the other hand, there are no weights inside the loop. 2. Based on static estimation we shouldn't set edges to "extreme" values, i.e. extremely likely or unlikely. 3. We shouldn't create profile metadata that is calculated from static estimation. I am not sure what policy is but it seems to make sense to treat profile metadata as something that is known to originate from profiling. Estimated probabilities should only be reflected in BPI/BFI. Any one of these would probably fix the immediate problem. I went for 3 because I think it's a good policy to have and added a FIXME about 2. Differential Revision: https://reviews.llvm.org/D24118 llvm-svn: 280713
*	[Coroutines] Part12: Handle alloca address-taken	Gor Nishanov	2016-09-05	1	-1/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Move early uses of spilled variables after CoroBegin. For example, if a parameter had address taken, we may end up with the code like: define @f(i32 %n) { %n.addr = alloca i32 store %n, %n.addr ... call @coro.begin This patch fixes the problem by moving uses of spilled variables after CoroBegin. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24234 llvm-svn: 280678
*	[InstCombine] don't assert that division-by-constant has been folded (PR30281)	Sanjay Patel	2016-09-05	1	-7/+6
\| \| \| \| \| \| \| \| \| \|	This is effectively a revert of: https://reviews.llvm.org/rL280115 And this should fix https://llvm.org/bugs/show_bug.cgi?id=30281: llvm-svn: 280677
*	[InstCombine] revert r280637 because it causes test failures on an ARM bot	Sanjay Patel	2016-09-05	1	-33/+43
\| \| \| \| \| \|	http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll llvm-svn: 280676
*	[Coroutines] Part11: Add final suspend handling.	Gor Nishanov	2016-09-05	3	-17/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties: * it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic; * a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic. This patch adds final suspend handling logic to CoroEarly and CoroSplit passes. Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll). Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24068 llvm-svn: 280646
*	[InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectors	Sanjay Patel	2016-09-04	1	-43/+33
\| \| \| \| \| \| \| \|	The code to calculate 'UsesRemoved' could be simplified. As-is, that code is a victim of PR30273: https://llvm.org/bugs/show_bug.cgi?id=30273 llvm-svn: 280637
*	[InstCombine] recode icmp fold in a vector-friendly way; NFC	Sanjay Patel	2016-09-04	1	-22/+30
\| \| \| \| \| \| \| \| \| \| \|	The transform in question: icmp (and (trunc W), C2), C1 -> icmp (and W, C2'), C1' ...is still not enabled for vectors, thus no functional change intended. It's not clear to me if this is a good transform for vectors or even scalars in general. Changing that behavior may be a follow-on patch. llvm-svn: 280627
*	[InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing	Dorit Nuzman	2016-09-04	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	memcpy with ld/st. When InstCombine replaces a memcpy with loads+stores it does not copy over the llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes that. Differential Revision: https://reviews.llvm.org/D23499 llvm-svn: 280617
*	Test commit.	Dorit Nuzman	2016-09-04	1	-0/+1
\| \| \| \|	llvm-svn: 280615
*	Fix inliner funclet unwind memoization	Joseph Tremoulet	2016-09-04	1	-7/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The inliner may need to determine where a given funclet unwinds to, and this determination may depend on other funclets throughout the funclet tree. The code that performs this walk in getUnwindDestToken memoizes results to avoid redundant computations. In the case that a funclet's unwind destination is derived from its ancestor, there's code to walk back down the tree from the ancestor updating the memo map of its descendants to record the unwind destination. This change fixes that code to account for the case that some descendant has a different unwind destination, which can happen if that unwind dest is a descendant of the EHPad being queried and thus didn't determine its unwind destination. Also update test inline-funclets.ll, which is supposed to cover such scenarios, to include a case that fails an assertion without this fix but passes with it. Fixes PR29151. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24117 llvm-svn: 280610
*	Cleanup : Use metadata preserving API for branch creation	Xinliang David Li	2016-09-03	1	-9/+4
\| \| \| \| \| \| \|	Use the wrapper API in IRBuilder that does meta data copy to create new branch in LoopUnswitch. llvm-svn: 280602
*	AMDGPU: Do basic folding of class intrinsic	Matt Arsenault	2016-09-03	1	-0/+79
\| \| \| \| \| \| \|	This allows more of the OCML builtin library to be constant folded. llvm-svn: 280586
*	ADT: Do not inherit from std::iterator in ilist_iterator	Duncan P. N. Exon Smith	2016-09-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Inheriting from std::iterator uses more boiler-plate than manual typedefs. Avoid that in both ilist_iterator and MachineInstrBundleIterator. This has the side effect of removing ilist_iterator from certain ADL lookups in namespace std; calls to std::next need to be qualified by "std::" that didn't have to before. The one case of this in-tree was operating on a temporary, so I used the more compact operator++. llvm-svn: 280570
*	[Profile] handle select instruction in 'expect' lowering	Xinliang David Li	2016-09-02	1	-11/+25
\| \| \| \| \| \| \| \| \|	Builtin expect lowering currently ignores select. This patch fixes the issue Differential Revision: http://reviews.llvm.org/D24166 llvm-svn: 280547
*	[SLP] Don't pass a global CL option as an argument. NFC.	Chad Rosier	2016-09-02	1	-8/+7
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D24199 llvm-svn: 280527
*	[InsttCombine] fold insertelement of constant into shuffle with constant ↵	Sanjay Patel	2016-09-02	1	-0/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operand (PR29126) The motivating case occurs with SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector. Note that the transform is intentionally limited to shuffles that are equivalent to vector selects to avoid creating arbitrary shuffle masks that may not lower well. This should solve PR29126: https://llvm.org/bugs/show_bug.cgi?id=29126 Differential Revision: https://reviews.llvm.org/D23886 llvm-svn: 280504
*	[LV] Ensure reverse interleaved group GEPs remain uniform	Matthew Simpson	2016-09-02	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For uniform instructions, we're only required to generate a scalar value for the first vector lane of each unroll iteration. Thus, if we have a reverse interleaved group, computing the member index off the scalar GEP corresponding to the last vector lane of its pointer operand technically makes the GEP non-uniform. We should compute the member index off the first scalar GEP instead. I've added the updated member index computation to the existing reverse interleaved group test. llvm-svn: 280497
*	[SimplifyCFG] Add a workaround to fix PR30188	James Molloy	2016-09-02	1	-0/+10
\| \| \| \| \| \| \| \|	We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions. The real fix is in SROA, which I'll be looking into. llvm-svn: 280470
*	revert r280429 and r280425:	Dehao Chen	2016-09-02	1	-22/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r280425 \| dehao \| 2016-09-01 16:15:50 -0700 (Thu, 01 Sep 2016) \| 9 lines Refactor LICM pass in preparation for LoopSink pass. Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778). r280429 \| dehao \| 2016-09-01 16:31:25 -0700 (Thu, 01 Sep 2016) \| 9 lines Refactor LICM to expose canSinkOrHoistInst to LoopSink pass. Summary: LoopSink pass shares the same canSinkOrHoistInst functionality with LICM pass. This patch exposes this function in preparation of https://reviews.llvm.org/D22778 llvm-svn: 280453
*	revert r280432:	Dehao Chen	2016-09-02	1	-6/+5
\| \| \| \| \| \| \| \| \|	r280432 \| dehao \| 2016-09-01 16:51:37 -0700 (Thu, 01 Sep 2016) \| 9 lines Explicitly require DominatorTreeAnalysis pass for instsimplify pass. Summary: DominatorTreeAnalysis is always required by instsimplify. llvm-svn: 280452
*	Explicitly require DominatorTreeAnalysis pass for instsimplify pass.	Dehao Chen	2016-09-01	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: DominatorTreeAnalysis is always required by instsimplify. Reviewers: davidxl, danielcdh Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24173 llvm-svn: 280432
*	Refactor LICM to expose canSinkOrHoistInst to LoopSink pass.	Dehao Chen	2016-09-01	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: LoopSink pass shares the same canSinkOrHoistInst functionality with LICM pass. This patch exposes this function in preparation of https://reviews.llvm.org/D22778 Reviewers: chandlerc, davidxl, danielcdh Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24171 llvm-svn: 280429
*	Refactor replaceDominatedUsesWith to have a flag to control whether to ↵	Dehao Chen	2016-09-01	2	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	replace uses in BB itself. Summary: This is in preparation for LoopSink pass which calls replaceDominatedUsesWith to update after sinking. Reviewers: chandlerc, davidxl, danielcdh Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24170 llvm-svn: 280427
*	Refactor LICM pass in preparation for LoopSink pass.	Dehao Chen	2016-09-01	1	-21/+23
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: LoopSink pass uses some common function in LICM. This patch refactor the LICM code to make it usable by LoopSink pass (https://reviews.llvm.org/D22778). Reviewers: chandlerc, davidxl, danielcdh Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D24168 llvm-svn: 280425
*	[LV] Use ScalarParts for ad-hoc pointer IV scalarization (NFCI)	Matthew Simpson	2016-09-01	1	-22/+9
\| \| \| \| \| \| \| \| \|	We can now maintain scalar values in VectorLoopValueMap. Thus, we no longer have to create temporary vectors with insertelement instructions when handling pointer induction variables. This case was mistakenly missed from r279649 when refactoring the other scalarization code. llvm-svn: 280405
*	[LV] Move VectorParts allocation and mapping into PHI widening (NFC)	Matthew Simpson	2016-09-01	1	-29/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch moves the allocation of VectorParts for PHI nodes into the actual PHI widening code. Previously, we allocated these VectorParts in vectorizeBlockInLoop, and passed them by reference to widenPHIInstruction. Upon returning, we would then map the VectorParts in VectorLoopValueMap. This behavior is problematic for the cases where we only want to generate a scalar version of a PHI node. For example, if in the future we only generate a scalar version of an induction variable, we would end up inserting an empty vector entry into the map once we return to vectorizeBlockInLoop. We now no longer need to pass VectorParts to the various PHI widening functions, and we can keep VectorParts allocation as close as possible to the point at which they are actually mapped in VectorLoopValueMap. llvm-svn: 280390
*	[EarlyCSE] Change C API pass interface for EarlyCSE w/ MemorySSA	Geoff Berry	2016-09-01	1	-2/+6
\| \| \| \| \| \| \| \| \| \|	Previous change broke the C API for creating an EarlyCSE pass w/ MemorySSA by adding a bool parameter to control whether MemorySSA was used or not. This broke the OCaml bindings. Instead, change the old C API entry point back and add a new one to request an EarlyCSE pass with MemorySSA. llvm-svn: 280379
*	[InstCombine] remove fold of an icmp pattern that should never happen	Sanjay Patel	2016-09-01	1	-15/+0
\| \| \| \| \| \| \| \| \| \| \| \|	While removing a scalar shackle from an icmp fold, I noticed that I couldn't find any tests to trigger this code path. The 'and' shrinking transform should be handled by InstCombiner::foldCastedBitwiseLogic() or eliminated with InstSimplify. The icmp narrowing is part of InstCombiner::foldICmpWithCastAndCast(). Differential Revision: https://reviews.llvm.org/D24031 llvm-svn: 280370
*	[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches	James Molloy	2016-09-01	1	-28/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted. As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider: if (a) x(1); else if (b) x(2); This produces the following CFG: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ \| / [ end ] [end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch. We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects). We are now able to detect this case and split off the unconditional arcs to a common successor: [if] / \ [x(1)] [if] \| \| \ \| \| \ \| [x(2)] \| \ / \| [sink.split] \| \ / [ end ] Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases. llvm-svn: 280364
*	[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd	James Molloy	2016-09-01	1	-90/+149
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything. This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink. This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example: %a = load i32* %b %d = load i32* %b %c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1 Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor). This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough. Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm. In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging. In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive. This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans. llvm-svn: 280351
*	[SimplifyCFG] Fix nondeterministic iteration order	James Molloy	2016-09-01	1	-2/+2
\| \| \| \| \| \| \| \|	We iterate over the result from SafeToMergeTerminators, so make it a SmallSetVector instead of a SmallPtrSet. Should fix stage3 convergence builds. llvm-svn: 280342
*	[SimplifyCFG] Improve FoldValueComparisonIntoPredecessors to handle more cases	James Molloy	2016-09-01	1	-6/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A very important case is not handled here: multiple arcs to a single block with a PHI. Consider: a: %1 = icmp %b, 1 br %1, label %c, label %e c: %2 = icmp %b, 2 br %2, label %d, label %e d: br %e e: phi [0, %a], [1, %c], [2, %d] FoldValueComparisonIntoPredecessors will refuse to fold this, as it doesn't know how to deal with two arcs to a common destination with different PHI values. The answer is obvious - just split all conflicting arcs. llvm-svn: 280338