bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Reland: Dead Virtual Function Elimination	Oliver Stannard	2019-10-17	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove dead virtual functions from vtables with replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up correctly. Original commit message: Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 375094
*	[Alignment][NFC] Value::getPointerAlignment returns MaybeAlign	Guillaume Chatelet	2019-10-15	2	-16/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68398 llvm-svn: 374889
*	Revert "Dead Virtual Function Elimination"	Jorge Gorbe Moya	2019-10-14	1	-32/+0
\| \| \| \| \| \|	This reverts commit 9f6a873268e1ad9855873d9d8007086c0d01cf4f. llvm-svn: 374844
*	[NFC][TTI] Add Alignment for isLegalMasked[Load/Store]	Sam Parker	2019-10-14	1	-4/+6
\| \| \| \| \| \| \| \| \|	Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763
*	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure ↵	Zi Xuan Wu	2019-10-12	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634
*	Dead Virtual Function Elimination	Oliver Stannard	2019-10-11	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 374539
*	[SCEV] Add stricter verification option.	Florian Hahn	2019-10-11	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently -verify-scev only fails if there is a constant difference between two BE counts. This misses a lot of cases. This patch adds a -verify-scev-strict options, which fails for any non-zero differences, if used together with -verify-scev. With the stricter checking, some unit tests fail because of mis-matches, especially around IndVarSimplify. If there is no reason I am missing for just checking constant deltas, I am planning on looking into the various failures. Reviewers: efriedma, sanjoy.google, reames, atrick Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D68592 llvm-svn: 374535
*	[MemorySSA] Update Phi simplification.	Alina Sbirlea	2019-10-10	1	-5/+12
\| \| \| \| \| \| \| \| \| \|	When simplifying a Phi to the unique value found incoming, check that there wasn't a Phi already created to break a cycle. If so, remove it. Resolves PR43541. Some additional nits included. llvm-svn: 374471
*	[ValueTracking] Improve pointer offset computation for cases of same base	Rong Xu	2019-10-10	1	-9/+39
\| \| \| \| \| \| \| \| \| \| \| \|	This patch improves the handling of pointer offset in GEP expressions where one argument is the base pointer. isPointerOffset() is being used by memcpyopt where current code synthesizes consecutive 32 bytes stores to one store and two memset intrinsic calls. With this patch, we convert the stores to one memset intrinsic. Differential Revision: https://reviews.llvm.org/D67989 llvm-svn: 374454
*	[MemorySSA] Additional handling of unreachable blocks.	Alina Sbirlea	2019-10-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Whenever we get the previous definition, the assumption is that the recursion starts ina reachable block. If the recursion starts in an unreachable block, we may recurse indefinitely. Handle this case by returning LoE if the block is unreachable. Resolves PR43426. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68809 llvm-svn: 374447
*	[System Model] [TTI] Move default cache/prefetch implementations	David Greene	2019-10-10	1	-28/+0
\| \| \| \| \| \| \| \| \| \|	Move the default implementations of cache and prefetch queries to TargetTransformInfoImplBase and delete them from NoTIIImpl. This brings these interfaces in line with how other TTI interfaces work. Differential Revision: https://reviews.llvm.org/D68804 llvm-svn: 374446
*	[Alignment][NFC] Make VectorUtils uas llvm::Align	Guillaume Chatelet	2019-10-10	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, rogfer01, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68784 llvm-svn: 374330
*	[System Model] [TTI] Update cache and prefetch TTI interfaces	David Greene	2019-10-09	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo. Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Moving the default TargetTransformInfoImplBase implementation to a default MCSubtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default MCSubtargetInfo implementation essentially returns the defaults TargetTransformInfoImplBase used to return. Existing users of TTI defaults will hit the defaults now in MCSubtargetInfo. Targets that define their own custom TTI implementations won't use the BasicTTIImpl implementations that route to the subtarget. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 llvm-svn: 374205
*	[MemorySSA] Make the use of moveAllAfterMergeBlocks consistent.	Alina Sbirlea	2019-10-09	1	-15/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The rule for the moveAllAfterMergeBlocks API si for all instructions from `From` to have been moved to `To`, while keeping the CFG edges (and block terminators) unchanged. Update all the callsites for moveAllAfterMergeBlocks to follow this. Pending follow-up: since the same behavior is needed everytime, merge all callsites into one. The common denominator may be the call to `MergeBlockIntoPredecessor`. Resolves PR43569. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68659 llvm-svn: 374177
*	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure ↵	Jinsong Ji	2019-10-08	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a. This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091
*	[SVE][IR] Scalable Vector size queries and IR instruction support	Graham Hunter	2019-10-08	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Adds a TypeSize struct to represent the known minimum size of a type along with a flag to indicate that the runtime size is a integer multiple of that size * Converts existing size query functions from Type.h and DataLayout.h to return a TypeSize result * Adds convenience methods (including a transparent conversion operator to uint64_t) so that most existing code 'just works' as if the return values were still scalars. * Uses the new size queries along with ElementCount to ensure that all supported instructions used with scalable vectors can be constructed in IR. Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen Reviewed By: rovka, sdesmalen Differential Revision: https://reviews.llvm.org/D53137 llvm-svn: 374042
*	[LoopVectorize][PowerPC] Estimate int and float register pressure separately ↵	Zi Xuan Wu	2019-10-08	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374017
*	Second attempt to add iterator_range::empty()	Jordan Rose	2019-10-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Doing this makes MSVC complain that `empty(someRange)` could refer to either C++17's std::empty or LLVM's llvm::empty, which previously we avoided via SFINAE because std::empty is defined in terms of an empty member rather than begin and end. So, switch callers over to the new method as it is added. https://reviews.llvm.org/D68439 llvm-svn: 373935
*	Revert "[SLP] avoid reduction transform on patterns that the backend can ↵	Martin Storsjo	2019-10-07	1	-53/+0
\| \| \| \| \| \| \| \| \| \|	load-combine" This reverts SVN r373833, as it caused a failed assert "Non-zero loop cost expected" on building numerous projects, see PR43582 for details and reproduction samples. llvm-svn: 373882
*	[LOOPGUARD] Remove asserts in getLoopGuardBranch	Whitney Tsang	2019-10-06	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The assertion in getLoopGuardBranch can be a 'return nullptr' under if condition. Authored By: DTharun Reviewer: Whitney, fhahn Reviewed By: Whitney, fhahn Subscribers: fhahn, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D66084 llvm-svn: 373857
*	[SLP] avoid reduction transform on patterns that the backend can load-combine	Sanjay Patel	2019-10-05	1	-0/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a scalar cost model adjustment with a conservative pattern match and cost summation for a multi-instruction sequence that can probably be reduced later. This should prevent SLP from creating a vector reduction unless that sequence is extremely cheap. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 llvm-svn: 373833
*	[MemorySSA] Update Phi creation when inserting a Def.	Alina Sbirlea	2019-10-02	1	-37/+40
\| \| \| \| \| \| \| \| \|	MemoryPhis should be added in the IDF of the blocks newly gaining Defs. This includes the blocks that gained a Phi and the block gaining a Def, if the block did not have one before. Resolves PR43427. llvm-svn: 373505
*	Fix: Actually erase remove the elements from AssumeHandles	Aditya Kumar	2019-10-02	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: sdmitriev, tejohnson Reviewed by: tejohnson Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68318 llvm-svn: 373494
*	MemorySSAUpdater::applyInsertUpdates - silence static analyzer ↵	Simon Pilgrim	2019-10-02	1	-1/+1
\| \| \| \| \| \| \| \|	dyn_cast<MemoryAccess> null dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<MemoryAccess> directly and if not assert will fire for us. llvm-svn: 373467
*	MemorySSA tryOptimizePhi - assert that we've found a DefChainEnd. NFCI.	Simon Pilgrim	2019-10-02	1	-0/+1
\| \| \| \| \| \|	Silences static analyzer null dereference warning. llvm-svn: 373466
*	LoopAccessAnalysis isConsecutiveAccess() - silence static analyzer ↵	Simon Pilgrim	2019-10-02	1	-2/+2
\| \| \| \| \| \| \| \|	dyn_cast<SCEVConstant> null dereference warning. NFCI. The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<SCEVConstant> directly and if not assert will fire for us. llvm-svn: 373465
*	[InstCombine] Simplify fma multiplication to nan for undef or nan operands.	Florian Hahn	2019-10-02	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In similar fashion to D67721, we can simplify FMA multiplications if any of the operands is NaN or undef. In instcombine, we will simplify the FMA to an fadd with a NaN operand, which in turn gets folded to NaN. Note that this just changes SimplifyFMAFMul, so we still not catch the case where only the Add part of the FMA is Nan/Undef. Reviewers: cameron.mcinally, mcberg2017, spatel, arsenm Reviewed By: cameron.mcinally Differential Revision: https://reviews.llvm.org/D68265 llvm-svn: 373459
*	[InstSimplify] fold fma/fmuladd with a NaN or undef operand	Sanjay Patel	2019-10-02	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \|	This is intended to be similar to the constant folding results from D67446 and earlier, but not all operands are constant in these tests, so the responsibility for folding is left to InstSimplify. Differential Revision: https://reviews.llvm.org/D67721 llvm-svn: 373455
*	Remove an unnecessary cast. NFC.	Jay Foad	2019-10-02	1	-3/+2
\| \| \| \|	llvm-svn: 373434
*	[DDG] Data Dependence Graph - Root Node	Bardia Mahjour	2019-10-01	2	-1/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds Root Node to the DDG. The purpose of the root node is to create a single entry node that allows graph walk iterators to iterate through all nodes of the graph, making sure that no node is left unvisited during a graph walk (eg. SCC or DFS). Once the DDG is fully constructed it will have exactly one root node. Every node in the graph is reachable from the root. The algorithm for connecting the root node is based on depth-first-search that keeps track of visited nodes to try to avoid creating unnecessary edges. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack Tag: #llvm Differential Revision: https://reviews.llvm.org/D67970 llvm-svn: 373386
*	[MemorySSA] Check for unreachable blocks when getting last definition.	Alina Sbirlea	2019-10-01	1	-0/+3
\| \| \| \| \| \| \| \|	If a single predecessor is found, still check if the block is unreachable. The test that found this had a self loop unreachable block. Resolves PR43493. llvm-svn: 373383
*	[MemorySSA] Update last_access_in_block check.	Alina Sbirlea	2019-10-01	1	-2/+7
\| \| \| \| \| \| \| \| \| \|	The check for "was there an access in this block" should be: is the last access in this block and is it not a newly inserted phi. Resolves new test in PR43438. Also fix a typo when simplifying trivial Phis to match the comment. llvm-svn: 373380
*	[NFC][HardwareLoops] Update some iterators	Sam Parker	2019-10-01	1	-11/+6
\| \| \| \|	llvm-svn: 373309
*	[ConstantFolding] Fold constant calls to log2()	Evandro Menezes	2019-09-30	1	-0/+9
\| \| \| \| \| \| \| \|	Somehow, folding calls to `log2()` with a constant was missing. Differential revision: https://reviews.llvm.org/D67300 llvm-svn: 373262
*	Revert "[SCEV] add no wrap flag for SCEVAddExpr."	Tim Northover	2019-09-30	1	-1/+1
\| \| \| \| \| \| \| \|	This reverts r366419 because the analysis performed is within the context of the loop and it's only valid to add wrapping flags to "global" expressions if they're always correct. llvm-svn: 373184
*	[InstSimplify] generalize FP folds with undef/NaN; NFC	Sanjay Patel	2019-09-27	1	-12/+14
\| \| \| \| \| \|	We can reuse this logic for things like fma. llvm-svn: 373119
*	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types	Guillaume Chatelet	2019-09-27	2	-6/+5
\| \| \| \|	llvm-svn: 373081
*	Revert "[LoopInfo] Limit the iterations to check whether a loop has dedicated	Wei Mi	2019-09-27	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \|	exits" Get a better approach in https://reviews.llvm.org/D68107 to solve the problem. Revert the initial patch and will commit the new one soon. This reverts commit rL372990. llvm-svn: 373044
*	[LOOPGUARD] Disable loop with multiple loop exiting blocks.	Whitney Tsang	2019-09-26	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As discussed in the loop group meeting. With the current definition of loop guard, we should not allow multiple loop exiting blocks. For loops that has multiple loop exiting blocks, we can simply unable to find the loop guard. When getUniqueExitBlock() obtains a vector size not equals to one, that means there is either no exit blocks or there exists more than one unique block the loop exit to. If we don't disallow loop with multiple loop exit blocks, then with our current implementation, there can exist exit blocks don't post dominated by the non pre-header successor of the guard block. Reviewer: reames, Meinersbur, kbarton, etiotto, bmahjour Reviewed By: Meinersbur, kbarton Subscribers: fhahn, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D66529 llvm-svn: 373011
*	ConstantFold - silence static analyzer dyn_cast<ExtractValueInst> null ↵	Simon Pilgrim	2019-09-26	1	-1/+1
\| \| \| \| \| \| \| \|	dereference warning. NFCI. The static analyzer is warning about a potential null dereference, but we should be able to use cast<ExtractValueInst> directly and if not assert will fire for us. llvm-svn: 372993
*	[LoopInfo] Limit the iterations to check whether a loop has dedicated exits	Wei Mi	2019-09-26	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	for extreme large case. We had a case that a single loop which has 4000 exits and the average number of predecessors of each exit is > 1000, and we found compiling the case spent a significant amount of time on checking whether a loop has dedicated exits. This patch adds a limit for the iterations to the check. With the patch, the time to compile our testcase reduced from 1000s to 200s (clang release build). Differential Revision: https://reviews.llvm.org/D67359 llvm-svn: 372990
*	Remove local shadow constant. NFCI.	Simon Pilgrim	2019-09-26	1	-2/+0
\| \| \| \| \| \|	ValueTracking.cpp already has a local static MaxDepth = 6 constant - this one seems to have been missed when rL124183 landed. llvm-svn: 372964
*	[ValueTracking] Silence static analyzer dyn_cast<Operator> null dereference ↵	Simon Pilgrim	2019-09-26	1	-225/+228
\| \| \| \| \| \| \| \|	warnings. NFCI. The static analyzer is warning about a potential null dereferences, but since the pointer is only used in a switch statement for Operator::getOpcode() (with an empty default) then its easiest just to wrap this in a null test as the dyn_cast might return null here. llvm-svn: 372962
*	[ConstantFolding] Use FoldBitCast correctly	Keno Fischer	2019-09-26	1	-2/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we might attempt to use a BitCast to turn bits into vectors of pointers, but that requires an inttoptr cast to be legal. Add an assertion to detect the formation of illegal bitcast attempts early (in the tests, we often constant-fold away the result before getting to this assertion check), while being careful to still handle the early-return conditions without adding extra complexity in the result. Patch by Jameson Nash <jameson@juliacomputing.com>. Differential Revision: https://reviews.llvm.org/D65057 llvm-svn: 372940
*	[MemorySSA] Avoid adding Phis in the presence of unreachable blocks.	Alina Sbirlea	2019-09-25	1	-45/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If a block has all incoming values with the same MemoryAccess (ignoring incoming values from unreachable blocks), then use that incoming MemoryAccess and do not create a Phi in the first place. Revert IDF work-around added in rL372673; it should not be required unless the Def inserted is the first in its block. The patch also cleans up a series of tests, added during the many iterations on insertDef. The patch also fixes PR43438. The same issue that occurs in insertDef with "adding phis, hence the IDF of Phis is needed", can also occur in fixupDefs: the `getPreviousRecursive` call only adds Phis walking on the predecessor edges, which means there may be the case of a Phi added walking the CFG "backwards" which triggers the needs for an additional Phi in successor blocks. Such Phis are added during fixupDefs only in the presence of unreachable blocks. Hence this highlights the need to avoid adding Phis in blocks with unreachable predecessors in the first place. Reviewers: george.burgess.iv Subscribers: Prazek, sanjoy.google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67995 llvm-svn: 372932
*	[InstSimplify] Handle more 'A </>/>=/<= B &&/\|\| (A - B) !=/== 0' patterns ↵	Roman Lebedev	2019-09-25	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \|	(PR43251) https://rise4fun.com/Alive/sl9s https://rise4fun.com/Alive/2plN https://bugs.llvm.org/show_bug.cgi?id=43251 llvm-svn: 372928
*	[InstSimplify] Match 1.0 and 0.0 for both operands in SimplifyFMAMul	Florian Hahn	2019-09-25	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because we do not constant fold multiplications in SimplifyFMAMul, we match 1.0 and 0.0 for both operands, as multiplying by them is guaranteed to produce an exact result (if it is allowed to do so). Note that it is not enough to just swap the operands to ensure a constant is on the RHS, as we want to also cover the case with 2 constants. Reviewers: lebedev.ri, spatel, reames, scanon Reviewed By: lebedev.ri, reames Differential Revision: https://reviews.llvm.org/D67553 llvm-svn: 372915
*	[InstCombine] Limit FMul constant folding for fma simplifications.	Florian Hahn	2019-09-25	1	-9/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As @reames pointed out post-commit, rL371518 adds additional rounding in some cases, when doing constant folding of the multiplication. This breaks a guarantee llvm.fma makes and must be avoided. This patch reapplies rL371518, but splits off the simplifications not requiring rounding from SimplifFMulInst as SimplifyFMAFMul. Reviewers: spatel, lebedev.ri, reames, scanon Reviewed By: reames Differential Revision: https://reviews.llvm.org/D67434 llvm-svn: 372899
*	[PatternMatch] Make m_Br more flexible, add matchers for BB values.	Florian Hahn	2019-09-25	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently m_Br only takes references to BasicBlock*, which limits its flexibility. For example, you have to declare a variable, even if you ignore the result or you have to have additional checks to make sure the matched BB matches an expected one. This patch adds m_BasicBlock and m_SpecificBB matchers, which can be used like the existing matchers for constants or values. I also had a look at the existing uses and updated a few. IMO it makes the code a bit more explicit. Reviewers: spatel, craig.topper, RKSimon, majnemer, lebedev.ri Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D68013 llvm-svn: 372885
*	[IR] allow fast-math-flags on phi of FP values (2nd try)	Sanjay Patel	2019-09-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The changes here are based on the corresponding diffs for allowing FMF on 'select': D61917 <https://reviews.llvm.org/D61917> As discussed there, we want to have fast-math-flags be a property of an FP value because the alternative (having them on things like fcmp) leads to logical inconsistency such as: https://bugs.llvm.org/show_bug.cgi?id=38086 The earlier patch for select made almost no practical difference because most unoptimized conditional code begins life as a phi (based on what I see in clang). Similarly, I don't expect this patch to do much on its own either because SimplifyCFG promptly drops the flags when converting to select on a minimal example like: https://bugs.llvm.org/show_bug.cgi?id=39535 But once we have this plumbing in place, we should be able to wire up the FMF propagation and start solving cases like that. The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a regression in a LoopVectorize test. We are intersecting the FMF of any FPMathOperator there, so if a phi is not properly annotated, new math instructions may not be either. Once we fix the propagation in SimplifyCFG, it may be safe to remove that hack. Differential Revision: https://reviews.llvm.org/D67564 llvm-svn: 372878