bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SCCP] Teach the pass about `mul %x 0` even if %x is overdefined.	Davide Italiano	2016-12-09	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The motivating example is: extern int patatino; int goo() { int x = 0; for (int i = 0; i < 1000000; ++i) { x *= patatino; } return x; } Currently SCCP will not realize that this function returns always zero, therefore will try to unroll and vectorize the loop at -O3 producing an awful lot of (useless) code. With this change, it will just produce: 0000000000000000 <g>: xor %eax,%eax retq llvm-svn: 289175
*	WholeProgramDevirt: Teach the pass to handle structs of arrays.	Peter Collingbourne	2016-12-09	1	-23/+22
\| \| \| \| \| \|	This will become necessary in some cases once D22296 lands. llvm-svn: 289165
*	Make WholeProgramDevirt understand ConstStruct vtables.	Peter Collingbourne	2016-12-09	1	-13/+37
\| \| \| \| \| \| \| \|	Based on a patch by LemonBoy! Differential Revision: https://reviews.llvm.org/D26581 llvm-svn: 289162
*	[SCCP] Make sure SCCP and ConstantFolding agree on undef >> a.	Davide Italiano	2016-12-08	1	-2/+2
\| \| \| \| \| \| \|	Currently SCCP folds the value to -1, while ConstantProp folds to 0. This changes SCCP to do what ConstantFolding does. llvm-svn: 289147
*	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements.	Alexey Bataev	2016-12-08	1	-66/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 289043
*	CFI-icall on Thumb	Evgeniy Stepanov	2016-12-08	1	-4/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Replace @progbits in the section directive with %progbits, because "@" starts a comment on arm/thumb. Use b.w branch instruction. Use .thumb_function and .thumb_set for proper arm/thumb interwork. This way jumptable entry addresses on thumb have bit 0 set (correctly). This does not affect CFI check math, because the address of the jumptable start also has that bit set. This does not work on thumbv5, because it does not support b.w, and the linker would not insert a veneer (trampoline?) to extend the range of b.n. We may need to do full-range plt-style jumptables on thumbv54, which are 12 bytes per entry. Another option is "push lr; bl; pop pc" (4 bytes) but that needs unwinding instructions, etc. Differential Revision: https://reviews.llvm.org/D27499 llvm-svn: 289008
*	[BDCE] Skip metadata while replacing uses.	Davide Italiano	2016-12-07	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \|	The fix committed in r288851 doesn't cover all the cases. In particular, if we have an instruction with side effects which has a no non-dbg use not depending on the bits, we still perform RAUW destroying the dbg.value's first argument. Prevent metadata from being replaced here to avoid the issue. Differential Revision: https://reviews.llvm.org/D27534 llvm-svn: 288987
*	[GVNHoist] Invalidate MemDep when an instruction is moved.	Eli Friedman	2016-12-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	See also r279907. Fixes https://llvm.org/bugs/show_bug.cgi?id=30991 . Differential Revision: https://reviews.llvm.org/D27493 llvm-svn: 288968
*	[LV] Scalarize operands of predicated instructions	Matthew Simpson	2016-12-07	1	-7/+210
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch attempts to scalarize the operand expressions of predicated instructions if they were conditionally executed in the original loop. After scalarization, the expressions will be sunk inside the blocks created for the predicated instructions. The transformation essentially performs un-if-conversion on the operands. The cost model has been updated to determine if scalarization is profitable. It compares the cost of a vectorized instruction, assuming it will be if-converted, to the cost of the scalarized instruction, assuming that the instructions corresponding to each vector lane will be sunk inside a predicated block, possibly avoiding execution. If it's more profitable to scalarize the entire expression tree feeding the predicated instruction, the expression will be scalarized; otherwise, it will be vectorized. We only consider the cost of the entire expression to accurately estimate the cost of the required insertelement and extractelement instructions. Differential Revision: https://reviews.llvm.org/D26083 llvm-svn: 288909
*	Try unbreaking the MSVC build.	Benjamin Kramer	2016-12-07	1	-1/+1
\| \| \| \|	llvm-svn: 288907
*	[LowerTypeTests] Use the TrailingObjects infrastructure for trailing objects.	Benjamin Kramer	2016-12-07	1	-6/+10
\| \| \| \| \| \|	Also avoid allocating ~3x as much memory as needed. llvm-svn: 288904
*	When GVN removes a redundant load, it should not modify the debug location ↵	Andrea Di Biagio	2016-12-07	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of the dominating load. In the case of a fully redundant load LI dominated by an equivalent load V, GVN should always preserve the original debug location of V. Otherwise, we risk to introduce an incorrect stepping. If V has debug info, then clearly it should not be modified. If V has a null debugloc, then it is still potentially incorrect to propagate LI's debugloc because LI may not post-dominate V. Differential Revision: https://reviews.llvm.org/D27468 llvm-svn: 288903
*	[InlineFunction] Refactor code in function `fixupLineNumbers' as suggested ↵	Andrea Di Biagio	2016-12-07	1	-16/+18
\| \| \| \| \| \|	by David in D27462. NFC llvm-svn: 288901
*	[InlineFunction] Do not propagate the callsite debug location to ↵	Andrea Di Biagio	2016-12-07	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions inlined from functions with debug info. When a function F is inlined, InlineFunction extends the debug location of every instruction inlined from F by adding an InlinedAt. However, if an instruction has a 'null' debug location, InlineFunction would propagate the callsite debug location to it. This behavior existed since revision 210459. Revision 210459 was originally committed specifically to workaround the lack of debug information for instructions inlined from intrinsic functions (which are usually declared with attributes `__always_inline__, __nodebug__`). The problem with revision 210459 is that it doesn't make any sort of distinction between instructions inlined from a 'nodebug' function and instructions which are inlined from a function built with debug info. This issue may lead to incorrect stepping in the debugger. This patch works under the assumption that a nodebug function does not have a DISubprogram. When a function F is inlined into another function G, InlineFunction checks if F has debug info associated with it. For nodebug functions, the InlineFunction logic is unchanged (i.e. it would still propagate the callsite debugloc to the inlined instructions). Otherwise, InlineFunction no longer propagates the callsite debug location. Differential Revision: https://reviews.llvm.org/D27462 llvm-svn: 288895
*	LowerTypeTests: Improve performance by optimising type metadata queries.	Peter Collingbourne	2016-12-06	1	-88/+129
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Requesting metadata for a global is a relatively expensive operation as it involves a map lookup, but it's one that we need to do relatively frequently in this pass to collect the list of type metadata nodes associated with a global. This change improves the performance of type metadata queries by prebuilding data structures that keep the global together with its list of type metadata, and changing the pass to use that data structure wherever we were previously passing global references around. This change also eliminates some O(N^2) behavior by collecting the list of globals associated with each type identifier during the first pass over the list of globals rather than visiting each global to compute that list every time we add a new type identifier. Reduces pass runtime on a module containing Chrome's vtables from over 60s to 0.9s. Differential Revision: https://reviews.llvm.org/D27484 llvm-svn: 288859
*	[BDCE/DebugInfo] Preserve llvm.dbg.value's argument.	Davide Italiano	2016-12-06	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BDCE has two phases: 1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so, replaces all its uses with the constant zero. 2. Then, it asks SimplifyDemandedBits again if the instruction is really dead (no side effects etc..) and if so, eliminates it. Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use: %call = tail call i32 (...) @g() #4, !dbg !15 tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17 -> %call = tail call i32 (...) @g() #4, !dbg !15 tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17 but not eliminating the call because it may have arbitrary side effects. In other words, we lose some debug informations. This patch fixes the problem making sure that BDCE does nothing with the instruction if it has side effects and no non-dbg uses. Differential Revision: https://reviews.llvm.org/D27471 llvm-svn: 288851
*	Revert "[SCCP] Remove manual folding of terminator instructions."	Davide Italiano	2016-12-06	1	-2/+27
\| \| \| \| \| \|	This reverts commit r288725 as it broke a bot. llvm-svn: 288759
*	[SCCP] Remove manual folding of terminator instructions.	Davide Italiano	2016-12-05	1	-27/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	There are two cases handled here: 1) a branch on undef 2) a switch with an undef condition. Both cases are currently handled by ResolvedUndefsIn. If we have a branch on undef, we force its value to false (which is trivially foldable). If we have a switch on undef, we force to the first constant (which is also foldable). llvm-svn: 288725
*	[DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation	Adrian Prantl	2016-12-05	2	-29/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	so we can stop using DW_OP_bit_piece with the wrong semantics. The entire back story can be found here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's offset field to mean the offset into the source variable rather than the offset into the location at the top the DWARF expression stack. In order to be able to fix this in a subsequent patch, this patch introduces a dedicated DW_OP_LLVM_fragment operation with the semantics that we used to apply to DW_OP_bit_piece, which is what we actually need while inside of LLVM. This patch is complete with a bitcode upgrade for expressions using the old format. It does not yet fix the DWARF backend to use DW_OP_bit_piece correctly. Implementation note: We discussed several options for implementing this, including reserving a dedicated field in DIExpression for the fragment size and offset, but using an custom operator at the end of the expression works just fine and is more efficient because we then only pay for it when we need it. Differential Revision: https://reviews.llvm.org/D27361 rdar://problem/29335809 llvm-svn: 288683
*	[InstCombine] change select type to eliminate bitcasts	Sanjay Patel	2016-12-03	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This solves a secondary problem seen in PR6137: https://llvm.org/bugs/show_bug.cgi?id=6137#c6 This is similar to the bitwise logic op fold added with: https://reviews.llvm.org/rL287707 And like that patch, I'm artificially restricting the transform from vector <-> scalar types until we're sure that the backend can handle that. llvm-svn: 288584
*	Remove stale comment. NFC.	Michael Kuperstein	2016-12-03	1	-3/+0
\| \| \| \|	llvm-svn: 288572
*	[sanitizer-coverage] use IRB.SetCurrentDebugLocation after IRB.SetInsertPoint	Kostya Serebryany	2016-12-03	1	-1/+1
\| \| \| \|	llvm-svn: 288568
*	[PGO] Fix PGO use ICE when there are unreachable BBs	Rong Xu	2016-12-02	2	-21/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	For -O0 there might be unreachable BBs, which breaks the assumption that all the BBs have an auxiliary data structure. In this patch, we add another interface called findBBInfo() so that a nullptr can be returned for the unreachable BBs (and the callers can ignore those BBs). This fixes the bug reported https://llvm.org/bugs/show_bug.cgi?id=31209 Differential Revision: https://reviews.llvm.org/D27280 llvm-svn: 288528
*	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."	Renato Golin	2016-12-02	1	-74/+66
\| \| \| \| \| \| \| \|	This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's builtins (twice: once in r288412 and once in r288497). We should investigate this offline. llvm-svn: 288508
*	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements.	Alexey Bataev	2016-12-02	1	-66/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288497
*	IR: Move NumElements field from {Array,Vector}Type to SequentialType.	Peter Collingbourne	2016-12-02	3	-31/+11
\| \| \| \| \| \| \| \| \| \|	Now that PointerType is no longer a SequentialType, all SequentialTypes have an associated number of elements, so we can move that information to the base class, allowing for a number of simplifications. Differential Revision: https://reviews.llvm.org/D27122 llvm-svn: 288464
*	Change LoopUnrollPass cost from int to unsigned to make it consistent. (NFC)	Dehao Chen	2016-12-02	1	-5/+5
\| \| \| \|	llvm-svn: 288463
*	IR: Change PointerType to derive from Type rather than SequentialType.	Peter Collingbourne	2016-12-02	2	-7/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html This is for a couple of reasons: - Values of type PointerType are unlike the other SequentialTypes (arrays and vectors) in that they do not hold values of the element type. By moving PointerType we can unify certain aspects of how the other SequentialTypes are handled. - PointerType will have no place in the SequentialType hierarchy once pointee types are removed, so this is a necessary step towards removing pointee types. Differential Revision: https://reviews.llvm.org/D26595 llvm-svn: 288462
*	IR: Change the gep_type_iterator API to avoid always exposing the "current" ↵	Peter Collingbourne	2016-12-02	9	-32/+27
\| \| \| \| \| \| \| \| \| \| \| \| \|	type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458
*	[ThinLTO] Stop importing constant global vars as copies in the backend	Teresa Johnson	2016-12-02	2	-23/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We were doing an optimization in the ThinLTO backends of importing constant unnamed_addr globals unconditionally as a local copy (regardless of whether the thin link decided to import them). This should be done in the thin link instead, so that resulting exported references are marked and promoted appropriately, but will need a summary enhancement to mark these variables as constant unnamed_addr. The function import logic during the thin link was trying to handle this proactively, by conservatively marking all values referenced in the initializer lists of exported global variables as also exported. However, this only handled values referenced directly from the initializer list of an exported global variable. If the value is itself a constant unnamed_addr variable, we could end up exporting its references as well. This caused multiple issues. The first is that the transitively exported references weren't promoted. Secondly, some could not be promoted/renamed (e.g. they had a section or other constraint). recursively, instead of just adding the first level of initializer list references to the ExportList directly. Remove this optimization and the associated handling in the function import backend. SPEC measurements indicate we weren't getting much from it in any case. Fixes PR31052. Reviewers: mehdi_amini Subscribers: krasin, llvm-commits Differential Revision: https://reviews.llvm.org/D26880 llvm-svn: 288446
*	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."	Artem Belevich	2016-12-01	1	-72/+66
\| \| \| \| \| \|	This reverts r288412 which causes severe compile-time regression. llvm-svn: 288431
*	[PR29121] Don't fold if it would produce atomic vector loads or stores	Philip Reames	2016-12-01	1	-14/+28
\| \| \| \| \| \| \| \|	The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal. Differential Revision: https://reviews.llvm.org/D24365 llvm-svn: 288415
*	Factor out common parts of LVI and Float2Int into ConstantRange [NFCI]	Philip Reames	2016-12-01	1	-30/+15
\| \| \| \| \| \| \| \| \| \|	This just extracts out the transfer rules for constant ranges into a single shared point. As it happens, neither bit of code actually overlaps in terms of the handled operators, but with this change that could easily be tweaked in the future. I also want to have this separated out to make experimenting with a eager value info implementation and possibly a ValueTracking-like fixed depth recursion peephole version. There's no reason all four of these can't share a common implementation which reduces the chances of bugs. Differential Revision: https://reviews.llvm.org/D27294 llvm-svn: 288413
*	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements.	Alexey Bataev	2016-12-01	1	-66/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288412
*	[SLP] Fixed cost model for horizontal reduction.	Alexey Bataev	2016-12-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when cost of scalar operations is evaluated the vector type is used for scalar operations. Patch fixes this issue and fixes evaluation of the vector operations cost. Several test showed that vector cost model is too optimistic. It allowed vectorization of 8 or less add/fadd operations, though scalar code is faster. Actually, only for 16 or more operations vector code provides better performance. Differential Revision: https://reviews.llvm.org/D26277 llvm-svn: 288398
*	[GVN, OptDiag] Print the interesting instructions involved in missed ↵	Adam Nemet	2016-12-01	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load-elimination [recommitting after the fix in r288307] This includes the intervening store and the load/store that we're trying to forward from in the optimization remark for the missed load elimination. This is hooked up under a new mode in ORE that allows for compile-time budget for a bit more analysis to print more insightful messages. This mode is currently enabled for -fsave-optimization-record (-Rpass is trickier since it is controlled in the front-end). With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 Differential Revision: https://reviews.llvm.org/D26490 llvm-svn: 288381
*	[GVN, OptDiag] Include the value that is forwarded in load elimination	Adam Nemet	2016-12-01	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[recommitting after the fix in r288307] This requires some changes to the opt-diag API. Hal and I have discussed this at the Dev Meeting and came up with a streaming delimiter (setExtraArgs) to solve this. Arguments after this delimiter are only included in the optimization records and not in the remarks printed in the compiler output. (Note, how in the test the content of the YAML file changes but the remarks on the compiler output don't.) This implements the green GVN message with a bug fix at line http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 The fix is that now we properly include the constant value in the message: "load of type i32 eliminated in favor of 7" Differential Revision: https://reviews.llvm.org/D26489 llvm-svn: 288380
*	[GVN] Basic optimization remark support	Adam Nemet	2016-12-01	1	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[recommitting after the fix in r288307] Follow-on patches will add more interesting cases. The goal of this patch-set is to get the GVN messages printed in opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This is the optimization view for the function (the last remark in the function has a bug which is fixed in this series): http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430 Differential Revision: https://reviews.llvm.org/D26488 llvm-svn: 288370
*	[SCCP] Switch over to DEBUG() and drop an #ifdef.	Davide Italiano	2016-12-01	1	-6/+2
\| \| \| \|	llvm-svn: 288325
*	[SCCP] Prefer `auto` when the type is obvious. NFCI.	Davide Italiano	2016-12-01	1	-27/+27
\| \| \| \|	llvm-svn: 288324
*	Object: Extract a ModuleSymbolTable class from IRObjectFile.	Peter Collingbourne	2016-12-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This class represents a symbol table built from in-memory IR. It provides access to GlobalValues and should only be used if such access is required (e.g. in the LTO implementation). We will eventually change IRObjectFile to read from a bitcode symbol table rather than using ModuleSymbolTable, so it would not be able to expose the module. Differential Revision: https://reviews.llvm.org/D27073 llvm-svn: 288319
*	[GVN] When merging blocks update LoopInfo if it's available	Adam Nemet	2016-12-01	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	If LoopInfo is available during GVN, BasicAA will use it. However MergeBlockIntoPredecessor does not update LI as it merges blocks. This didn't use to cause problems because LI was freed before GVN/BasicAA. Now with OptimizationRemarkEmitter, the lifetime of LI is extended so LI needs to be kept up-to-date during GVN. Differential Revision: https://reviews.llvm.org/D27288 llvm-svn: 288307
*	Fix LSR best register search algorithm.	Evgeny Stupachenko	2016-11-30	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fix a case when first register in a search has maximum RegUses.getUsedByIndices(Reg).count() Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D26877 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 288278
*	[LoopUnroll] Implement profile-based loop peeling	Michael Kuperstein	2016-11-30	5	-31/+474
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements PGO-driven loop peeling. The basic idea is that when the average dynamic trip-count of a loop is known, based on PGO, to be low, we can expect a performance win by peeling off the first several iterations of that loop. Unlike unrolling based on a known trip count, or a trip count multiple, this doesn't save us the conditional check and branch on each iteration. However, it does allow us to simplify the straight-line code we get (constant-folding, etc.). This is important given that we know that we will usually only hit this code, and not the actual loop. This is currently disabled by default. Differential Revision: https://reviews.llvm.org/D25963 llvm-svn: 288274
*	[InstCombine] allow more narrowing transforms for logic ops	Sanjay Patel	2016-11-30	2	-9/+24
\| \| \| \| \| \| \|	We had a limited version of this for scalar 'and'; this expands the transform to 'or' and 'xor' and allows vectors types too. llvm-svn: 288273
*	Fix some Clang-tidy and Include What You Use warnings; other minor fixes (NFC).	Eugene Zelenko	2016-11-30	3	-41/+133
\| \| \| \| \| \|	This preparation to remove SetVector.h dependency on SmallSet.h. llvm-svn: 288256
*	Revert "[GVN] Basic optimization remark support"	Adam Nemet	2016-11-30	1	-20/+3
\| \| \| \| \| \| \| \|	This reverts commit r288210. The failure on the stage2 LTO build is back. llvm-svn: 288226
*	[GVN] Basic optimization remark support	Adam Nemet	2016-11-29	1	-3/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[recommiting patches one-by-one to see which breaks the stage2 LTO bot] Follow-on patches will add more interesting cases. The goal of this patch-set is to get the GVN messages printed in opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This is the optimization view for the function (the last remark in the function has a bug which is fixed in this series): http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430 Differential Revision: https://reviews.llvm.org/D26488 llvm-svn: 288210
*	[StructurizeCFG] Fix infinite loop in rebuildSSA.	Justin Lebar	2016-11-29	1	-1/+4
\| \| \| \| \| \| \| \| \|	Michel Dänzer reported that r288051, "[StructurizeCFG] Use range-based for loops", introduced a bug into rebuildSSA, wherein we were iterating over an instruction's use list while modifying it, without taking care to do this correctly. llvm-svn: 288200
*	Use CallSite to simplify code	David Blaikie	2016-11-29	1	-5/+3
\| \| \| \|	llvm-svn: 288192