bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Make ICP uses PSI to check for hotness.	Dehao Chen	2017-08-08	1	-10/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, ICP checks the count against a fixed value to see if it is hot enough to be promoted. This does not work for SamplePGO because sampled count may be much smaller. This patch uses PSI to check if the count is hot enough to be promoted. Reviewers: davidxl, tejohnson, eraman Reviewed By: davidxl Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36341 llvm-svn: 310416
*	[KnownBits][ValueTracking] Move the math for calculating known bits for ↵	Craig Topper	2017-08-08	1	-41/+1
\| \| \| \| \| \| \| \| \| \| \| \|	add/sub into a static method in KnownBits object I want to reuse this code in SimplifyDemandedBits handling of Add/Sub. This will make that easier. Wonder if we should use it in SelectionDAG's computeKnownBits too. Differential Revision: https://reviews.llvm.org/D36433 llvm-svn: 310378
*	BasicAA: aliasGEP shouldn't get a PartialAlias response here	Nuno Lopes	2017-08-08	1	-1/+3
\| \| \| \| \| \|	add an assert() to ensure that's the case (as I'm not convinced it won't happen) llvm-svn: 310373
*	[PM] Fix a likely more critical infloop bug in the CGSCC pass manager.	Chandler Carruth	2017-08-08	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was just a bad oversight on my part. The code in question should never have worked without this fix. But it turns out, there are relatively few places that involve libfunctions that participate in a single SCC, and unless they do, this happens to not matter. The effect of not having this correct is that each time through this routine, the edge from write_wrapper to write was toggled between a call edge and a ref edge. First time through, it becomes a demoted call edge and is turned into a ref edge. Next time it is a promoted call edge from a ref edge. On, and on it goes forever. I've added the asserts which should have always been here to catch silly mistakes like this in the future as well as a test case that will actually infloop without the fix. The other (much scarier) infinite-inlining issue I think didn't actually occur in practice, and I simply misdiagnosed this minor issue as that much more scary issue. The other issue is still a real issue, but I'm somewhat relieved that so far it hasn't happened in real-world code yet... llvm-svn: 310342
*	[LCG] Remove yet another variable only used inside of asserts.	Chandler Carruth	2017-08-05	1	-3/+3
\| \| \| \|	llvm-svn: 310174
*	[LCG] Fold otherwise unused variable into assert.	Benjamin Kramer	2017-08-05	1	-3/+2
\| \| \| \| \| \|	No functionality change intended. llvm-svn: 310173
*	[LCG] Completely remove the parent set and leaf tracking for RefSCCs.	Chandler Carruth	2017-08-05	1	-176/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After the previous series of patches, this is now trivial and deletes a pretty astonishing amount of complexity. This has been a long time coming, as the move toward a PO sequence of RefSCCs started eroding the underlying use cases for this half of the data structure. Among the biggest advantages here is that now there aren't two independent data structures that need to stay in sync. Some of my profiling has also indicated that updating the parent sets was among the most expensive parts of the lazy call graph. Eliminating it whole sale is likely to be a nice win in terms of compile time. Last but not least, I had discussed with some folks previously keeping it around for asserts and other correctness checking, but once the fundamentals of the parent and child checking were implemented without the parent sets their value in correctness checking was tiny and no where near worth the cost of the complexity required to keep everything up-to-date. llvm-svn: 310171
*	[LCG] Re-implement the basic isParentOf, isAncestorOf, isChildOf, and	Chandler Carruth	2017-08-05	1	-10/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isDescendantOf methods on RefSCCs in terms of the forward edges rather than the parent sets. This is technically slower, but probably not interestingly slower, and all of these routines were already so expensive that they're guarded behind both !NDEBUG and EXPENSIVE_CHECKS. This removes another non-critical usage of parent sets. I've also added some comments to try and help clarify to any potential users the costs of these routines. They're mostly useful for debugging, asserts, or other queries. llvm-svn: 310170
*	[LCG] Add the concept of a "dead" node and use it to avoid a complex	Chandler Carruth	2017-08-05	1	-8/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	walk over the parent set. When removing a single function from the call graph, we previously would walk the entire RefSCC's parent set and then walk every outgoing edge just to find the ones to remove. In addition to this being quite high complexity in theory, it is also the last fundamental use of the parent sets. With this change, when we remove a function we transform the node containing it to be recognizably "dead" and then teach the edge iterators to recognize edges to such nodes and skip them the same way they skip null edges. We can't move fully to using "dead" nodes -- when disconnecting two live nodes we need to null out the edge. But the complexity this adds to the edge sequence isn't too bad and the simplification of lazily handling this seems like a significant win. llvm-svn: 310169
*	[LCG] Replace an implicit bool operator with a named function. (NFC)	Chandler Carruth	2017-08-05	1	-2/+2
\| \| \| \| \| \| \| \| \|	The definition of 'false' here was already pretty vague and debatable, and I'm about to add another potential 'false' that would actually make much more sense in a bool operator. Especially given how rarely this is used, a nicely named method seems better. llvm-svn: 310165
*	[LCG] When removing a dead function and clearing out the data	Chandler Carruth	2017-08-05	1	-0/+2
\| \| \| \| \| \| \| \|	structures, actually null out the graph pointers as well. We won't ever update these, and we certainly shouldn't be calling any methods on them, so it seems good to defensively nuke them. llvm-svn: 310164
*	[LCG] Rather than walking the directed graph structure to update graph	Chandler Carruth	2017-08-05	1	-14/+4
\| \| \| \| \| \| \| \| \|	pointers in node objects, just walk the map from function to node. It doesn't have stable ordering, but works just as well and is much simpler. We don't need ordering when just updating internal pointers. llvm-svn: 310163
*	[LCG] Remove the complex walk of the parent sets to update graph	Chandler Carruth	2017-08-05	1	-11/+2
\| \| \| \| \| \| \| \| \|	pointers. This is completely unnecessary as we have a trivial list of RefSCCs now that we can walk. llvm-svn: 310162
*	[LCG] Remove the use of the parent sets to compute connectivity when	Chandler Carruth	2017-08-05	1	-16/+14
\| \| \| \| \| \| \| \| \| \|	merging RefSCCs. The logic to directly use the reference edges is simpler and not substantially slower (despite the comments to the contrary) because this is not actually an especially hot part of LCG in practice. llvm-svn: 310161
*	[SCEV] Preserve NSW information for sext(subtract).	Amara Emerson	2017-08-04	1	-3/+29
\| \| \| \| \| \| \| \| \| \|	Pushes the sext onto the operands of a Sub if NSW is present. Also adds support for propagating the nowrap flags of the llvm.ssub.with.overflow intrinsic during analysis. Differential Revision: https://reviews.llvm.org/D35256 llvm-svn: 310117
*	[Inliner] Fix a typo in option description. NFC.	Easwaran Raman	2017-08-04	1	-1/+1
\| \| \| \|	llvm-svn: 310073
*	[ConstantInt] Use ConstantInt::getValue instead of ↵	Craig Topper	2017-08-04	2	-4/+4
\| \| \| \| \| \| \| \|	Constant::getUniqueInteger in a few places where we obviously have a ConstantInt. NFC getUniqueInteger will ultimately call ConstantInt::getValue, but calling ConstantInt::getValue should be inlined. llvm-svn: 310069
*	Adjust the hotness threshold from 99.9% to 99%.	Dehao Chen	2017-08-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We originally set the hotness threshold as 99.9% to be consistent with gcc FDO. But because the inline heuristic is different between 2 compilers: llvm uses bottom-up algorithm while gcc uses priority based. The LLVM algorithm tends to inline too much early that prevents hot callsites from further inlined into its caller. Due to this restriction, we think it is reasonable to lower the hotness threshold to give priority to those that are really hot. Our experiments show that this change would improve performance on large applications. Note that the inline heuristic has great room for further tuning. Once the inline heuristics are refined, we could adjust this threshold to allow inlining for less hot callsites. Reviewers: davidxl, tejohnson, eraman Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D36317 llvm-svn: 310065
*	[ThinLTO] Add FunctionAttrs to ThinLTO index	Charles Saternos	2017-08-04	1	-6/+17
\| \| \| \| \| \|	Adds function attributes to index: ReadNone, ReadOnly, NoRecurse, NoAlias. This attributes will be used for future ThinLTO optimizations that will propagate function attributes across modules. llvm-svn: 310061
*	[InstCombine] Canonicalize clamp of float types to minmax in fast mode.	Nikolai Bozhenov	2017-08-04	1	-1/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This commit allows matchSelectPattern to recognize clamp of float arguments in the presence of FMF the same way as already done for integers. This case is a little different though. With integers, given the min/max pattern is recognized, DAGBuilder starts selecting MIN/MAX "automatically". That is not the case for float, because for them only full FMINNAN/FMINNUM/FMAXNAN/FMAXNUM ISD nodes exist and they do care about NaNs. On the other hand, some backends (e.g. X86) have only FMIN/FMAX nodes that do not care about NaNS and the former NAN/NUM nodes are illegal thus selection is not happening. So I decided to do such kind of transformation in IR (InstCombiner) instead of complicating the logic in the backend. Reviewers: spatel, jmolloy, majnemer, efriedma, craig.topper Reviewed By: efriedma Subscribers: hiraditya, javed.absar, n.bozhenov, llvm-commits Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D33186 llvm-svn: 310054
*	Use profile summary to disable peeling for huge working sets	Teresa Johnson	2017-08-03	1	-8/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Detect when the working set size of a profiled application is huge, by comparing the number of counts required to reach the hot percentile in the profile summary to a large threshold. When the working set size is determined to be huge, disable peeling to avoid bloating the working set further. Note that the selected threshold (15K) is significantly larger than the largest working set value in SPEC cpu2006 (which is gcc at around 11K). Reviewers: davidxl Subscribers: mehdi_amini, mzolotukhin, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D36288 llvm-svn: 310005
*	[Inliner] Increase threshold for hot callsites without PGO.	Easwaran Raman	2017-08-03	1	-3/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This increases the inlining threshold for hot callsites. Hotness is defined in terms of block frequency of the callsite relative to the caller's entry block's frequency. Since this requires BFI in the inliner, this only affects the new PM pipeline. This is enabled by default at -O3. This improves the performance of some internal benchmarks. Notably, an internal benchmark for Gipfeli compression (https://github.com/google/gipfeli) improves by ~7%. Povray in SPEC2006 improves by ~2.5%. I am running more experiments and will update the thread if other benchmarks show improvement/regression. In terms of text size, LLVM test-suite shows an 1.22% text size increase. Diving into the results, 13 of the benchmarks in the test-suite increases by > 10%. Most of these are small, but Adobe-C++/loop_unroll (17.6% increases) and tramp3d(20.7% size increase) have >250K text size. On a large application, the text size increases by 2% Reviewers: chandlerc, davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36199 llvm-svn: 309994
*	[LVI] Constant-propagate a zero extension of the switch condition value ↵	Hiroshi Yamauchi	2017-08-03	1	-6/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	through case edges Summary: (This is a second attempt as https://reviews.llvm.org/D34822 was reverted.) LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges. But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur. This patch adds a small logic to handle such a case in getEdgeValueLocal(). This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary. With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%. Reviewers: sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36247 llvm-svn: 309986
*	Do not want to use BFI to get profile count for sample pgo	Dehao Chen	2017-08-03	1	-2/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For SamplePGO, we already record the callsite count in the call instruction itself. So we do not want to use BFI to get profile count as it is less accurate. Reviewers: tejohnson, davidxl, eraman Reviewed By: eraman Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36025 llvm-svn: 309964
*	[SCEV] Re-enable "Cache results of computeExitLimit"	Max Kazantsev	2017-08-03	1	-2/+37
\| \| \| \| \| \| \| \| \| \|	The patch rL309080 was reverted because it did not clean up the cache on "forgetValue" method call. This patch re-enables this change, adds the missing check and introduces two new unit tests that make sure that the cache is cleaned properly. Differential Revision: https://reviews.llvm.org/D36087 llvm-svn: 309925
*	[StackColoring] Update AliasAnalysis information in stack coloring pass (part 2)	Hiroshi Inoue	2017-08-02	1	-7/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is update after the first patch (https://reviews.llvm.org/rL309651) based on the post-commit comments. Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types. Actually, there is a FIXME comment in StackColoring.cpp // FIXME: In order to enable the use of TBAA when using AA in CodeGen, // we'll also need to update the TBAA nodes in MMOs with values // derived from the merged allocas. But, TBAA has been already enabled in CodeGen without fixing this pass. The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling. Although we observed the problem on ppc64le, this is a platform neutral issue. This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots. This patch fixes PR33928. llvm-svn: 309849
*	[InlineCost] Remove redundant call. NFC.	Chad Rosier	2017-08-02	1	-2/+3
\| \| \| \|	llvm-svn: 309819
*	[InlineCost] Simplify more 'and' and 'or' operations.	Chad Rosier	2017-08-02	1	-0/+30
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D35856 llvm-svn: 309817
*	[SCEV/IndVars] Always compute loop exiting values if the backedge count is 0	Sanjoy Das	2017-08-01	1	-0/+19
\| \| \| \| \| \| \| \| \|	If SCEV can prove that the backedge taken count for a loop is zero, it does not need to "understand" a recursive PHI to compute its exiting value. This should fix PR33885. llvm-svn: 309758
*	[Value Tracking] Default argument to true and rename accordingly. NFC.	Chad Rosier	2017-08-01	1	-11/+11
\| \| \| \| \| \|	IMHO this is a bit more readable. llvm-svn: 309739
*	[Value Tracking] Refactor and/or logic into helper. NFC.	Chad Rosier	2017-08-01	1	-40/+52
\| \| \| \|	llvm-svn: 309726
*	[PM] Add a comment clarifying what a particular predicate is doing.	Chandler Carruth	2017-08-01	1	-0/+8
\| \| \| \| \| \| \|	This came up as a point of confusion while working on a fundamental problem with the combination of CGSCC iteration and the inliner. llvm-svn: 309662
*	Revert r309415: "[LVI] Constant-propagate a zero extension of the switch ↵	Daniel Jasper	2017-08-01	1	-108/+4
\| \| \| \| \| \| \| \| \|	condition value through case edges" This causes assertion failures in (a somewhat old version of) SpiderMonkey. I have already forwarded reproduction instructions to the patch author. llvm-svn: 309659
*	[StackColoring] Update AliasAnalysis information in stack coloring pass	Hiroshi Inoue	2017-08-01	1	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stack coloring pass need to maintain AliasAnalysis information when merging stack slots of different types. Actually, there is a FIXME comment in StackColoring.cpp // FIXME: In order to enable the use of TBAA when using AA in CodeGen, // we'll also need to update the TBAA nodes in MMOs with values // derived from the merged allocas. But, TBAA has been already enabled in CodeGen without fixing this pass. The incorrect TBAA metadata results in recent failures in bootstrap test on ppc64le (PR33928) by allowing unsafe instruction scheduling. Although we observed the problem on ppc64le, this is a platform neutral issue. This patch makes the stack coloring pass maintains AliasAnalysis information when merging multiple stack slots. llvm-svn: 309651
*	Allow None as a MemoryLocation to getModRefInfo	Alina Sbirlea	2017-08-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Adding part of the changes in D30369 (needed to make progress): Current patch updates AliasAnalysis and MemoryLocation, but does _not_ clean up MemorySSA. Original summary from D30369, by dberlin: Currently, we have instructions which affect memory but have no memory location. If you call, for example, MemoryLocation::get on a fence, it asserts. This means things specifically have to avoid that. It also means we end up with a copy of each API, one taking a memory location, one not. This starts to fix that. We add MemoryLocation::getOrNone as a new call, and reimplement the old asserting version in terms of it. We make MemoryLocation optional in the (Instruction, MemoryLocation) version of getModRefInfo, and kill the old one argument version in favor of passing None (it had one caller). Now both can handle fences because you can just use MemoryLocation::getOrNone on an instruction and it will return a correct answer. We use all this to clean up part of MemorySSA that had to handle this difference. Note that literally every actual getModRefInfo interface we have could be made private and replaced with: getModRefInfo(Instruction, Optional<MemoryLocation>) and getModRefInfo(Instruction, Optional<MemoryLocation>, Instruction, Optional<MemoryLocation>) and delegating to the right ones, if we wanted to. I have not attempted to do this yet. Reviewers: dberlin, davide, dblaikie Subscribers: sanjoy, hfinkel, chandlerc, llvm-commits Differential Revision: https://reviews.llvm.org/D35441 llvm-svn: 309641
*	[SLP] Initial rework for min/max horizontal reduction vectorization, NFC.	Alexey Bataev	2017-07-31	1	-41/+69
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: All getReductionCost() functions are renamed to getArithmeticReductionCost() + added basic infrastructure to handle non-binary reduction operations. Reviewers: spatel, mzolotukhin, Ayal, mkuper, gilr, hfinkel Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D29402 llvm-svn: 309566
*	[Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC.	Alexey Bataev	2017-07-31	2	-5/+5
\| \| \| \|	llvm-svn: 309563
*	[SCEV] Change an early exit to an assert; NFC	Sanjoy Das	2017-07-29	1	-2/+2
\| \| \| \|	llvm-svn: 309480
*	[Inliner] Do not apply any bonus for cold callsites.	Easwaran Raman	2017-07-28	1	-28/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Inlining threshold is increased by application of bonuses when the callee has a single reachable basic block or is rich in vector instructions. Similarly, inlining cost is reduced by applying a large bonus when the last call to a static function is considered for inlining. This patch disables the application of these bonuses when the callsite or the callee is cold. The intention here is to prevent a large cold callsite from being inlined to a non-cold caller that could prevent the caller from being inlined. This is especially important when the cold callsite is a last call to a static since the associated bonus is very high. Reviewers: chandlerc, davidxl Subscribers: danielcdh, llvm-commits Differential Revision: https://reviews.llvm.org/D35823 llvm-svn: 309441
*	[Value Tracking] Refactor icmp comparison logic into helper. NFC.	Chad Rosier	2017-07-28	1	-41/+62
\| \| \| \|	llvm-svn: 309417
*	[LVI] Constant-propagate a zero extension of the switch condition value ↵	Hiroshi Yamauchi	2017-07-28	1	-4/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	through case edges Summary: LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges. But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur. This patch adds a small logic to handle such a case in getEdgeValueLocal(). This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary. With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%. Reviewers: wmi, dberlin, sanjoy Reviewed By: sanjoy Subscribers: davide, davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D34822 llvm-svn: 309415
*	[ValueTracking] Remove a number of unused arguments. NFC.	Chad Rosier	2017-07-28	1	-26/+17
\| \| \| \|	llvm-svn: 309385
*	[SCEV] Do not visit nodes twice in containsConstantSomewhere	Max Kazantsev	2017-07-28	1	-13/+20
\| \| \| \| \| \| \| \| \| \|	This patch reworks the function that searches constants in Add and Mul SCEV expression chains so that now it does not visit a node more than once, and also renames this function for better correspondence between its implementation and semantics. Differential Revision: https://reviews.llvm.org/D35931 llvm-svn: 309367
*	Revert "[SCEV] Cache results of computeExitLimit"	Sanjoy Das	2017-07-28	1	-21/+0
\| \| \| \| \| \| \| \| \|	This reverts commit r309080. The patch needs to clear out the ScalarEvolution::ExitLimits cache in forgetMemoizedResults. I've replied on the commit thread for the patch with more details. llvm-svn: 309357
*	Changing the default MaxNumPromotions from 2 to 3.	Dehao Chen	2017-07-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In performance tuning, we see performance benefits when enlarge the maximum num promotion targets to 3. This is safe as soon as we have total percentage threshold properly setup (https://reviews.llvm.org/D35962) Reviewers: davidxl, tejohnson Reviewed By: tejohnson Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D35966 llvm-svn: 309346
*	Separate the ICP total threshold and remaining threshold.	Dehao Chen	2017-07-28	1	-12/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In the current implementation, isPromotionProfitable only checks if the call count to a direct target is no less than a certain percentage threshold of the remaining call counts that have not been promoted. This causes code size problems when the target count is small but greater than a large portion of remaining counts. E.g. target1 takes 99.9%, while target2 takes 0.1%. Both targets will be promoted and inlined, makes the function size too large, which potentially prevents it from further inlining into its callers. This patch adds another percentage threshold against the total indirect call count. If the target count needs to be no less than both thresholds in order to be promoted speculatively. Reviewers: davidxl, tejohnson Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D35962 llvm-svn: 309345
*	[InlineCost, NFC] Change CallAnalyzer::isGEPFree to use TTI::getUserCost ↵	Evgeny Astigeevich	2017-07-27	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of TTI::getGEPCost Currently CallAnalyzer::isGEPFree uses TTI::getGEPCost to check if GEP is free. TTI::getGEPCost cannot handle cases when GEPs participate in Def-Use dependencies (see https://reviews.llvm.org/D31186 for example). There is TTI::getUserCost which can calculate the cost more accurately by taking dependencies into account. Differential Revision: https://reviews.llvm.org/D33685 llvm-svn: 309268
*	[TTI] fixing a bug in the isLegalMaskedScatter API	Mohammed Agabaria	2017-07-27	1	-1/+1
\| \| \| \| \| \| \| \| \|	isLegalMaskedScatter called the Gather version which is a bug. use test case is provided within the patch of AVX2 gathers at: https://reviews.llvm.org/D35772 Differential Revision: https://reviews.llvm.org/D35786 llvm-svn: 309260
*	[SCEV] Cache results of computeExitLimit	Max Kazantsev	2017-07-26	1	-0/+21
\| \| \| \| \| \| \| \| \|	This patch adds a cache for computeExitLimit to save compilation time. A lot of examples of tests that take extensive time to compile are attached to the bug 33494. Differential Revision: https://reviews.llvm.org/D35827 llvm-svn: 309080
*	[SCEV] Remove unnecessary call to forgetMemoizedResults	Sanjoy Das	2017-07-26	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	`SCEVUnknown::allUsesReplacedWith` does not need to call `forgetMemoizedResults` since RAUW does a value-equivalent replacement by assumption. If this assumption was false then the later setValPtr(New) call would be incorrect too. This is a non-trivial performance optimization for functions with a large number of loops since `forgetMemoizedResults` walks all loop backedge taken counts to see if any of them use the SCEVUnknown being RAUWed. However, this improvement is difficult to demonstrate without checking in an excessively large IR file. llvm-svn: 309072