bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[BypassSlowDivision] Improve our handling of divisions by constants	Sanjoy Das	2017-09-26	1	-0/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Don't bail out on constant divisors for divisions that can be narrowed without introducing control flow . This gives us a 32 bit multiply instead of an emulated 64 bit multiply in the generated PTX assembly. Reviewers: jlebar Subscribers: jholewinski, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38265 llvm-svn: 314253
*	[X86][LLVM]Expanding Supports lowerInterleavedStore() in ↵	Michael Zuckerman	2017-09-26	1	-4/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	X86InterleavedAccess (VF{8\|16\|32} stride 3) This patch expands the support of lowerInterleavedStore to {8\|16\|32}x8i stride 3. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8\|16\|32}) . This patch is part two of two patches and it covers the store (interlevaed) side. The patch goal is to optimize the following sequence: a0 a1 a2 a3 a4 a5 a6 a7 b0 b1 b2 b3 b4 b5 b6 b7 c0 c1 c2 c3 c4 c5 c6 c7 into a0 b0 c0 a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a6 b6 c6 a7 b7 c7 Reviewers: zvi guyblank dorit Ayal Differential Revision: https://reviews.llvm.org/D37117 Change-Id: I56ced8bcbea809a37654060771911ade20246ccc llvm-svn: 314234
*	[InstCombine] Remove one use restriction on the shift for calls to ↵	Craig Topper	2017-09-26	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \|	foldICmpAndShift. If this transformation succeeds, we're going to remove our dependency on the shift by rewriting the and. So it doesn't matter how many uses the shift has. This distributes the one use check to other transforms in foldICmpAndConstConst that do need it. Differential Revision: https://reviews.llvm.org/D38206 llvm-svn: 314233
*	[DSE] Merge stores when the later store only writes to memory locations the ↵	Sanjay Patel	2017-09-26	4	-2/+395
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	early store also wrote to (2nd try) This is a 2nd attempt at: https://reviews.llvm.org/rL310055 ...which was reverted at rL310123 because of PR34074: https://bugs.llvm.org/show_bug.cgi?id=34074 In this version, we break out of the inner loop after we successfully merge and kill a pair of stores. In the earlier rev, we were continuing instead, which meant we could process the invalid info from a now dead store. Original commit message (authored by Filipe Cabecinhas): This fixes PR31777. If both stores' values are ConstantInt, we merge the two stores (shifting the smaller store appropriately) and replace the earlier (and larger) store with an updated constant. In the future we should also support vectors of integers. And maybe float/double if we can. Differential Revision: https://reviews.llvm.org/D30703 llvm-svn: 314206
*	TargetLibraryInfo: Stop guessing wchar_t size	Matthias Braun	2017-09-26	3	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Usually the frontend communicates the size of wchar_t via metadata and we can optimize wcslen (and possibly other calls in the future). In cases without the wchar_size metadata we would previously try to guess the correct size based on the target triple; however this is fragile to keep up to date and may miss users manually changing the size via flags. Better be safe and stop guessing and optimizing if the frontend didn't communicate the size. Differential Revision: https://reviews.llvm.org/D38106 llvm-svn: 314185
*	[InstCombine] remove extract-of-select vector transform (2nd try)	Sanjay Patel	2017-09-25	2	-61/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 1st attempt at this: https://reviews.llvm.org/rL314117 was reverted at: https://reviews.llvm.org/rL314118 because of bot fails for clang tests that were checking optimized IR. That should be fixed with: https://reviews.llvm.org/rL314144 ...so try again. Original commit message: The transform to convert an extract-of-a-select-of-vectors was added at: https://reviews.llvm.org/rL194013 And a question about the validity of this transform was raised in the review: https://reviews.llvm.org/D1539: ...but not answered AFAICT> Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with the original commit, but they are not regressing even after we remove the transform in this patch. The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do those transforms as canonicalizations. The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301: https://bugs.llvm.org/show_bug.cgi?id=33301 ...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see. Differential Revision: https://reviews.llvm.org/D38006 llvm-svn: 314147
*	[SLP] Add a test for PR32086, NFC.	Alexey Bataev	2017-09-25	1	-0/+32
\| \| \| \|	llvm-svn: 314137
*	[SimplifyIndvar] Replace the srem used by IV if we can prove both of its ↵	Hongbin Zheng	2017-09-25	2	-3/+112
\| \| \| \| \| \| \| \| \| \| \| \|	operands are non-negative Since now SCEV can handle 'urem', an 'urem' is a better canonical form than an 'srem' because it has well-defined behavior This is a follow up of D34598 Differential Revision: https://reviews.llvm.org/D38072 llvm-svn: 314125
*	revert r314117 because there are bogus clang tests that depend on the optimizer	Sanjay Patel	2017-09-25	2	-52/+61
\| \| \| \|	llvm-svn: 314118
*	[InstCombine] remove extract-of-select vector transform	Sanjay Patel	2017-09-25	2	-61/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The transform to convert an extract-of-a-select-of-vectors was added at: rL194013 And a question about the validity of this transform was raised in the review: https://reviews.llvm.org/D1539: ...but not answered AFAICT> Most of the motivating cases in that patch are now handled by other combines. These are the tests that were added with the original commit, but they are not regressing even after we remove the transform in this patch. The diffs we see after removing this transform cause us to avoid increasing the instruction count, so we don't want to do those transforms as canonicalizations. The motivation for not turning a vector-select-of-vectors into a scalar operation is shown in PR33301: https://bugs.llvm.org/show_bug.cgi?id=33301 ...in those cases, we'll get vector ops with this patch rather than the vector/scalar mix that we currently see. Differential Revision: https://reviews.llvm.org/D38006 llvm-svn: 314117
*	[X86][LLVM]Expanding Supports lowerInterleavedStore() in ↵	Michael Zuckerman	2017-09-25	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	X86InterleavedAccess (VF8 stride 4): This patch expands the support of lowerInterleavedStore to 8x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future. The patch goal is to optimize the following sequence: At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding each 8 chars: c0, c1, , c7 m0, m1, , m7 y0, y1, , y7 k0, k1, ., k7 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers DavidKreitzer Farhana zvi igorb guyblank RKSimon Ayal Differential Revision: https://reviews.llvm.org/D36058 Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32 llvm-svn: 314109
*	[SLP] Support for horizontal min/max reduction.	Alexey Bataev	2017-09-25	2	-1231/+1470
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions. Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions. Patch fixes PR26956. Reviewers: spatel, mkuper, hfinkel, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27846 llvm-svn: 314101
*	[InstCombine] Teach foldICmpUsingKnownBits to simplify SLE/SGE/ULE/UGE to ↵	Craig Topper	2017-09-22	1	-8/+4
\| \| \| \| \| \| \| \|	equality comparisons when the min/max ranges intersect in a single value. This is the inverse of what we do for SGT/SLT/UGT/ULT. llvm-svn: 314032
*	[InstCombine] Add test cases for known bits simplifications for comparisons ↵	Craig Topper	2017-09-22	1	-0/+143
\| \| \| \| \| \| \| \|	that don't depend on constant RHS. NFC This shows some missing simplifications for sge/sle/uge/ule relative to their non-equality counterparts. llvm-svn: 314031
*	[InstCombine] Remove a FIXME from a test that was fixed in r314025.	Craig Topper	2017-09-22	1	-1/+0
\| \| \| \|	llvm-svn: 314030
*	[InstCombine] Add constant splat handling to one of the ICMP_SLT/SGT cases ↵	Craig Topper	2017-09-22	1	-2/+39
\| \| \| \| \| \|	in foldICmpUsingKnownBits. llvm-svn: 314025
*	Rework loop predication pass	Artur Pilipenko	2017-09-22	3	-146/+286
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've found a serious issue with the current implementation of loop predication. The current implementation relies on SCEV and this turned out to be problematic. To fix the problem we had to rework the pass substantially. We have had the reworked implementation in our downstream tree for a while. This is the initial patch of the series of changes to upstream the new implementation. For now the transformation is limited to the following case: * The loop has a single latch with either ult or slt icmp condition. * The step of the IV used in the latch condition is 1. * The IV of the latch condition is the same as the post increment IV of the guard condition. * The guard condition is ult. See the review or the LoopPredication.cpp header for the details about the problem and the new implementation. Reviewed By: sanjoy, mkazantsev Differential Revision: https://reviews.llvm.org/D37569 llvm-svn: 313981
*	Re-land r313825: "[IR] Add llvm.dbg.addr, a control-dependent version of ↵	Reid Kleckner	2017-09-21	3	-0/+312
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	llvm.dbg.declare" The fix is to avoid invalidating our insertion point in replaceDbgDeclare: Builder.insertDeclare(NewAddress, DIVar, DIExpr, Loc, InsertBefore); + if (DII == InsertBefore) + InsertBefore = &std::next(InsertBefore->getIterator()); DII->eraseFromParent(); I had to write a unit tests for this instead of a lit test because the use list order matters in order to trigger the bug. The reduced C test case for this was: void useit(int); static inline void inlineme() { int x[2]; useit(x); } void f() { inlineme(); inlineme(); } llvm-svn: 313905
*	Revert r313825: "[IR] Add llvm.dbg.addr, a control-dependent version of ↵	Daniel Jasper	2017-09-21	3	-312/+0
\| \| \| \| \| \| \| \| \| \| \|	llvm.dbg.declare" .. as well as the two subsequent changes r313826 and r313875. This leads to segfaults in combination with ASAN. Will forward repro instructions to the original author (rnk). llvm-svn: 313876
*	Fixed reverted commit rL312318	Strahinja Petrovic	2017-09-21	2	-0/+103
\| \| \| \| \| \| \| \| \| \|	This patch contains fix for reverted commit rL312318 which was causing failure due to use of unchecked dyn_cast to CIInit. Patch by: Nikola Prica. llvm-svn: 313870
*	Revert "Re-enable "[IRCE] Identify loops with latch comparison against ↵	Serguei Katkov	2017-09-21	1	-245/+0
\| \| \| \| \| \| \| \| \| \|	current IV value"" Revert the patch causing the functional failures. The patch owner is notified with test cases which fail. Test case has been provided to Maxim offline. llvm-svn: 313857
*	[InstCombine] Teach getDemandedBitsLHSMask to handle constant splat vectors	Craig Topper	2017-09-20	1	-5/+3
\| \| \| \| \| \| \| \|	This replaces a ConstantInt dyn_cast with m_APInt Differential Revision: https://reviews.llvm.org/D38100 llvm-svn: 313840
*	[SimplifyCFG] don't create a no-op subtract	Sanjay Patel	2017-09-20	2	-21/+15
\| \| \| \| \| \| \| \| \| \|	I noticed this inefficiency while investigating PR34603: https://bugs.llvm.org/show_bug.cgi?id=34603 This fix will likely push another bug (we don't maintain state of 'LateSimplifyCFG') into hiding, but I'll try to clean that up with a follow-up patch anyway. llvm-svn: 313829
*	[IR] Add llvm.dbg.addr, a control-dependent version of llvm.dbg.declare	Reid Kleckner	2017-09-20	3	-0/+312
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This implements the design discussed on llvm-dev for better tracking of variables that live in memory through optimizations: http://lists.llvm.org/pipermail/llvm-dev/2017-September/117222.html This is tracked as PR34136 llvm.dbg.addr is intended to be produced and used in almost precisely the same way as llvm.dbg.declare is today, with the exception that it is control-dependent. That means that dbg.addr should always have a position in the instruction stream, and it will allow passes that optimize memory operations on local variables to insert llvm.dbg.value calls to reflect deleted stores. See SourceLevelDebugging.rst for more details. The main drawback to generating DBG_VALUE machine instrs is that they usually cause LLVM to emit a location list for DW_AT_location. The next step will be to teach DwarfDebug.cpp how to recognize more DBG_VALUE ranges as not needing a location list, and possibly start setting DW_AT_start_offset for variables whose lifetimes begin mid-scope. Reviewers: aprantl, dblaikie, probinson Subscribers: eraman, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D37768 llvm-svn: 313825
*	[SimplifyCFG] auto-generate full checks; NFC	Sanjay Patel	2017-09-20	1	-78/+204
\| \| \| \|	llvm-svn: 313821
*	[InstCombine] Handle (X & C2) < C1 --> (X & C2) == 0	Craig Topper	2017-09-20	1	-2/+2
\| \| \| \| \| \| \| \|	We already did (X & C2) > C1 --> (X & C2) != 0, if any bit set in (X & C2) will produce a result greater than C1. But there is an equivalent inverse condition with <= C1 (which will be canonicalized to < C1+1) Differential Revision: https://reviews.llvm.org/D38065 llvm-svn: 313819
*	[InstCombine] Pre-commit test cases for D38065.	Craig Topper	2017-09-20	1	-0/+22
\| \| \| \|	llvm-svn: 313818
*	Revert r313771 "[SLP] Vectorize jumbled memory loads."	Hans Wennborg	2017-09-20	4	-102/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This broke the buildbots, e.g. http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/391 > Summary: > This patch tries to vectorize loads of consecutive memory accesses, accessed > in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 > which was reverted back due to some basic issue with representing the 'use mask' > jumbled accesses. > > This patch fixes the mask representation by recording the 'use mask' in the usertree entry. > > Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df > > Subscribers: mzolotukhin > > Reviewed By: ayal > > Differential Revision: https://reviews.llvm.org/D36130 > > Review comments updated accordingly > > Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0 > > Added a TODO for sortLoadAccesses API > > Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58 > > Modified the TODO for sortLoadAccesses API > > Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565 > > Review comment update for using OpdNum to insert the mask in respective location > > Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce > > Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase > > Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b llvm-svn: 313781
*	[InstCombine] Add select simplifications	Quentin Colombet	2017-09-20	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In these cases, two selects have constant selectable operands for both the true and false components and have the same conditional expression. We then create two arithmetic operations of the same type and feed a final select operation using the result of the true arithmetic for the true operand and the result of the false arithmetic for the false operand and reuse the original conditionl expression. The arithmetic operations are naturally folded as a consequence, leaving only the newly formed select to replace the old arithmetic operation. Patch by: Michael Berg <michael_c_berg@apple.com> Differential Revision: https://reviews.llvm.org/D37019 llvm-svn: 313774
*	[SLP] Vectorize jumbled memory loads.	Mohammad Shahid	2017-09-20	4	-52/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Subscribers: mzolotukhin Reviewed By: ayal Differential Revision: https://reviews.llvm.org/D36130 Review comments updated accordingly Change-Id: I22ab0a8a9bac9d49d74baa81a08e1e486f5e75f0 Added a TODO for sortLoadAccesses API Change-Id: I3c679bf1865422d1b45e17ea28f1992bca660b58 Modified the TODO for sortLoadAccesses API Change-Id: Ie64a66cb5f9e2a7610438abb0e750c6e090f9565 Review comment update for using OpdNum to insert the mask in respective location Change-Id: I016d0c1b29874e979efc0205bbf078991f92edce Fixes '-Wsign-compare warning' in LoopAccessAnalysis.cpp and code rebase Change-Id: I64b2ea5e68c1d7b6a028f5ef8251c5a97333f89b llvm-svn: 313771
*	[ThinLTO] Fix dead stripping analysis for SamplePGO	Teresa Johnson	2017-09-20	2	-0/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The fix for dead stripping analysis in the case of SamplePGO indirect calls to local functions (r313151) introduced the possibility of an infinite loop. Make sure we check for the value being already live after we update it for SamplePGO indirect call handling. Reviewers: danielcdh Subscribers: mehdi_amini, inglorion, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D38086 llvm-svn: 313766
*	Revert r313736: "[SLP] Vectorize jumbled memory loads."	Alexander Kornienko	2017-09-20	4	-102/+52
\| \| \| \| \| \| \|	The revision breaks buildbots: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/6694/steps/test/logs/stdio llvm-svn: 313758
*	[SLP] Vectorize jumbled memory loads.	Mohammad Shahid	2017-09-20	4	-52/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch tries to vectorize loads of consecutive memory accesses, accessed in non-consecutive or jumbled way. An earlier attempt was made with patch D26905 which was reverted back due to some basic issue with representing the 'use mask' of jumbled accesses. This patch fixes the mask representation by recording the 'use mask' in the usertree entry. Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh Reviewed By: Ayal Subscribers: mzolotukhin Differential Revision: https://reviews.llvm.org/D36130 Commit after rebase for patch D36130 Change-Id: I8add1c265455669ef288d880f870a9522c8c08ab llvm-svn: 313736
*	Tighten the invariants around LoopBase::invalidate	Sanjoy Das	2017-09-20	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: With this change: - Methods in LoopBase trip an assert if the receiver has been invalidated - LoopBase::clear frees up the memory held the LoopBase instance This change also shuffles things around as necessary to work with this stricter invariant. Reviewers: chandlerc Subscribers: mehdi_amini, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D38055 llvm-svn: 313708
*	Import all inlined indirect call targets for SamplePGO.	Dehao Chen	2017-09-19	2	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In the ThinLTO compilation, if a function is inlined in the profiling binary, we need to inline it before annotation. If the callee is not available in the primary module, a first step is needed to import that callee function. For the current implementation, if the call is an indirect call, which has been promoted to >1 targets and inlined, SamplePGO will only import one target with the largest sample count. This patch fixed the bug to import all targets instead. Reviewers: tejohnson, davidxl Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D36637 llvm-svn: 313678
*	Handle profile mismatch correctly for SamplePGO.	Dehao Chen	2017-09-19	2	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fix the bug when promoted call return type mismatches with the promoted function, we should not try to inline it. Otherwise it may lead to compiler crash. Reviewers: davidxl, tejohnson, eraman Reviewed By: tejohnson Subscribers: llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D38018 llvm-svn: 313658
*	[SLP] Reduce test, NFC.	Alexey Bataev	2017-09-19	1	-134/+56
\| \| \| \|	llvm-svn: 313630
*	[InstCombine] auto-generate complete checks; NFC	Sanjay Patel	2017-09-18	1	-9/+62
\| \| \| \| \| \| \|	The code responsible for these transforms has the potential to add 2 instructions and break min/max patterns (PR33301). llvm-svn: 313575
*	[SLP] Add a test for PR34635, NFC.	Alexey Bataev	2017-09-18	1	-0/+176
\| \| \| \|	llvm-svn: 313559
*	[ARM] Implement isTruncateFree	Sam Parker	2017-09-18	1	-0/+25
\| \| \| \| \| \| \| \| \| \|	Implement the isTruncateFree hooks, lifted from AArch64, that are used by TargetTransformInfo. This allows simplifycfg to reduce the test case into a single basic block. Differential Revision: https://reviews.llvm.org/D37516 llvm-svn: 313533
*	[X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native ↵	Craig Topper	2017-09-16	1	-313/+0
\| \| \| \| \| \| \| \|	shuffles. I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates. llvm-svn: 313450
*	[SLP] Revert r312791 and other necessary commits, except for TTI and	Chandler Carruth	2017-09-15	2	-1823/+1182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CostModel. The original patch added support for horizontal min/max reductions to the SLP vectorizer. This patch causes LLVM to miscompile fairly simple signed min reductions. I have attached a test progrom to http://llvm.org/PR34635 that shows the behavior change after this patch. We found this in a test for the open source Eigen library, but also in other code. Unfortunately, the revert is moderately challenging. It required reverting: r313042: [SLP] Test with multiple uses of conditional op and wrong parent. r312853: [SLP] Fix buildbots, NFC. r312793: [SLP] Fix the warning about paths not returning the value, NFC. r312791: [SLP] Support for horizontal min/max reduction. And even then, I had to completely skip reverting the changes to TTI and CostModel because r312832 rewrote so much of this code. Plus, the cost modeling changes aren implicated in the miscompile, so they should be fine and will just not be used until this gets re-introduced. llvm-svn: 313409
*	[ConstantFold] Return the correct type when folding a GEP with vector indices.	Davide Italiano	2017-09-15	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As Eli pointed out (and I got wrong in the first place), langref says: "The getelementptr returns a vector of pointers, instead of a single address, when one or more of its arguments is a vector. In such cases, all vector arguments should have the same number of elements, and every scalar argument will be effectively broadcast into a vector during address calculation." Costantfold for gep doesn't really take in account this paragraph, returning a pointer instead of a vector of pointer which triggers an assertion in RAUW, as we're trying to replace values with mistmatching types. Differential Revision: https://reviews.llvm.org/D37928 llvm-svn: 313394
*	This patch fixes https://bugs.llvm.org/show_bug.cgi?id=32352	Vivek Pandya	2017-09-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with command line options. The diagnostic handler used to be callback now this patch adds a class DiagnosticHandler. It has virtual method to provide custom diagnostic handler and methods to control which particular remarks are enabled. However LLVM-C API users can still provide callback function for diagnostic handler. llvm-svn: 313390
*	This reverts r313381	Vivek Pandya	2017-09-15	1	-1/+1
\| \| \| \|	llvm-svn: 313387
*	This patch fixes https://bugs.llvm.org/show_bug.cgi?id=32352	Vivek Pandya	2017-09-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	It enables OptimizationRemarkEmitter::allowExtraAnalysis and MachineOptimizationRemarkEmitter::allowExtraAnalysis to return true not only for -fsave-optimization-record but when specific remarks are requested with command line options. The diagnostic handler used to be callback now this patch adds a class DiagnosticHandler. It has virtual method to provide custom diagnostic handler and methods to control which particular remarks are enabled. However LLVM-C API users can still provide callback function for diagnostic handler. llvm-svn: 313382
*	Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection ↵	Hans Wennborg	2017-09-15	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for LEAs." This caused PR34629: asserts firing when building Chromium. It also broke some buildbots building test-suite as reported on the commit thread. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet > > Reviewed By: lsaba > > Subscribers: spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313376
*	[RuntimeUnroll] Add heuristic for unrolling multi-exit loop	Anna Thomas	2017-09-15	1	-0/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a profitability heuristic to enable runtime unrolling of multi-exit loop: There can be atmost two unique exit blocks for the loop and the second exit block should be a deoptimizing block. Also, there can be one other exiting block other than the latch exiting block. The reason for the latter is so that we limit the number of branches in the unrolled code to being at most the unroll factor. Deoptimizing blocks are rarely taken so these additional number of branches created due to the unrolling are predictable, since one of their target is the deopt block. Reviewers: apilipenko, reames, evstupac, mkuper Subscribers: llvm-commits Reviewed by: reames Differential Revision: https://reviews.llvm.org/D35380 llvm-svn: 313363
*	[RuntimeUnrolling] Populate the VMap entry correctly when default generated ↵	Anna Thomas	2017-09-15	1	-0/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	through lookup During runtime unrolling on loops with multiple exits, we update the exit blocks with the correct phi values from both original and remainder loop. In this process, we lookup the VMap for the mapped incoming phi values, but did not update the VMap if a default entry was generated in the VMap during the lookup. This default value is generated when constants or values outside the current loop are looked up. This patch fixes the assertion failure when null entries are present in the VMap because of this lookup. Added a testcase that showcases the problem. llvm-svn: 313358
*	Revert "[SLPVectorizer] Failure to beneficially vectorize 'copyable' ↵	Ilya Biryukov	2017-09-15	1	-68/+132
\| \| \| \| \| \| \| \| \|	elements in integer binary ops." This reverts commit r313348. Reason: it caused buildbot failures. llvm-svn: 313352