bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[LV] Avoid redundant operations manipulating masks	Ayal Zaks	2017-07-31	2	-34/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Loop Vectorizer generates redundant operations when manipulating masks: AND with true, OR with false, compare equal to true. Instead of relying on a subsequent pass to clean them up, this patch avoids generating them. Use null (no-mask) to represent all-one full masks, instead of a constant all-one vector, following the convention of masked gathers and scatters. Preparing for a follow-up VPlan patch in which these mask manipulating operations are modeled using recipes. Differential Revision: https://reviews.llvm.org/D35725 llvm-svn: 309558
*	[LoopInterchange] Do not interchange loops with function calls.	Florian Hahn	2017-07-31	3	-3/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without any information about the called function, we cannot be sure that it is safe to interchange loops which contain function calls. For example there could be dependences that prevent interchanging between accesses in the called function and the loops. Even functions without any parameters could cause problems, as they could access memory using global pointers. For now, I think it is only safe to interchange loops with calls marked as readnone. With this patch, the LLVM test suite passes with `-O3 -mllvm -enable-loopinterchange` and LoopInterchangeProfitability::isProfitable returning true for all loops. check-llvm and check-clang also pass when bootstrapped in a similar fashion, although only 3 loops got interchanged. Reviewers: karthikthecool, blitz.opensource, hfinkel, mcrosier, mkuper Reviewed By: mcrosier Subscribers: mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D35489 llvm-svn: 309547
*	[SLP]: Add test to resurrect the jumbled load patch. This test has multiple uses	Mohammad Shahid	2017-07-31	1	-0/+48
\| \| \| \| \| \| \|	of memory loads by different user Change-Id: I40b5ba8b810265440f3e55efca77c4b41ca98fa4 llvm-svn: 309544
*	Expanding the test case for vf8 for stride 4 interleaved.	Michael Zuckerman	2017-07-30	1	-0/+15
\| \| \| \|	llvm-svn: 309511
*	Migrate PGOMemOptSizeOpt to use new OptimizationRemarkEmitter Pass	Sam Elliott	2017-07-30	1	-1/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes PR33790. This patch still needs a yaml-style test, which I shall write tomorrow Reviewers: anemet Reviewed By: anemet Subscribers: anemet, llvm-commits Differential Revision: https://reviews.llvm.org/D35981 llvm-svn: 309497
*	Fix test failure without X86 backend	Hiroshi Inoue	2017-07-29	1	-0/+0
\| \| \| \| \| \|	move test/Transforms/SimplifyCFG/disable-lookup-table.ll into test/Transforms/SimplifyCFG/X86/disable-lookup-table.ll to avoid test failure when X86 backend is not enabled llvm-svn: 309487
*	[SimplifyCFG] Make the no-jump-tables attribute also disable switch lookup ↵	Sumanth Gundapaneni	2017-07-28	2	-0/+80
\| \| \| \| \| \| \| \|	tables Differential Revision: https://reviews.llvm.org/D35579 llvm-svn: 309444
*	[Inliner] Do not apply any bonus for cold callsites.	Easwaran Raman	2017-07-28	3	-0/+106
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Inlining threshold is increased by application of bonuses when the callee has a single reachable basic block or is rich in vector instructions. Similarly, inlining cost is reduced by applying a large bonus when the last call to a static function is considered for inlining. This patch disables the application of these bonuses when the callsite or the callee is cold. The intention here is to prevent a large cold callsite from being inlined to a non-cold caller that could prevent the caller from being inlined. This is especially important when the cold callsite is a last call to a static since the associated bonus is very high. Reviewers: chandlerc, davidxl Subscribers: danielcdh, llvm-commits Differential Revision: https://reviews.llvm.org/D35823 llvm-svn: 309441
*	Remove the obsolete offset parameter from @llvm.dbg.value	Adrian Prantl	2017-07-28	43	-162/+162
\| \| \| \| \| \| \| \| \| \| \| \|	There is no situation where this rarely-used argument cannot be substituted with a DIExpression and removing it allows us to simplify the DWARF backend. Note that this patch does not yet remove any of the newly dead code. rdar://problem/33580047 Differential Revision: https://reviews.llvm.org/D35951 llvm-svn: 309426
*	[LVI] Constant-propagate a zero extension of the switch condition value ↵	Hiroshi Yamauchi	2017-07-28	1	-0/+114
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	through case edges Summary: LazyValueInfo currently computes the constant value of the switch condition through case edges, which allows the constant value to be propagated through the case edges. But we have seen a case where a zero-extended value of the switch condition is used past case edges for which the constant propagation doesn't occur. This patch adds a small logic to handle such a case in getEdgeValueLocal(). This is motivated by the Python 2.7 eval loop in PyEval_EvalFrameEx() where the lack of the constant propagation causes longer live ranges and more spill code than necessary. With this patch, we see that the code size of PyEval_EvalFrameEx() decreases by ~5.4% and a performance test improves by ~4.6%. Reviewers: wmi, dberlin, sanjoy Reviewed By: sanjoy Subscribers: davide, davidxl, llvm-commits Differential Revision: https://reviews.llvm.org/D34822 llvm-svn: 309415
*	[GVN] Recommit the patch "Add phi-translate support in scalarpre"	Wei Mi	2017-07-28	3	-4/+135
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recommit after workaround the bug PR31652. Three bugs fixed in previous recommits: The first one is to use CurrentBlock instead of PREInstr's Parent as param of performScalarPREInsertion because the Parent of a clone instruction may be uninitialized. The second one is stop PRE when CurrentBlock to its predecessor is a backedge and an operand of CurInst is defined inside of CurrentBlock. The same value defined inside of loop in last iteration can not be regarded as available. The third one is an out-of-bound array access in a flipped if guard. Right now scalarpre doesn't have phi-translate support, so it will miss some simple pre opportunities. Like the following testcase, current scalarpre cannot recognize the last "a * b" is fully redundent because a and b used by the last "a * b" expr are both defined by phis. long a[100], b[100], g1, g2, g3; __attribute__((pure)) long goo(); void foo(long a, long b, long c, long d) { g1 = a * b; if (__builtin_expect(g2 > 3, 0)) { a = c; b = d; g2 = a * b; } g3 = a * b; // fully redundant. } The patch adds phi-translate support in scalarpre. This is only a temporary solution before the newpre based on newgvn is available. Differential Revision: https://reviews.llvm.org/D32252 llvm-svn: 309397
*	[SCEV] Do not visit nodes twice in containsConstantSomewhere	Max Kazantsev	2017-07-28	1	-0/+75
\| \| \| \| \| \| \| \| \| \|	This patch reworks the function that searches constants in Add and Mul SCEV expression chains so that now it does not visit a node more than once, and also renames this function for better correspondence between its implementation and semantics. Differential Revision: https://reviews.llvm.org/D35931 llvm-svn: 309367
*	[JumpThreading] Stop falsely preserving LazyValueInfo.	Davide Italiano	2017-07-28	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JumpThreading claims to preserve LVI, but it doesn't preserve the analyses which LVI holds a reference to (e.g. the Dominator). In the current pass manager infrastructure, after JT runs, the PM frees these analyses (including DominatorTree) but preserves LVI. CorrelatedValuePropagation runs immediately after and queries a corrupted domtree, causing weird miscompiles. This commit disables the preservation of LVI for the time being. Eventually, we should either move LVI to a proper dependency tracking mechanism (i.e. an analyses shouldn't hold references to other analyses and compute them on demand if needed), or we should teach all the passes preserving LVI to preserve the analyses LVI depends on. The new pass manager has a mechanism to invalidate LVI in case one of the analyses it depends on becomes invalid, so this problem shouldn't exist (at least not in this immediate form), but handling of analyses holding references is still a very delicate subject. Fixes PR33917 (and rustc). llvm-svn: 309355
*	Separate the ICP total threshold and remaining threshold.	Dehao Chen	2017-07-28	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In the current implementation, isPromotionProfitable only checks if the call count to a direct target is no less than a certain percentage threshold of the remaining call counts that have not been promoted. This causes code size problems when the target count is small but greater than a large portion of remaining counts. E.g. target1 takes 99.9%, while target2 takes 0.1%. Both targets will be promoted and inlined, makes the function size too large, which potentially prevents it from further inlining into its callers. This patch adds another percentage threshold against the total indirect call count. If the target count needs to be no less than both thresholds in order to be promoted speculatively. Reviewers: davidxl, tejohnson Reviewed By: tejohnson Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D35962 llvm-svn: 309345
*	[ConstantFolder] Don't try to fold gep when the idx is a vector.	Davide Italiano	2017-07-27	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code in ConstantFoldGetElementPtr() assumes integers, and therefore it crashes trying to get the integer bidwith of a vector type (in this case <4 x i32>. I just changed the code to prevent the folding in case of vectors and I didn't bother to generalize as this doesn't seem to me something that really happens in practice, but I'm willing to change the patch if you think it's worth it. This is hard to trigger from -instsimplify or -instcombine only as the second instruction is dead, so the test uses loop-unroll. Differential Revision: https://reviews.llvm.org/D35956 llvm-svn: 309330
*	[InstCombine] Simplify pointer difference subtractions (GEP-GEP) where GEPs ↵	Hiroshi Yamauchi	2017-07-27	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	have other uses and one non-constant index Summary: Pointer difference simplifications currently happen only if input GEPs don't have other uses or their indexes are all constants, to avoid duplicating indexing arithmetic. This patch enables cases with exactly one non-constant index among input GEPs to happen where there is no duplicated arithmetic or code size increase even if input GEPs have other uses. For example, this patch allows "(&A[42][i]-&A[42][0])" --> "i", which didn't happen previously, if the input GEP(s) have other uses. Reviewers: sanjoy, bkramer Reviewed By: sanjoy Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D35499 llvm-svn: 309304
*	[ICP] Migrate to OptimizationRemarkEmitter	Adam Nemet	2017-07-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a module pass so for the old PM, we can't use ORE, the function analysis pass. Instead ORE is created on the fly. A few notes: - isPromotionLegal is folded in the caller since we want to emit the Function in the remark but we can only do that if the symbol table look-up succeeded. - There was good test coverage for remarks in this pass. - promoteIndirectCall uses ORE conditionally since it's also used from SampleProfile which does not use ORE yet. Fixes PR33792. Differential Revision: https://reviews.llvm.org/D35929 llvm-svn: 309294
*	All libcalls should be considered to be GC-leaf functions.	Daniel Neilson	2017-07-27	2	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It is possible for some passes to materialize a call to a libcall (ex: ldexp, exp2, etc), but these passes will not mark the call as a gc-leaf-function. All libcalls are actually gc-leaf-functions, so we change llvm::callsGCLeafFunction() to tell us that available libcalls are equivalent to gc-leaf-function calls. Reviewers: sanjoy, anna, reames Reviewed By: anna Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35840 llvm-svn: 309291
*	ThinLTO: Don't import aliases of any kind (even linkonce_odr)	David Blaikie	2017-07-27	1	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Until a more advanced version of importing can be implemented for aliases (one that imports an alias as an available_externally definition of the aliasee), skip the narrow subset of cases that was possible but came at a cost: aliases of linkonce_odr functions could be imported because the linkonce_odr function could be safely duplicated from the source module. This came/comes at the cost of not being able to 'home' imported linkonce functions (they had to be emitted linkonce_odr in all the destination modules (even if they weren't used by an alias) rather than as available_externally - causing extra object size). Tangentially, this also was the only reason ThinLTO would emit multiple CUs in to the resulting DWARF - which happens to be a problem for Fission (there's a fix for this in GDB but not released yet, etc). (actually it's not the only reason - but I'm sending a patch to fix the other reason shortly) There's no reason to believe this particularly narrow alias importing was especially/meaningfully important, only that it was /possible/ to implement in this way. When a more general solution is done, it should still satisfy the DWARF concerns above, since the import will still be available_externally, and thus not create extra CUs. Since now all aliases are treated the same, I removed/simplified some test cases since they were testing corner cases where there are no longer any corners. Reviewers: tejohnson, mehdi_amini Differential Revision: https://reviews.llvm.org/D35875 llvm-svn: 309278
*	Migrate SimplifyLibCalls to new OptimizationRemarkEmitter	Adam Nemet	2017-07-26	1	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This changes SimplifyLibCalls to use the new OptimizationRemarkEmitter API. In fact, as SimplifyLibCalls is only ever called via InstCombine, (as far as I can tell) the OptimizationRemarkEmitter is added there, and then passed through to SimplifyLibCalls later. I have avoided changing any remark text. This closes PR33787 Patch by Sam Elliott! Reviewers: anemet, davide Reviewed By: anemet Subscribers: davide, mehdi_amini, eraman, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D35608 llvm-svn: 309158
*	[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.	Michael Zuckerman	2017-07-26	1	-3/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch expands the support of lowerInterleavedStore to 32x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low). The patch goal is to optimize the following sequence: At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding each 32 chars: c0, c1, , c31 m0, m1, , m31 y0, y1, , y31 k0, k1, ., k31 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers: dorit Farhana RKSimon guyblank DavidKreitzer Differential Revision: https://reviews.llvm.org/D34601 llvm-svn: 309086
*	Add "REQUIRES: asserts" for test unswitch-equality-undef.ll.	Wei Mi	2017-07-26	1	-0/+1
\| \| \| \|	llvm-svn: 309073
*	Disable loop unswitching for some patterns containing equality comparison ↵	Wei Mi	2017-07-25	1	-0/+121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	with undef. This is a workaround for the bug described in PR31652 and http://lists.llvm.org/pipermail/llvm-dev/2017-July/115497.html. The temporary solution is to add a function EqualityPropUnSafe. In EqualityPropUnSafe, for some simple patterns we can know the equality comparison may contains undef, so we regard such comparison as unsafe and will not do loop-unswitching for them. We also need to disable the select simplification when one of select operand is undef and its result feeds into equality comparison. The patch cannot clear the safety issue caused by the bug, but it can suppress the issue from happening to some extent. Differential Revision: https://reviews.llvm.org/D35811 llvm-svn: 309059
*	[X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)	Simon Pilgrim	2017-07-25	1	-1021/+53
\| \| \| \| \| \| \| \| \| \| \| \|	D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914). Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os). This patch should be considered for the 5.0.0 release branch as well Differential Revision: https://reviews.llvm.org/D35830 llvm-svn: 308986
*	[LIR] Teach LIR to avoid extending the BE count prior to adding one to	Chandler Carruth	2017-07-25	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	it when safe. Very often the BE count is the trip count minus one, and the plus one here should fold with that minus one. But because the BE count might in theory be UINT_MAX or some such, adding one before we extend could in some cases wrap to zero and break when we scale things. This patch checks to see if it would be safe to add one because the specific case that would cause this is guarded for prior to entering the preheader. This should handle essentially all of the common loop idioms coming out of C/C++ code once canonicalized by LLVM. Before this patch, both forms of loop in the added test cases ended up subtracting one from the size, extending it, scaling it up by 8 and then adding 8 back onto it. This is really silly, and it turns out made it all the way into generated code very often, so this is a surprisingly important cleanup to do. Many thanks to Sanjoy for showing me how to do this with SCEV. Differential Revision: https://reviews.llvm.org/D35758 llvm-svn: 308968
*	[tests] Cleanup vect.omp.persistence.ll test.	Michael Zolotukhin	2017-07-25	2	-67/+50
\| \| \| \|	llvm-svn: 308964
*	[ARM] Enable partial and runtime unrolling	Sam Parker	2017-07-25	2	-0/+213
\| \| \| \| \| \| \| \| \| \|	Enable runtime and partial loop unrolling of simple loops without calls on M-class cores. The thresholds are calculated based on whether the target is Thumb or Thumb-2. Differential Revision: https://reviews.llvm.org/D34619 llvm-svn: 308956
*	Adding base test for interleave store VF16 and expand the test for AVX512	Michael Zuckerman	2017-07-24	1	-0/+15
\| \| \| \| \| \|	This patch doesn't modifay any non test file. llvm-svn: 308909
*	X86InterleaveAccess: A fix for bug33826	Farhana Aleen	2017-07-21	1	-0/+17
\| \| \| \| \| \| \| \|	Reviewers: DavidKreitzer Differential Revision: https://reviews.llvm.org/D35638 llvm-svn: 308784
*	ThinLTO Minimized Bitcode File Size Reduction	Haojie Wang	2017-07-21	2	-19/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size. Reviewers: danielcdh, tejohnson, pcc Reviewed By: pcc Subscribers: mehdi_amini, aprantl, inglorion, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D35334 llvm-svn: 308760
*	[PGO] Move the PGOInstrumentation pass to new OptRemark API.	Davide Italiano	2017-07-20	1	-2/+2
\| \| \| \| \| \|	This fixes PR33791. llvm-svn: 308668
*	LowerTypeTests: Drop function type metadata only if we're going to replace it.	Peter Collingbourne	2017-07-20	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we were (mis)handling jump table members with a prevailing definition in a full LTO module and a non-prevailing definition in a ThinLTO module by dropping type metadata on those functions entirely, which would cause type tests involving such functions to fail. This patch causes us to drop metadata only if we are about to replace it with metadata from cfi.functions. We also want to replace metadata for available_externally functions, which can arise in the opposite scenario (prevailing ThinLTO definition, non-prevailing full LTO definition). The simplest way to handle that is to remove the definition; there's little value in keeping it around at this point (i.e. after most optimization passes have already run) and later code will try to use the function's linkage to create an alias, which would result in invalid IR if the function is available_externally. Fixes PR33832. Differential Revision: https://reviews.llvm.org/D35604 llvm-svn: 308642
*	[TRE] Add another test for OptRemark.	Davide Italiano	2017-07-19	1	-0/+39
\| \| \| \| \| \|	This shows we emit a remark for tail recursion -> loop. llvm-svn: 308525
*	[TRE] Move to the new OptRemark API.	Davide Italiano	2017-07-19	1	-0/+25
\| \| \| \| \| \| \| \|	Fixes PR33788. Differential Revision: https://reviews.llvm.org/D35570 llvm-svn: 308524
*	ThinLTOBitcodeWriter: Do not rewrite intrinsic functions when splitting modules.	Peter Collingbourne	2017-07-19	1	-0/+4
\| \| \| \| \| \| \| \|	Changing the type of an intrinsic may invalidate the IR. Differential Revision: https://reviews.llvm.org/D35593 llvm-svn: 308500
*	[SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it ↵	Balaram Makam	2017-07-19	11	-16/+148
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	can destroy canonical loop structure. Summary: When simplifying unconditional branches from empty blocks, we pre-test if the BB belongs to a set of loop headers and keep the block to prevent passes from destroying canonical loop structure. However, the current algorithm fails if the destination of the branch is a loop header. Especially when such a loop's latch block is folded into loop header it results in additional backedges and LoopSimplify turns it into a nested loop which prevent later optimizations from being applied (e.g., loop unrolling and loop interleaving). This patch augments the existing algorithm by further checking if the destination of the branch belongs to a set of loop headers and defer eliminating it if yes to LateSimplifyCFG. Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605 Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl Reviewed By: efriedma Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35411 llvm-svn: 308422
*	[LV] Test once if vector trip count is zero, instead of twice	Ayal Zaks	2017-07-19	13	-46/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Generate a single test to decide if there are enough iterations to jump to the vectorized loop, or else go to the scalar remainder loop. This test compares the Scalar Trip Count: if STC < VF * UF go to the scalar loop. If requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the rest can be used to form vector iterations. So in this case the test checks instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the scalar loop if so. Otherwise the vector loop is entered for at-least one vector iteration. This test covers the case where incrementing the backedge-taken count will overflow leading to an incorrect trip count of zero. In this (rare) case we will also avoid the vector loop and jump to the scalar loop. This patch simplifies the existing tests and effectively removes the basic-block originally named "min.iters.checked", leaving the single test in block "vector.ph". Original observation and initial patch by Evgeny Stupachenko. Differential Revision: https://reviews.llvm.org/D34150 llvm-svn: 308421
*	[CGP] Allow cycles during Phi traversal in OptimizaMemoryInst	Serguei Katkov	2017-07-19	1	-1/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allowing cycles in Phi traversal increases the scope of optimize memory instruction in case we are in loop. The added test shows an example of enabling optimization inside a loop. Reviewers: loladiro, spatel, efriedma Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35294 llvm-svn: 308419
*	Fix DebugLoc propagation for unreachable LoadInst	Weiming Zhao	2017-07-19	2	-2/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, when GVN creates a load and when InstCombine creates a new store for unreachable Load, the DebugLoc info gets lost. Reviewers: dberlin, davide, aprantl Reviewed By: aprantl Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D34639 llvm-svn: 308404
*	[x86, CGP] increase memcmp() expansion up to 4 load pairs	Simon Pilgrim	2017-07-18	1	-88/+1547
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads. Notes: Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz. There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare. We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329: https://bugs.llvm.org/show_bug.cgi?id=33329#c12 TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion. Committed on behalf of Sanjay Patel Differential Revision: https://reviews.llvm.org/D35067 llvm-svn: 308322
*	PSCEV] Create AddRec for Phis in cases of possible integer overflow,	Dorit Nuzman	2017-07-18	1	-0/+240
\| \| \| \| \| \| \| \| \| \| \| \| \|	using runtime checks Extend the SCEVPredicateRewriter to work a bit harder when it encounters an UnknownSCEV for a Phi node; Try to build an AddRecurrence also for Phi nodes whose update chain involves casts that can be ignored under the proper runtime overflow test. This is one step towards addressing PR30654. Differential revision: http://reviews.llvm.org/D30041 llvm-svn: 308299
*	[LoopInterchange] Split up interchange.ll test case (NFC).	Florian Hahn	2017-07-18	10	-749/+795
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently most tests for the loop interchange pass are in test/Transforms/LoopInterchange/interchange.ll. This patch splits up the large test file in smaller pieces, which makes debugging test failures easier. Reviewers: karthikthecool, blitz.opensource, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, mcrosier, mkuper, mzolotukhin, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D35488 llvm-svn: 308284
*	[IRCE] Recognize loops with ne/eq latch conditions	Max Kazantsev	2017-07-18	1	-0/+257
\| \| \| \| \| \| \| \| \| \|	In some particular cases eq/ne conditions can be turned into equivalent slt/sgt conditions. This patch teaches parseLoopStructure to handle some of these cases. Differential Revision: https://reviews.llvm.org/D35010 llvm-svn: 308264
*	Add element-atomic mem intrinsic canary tests for InstCombine.	Daniel Neilson	2017-07-18	1	-0/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add canary tests to verify that InstCombine currently does nothing with the element atomic memory intrinsics for memmove and memset. Placeholder tests that will fail once element atomic @llvm.mem[move\|set] instrinsics have been added to the MemIntrinsic class hierarchy. These will act as a reminder to verify that inst combine handles these intrinsics properly once they have been added to that class hierarchy. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35502 llvm-svn: 308247
*	Revert "Restore with fix "[ThinLTO] Ensure we always select the same ↵	Teresa Johnson	2017-07-17	3	-88/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	function copy to import"" This reverts commit r308114 (and follow on fixes to test). There is a linking failure in a ThinLTO bot: http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/3663/ (and undefined reference). It seems like it must be a second order effect of the heuristic change I made, and may take some time to try to reproduce locally and track down. Therefore, reverting for now. llvm-svn: 308206
*	[InstCombine] Don't violate dominance when replacing instructions.	Davide Italiano	2017-07-16	2	-1/+55
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D35376 llvm-svn: 308144
*	Fix bot failures from r308114	Teresa Johnson	2017-07-16	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \|	Finally figured out that some bots were failing from r308114 with the message: llvm-lto2: LTO::run failed: No available targets are compatible with this triple. after adding in some other checking that finally caused this to show up in the FileCheck output. Added "REQUIRES: x86-registered-target" which should fix it. llvm-svn: 308119
*	Attempt 2 to debug bot failures	Teresa Johnson	2017-07-16	1	-8/+10
\| \| \| \| \| \| \|	Modify checks from r308114 even more, to see if I can narrow down why some bots are still failing. llvm-svn: 308116
*	Attempt to debug bot failures	Teresa Johnson	2017-07-15	1	-8/+8
\| \| \| \| \| \| \|	Simplifying checks from r308114, to see if I can narrow down why some bots are still failing. llvm-svn: 308115
*	Restore with fix "[ThinLTO] Ensure we always select the same function copy ↵	Teresa Johnson	2017-07-15	3	-0/+86
\| \| \| \| \| \| \| \| \| \|	to import" This restores r308078/r308079 with a fix for bot non-determinisim (make sure we run llvm-lto in single threaded mode so the debug output doesn't get interleaved). llvm-svn: 308114