bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "MergeFuncs: Transfer the function parameter attributes to the call site"	Arnold Schwaighofer	2015-07-19	3	-24/+2
\| \| \| \| \| \| \| \|	It is okay to not transfer parameter attributes. This reverts commit r242558. llvm-svn: 242646
*	MergeFuncs: Transfer the function parameter attributes to the call site	Arnold Schwaighofer	2015-07-17	3	-2/+24
\| \| \| \| \| \|	rdar://21516488 llvm-svn: 242558
*	[NVPTX] enable SpeculativeExecution in NVPTX	Jingyue Wu	2015-07-16	1	-0/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: SpeculativeExecution enables a series straight line optimizations (such as SLSR and NaryReassociate) on conditional code. For example, if (...) ... b * s ... if (...) ... (b + 1) * s ... speculative execution can hoist b * s and (b + 1) * s from then-blocks, so that we have ... b * s ... if (...) ... ... (b + 1) * s ... if (...) ... Then, SLSR can rewrite (b + 1) * s to (b * s + s) because after speculative execution b * s dominates (b + 1) * s. The performance impact of this change is significant. It speeds up the benchmarks running EigenFloatContractionKernelInternal16x16 (https://bitbucket.org/eigen/eigen/src/ba68f42fa69e4f43417fe1e52669d4dd5d2b3bee/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h?at=default#cl-526) by roughly 2%. Some internal benchmarks that have the above code pattern are improved by up to 40%. No significant slowdowns are observed on Eigen CUDA microbenchmarks. Reviewers: jholewinski, broune, eliben Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D11201 llvm-svn: 242437
*	Internalize: internalize comdat members as a group, and drop comdat on such ↵	Peter Collingbourne	2015-07-16	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	members. Internalizing an individual comdat group member without also internalizing the other members of the comdat can break comdat semantics. For example, if a module contains a reference to an internalized comdat member, and the linker chooses a comdat group from a different object file, this will break the reference to the internalized member. This change causes the internalizer to only internalize comdat members if all other members of the comdat are not externally visible. Once a comdat group has been fully internalized, there is no need to apply comdat rules to its members; later optimization passes (e.g. globaldce) can legally drop individual members of the comdat. So we drop the comdat attribute from all comdat members. Differential Revision: http://reviews.llvm.org/D10679 llvm-svn: 242423
*	Fix mergefunc infinite loop	JF Bastien	2015-07-15	1	-0/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Self-referential constants containing references to a merged function no longer cause the MergeFunctions pass to infinite loop. Also adds a reproduction IR which would otherwise fail, which was isolated from a similar issue in Chromium. Author: jrkoenig Reviewers: nlewycky, jfb Subscribers: llvm-commits, nlewycky, jfb Differential Revision: http://reviews.llvm.org/D11208 llvm-svn: 242337
*	Tidy-up test case from r242257.	Michael Zolotukhin	2015-07-15	1	-5/+8
\| \| \| \|	llvm-svn: 242268
*	[LoopUnrolling] Handle cast instructions.	Michael Zolotukhin	2015-07-15	1	-0/+94
\| \| \| \| \| \| \| \| \|	During estimation of unrolling effect we should be able to propagate constants through casts. Differential Revision: http://reviews.llvm.org/D10207 llvm-svn: 242257
*	[InstCombine] Generalize sub of selects optimization to all BinaryOperators	David Majnemer	2015-07-14	1	-0/+10
\| \| \| \| \| \| \|	This exposes further optimization opportunities if the selects are correlated. llvm-svn: 242235
*	GVN: tolerate an instruction being replaced without existing in the leaderboard	Tim Northover	2015-07-14	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sometimes an incidentally created instruction can duplicate a Value used elsewhere. It then often doesn't end up in the leader table. If it's later removed, we attempt to remove it from the leader table and segfault. Instead we should just ignore the removal request, which won't cause any problems. The reverse situation, where the original instruction is replaced by the new one (which you might think could leave the leader table empty) cannot occur, because the incidental instruction will never be found in the first place. llvm-svn: 242199
*	[SROA] Don't de-atomic volatile loads and stores	David Majnemer	2015-07-14	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \|	Volatile loads and stores are made visible in global state regardless of what memory is involved. It is not correct to disregard the ordering and synchronization scope because it is possible to synchronize with memory operations performed by hardware. This partially addresses PR23737. llvm-svn: 242126
*	Update enforceKnownAlignment after the isWeakForLinker semantic change	Reid Kleckner	2015-07-14	1	-7/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	Previously we would refrain from attempting to increase the linkage of available_externally globals because they were considered weak for the linker. Now they are treated more like a declaration instead of a weak definition. This was causing SSE alignment faults in Chromuim, when some code assumed it could increase the alignment of a dllimported global that it didn't control. http://crbug.com/509256 llvm-svn: 242091
*	Remove unnecessary lines from the test in r242068.	Pete Cooper	2015-07-13	1	-3/+0
\| \| \| \| \| \| \| \|	This test case was breaking the hexagon elf bot. The failing lines were actually unnecessary as checking that the store still reads the correct value demonstrates that everything is working fine now. llvm-svn: 242073
*	Loop idiom recognizer was replacing too many uses of popcount.	Pete Cooper	2015-07-13	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When spotting that a loop can use ctpop, we were incorrectly replacing all uses of a value with a value derived from ctpop. The bug here was exposed because we were replacing a use prior to the ctpop with the ctpop value and so we have a use before def, i.e., we changed %tobool.5 = icmp ne i32 %num, 0 store i1 %tobool.5, i1* %ptr br i1 %tobool.5, label %for.body.lr.ph, label %for.end to store i1 %1, i1* %ptr %0 = call i32 @llvm.ctpop.i32(i32 %num) %1 = icmp ne i32 %0, 0 br i1 %1, label %for.body.lr.ph, label %for.end Even if we inserted the ctpop so that it dominates the store here, that would still be incorrect. The store doesn’t want the result of ctpop. The fix is very simple, and involves replacing only the branch condition with the ctpop instead of all uses. Reviewed by Hal Finkel. llvm-svn: 242068
*	Enable runtime unrolling with unroll pragma metadata	Mark Heffernan	2015-07-13	1	-14/+13
\| \| \| \| \| \| \| \| \| \|	Enable runtime unrolling for loops with unroll count metadata ("#pragma unroll N") and a runtime trip count. Also, do not unroll loops with unroll full metadata if the loop has a runtime loop count. Previously, such loops would be unrolled with a very large threshold (pragma-unroll-threshold) if runtime unrolled happened to be enabled resulting in a very large (and likely unwise) unroll factor. llvm-svn: 242047
*	Don't change the visibility when converting a definition to a declaration.	Rafael Espindola	2015-07-13	1	-0/+11
\| \| \| \|	llvm-svn: 242030
*	[LSR] don't attempt to promote ephemeral values to indvars	Jingyue Wu	2015-07-13	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This at least saves compile time. I also encountered a case where ephemeral values affect whether other variables are promoted, causing performance issues. It may be a bug in LSR, but I didn't manage to reduce it yet. Anyhow, I believe it's in general not worth considering ephemeral values in LSR. Reviewers: atrick, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11115 llvm-svn: 242011
*	[InstSimplify] Teach InstSimplify how to simplify extractelement	David Majnemer	2015-07-13	1	-0/+14
\| \| \| \|	llvm-svn: 242008
*	[InstSimplify] Teach InstSimplify how to simplify extractvalue	David Majnemer	2015-07-13	1	-0/+9
\| \| \| \|	llvm-svn: 242007
*	[LICM] Don't try to sink values out of loops without any exits	David Majnemer	2015-07-12	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	There is no suitable basic block to sink instructions in loops without exits. The only way an instruction in a loop without exits can be used is as an incoming value to a PHI. In such cases, the incoming block for the corresponding value is unreachable. This fixes PR24013. Differential Revision: http://reviews.llvm.org/D10903 llvm-svn: 241987
*	Renamed some uses of unroll to interleave in the vectorizer.	Tyler Nowicki	2015-07-11	1	-2/+2
\| \| \| \|	llvm-svn: 241971
*	[InstCombine] Actually combine AA metadata when replacing one load with another	Bjorn Steinbrink	2015-07-10	1	-4/+2
\| \| \| \| \| \|	Fixes PR24083 llvm-svn: 241955
*	[InstSimplify] Fold away ord/uno fcmps when nnan is present.	Benjamin Kramer	2015-07-10	1	-0/+15
\| \| \| \| \| \| \|	This is important to fold away the slow case of complex multiplies emitted by clang. llvm-svn: 241911
*	Disable loop re-rotation for -Oz (patch by Andrey Turetsky)	Alexey Bataev	2015-07-10	1	-0/+30
\| \| \| \| \| \| \|	After changes in rL231820 loop re-rotation is performed even in -Oz mode. Since loop rotation is disabled for -Oz, it seems loop re-rotation should be disabled too. Differential Revision: http://reviews.llvm.org/D10961 llvm-svn: 241897
*	[InstCombine] Employ AliasAnalysis in FindAvailableLoadedValue	Bjorn Steinbrink	2015-07-10	1	-0/+15
\| \| \| \|	llvm-svn: 241887
*	[InstCombine] Properly combine metadata when replacing a load with another	Bjorn Steinbrink	2015-07-10	1	-0/+31
\| \| \| \| \| \| \| \|	Not doing this can lead to misoptimizations down the line, e.g. because of range metadata on the replacing load excluding values that are valid for the load that is being replaced. llvm-svn: 241886
*	[IndVars] Try to use existing values in RewriteLoopExitValues.	Sanjoy Das	2015-07-09	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In RewriteLoopExitValues, before expanding out an SCEV expression using SCEVExpander, try to see if an existing LLVM IR expression already computes the value we're interested in. If so use that existing expression. Apart from reducing IndVars' reliance on the rest of the compilation pipeline, this also prevents IndVars from concluding some expressions as "high cost" when they're not. For instance, `InductiveRangeCheckElimination` often emits code of the following form: ``` len = umin(len_A, len_B) loop: ... if (i++ < len) goto loop outside_loop: use(i) ``` `SCEVExpander` refuses to rewrite the use of `i` in `outside_loop`, since it thinks the value of `i` on loop exit, `len`, is a high cost expansion since it contains an `umax` in it. With this change, `IndVars` can see that it can re-use `len` instead of creating a new expression to compute `umin(len_A, len_B)`. I considered putting this cleverness in `SCEVExpander`, but I was worried that it may then have a deterimental effect on other passes that use it. So I decided it was better to just do this in the one place where it seems like an obviously good idea, with the intent of generalizing later if needed. Reviewers: atrick, reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10782 llvm-svn: 241838
*	[SLPVectorizer] Try different vectorization factors for store chains	Sanjay Patel	2015-07-08	5	-24/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	...and set max vector register size based on target This patch is based on discussion on the llvmdev mailing list: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/087405.html and also solves: https://llvm.org/bugs/show_bug.cgi?id=17170 Several FIXME/TODO items are noted in comments as potential improvements. Differential Revision: http://reviews.llvm.org/D10950 llvm-svn: 241760
*	[LAA] Merge memchecks for accesses separated by a constant offset	Silviu Baranga	2015-07-08	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Often filter-like loops will do memory accesses that are separated by constant offsets. In these cases it is common that we will exceed the threshold for the allowable number of checks. However, it should be possible to merge such checks, sice a check of any interval againt two other intervals separated by a constant offset (a,b), (a+c, b+c) will be equivalent with a check againt (a, b+c), as long as (a,b) and (a+c, b+c) overlap. Assuming the loop will be executed for a sufficient number of iterations, this will be true. If not true, checking against (a, b+c) is still safe (although not equivalent). As long as there are no dependencies between two accesses, we can merge their checks into a single one. We use this technique to construct groups of accesses, and then check the intervals associated with the groups instead of checking the accesses directly. Reviewers: anemet Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10386 llvm-svn: 241673
*	Allow constfolding of llvm.sin.* and llvm.cos.* intrinsics	Karthik Bhat	2015-07-08	1	-0/+22
\| \| \| \| \| \| \| \|	This patch const folds llvm.sin.* and llvm.cos.* intrinsics whenever feasible. Differential Revision: http://reviews.llvm.org/D10836 llvm-svn: 241665
*	Rename llvm.frameescape and llvm.framerecover to localescape and localrecover	Reid Kleckner	2015-07-07	2	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Initially, these intrinsics seemed like part of a family of "frame" related intrinsics, but now I think that's more confusing than helpful. Initially, the LangRef specified that this would create a new kind of allocation that would be allocated at a fixed offset from the frame pointer (EBP/RBP). We ended up dropping that design, and leaving the stack frame layout alone. These intrinsics are really about sharing local stack allocations, not frame pointers. I intend to go further and add an `llvm.localaddress()` intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being used to address locals, which should not be confused with the frame pointer. Naming suggestions at this point are welcome, I'm happy to re-run sed. Reviewers: majnemer, nicholas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11011 llvm-svn: 241633
*	change CHECK to CHECK-LABEL for more precision	Sanjay Patel	2015-07-05	1	-1/+1
\| \| \| \|	llvm-svn: 241422
*	remove unnecessary test specifications	Sanjay Patel	2015-07-05	1	-5/+4
\| \| \| \|	llvm-svn: 241419
*	minimize test case and remove unnecessary opt passes	Sanjay Patel	2015-07-05	1	-65/+24
\| \| \| \|	llvm-svn: 241418
*	Make an X86 specific directory and put the recent X86 tti specific	Eric Christopher	2015-07-02	2	-1/+4
\| \| \| \| \| \|	inlining test into it. llvm-svn: 241223
*	Implement TargetTransformInfo::hasCompatibleFunctionAttributes for X86.	Eric Christopher	2015-07-02	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This checks subtarget feature compatibility for inlining by verifying that the callee is a strict subset of the caller's features. This includes the cpu as part of the subtarget we can get via the incoming functions as the backend takes CPUs as feature sets. This allows us to inline things like: int foo() { return baz(); } int __attribute__((target("sse4.2"))) bar() { return foo(); } so that generic code can be inlined into specialized functions. llvm-svn: 241221
*	[TwoAddressInstructionPass] Try 3 Addr Conversion After Commuting.	Quentin Colombet	2015-07-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TwoAddressInstructionPass stops after a successful commuting but 3 Addr conversion might be good for some cases. Consider: int foo(int a, int b) { return a + b; } Before this commit, we emit: addl %esi, %edi movl %edi, %eax ret After this commit, we try 3 Addr conversion: leal (%rsi,%rdi), %eax ret Patch by Volkan Keles <vkeles@apple.com>! Differential Revision: http://reviews.llvm.org/D10851 llvm-svn: 241206
*	[LoopVectorize] Use ReplaceInstWithInst() helper where appropriate.	Alexey Samsonov	2015-07-01	1	-15/+30
\| \| \| \| \| \| \| \| \| \|	This is mostly an NFC, which increases code readability (instead of saving old terminator, generating new one in front of old, and deleting old, we just call a function). However, it would additionaly copy the debug location from old instruction to replacement, which would help PR23837. llvm-svn: 241197
*	[LoopUnroll] Use undef for phis with no value live	David Majnemer	2015-07-01	1	-0/+24
\| \| \| \| \| \| \| \|	We would create a phi node with a zero initialized operand instead of undef in the case where no value was originally available. This was problematic for x86_mmx which has no null value. llvm-svn: 241143
*	[SCCP] Turn loads of null into undef instead of zero initialized values	David Majnemer	2015-07-01	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	Surprisingly, this is a correctness issue: the mmx type exists for calling convention purposes, LLVM doesn't have a zero representation for them. This partially fixes PR23999. llvm-svn: 241142
*	[NaryReassociate] enhances nsw by leveraging @llvm.assume	Jingyue Wu	2015-07-01	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: nsw are flaky and can often be removed by optimizations. This patch enhances nsw by leveraging @llvm.assume in the IR. Specifically, NaryReassociate now understands that assume(a + b >= 0) && assume(a >= 0) ==> a +nsw b As a result, it can split more sext(a + b) into sext(a) + sext(b) for CSE. Test Plan: nary-gep.ll Reviewers: broune, meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10822 llvm-svn: 241139
*	Correct a typo for a LoopVectorize test	David Majnemer	2015-06-30	1	-1/+1
\| \| \| \| \| \|	I forgot to specify the correct pass. llvm-svn: 241054
*	[LoopSimplify] Set proper debug location in loop backedge blocks.	Alexey Samsonov	2015-06-29	1	-11/+27
\| \| \| \| \| \| \| \| \| \| \| \| \|	Set debug location for terminator instruction in loop backedge block (which is an unconditional jump to loop header). We can't copy debug location from original backedges, as there can be several of them, with different debug info locations. So, we follow the approach of SplitBlockPredecessors, and copy the debug info from first non-PHI instruction in the header (i.e. destination block). This is yet another change for PR23837. llvm-svn: 240999
*	[SLSR] S's basis must have the same type as S	Jingyue Wu	2015-06-28	1	-0/+20
\| \| \| \|	llvm-svn: 240910
*	[LoopVectorize] Pointer indicies may be wider than the pointer	David Majnemer	2015-06-27	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \|	If we are dealing with a pointer induction variable, isInductionPHI gives back a step value of Stride / size of pointer. However, we might be indexing with a legal type wider than the pointer width. Handle this by inserting casts where appropriate instead of crashing. This fixes PR23954. llvm-svn: 240877
*	[PruneEH] A naked, noinline function can return via InlineAsm	David Majnemer	2015-06-27	1	-0/+21
\| \| \| \| \| \| \| \| \| \|	The PruneEH pass tries to annotate functions as 'noreturn' if it doesn't see a ReturnInst. However, a naked function containing inline assembly can contain control flow leaving the function. This fixes PR23971. llvm-svn: 240876
*	LowerBitSets: Ignore bitset entries that do not directly refer to a global.	Peter Collingbourne	2015-06-27	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible for a global to be substituted with another global of a different type or a different kind (i.e. an alias) at IR link time. One example of this scenario is when a Microsoft ABI vtable is substituted with an alias referring to a larger vtable containing an RTTI reference. This will cause the global to be RAUW'd with a possibly bitcasted reference to the other global. This will of course also affect any references to the global in bitset metadata. The right way to handle such metadata is simply to ignore it. This is sound because the linked module should contain another copy of the bitset entries as applied to the new global. llvm-svn: 240866
*	[RewriteStatepointsForGC] Generalized vector phi/select handling for base ↵	Philip Reames	2015-06-26	1	-2/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pointers This change extends the detection of base pointers for vector constructs to handle arbitrary phi and select nodes. The existing non-vector code already handles those, so this is basically just extending the vector special case to be less special cased. It still isn't generalized vector handling since we can't handle arbitrary vector instructions (e.g. shufflevectors), but it's a lot closer. The general structure of the change is as follows: * Extend the base defining value relation over a subset of vector instructions and vector typed phi & select instructions. * Move scalarization from before base pointer rewriting to after base pointer rewriting. The extension of the BDV relation is sufficient to find vector base phis for vector inputs. * Preserve the existing special case logic for when the base of a vector element is locally obvious. This general idea could be extended to the scalar case as well. Differential Revision: http://reviews.llvm.org/D10461#inline-84275 llvm-svn: 240850
*	Teach InlineCost to account for a null check which can be folded away	Philip Reames	2015-06-26	1	-0/+45
\| \| \| \| \| \| \| \| \| \|	If we have a caller that knows a particular argument can never be null, we can exploit this fact while simplifying values in the inline cost analysis. This has the effect of reducing the cost for inlining when a null check is present in the callee, but the value is known non null in the caller. In particular, any dependent control flow can be discounted from the cost estimate. Note that we use the parameter attributes at the call site to memoize the analysis within the caller's code. The setting of this attribute is done in InstCombine, the inline cost analysis just consumes it. This is intentional and important because we want the inline cost analysis results to be easily cachable themselves. We're not currently doing so, but initial results on LTO indicate this will quickly become important. Differential Revision: http://reviews.llvm.org/D9129 llvm-svn: 240828
*	[InstCombine] call SimplifyICmpInst with correct context	Jingyue Wu	2015-06-25	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Fixes PR23809. Without passing the context to SimplifyICmpInst, we would use the assume to prove that the condition feeding the assume is trivially true (see isValidAssumeForContext in ValueTracking.cpp), causing the removal of the assume which may be useful for later optimizations. Test Plan: pr23800.ll Reviewers: hfinkel, majnemer Reviewed By: hfinkel Subscribers: henryhu, llvm-commits, wengxt, broune, meheff, eliben Differential Revision: http://reviews.llvm.org/D10695 llvm-svn: 240683
*	GVN: If a branch has two identical successors, we cannot declare either dead.	Peter Collingbourne	2015-06-25	1	-0/+38
\| \| \| \| \| \| \| \| \|	This previously caused miscompilations as a result of phi nodes receiving undef incoming values from blocks dominated by such successors. Differential Revision: http://reviews.llvm.org/D10726 llvm-svn: 240670