bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Enable unroll for constant bound loops when TripCount is not modulo of ↵	Zia Ansari	2016-04-04	1	-0/+29
\| \| \| \| \| \| \| \| \| \|	unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit. Commit for Evgeny Stupachenko (evstupac@gmail.com) Differential Revision: http://reviews.llvm.org/D18290 llvm-svn: 265337
*	Fix bot errors from r265327, exact GUID which depends on path	Teresa Johnson	2016-04-04	1	-34/+33
\| \| \| \| \| \| \| \| \|	E.g. http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/21919 The source file path name will affect exact GUID, don't try to match exact value. llvm-svn: 265334
*	[PGO] Avoid instrumenting direct callee's at value sites.	Betul Buyukkurt	2016-04-04	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	Direct callees' that are cast to other function prototypes, show up in the Call/Invoke instructions as ConstantExpr's. Currently llvm::CallSite's getCalledFunction() fails to return the callees in such expressions as direct calls. Value profiling should avoid instrumenting such cases. Mostly NFC. llvm-svn: 265330
*	[ThinLTO] Add option to dump value name to GUID mapping	Teresa Johnson	2016-04-04	1	-1/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Useful for debugging since we lose this correlation after the permodule summary/VST is read and until we later materialize source modules in the function importer. Reviewers: joker.eph Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18555 llvm-svn: 265327
*	[CodeGenPrepare] Avoid sinking soft-FP comparisons	Peter Zotov	2016-04-03	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sinking comparisons in CGP can undo the job of hoisting them done earlier by LICM, and soft-FP makes this an expensive mistake. A common pattern that produces floating point comparisons uniform over a loop is an explicit check for division by zero. If the divisor is hoisted out of the loop, the comparison can also be, but hoisting the function that unwinds is never legal, since it may cause side effects in the loop body prior to the unwinding to not be executed. Differential Revision: http://reviews.llvm.org/D18744 llvm-svn: 265264
*	Mark some FP intrinsics as safe to speculatively execute	Peter Zotov	2016-04-03	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Floating point intrinsics in LLVM are generally not speculatively executed, since most of them are defined to behave the same as libm functions, which set errno. However, the only error that can happen when executing ceil, floor, nearbyint, rint and round libm functions per POSIX.1-2001 is -ERANGE, and that requires the maximum value of the exponent to be smaller than the number of mantissa bits, which is not the case with any of the floating point types supported by LLVM. The trunc and copysign functions never set errno per per POSIX.1-2001. Differential Revision: http://reviews.llvm.org/D18643 llvm-svn: 265262
*	[InstCombine] Don't sink an instr after a catchswitch	David Majnemer	2016-04-01	1	-0/+45
\| \| \| \| \| \|	A catchswitch is a terminator, instructions cannot be inserted after it. llvm-svn: 265158
*	[SLPVectorizer] Don't insert an extractelement before a catchswitch	David Majnemer	2016-04-01	1	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \|	A catchswitch cannot be preceded by another instruction in the same basic block (other than a PHI node). Instead, insert the extract element right after the materialization of the vectorized value. This isn't optimal but is a reasonable compromise given the constraints of WinEH. This fixes PR27163. llvm-svn: 265157
*	[PGOProfile] Rename a test to make it more reusable, NFC	Vedant Kumar	2016-04-01	1	-1/+2
\| \| \| \|	llvm-svn: 265144
*	Don't insert stackrestore on deoptimizing returns	Sanjoy Das	2016-04-01	1	-0/+16
\| \| \| \| \| \| \| \|	They're not necessary (since the stack pointer is trivially restored on return), and the way LLVM inserts the stackrestore calls breaks the IR (we get a stackrestore between the deoptimize call and the return). llvm-svn: 265101
*	Don't insert lifetime end markers on deoptimizing returns	Sanjoy Das	2016-04-01	1	-0/+16
\| \| \| \| \| \| \| \| \|	They're not necessary (since the lifetime of the alloca is trivially over due to the return), and the way LLVM inserts the lifetime.end markers breaks the IR (we get a lifetime end marker between the deoptimize call and the return). llvm-svn: 265100
*	testcase gardening: update the emissionKind enum to the new syntax. (NFC)	Adrian Prantl	2016-04-01	72	-72/+72
\| \| \| \|	llvm-svn: 265081
*	Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.	Adrian Prantl	2016-03-31	30	-30/+30
\| \| \| \| \| \| \| \| \| \| \| \| \|	This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder into DICompileUnit. DIBuilder is not the right place for this enum to live in — a metadata consumer should not have to include DIBuilder.h. I also added a Verifier check that checks that the emission kind of a DICompileUnit is actually legal. http://reviews.llvm.org/D18612 <rdar://problem/25427165> llvm-svn: 265077
*	[NVPTX] Infer __nvvm_reflect as nounwind, readnone	David Majnemer	2016-03-31	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	This patch simply mirrors the attributes we give to @llvm.nvvm.reflect to the __nvvm_reflect libdevice call. This shaves about 30% of the code in libdevice away because of CSE opportunities. It's also helps us figure out that libdevice implementations of transcendental functions don't have side-effects. llvm-svn: 265060
*	[InstCombine] Fix incorrect rule from rL236202	Sanjoy Das	2016-03-31	1	-0/+18
\| \| \| \| \| \| \|	The rule for SMIN introduced in rL236202 doesn't work as advertised: the check for Pred == ICmpInst::ICMP_SGT was missing. llvm-svn: 264996
*	[DebugInfo] Subprograms should belong to a CU.	Davide Italiano	2016-03-31	5	-5/+5
\| \| \| \| \| \| \| \|	Start fixing tests accordingly. There are still about 35 failures before we can enable this check in the IR verifier. llvm-svn: 264990
*	Introduce a @llvm.experimental.guard intrinsic	Sanjoy Das	2016-03-31	2	-0/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As discussed on llvm-dev[1]. This change adds the basic boilerplate code around having this intrinsic in LLVM: - Changes in Intrinsics.td, and the IR Verifier - A lowering pass to lower @llvm.experimental.guard to normal control flow - Inliner support [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18527 llvm-svn: 264976
*	AMDGPU: Add frexp_exp intrinsic	Matt Arsenault	2016-03-30	1	-0/+162
\| \| \| \|	llvm-svn: 264944
*	AMDGPU: Constant folding for frexp_mant	Matt Arsenault	2016-03-30	1	-0/+155
\| \| \| \|	llvm-svn: 264943
*	[IndVarSimplify] Don't insert after a catchswitch	David Majnemer	2016-03-30	1	-0/+38
\| \| \| \| \| \| \| \| \| \|	Widening a PHI requires us to insert a trunc. The logical place for this trunc is in the same BB as the PHI. This is not possible if the BB is terminated by a catchswitch. This fixes PR27133. llvm-svn: 264926
*	test: Remove a test for a transform that hasn't existed in 5 years.	Justin Bogner	2016-03-30	3	-33/+0
\| \| \| \| \| \| \| \| \| \|	The TailDup transform was removed in r138841 in 2011, along with most of the tests for it. This test, however, was missed. Probably because it had already been XFAIL'd for 3 years at that point (since r52243!) and continued to fail when the opt flag for -tailduplicate stopped being valid. llvm-svn: 264916
*	[LoopVectorize] Don't vectorize loops when everything will be scalarized	Hal Finkel	2016-03-30	2	-0/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 llvm-svn: 264904
*	[VectorUtils] Don't try and truncate PHIs to a smaller bitwidth	James Molloy	2016-03-30	1	-0/+58
\| \| \| \| \| \| \| \| \| \|	We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means. However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018. This fixes PR17018. llvm-svn: 264852
*	[MemorySSA] Make the visitor more careful with calls.	George Burgess IV	2016-03-30	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this patch, the MemorySSA caching visitor would cache all calls that it visited. When paired with phi optimization, this can be problematic. Consider: define void @foo() { ; 1 = MemoryDef(liveOnEntry) call void @clobberFunction() br i1 undef, label %if.end, label %if.then if.then: ; MemoryUse(??) call void @readOnlyFunction() ; 2 = MemoryDef(1) call void @clobberFunction() br label %if.end if.end: ; 3 = MemoryPhi(...) ; MemoryUse(?) call void @readOnlyFunction() ret void } When optimizing MemoryUse(?), we visit defs 1 and 2, so we note to cache them later. We ultimately end up not being able to optimize passed the Phi, so we set MemoryUse(?) to point to the Phi. We then cache the clobbering call for def 1 to be the Phi. This commit changes this behavior so that we wipe out any calls added to VisistedCalls while visiting the defs of a phi we couldn't optimize. Aside: With this patch, we now can bootstrap clang/LLVM without a single MemorySSA verifier failure. Woohoo. :) llvm-svn: 264820
*	[PGO] Handle invoke inst in IR based icall instrumentation	Xinliang David Li	2016-03-30	1	-0/+42
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D18580 llvm-svn: 264818
*	[MemorySSA] Change how the walker views/walks visited phis.	George Burgess IV	2016-03-30	2	-5/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch teaches the caching MemorySSA walker a few things: 1. Not to walk Phis we've walked before. It seems that we tried to do this before, but it didn't work so well in cases like: define void @foo() { %1 = alloca i8 %2 = alloca i8 br label %begin begin: ; 3 = MemoryPhi({%0,liveOnEntry},{%end,2}) ; 1 = MemoryDef(3) store i8 0, i8* %2 br label %end end: ; MemoryUse(?) load i8, i8* %1 ; 2 = MemoryDef(1) store i8 0, i8* %2 br label %begin } Because we wouldn't put Phis in Q.Visited until we tried to visit them. So, when trying to optimize MemoryUse(?): - We would visit 3 above - ...Which would make us put {%0,liveOnEntry} in Q.Visited - ...Which would make us visit {%0,liveOnEntry} - ...Which would make us put {%end,2} in Q.Visited - ...Which would make us visit {%end,2} - ...Which would make us visit 3 - ...Which would realize we've already visited everything in 3 - ...Which would make us conservatively return 3. In the added test-case, (@looped_visitedonlyonce) this behavior would cause us to give incorrect results. Specifically, we'd visit 4 twice in the same query, but on the second visit, we'd skip while.cond because it had been visited, visit if.then/if.then2, and cache "1" as the clobbering def on the way back. 2. If we try to walk the defs of a {Phi,MemLoc} and see it has been visited before, just hand back the Phi we're trying to optimize. I promise this isn't as terrible as it seems. :) We now insert {Phi,MemLoc} pairs just before walking the Phi's upward defs. So, we check the cache for the {Phi,MemLoc} pair before checking if we've already walked the Phi. The {Phi,MemLoc} pair is (almost?) always guaranteed to have a cache entry if we've already fully walked it, because we cache as we go. So, if the {Phi,MemLoc} pair isn't in cache, either: (a) we must be in the process of visiting it (in which case, we can't give a better answer in a cache-as-we-go DFS walker) (b) we visited it, but didn't cache it on the way back (...which seems to require `ModifyingAccess` to not dominate `StartingAccess`, so I'm 99% sure that would be an error. If it's not an error, I haven't been able to get it to happen locally, so I suspect it's rare.) - - - - - As a consequence of this change, we no longer skip upward defs of phis, so we can kill the `VisitedOnlyOne` check. This gives us better accuracy than we had before, at the cost of potentially doing a bit more work when we have a loop. llvm-svn: 264814
*	[LoopDataPrefetch] Centralize the tuning cl::opts under the pass	Adam Nemet	2016-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is effectively NFC, minus the renaming of the options (-cyclone-prefetch-distance -> -prefetch-distance). The change was requested by Tim in D17943. llvm-svn: 264806
*	ADCE: Remove debug info intrinsics in dead scopes	Duncan P. N. Exon Smith	2016-03-29	1	-0/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During ADCE, track which debug info scopes still have live references from the code, and delete debug info intrinsics for the dead ones. These intrinsics describe the locations of variables (in registers or stack slots). If there's no code left corresponding to a variable's scope, then there's no way to reference the variable in the debugger and it doesn't matter what its value is. I add a DEBUG printout when the described location in an SSA register, in case it helps some trying to track down why locations get lost. However, we still delete these; the scope itself isn't attached to any real code, so the ship has already sailed. llvm-svn: 264800
*	Upgrade some wildly anachronistic debug info in testcases.	Adrian Prantl	2016-03-29	7	-18/+11
\| \| \| \|	llvm-svn: 264797
*	Add support for no-jump-tables	Nirav Dave	2016-03-29	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add function soft attribute to the generation of Jump Tables in CodeGen as initial step towards clang support of gcc's no-jump-table support Reviewers: hans, echristo Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18321 llvm-svn: 264756
*	fixed typo - CHECK-LABEL	Elena Demikhovsky	2016-03-29	1	-1/+1
\| \| \| \|	llvm-svn: 264702
*	[SimlifyCFG] Prevent passes from destroying canonical loop structure, ↵	Hyojin Sung	2016-03-29	5	-42/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	especially for nested loops When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes is currently used to recognize potential loops of which the block is the header and keep the block. However, the current algorithm fails if the loops' exit condition is evaluated only with volatile values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested loop, the loop is collapsed into a single loop which prevent later optimizations from being applied (e.g., transforming nested loops into simplified forms and loop vectorization). The patch augments the existing PHI node-based check by adding a pre-test if the BB actually belongs to a set of loop headers and not eliminating it if yes. llvm-svn: 264697
*	regenerate checks	Sanjay Patel	2016-03-28	3	-74/+107
\| \| \| \|	llvm-svn: 264677
*	Add an IR Verifier check for orphaned DICompileUnits.	Adrian Prantl	2016-03-28	1	-2/+2
\| \| \| \| \| \| \| \| \|	A DICompileUnit that is not listed in llvm.dbg.cu will cause assertion failures and/or crashes in the backend. The Verifier should reject this. rdar://problem/25369499 llvm-svn: 264657
*	[LVers] Change CHECK_LABEL to CHECK-LABEL (underscore->dash)	Adam Nemet	2016-03-28	1	-1/+1
\| \| \| \| \| \|	Thanks to Sanjoy for catching this. llvm-svn: 264656
*	Revert "[SimlifyCFG] Prevent passes from destroying canonical loop ↵	Reid Kleckner	2016-03-28	4	-40/+40
\| \| \| \| \| \| \| \| \| \|	structure, especially for nested loops" This reverts commit r264596. It does not compile. llvm-svn: 264604
*	[SimlifyCFG] Prevent passes from destroying canonical loop structure, ↵	Hyojin Sung	2016-03-28	4	-40/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	especially for nested loops When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes is currently used to recognize potential loops of which the block is the header and keep the block. However, the current algorithm fails if the loops' exit condition is evaluated only with volatile values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested loop, the loop is collapsed into a single loop which prevent later optimizations from being applied (e.g., transforming nested loops into simplified forms and loop vectorization). The patch augments the existing PHI node-based check by adding a pre-test if the BB actually belongs to a set of loop headers and not eliminating it if yes. llvm-svn: 264596
*	[SimplifyLibCalls] Transform printf("%s", "a") -> putchar('a').	Davide Italiano	2016-03-28	1	-0/+12
\| \| \| \|	llvm-svn: 264588
*	llvm/test/Transforms/FunctionImport/funcimport.ll: -stats REQUIRES +Asserts.	NAKAMURA Takumi	2016-03-28	1	-0/+2
\| \| \| \|	llvm-svn: 264561
*	Use DAG check to try to appease bot	Teresa Johnson	2016-03-27	1	-1/+1
\| \| \| \| \| \| \| \|	Try to appease http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/34772. This was the only check that didn't use DAG and it wasn't found. llvm-svn: 264538
*	[ThinLTO] Add optional import message and statistics	Teresa Johnson	2016-03-27	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add a statistic to count the number of imported functions. Also, add a new -print-imports option to emit a trace of imported functions, that works even for an NDEBUG build. Note that emitOptimizationRemark does not work for the above printing as it expects a Function object and DebugLoc, neither of which we have with summary-based importing. This is part 2 of D18487, the first part was committed separately as r264536. Reviewers: joker.eph Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18487 llvm-svn: 264537
*	[Verifier] Reject PHIs using defs from own block.	Michael Kruse	2016-03-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reject the following IR as malformed (assuming that %entry, %next are not in a loop): next: %y = phi i32 [ 0, %entry ] %x = phi i32 [ %y, %entry ] Such PHI nodes came up in PR26718. While there was no consensus on whether or not this is valid IR, most opinions on that bug and in a discussion on the llvm-dev mailing list tended towards a "strict interpretation" (term by Joseph Tremoulet) of PHI node uses. Also, the language reference explicitly states that "the use of each incoming value is deemed to occur on the edge from the corresponding predecessor block to the current block" and `DominatorTree::dominates(Instruction*, Use&)` uses this definition as well. For the code mentioned in PR15384, clang does not compile to such PHIs (anymore?). The test case still hangs when replacing `%tmp6` with `%tmp` in revisions before r176366 (where PR15384 has been fixed). The occurrence of %tmp6 therefore was probably unintentional. Its value is not used except in other PHIs. Reviewers: majnemer, reames, JosephTremoulet, bkramer, grosser, jdoerfert, kparzysz, sanjoy Differential Revision: http://reviews.llvm.org/D18443 llvm-svn: 264528
*	[SimplifyCFG] propagate branch metadata when creating select (PR26636)	Sanjay Patel	2016-03-26	1	-2/+5
\| \| \| \|	llvm-svn: 264527
*	minimize test cases	Sanjay Patel	2016-03-26	1	-59/+38
\| \| \| \| \| \| \|	These are tests for store transforms. The loads, adds, and geps were irrelevant. llvm-svn: 264526
*	ThinLTO: use the callgraph from the combined index to drive the FunctionImporter	Mehdi Amini	2016-03-26	4	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Now that the summary contains the full reference/call graph, we can replace the existing function importer that loads and inspect the IR to iteratively walk the call graph by a traversal based purely on the summary information. Decouple the actual importing decision from any IR manipulation. Reviewers: tejohnson Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18343 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 264503
*	Allow value forwarding past release fences in GVN	Philip Reames	2016-03-25	2	-0/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-load forwarding across a release fence. We choose to be much more conservative about stores. In theory, nothing prevents us from shifting a store from after a release fence to before it, and then eliminating the preceeding (previously fenced) store. Doing this without actually moving the second store is likely also legal, but we chose to be conservative at this time. The LangRef indicates only atomic loads and stores are effected by fences. This patch chooses to be far more conservative then that. This is the GVN companion to http://reviews.llvm.org/D11434 which applied the same logic in EarlyCSE and has been baking in tree for a while now. Differential Revision: http://reviews.llvm.org/D11436 llvm-svn: 264472
*	[InstSimplify] regenerate checks using a script	Sanjay Patel	2016-03-25	20	-380/+670
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	I didn't notice any significant changes in the actual checks here; all of these tests already used FileCheck, so a script can batch update them in one shot. This commit is just to show the value of automating this process: We have uniform formatting as opposed to a mish-mash of check structure that changes based on individual prefs and the current fashion. This makes it simpler to update when we find a bug or make an enhancement. llvm-svn: 264457
*	[RS4GC] Lower calls to @llvm.experimental.deoptimize	Sanjoy Das	2016-03-25	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This changes RS4GC to lower calls to ``@llvm.experimental.deoptimize`` to gc.statepoints wrapping ``__llvm_deoptimize``, and changes ``callsGCLeafFunction`` to recognize ``@llvm.experimental.deoptimize`` as a non GC leaf function. I've had to hard code the ``"__llvm_deoptimize"`` name in RewriteStatepointsForGC; since ``TargetLibraryInfo`` is available only during codegen. This isn't without precedent in the codebase, so I'm not overtly concerned. llvm-svn: 264456
*	[InstCombine] use FileCheck for better checking	Sanjay Patel	2016-03-25	1	-5/+10
\| \| \| \| \| \|	(testing script for autogeneration of check lines) llvm-svn: 264438
*	[InstCombine] use FileCheck for better checking	Sanjay Patel	2016-03-25	1	-23/+42
\| \| \| \| \| \|	(testing script for autogeneration of check lines) llvm-svn: 264437