bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Minor code cleanup /NFC	Xinliang David Li	2016-03-31	1	-4/+6
\| \| \| \|	llvm-svn: 265025
*	Introduce a @llvm.experimental.guard intrinsic	Sanjoy Das	2016-03-31	4	-5/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: As discussed on llvm-dev[1]. This change adds the basic boilerplate code around having this intrinsic in LLVM: - Changes in Intrinsics.td, and the IR Verifier - A lowering pass to lower @llvm.experimental.guard to normal control flow - Inliner support [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18527 llvm-svn: 264976
*	AMDGPU: Add frexp_exp intrinsic	Matt Arsenault	2016-03-30	1	-5/+16
\| \| \| \|	llvm-svn: 264944
*	AMDGPU: Constant folding for frexp_mant	Matt Arsenault	2016-03-30	1	-0/+14
\| \| \| \|	llvm-svn: 264943
*	Cloning: Reduce complexity of debug info cloning and fix correctness issue.	Peter Collingbourne	2016-03-30	2	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	Commit r260791 contained an error in that it would introduce a cross-module reference in the old module. It also introduced O(N^2) complexity in the module cloner by requiring the entire module to be visited for each function. Fix both of these problems by avoiding use of the CloneDebugInfoMetadata function (which is only designed to do intra-module cloning) and cloning function-attached metadata in the same way that we clone all other metadata. Differential Revision: http://reviews.llvm.org/D18583 llvm-svn: 264935
*	Silencing warnings from MSVC 2015 Update 2. All of these changes silence ↵	Aaron Ballman	2016-03-30	2	-5/+5
\| \| \| \| \| \|	"C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929
*	[IndVarSimplify] Don't insert after a catchswitch	David Majnemer	2016-03-30	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	Widening a PHI requires us to insert a trunc. The logical place for this trunc is in the same BB as the PHI. This is not possible if the BB is terminated by a catchswitch. This fixes PR27133. llvm-svn: 264926
*	[PassManager] Make PassManagerBuilder::addExtension take an std::function, ↵	Justin Lebar	2016-03-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rather than a function pointer. Summary: This gives callers flexibility to pass lambdas with captures, which lets callers avoid the C-style void*-ptr closure style. (Currently, callers in clang store state in the PassManagerBuilderBase arg.) No functional change, and the new API is backwards-compatible. Reviewers: chandlerc Subscribers: joker.eph, cfe-commits Differential Revision: http://reviews.llvm.org/D18613 llvm-svn: 264918
*	[LoopVectorize] Don't vectorize loops when everything will be scalarized	Hal Finkel	2016-03-30	1	-18/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 llvm-svn: 264904
*	[PGO] PGOFuncName in LTO optimizations	Rong Xu	2016-03-30	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PGOFuncNames are used as the key to retrieve the Function definition from the MD5 stored in the profile. For internal linkage function, we prefix the source file name to the PGOFuncNames. LTO's internalization privatizes many global linkage symbols. This happens after value profile annotation, but those internal linkage functions should not have a source prefix. To differentiate compiler generated internal symbols from original ones, PGOFuncName meta data are created and attached to the original internal symbols in the value profile annotation step. If a symbol does not have the meta data, its original linkage must be non-internal. Also add a new map that maps PGOFuncName's MD5 value to the function definition. Differential Revision: http://reviews.llvm.org/D17895 llvm-svn: 264902
*	Remove HasFnAttribute guards to getFnAttribute calls	Nirav Dave	2016-03-30	2	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \|	These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872
*	[MemorySSA] Make the visitor more careful with calls.	George Burgess IV	2016-03-30	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prior to this patch, the MemorySSA caching visitor would cache all calls that it visited. When paired with phi optimization, this can be problematic. Consider: define void @foo() { ; 1 = MemoryDef(liveOnEntry) call void @clobberFunction() br i1 undef, label %if.end, label %if.then if.then: ; MemoryUse(??) call void @readOnlyFunction() ; 2 = MemoryDef(1) call void @clobberFunction() br label %if.end if.end: ; 3 = MemoryPhi(...) ; MemoryUse(?) call void @readOnlyFunction() ret void } When optimizing MemoryUse(?), we visit defs 1 and 2, so we note to cache them later. We ultimately end up not being able to optimize passed the Phi, so we set MemoryUse(?) to point to the Phi. We then cache the clobbering call for def 1 to be the Phi. This commit changes this behavior so that we wipe out any calls added to VisistedCalls while visiting the defs of a phi we couldn't optimize. Aside: With this patch, we now can bootstrap clang/LLVM without a single MemorySSA verifier failure. Woohoo. :) llvm-svn: 264820
*	[PGO] Handle invoke inst in IR based icall instrumentation	Xinliang David Li	2016-03-30	1	-5/+7
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D18580 llvm-svn: 264818
*	[MemorySSA] Change how the walker views/walks visited phis.	George Burgess IV	2016-03-30	1	-22/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch teaches the caching MemorySSA walker a few things: 1. Not to walk Phis we've walked before. It seems that we tried to do this before, but it didn't work so well in cases like: define void @foo() { %1 = alloca i8 %2 = alloca i8 br label %begin begin: ; 3 = MemoryPhi({%0,liveOnEntry},{%end,2}) ; 1 = MemoryDef(3) store i8 0, i8* %2 br label %end end: ; MemoryUse(?) load i8, i8* %1 ; 2 = MemoryDef(1) store i8 0, i8* %2 br label %begin } Because we wouldn't put Phis in Q.Visited until we tried to visit them. So, when trying to optimize MemoryUse(?): - We would visit 3 above - ...Which would make us put {%0,liveOnEntry} in Q.Visited - ...Which would make us visit {%0,liveOnEntry} - ...Which would make us put {%end,2} in Q.Visited - ...Which would make us visit {%end,2} - ...Which would make us visit 3 - ...Which would realize we've already visited everything in 3 - ...Which would make us conservatively return 3. In the added test-case, (@looped_visitedonlyonce) this behavior would cause us to give incorrect results. Specifically, we'd visit 4 twice in the same query, but on the second visit, we'd skip while.cond because it had been visited, visit if.then/if.then2, and cache "1" as the clobbering def on the way back. 2. If we try to walk the defs of a {Phi,MemLoc} and see it has been visited before, just hand back the Phi we're trying to optimize. I promise this isn't as terrible as it seems. :) We now insert {Phi,MemLoc} pairs just before walking the Phi's upward defs. So, we check the cache for the {Phi,MemLoc} pair before checking if we've already walked the Phi. The {Phi,MemLoc} pair is (almost?) always guaranteed to have a cache entry if we've already fully walked it, because we cache as we go. So, if the {Phi,MemLoc} pair isn't in cache, either: (a) we must be in the process of visiting it (in which case, we can't give a better answer in a cache-as-we-go DFS walker) (b) we visited it, but didn't cache it on the way back (...which seems to require `ModifyingAccess` to not dominate `StartingAccess`, so I'm 99% sure that would be an error. If it's not an error, I haven't been able to get it to happen locally, so I suspect it's rare.) - - - - - As a consequence of this change, we no longer skip upward defs of phis, so we can kill the `VisitedOnlyOne` check. This gives us better accuracy than we had before, at the cost of potentially doing a bit more work when we have a loop. llvm-svn: 264814
*	[LoopDataPrefetch] Centralize the tuning cl::opts under the pass	Adam Nemet	2016-03-29	1	-4/+35
\| \| \| \| \| \| \| \| \|	This is effectively NFC, minus the renaming of the options (-cyclone-prefetch-distance -> -prefetch-distance). The change was requested by Tim in D17943. llvm-svn: 264806
*	[tsan] Do not instrument reads/writes to instruction profile counters.	Anna Zaks	2016-03-29	1	-1/+25
\| \| \| \| \| \| \| \| \|	We have known races on profile counters, which can be reproduced by enabling -fsanitize=thread and -fprofile-instr-generate simultaneously on a multi-threaded program. This patch avoids reporting those races by not instrumenting the reads and writes coming from the instruction profiler. llvm-svn: 264805
*	ADCE: Remove debug info intrinsics in dead scopes	Duncan P. N. Exon Smith	2016-03-29	1	-6/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During ADCE, track which debug info scopes still have live references from the code, and delete debug info intrinsics for the dead ones. These intrinsics describe the locations of variables (in registers or stack slots). If there's no code left corresponding to a variable's scope, then there's no way to reference the variable in the debugger and it doesn't matter what its value is. I add a DEBUG printout when the described location in an SSA register, in case it helps some trying to track down why locations get lost. However, we still delete these; the scope itself isn't attached to any real code, so the ship has already sailed. llvm-svn: 264800
*	[LoopDataPrefetch] Make more member functions private, NFC.	Adam Nemet	2016-03-29	1	-1/+2
\| \| \| \|	llvm-svn: 264798
*	[ThinLTO] Remove post-pass metadata linking support	Teresa Johnson	2016-03-29	1	-64/+21
\| \| \| \| \| \| \| \| \| \| \|	Since we have moved to a model where functions are imported in bulk from each source module after making summary-based importing decisions, there is no longer a need to link metadata as a postpass, and all users have been removed. This essentially reverts r255909 and follow-on fixes. llvm-svn: 264763
*	Make InlineSimple's one-arg constructor explicit. NFC	Justin Lebar	2016-03-29	1	-1/+2
\| \| \| \|	llvm-svn: 264744
*	Reformat a comment in InlineSimple.cpp. NFC	Justin Lebar	2016-03-29	1	-3/+3
\| \| \| \|	llvm-svn: 264743
*	[ThinLTO] Use new GlobalValue::getGUID helper (NFC)	Teresa Johnson	2016-03-29	1	-2/+1
\| \| \| \| \| \| \|	This was already being used for functions and aliases, was missed when handling global variables. llvm-svn: 264734
*	[SimlifyCFG] Prevent passes from destroying canonical loop structure, ↵	Hyojin Sung	2016-03-29	3	-7/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	especially for nested loops When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes is currently used to recognize potential loops of which the block is the header and keep the block. However, the current algorithm fails if the loops' exit condition is evaluated only with volatile values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested loop, the loop is collapsed into a single loop which prevent later optimizations from being applied (e.g., transforming nested loops into simplified forms and loop vectorization). The patch augments the existing PHI node-based check by adding a pre-test if the BB actually belongs to a set of loop headers and not eliminating it if yes. llvm-svn: 264697
*	Remove personality for declarations in CloneModule.	Evgeniy Stepanov	2016-03-28	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	Personality is copied as part of copyFunctionAttributes, but it is invalid on a declaration. Remove the personality attribute it the function body is not cloned. Also add a verifier run over output modules in the llvm-split tool. llvm-svn: 264667
*	[asan] Support dead code stripping on Mach-O platforms	Ryan Govostes	2016-03-28	1	-12/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On OS X El Capitan and iOS 9, the linker supports a new section attribute, live_support, which allows dead stripping to remove dead globals along with the ASAN metadata about them. With this change __asan_global structures are emitted in a new __DATA,__asan_globals section on Darwin. Additionally, there is a __DATA,__asan_liveness section with the live_support attribute. Each entry in this section is simply a tuple that binds together the liveness of a global variable and its ASAN metadata structure. Thus the metadata structure will be alive if and only if the global it references is also alive. Review: http://reviews.llvm.org/D16737 llvm-svn: 264645
*	Revert "[SimlifyCFG] Prevent passes from destroying canonical loop ↵	Reid Kleckner	2016-03-28	3	-34/+8
\| \| \| \| \| \| \| \| \| \|	structure, especially for nested loops" This reverts commit r264596. It does not compile. llvm-svn: 264604
*	[SimlifyCFG] Prevent passes from destroying canonical loop structure, ↵	Hyojin Sung	2016-03-28	3	-8/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	especially for nested loops When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes is currently used to recognize potential loops of which the block is the header and keep the block. However, the current algorithm fails if the loops' exit condition is evaluated only with volatile values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested loop, the loop is collapsed into a single loop which prevent later optimizations from being applied (e.g., transforming nested loops into simplified forms and loop vectorization). The patch augments the existing PHI node-based check by adding a pre-test if the BB actually belongs to a set of loop headers and not eliminating it if yes. llvm-svn: 264596
*	[PGO] Don't set the function hotness attribute when populating counters	Rong Xu	2016-03-28	1	-21/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't set the function hotness attribute on the fly. This changes the CFG branch probability of the caller function, which leads to inconsistent BB ordering. This patch moves the attribute setting to a separated loop after the counts in all functions are populated. Fixes PR27024 - PGO instrumentation profile data is not reflected in correct basic blocks. Differential Revision: http://reviews.llvm.org/D18491 llvm-svn: 264594
*	[SimplifyLibCalls] Transform printf("%s", "a") -> putchar('a').	Davide Italiano	2016-03-28	1	-0/+18
\| \| \| \|	llvm-svn: 264588
*	[SROA] Fix typo in comment	Hal Finkel	2016-03-28	1	-1/+1
\| \| \| \|	llvm-svn: 264573
*	C++11 is required, remove some preprocessor checks for it	Hal Finkel	2016-03-28	1	-3/+3
\| \| \| \| \| \| \|	We require C++11 to build, so remove a few remaining preprocessor checks for '__cplusplus >= 201103L'. This should always be true. llvm-svn: 264572
*	[ThinLTO] Add optional import message and statistics	Teresa Johnson	2016-03-27	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add a statistic to count the number of imported functions. Also, add a new -print-imports option to emit a trace of imported functions, that works even for an NDEBUG build. Note that emitOptimizationRemark does not work for the above printing as it expects a Function object and DebugLoc, neither of which we have with summary-based importing. This is part 2 of D18487, the first part was committed separately as r264536. Reviewers: joker.eph Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18487 llvm-svn: 264537
*	[ThinLTO] Don't try to import alias unless aliasee can be imported	Teresa Johnson	2016-03-27	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With r264503, aliases are now being added to the GlobalsToImport set even when their aliasees can't be imported due to their linkage type. While the importing worked correctly (the aliases imported as declarations) due to the logic in doImportAsDefinition, there is no point to adding them to the GlobalsToImport set. Additionally, with D18487 it was resulting in incorrectly printing a message indicating that the alias was imported. To avoid this, delay adding aliases to the GlobalsToImport set until after the linkage type of the aliasee is checked. This patch is part of D18487. llvm-svn: 264536
*	[SimplifyCFG] propagate branch metadata when creating select (PR26636)	Sanjay Patel	2016-03-26	1	-2/+2
\| \| \| \|	llvm-svn: 264527
*	ThinLTO: use the callgraph from the combined index to drive the FunctionImporter	Mehdi Amini	2016-03-26	1	-249/+254
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Now that the summary contains the full reference/call graph, we can replace the existing function importer that loads and inspect the IR to iteratively walk the call graph by a traversal based purely on the summary information. Decouple the actual importing decision from any IR manipulation. Reviewers: tejohnson Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18343 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 264503
*	[RS4GC] Lower calls to @llvm.experimental.deoptimize	Sanjoy Das	2016-03-25	2	-10/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This changes RS4GC to lower calls to ``@llvm.experimental.deoptimize`` to gc.statepoints wrapping ``__llvm_deoptimize``, and changes ``callsGCLeafFunction`` to recognize ``@llvm.experimental.deoptimize`` as a non GC leaf function. I've had to hard code the ``"__llvm_deoptimize"`` name in RewriteStatepointsForGC; since ``TargetLibraryInfo`` is available only during codegen. This isn't without precedent in the codebase, so I'm not overtly concerned. llvm-svn: 264456
*	Enable non-power-of-2 #pragma unroll counts.	David L Kreitzer	2016-03-25	2	-26/+35
\| \| \| \| \| \| \| \|	Patch by Evgeny Stupachenko. Differential Revision: http://reviews.llvm.org/D18202 llvm-svn: 264407
*	[LoopStrengthReduce] Don't hoist into a catchswitch	David Majnemer	2016-03-24	1	-1/+6
\| \| \| \| \| \| \| \|	We try to hoist the insertion point as high as possible to encourage sharing. However, we must be careful not to hoist into a catchswitch as it is both an EHPad and a terminator. llvm-svn: 264344
*	[LLE] Check for mismatching types between the store and the load earlier	Adam Nemet	2016-03-24	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	isDependenceDistanceOfOne asserts that the store and the load access through the same type. This function is also used by removeDependencesFromMultipleStores so we need to make sure we filter out mismatching types before reaching this point. Now we do this when the initial candidates are gathered. This is a refinement of the fix made in r262267. Fixes PR27048. llvm-svn: 264313
*	[sancov] code readability improvement.	Mike Aizatsky	2016-03-23	1	-11/+26
\| \| \| \| \| \| \| \|	Summary: Reply to http://reviews.llvm.org/D18341 Differential Revision: http://reviews.llvm.org/D18406 llvm-svn: 264213
*	Fix bugs in the MemorySSA walker.	George Burgess IV	2016-03-23	1	-17/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are a few bugs in the walker that this patch addresses. Primarily: - Caching can break when we have multiple BBs without phis - We weren't optimizing some phis properly - Because of how the DFS iterator works, there were times where we wouldn't cache any results of our DFS I left the test cases with FIXMEs in, because I'm not sure how much effort it will take to get those to work (read: We'll probably ultimately have to end up redoing the walker, or we'll have to come up with some creative caching tricks), and more test coverage = better. Differential Revision: http://reviews.llvm.org/D18065 llvm-svn: 264180
*	Minor code cleanup. NFC.	Junmo Park	2016-03-23	1	-1/+1
\| \| \| \|	llvm-svn: 264124
*	[ModuleUtils] Use range-based loop. NFC.	Davide Italiano	2016-03-23	1	-2/+1
\| \| \| \|	llvm-svn: 264122
*	[LoopVersioning] Relax an assert for LCSSA PHIs	Adam Nemet	2016-03-22	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	When you have multiple LCSSA (single-operand) PHIs that are converted into two-operand PHIs due to versioning, only assert that the PHI currently being converted has a single operand. I.e. we don't want to check PHIs that were converted earlier in the loop. Fixes PR27023. Thanks to Karl-Johan Karlsson for the minimized testcase! llvm-svn: 264081
*	[PATCH] Force LoopReroll to reset the loop trip count value after reroll.	Zinovy Nis	2016-03-22	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \|	It's a bug fix. For rerolled loops SE trip count remains unchanged. It leads to incorrect work of the next passes. My patch just resets SE info for rerolled loop forcing SE to re-evaluate it next time it requested. I also added a verifier call in the exisitng test to be sure no invalid SE data remain. Without my fix this test would fail with -verify-scev. Differential Revision: http://reviews.llvm.org/D18316 llvm-svn: 264051
*	[sancov] do not instrument nodes that are full pre-dominators	Mike Aizatsky	2016-03-21	1	-10/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Without tree pruning clang has 2,667,552 points. Wiht only dominators pruning: 1,515,586. With both dominators & predominators pruning: 1,340,534. Resubmit of r262103. Differential Revision: http://reviews.llvm.org/D18341 llvm-svn: 264003
*	[MemorySSA] Consider def-only BBs for live-in calculations.	George Burgess IV	2016-03-21	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have a BB with only MemoryDefs, live-in calculations will ignore it. This means we get results like this: define void @foo(i8* %p) { ; 1 = MemoryDef(liveOnEntry) store i8 0, i8* %p br i1 undef, label %if.then, label %if.end if.then: ; 2 = MemoryDef(1) store i8 1, i8* %p br label %if.end if.end: ; 3 = MemoryDef(1) store i8 2, i8* %p ret void } ...When there should be a MemoryPhi in the `if.end` BB. This patch fixes that behavior. llvm-svn: 263991
*	[SLP] Remove unnecessary member variables by using container APIs.	Chad Rosier	2016-03-21	1	-13/+6
\| \| \| \| \| \| \|	This changes the debug output, but still retains its usefulness. Differential Revision: http://reviews.llvm.org/D18324 llvm-svn: 263975
*	[SimplifyLibCalls] Only consider sinpi/cospi functions within the same function	David Majnemer	2016-03-19	1	-7/+11
\| \| \| \| \| \| \| \| \| \|	The sinpi/cospi can be replaced with sincospi to remove unnecessary computations. However, we need to make sure that the calls are within the same function! This fixes PR26993. llvm-svn: 263875
*	[InstCombine] Don't insert instructions before a catch switch	David Majnemer	2016-03-19	1	-0/+3
\| \| \| \| \| \| \| \| \|	CatchSwitches are not splittable, we cannot insert casts, etc. before them. This fixes PR26992. llvm-svn: 263874