bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Transforms: sort source files in build	Saleem Abdulrasool	2014-11-08	1	-4/+4
\| \| \| \| \| \|	Sort target sources. NFC. llvm-svn: 221563
*	Transforms: use typedef rather than using aliases	Saleem Abdulrasool	2014-11-07	1	-26/+25
\| \| \| \| \| \| \| \|	Visual Studio 2012 apparently does not support using alias declarations. Use the more traditional typedef approach. This should let the Windows buildbots pass. NFC. llvm-svn: 221554
*	Transform: add SymbolRewriter pass	Saleem Abdulrasool	2014-11-07	2	-0/+544
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces the symbol rewriter. This is an IR->IR transformation that is implemented as a CodeGenPrepare pass. This allows for the transparent adjustment of the symbols during compilation. It provides a clean, simple, elegant solution for symbol inter-positioning. This technique is often used, such as in the various sanitizers and performance analysis. The control of this is via a custom YAML syntax map file that indicates source to destination mapping, so as to avoid having the compiler to know the exact details of the source to destination transformations. llvm-svn: 221548
*	Fix heap-use-after-free bug in expandSDiv when the operands are	Michael Ilseman	2014-11-05	1	-6/+10
\| \| \| \| \| \| \| \|	constants, as discovered by ASAN. Patch by Mehdi Amini! llvm-svn: 221401
*	Revert earlier change removing setPreservesCFG from instcombine (r221223) and	Mark Heffernan	2014-11-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	change LoopSimplifyPass to be !isCFGOnly. The motivation for the earlier patch (r221223) was that LoopSimplify is not preserved by instcombine though setPreservesCFG indicates that it is. This change fixes the issue by making setPreservesCFG no longer imply LoopSimplifyPass, and is therefore less invasive. llvm-svn: 221311
*	Revert "Transforms: reapply SVN r219899"	Reid Kleckner	2014-11-04	1	-16/+3
\| \| \| \| \| \| \|	This reverts commit r220811 and r220839. It made an incorrect change to musttail handling. llvm-svn: 221226
*	IR: MDNode => Value: Instruction::getAllMetadataOtherThanDebugLoc()	Duncan P. N. Exon Smith	2014-11-03	1	-2/+2
\| \| \| \| \| \| \|	Change `Instruction::getAllMetadataOtherThanDebugLoc()` from a vector of `MDNode` to one of `Value`. Part of PR21433. llvm-svn: 221167
*	IR: MDNode => Value: Instruction::getAllMetadata()	Duncan P. N. Exon Smith	2014-11-01	1	-5/+4
\| \| \| \| \| \| \|	Change `Instruction::getAllMetadata()` to modify a vector of `Value` instead of `MDNode` and update call sites. This is part of PR21433. llvm-svn: 221027
*	IR: MDNode => Value: Instruction::getMetadata()	Duncan P. N. Exon Smith	2014-11-01	3	-23/+23
\| \| \| \| \| \| \| \| \| \|	Change `Instruction::getMetadata()` to return `Value` as part of PR21433. Update most callers to use `Instruction::getMDNode()`, which wraps the result in a `cast_or_null<MDNode>`. llvm-svn: 221024
*	Transforms: reapply SVN r219899	Saleem Abdulrasool	2014-10-28	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This restores the commit from SVN r219899 with an additional change to ensure that the CodeGen is correct for the case that was identified as being incorrect (originally PR7272). In the case that during inlining we need to synthesize a value on the stack (i.e. for passing a value byval), then any function involving that alloca must be stripped of its tailness as the restriction that it does not access the parent's stack no longer holds. Unfortunately, a single alloca can cause a rippling effect through out the inlining as the value may be aliased or may be mutated through an escaped external call. As such, we simply track if an alloca has been introduced in the frame during inlining, and strip any tail calls. llvm-svn: 220811
*	Untabify and whitespace cleanups.	NAKAMURA Takumi	2014-10-28	1	-4/+4
\| \| \| \|	llvm-svn: 220771
*	Handle sqrt() shrinking in SimplifyLibCalls like any other call	Sanjay Patel	2014-10-23	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes a chunk of special case logic for folding (float)sqrt((double)x) -> sqrtf(x) in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls. No functional change intended, but I loosened the restriction on the existing sqrt testcases to allow for this optimization even without unsafe-fp-math because that's the existing behavior. I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic in case the result is used as a double. Differential Revision: http://reviews.llvm.org/D5919 llvm-svn: 220514
*	Preserving 'nonnull' metadata in SimplifyCFG	Philip Reames	2014-10-22	1	-1/+4
\| \| \| \| \| \| \| \| \| \|	When we hoist two loads above an if, we can preserve the nonnull metadata. We could also do the same for sinking them, but we appear to not handle metadata at all in that case. Thanks to Hal for the review. Differential Revision: http://reviews.llvm.org/D5910 llvm-svn: 220392
*	Shrinkify libcalls: use float versions of double libm functions with ↵	Sanjay Patel	2014-10-22	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fast-math (bug 17850) When a call to a double-precision libm function has fast-math semantics (via function attribute for now because there is no IR-level FMF on calls), we can avoid fpext/fptrunc operations and use the float version of the call if the input and output are both float. We already do this optimization using a command-line option; this patch just adds the ability for fast-math to use the existing functionality. I moved the cl::opt from InstructionCombining into SimplifyLibCalls because it's only ever used internally to that class. Modified the existing test cases to use the unsafe-fp-math attribute rather than repeating all tests. This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850 Differential Revision: http://reviews.llvm.org/D5893 llvm-svn: 220390
*	Teach combineMetadata how to merge 'nonnull' metadata.	Philip Reames	2014-10-21	1	-0/+4
\| \| \| \| \| \|	combineMetadata is used when merging two instructions into one. This change teaches it how to merge 'nonnull' - i.e. only preserve it on the new instruction if it's set on both sources. This isn't actually used yet since I haven't adjusted any of the call sites to pass in nonnull as a 'known metadata'. llvm-svn: 220325
*	Do not attribute static allocas to the call site's DebugLoc.	Paul Robinson	2014-10-21	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	When functions are inlined, instructions without debug information are attributed to the call site's DebugLoc. After inlining, inlined static allocas are moved to the caller's entry block, adjacent to the caller's original static alloca instructions. By retaining the call site's DebugLoc, these instructions could cause instructions that were subsequently inserted at the entry block to pick up the same DebugLoc. Patch by Wolfgang Pieb! llvm-svn: 220255
*	fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)	Sanjay Patel	2014-10-16	1	-1/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a square root call has an FP multiplication argument that can be reassociated, then we can hoist a repeated factor out of the square root call and into a fabs(). In the simplest case, this: y = sqrt(x * x); becomes this: y = fabs(x); This patch relies on an earlier optimization in instcombine or reassociate to put the multiplication tree into a canonical form, so we don't have to search over every permutation of the multiplication tree. Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to use function-level attributes to do this optimization. This needs to be fixed for both the intrinsics and in the backend. Differential Revision: http://reviews.llvm.org/D5787 llvm-svn: 219944
*	Preserve non-byval pointer alignment attributes using @llvm.assume when inlining	Hal Finkel	2014-10-15	1	-0/+45
\| \| \| \| \| \| \| \| \|	For pointer-typed function arguments, enhanced alignment can be asserted using the 'align' attribute. When inlining, if this enhanced alignment information is not otherwise available, preserve it using @llvm.assume-based alignment assumptions. llvm-svn: 219876
*	Optimize away fabs() calls when input is squared (known positive).	Sanjay Patel	2014-10-14	1	-1/+30
\| \| \| \| \| \| \| \| \| \| \| \|	Eliminate library calls and intrinsic calls to fabs when the input is a squared value. Note that no unsafe-math / fast-math assumptions are needed for this optimization. Differential Revision: http://reviews.llvm.org/D5777 llvm-svn: 219717
*	Switch to select optimization for two-case switches	Marcello Maggioni	2014-10-14	1	-0/+170
\| \| \| \| \| \| \|	This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block and a test to check that this condition is handled. llvm-svn: 219656
*	Revert r219223, it creates invalid PHI nodes.	Joerg Sonnenberger	2014-10-12	1	-170/+0
\| \| \| \|	llvm-svn: 219587
*	SimplifyCFG: Don't convert phis into selects if we could remove undef behavior	Arnold Schwaighofer	2014-10-10	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead We used to transform this: define void @test6(i1 %cond, i8* %ptr) { entry: br i1 %cond, label %bb1, label %bb2 bb1: br label %bb2 bb2: %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ] store i8 2, i8* %ptr.2, align 8 ret void } into this: define void @test6(i1 %cond, i8* %ptr) { %ptr.2 = select i1 %cond, i8* null, i8* %ptr store i8 2, i8* %ptr.2, align 8 ret void } because the simplifycfg transformation into selects would happen to happen before the simplifycfg transformation that removes unreachable control flow (We have 'unreachable control flow' due to the store to null which is undefined behavior). The existing transformation that removes unreachable control flow in simplifycfg is: /// If BB has an incoming value that will always trigger undefined behavior /// (eg. null pointer dereference), remove the branch leading here. static bool removeUndefIntroducingPredecessor(BasicBlock BB) Now we generate: define void @test6(i1 %cond, i8 %ptr) { store i8 2, i8* %ptr.2, align 8 ret void } I did not see any impact on the test-suite + externals. rdar://18596215 llvm-svn: 219462
*	LoopUnroll: Create sub-loops in LoopInfo	Duncan P. N. Exon Smith	2014-10-07	1	-1/+29
\| \| \| \| \| \| \| \| \| \| \| \| \|	`LoopUnrollPass` says that it preserves `LoopInfo` -- make it so. In particular, tell `LoopInfo` about copies of inner loops when unrolling the outer loop. Conservatively, also tell `ScalarEvolution` to forget about the original versions of these loops, since their inputs may have changed. Fixes PR20987. llvm-svn: 219241
*	LoopUnroll: Only check for ScalarEvolution analysis once, NFC	Duncan P. N. Exon Smith	2014-10-07	1	-7/+4
\| \| \| \| \| \| \|	A follow-up commit will add use to a tight loop. We might as well just find it once anyway. llvm-svn: 219239
*	Two case switch to select optimization	Marcello Maggioni	2014-10-07	1	-0/+170
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block to a select or a couple of selects (depending if the default block is reachable or not). The typical case this optimization wants to be able to optimize is this one: Example: switch (a) { case 10: %0 = icmp eq i32 %a, 10 return 10; %1 = select i1 %0, i32 10, i32 4 case 20: ----> %2 = icmp eq i32 %a, 20 return 2; %3 = select i1 %2, i32 2, i32 %1 default: return 4; } It also sets the base for further optimizations that are planned and being reviewed. llvm-svn: 219223
*	LoopUnroll: Change code order of changes to new basic blocks	Duncan P. N. Exon Smith	2014-10-06	1	-2/+2
\| \| \| \| \| \| \|	Add new basic blocks to `LoopInfo` earlier. No functionality change intended (simplifies upcoming bugfix patch). llvm-svn: 219150
*	Sink comment, NFC	Duncan P. N. Exon Smith	2014-10-06	1	-2/+2
\| \| \| \|	llvm-svn: 219149
*	DIBuilder: Encapsulate DIExpression's element type	Duncan P. N. Exon Smith	2014-10-01	1	-4/+3
\| \| \| \| \| \| \| \|	`DIExpression`'s elements are 64-bit integers that are stored as `ConstantInt`. The accessors already encapsulate the storage. This commit updates the `DIBuilder` API to also encapsulate that. llvm-svn: 218797
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	1	-20/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
*	Revert r218778 while investigating buldbot breakage.	Adrian Prantl	2014-10-01	1	-19/+20
\| \| \| \| \| \|	"Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
*	Move the complex address expression out of DIVariable and into an extra	Adrian Prantl	2014-10-01	1	-20/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
*	C API: Add LLVMCloneModule()	Tom Stellard	2014-10-01	1	-0/+9
\| \| \| \|	llvm-svn: 218775
*	[SimplifyCFG] threshold for folding branches with common destination	Jingyue Wu	2014-09-30	1	-66/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds a threshold that controls the number of bonus instructions allowed for folding branches with common destination. The original code allows at most one bonus instruction. With this patch, users can customize the threshold to allow multiple bonus instructions. The default threshold is still 1, so that the code behaves the same as before when users do not specify this threshold. The motivation of this change is that tuning this threshold significantly (up to 25%) improves the performance of some CUDA programs in our internal code base. In general, branch instructions are very expensive for GPU programs. Therefore, it is sometimes worth trading more arithmetic computation for a more straightened control flow. Here's a reduced example: __global__ void foo(int a, int b, int c, int d, int e, int n, const int input, int output) { int sum = 0; for (int i = 0; i < n; ++i) sum += (((i ^ a) > b) && (((i \| c ) ^ d) > e)) ? 0 : input[i]; *output = sum; } The select statement in the loop body translates to two branch instructions "if ((i ^ a) > b)" and "if (((i \| c) ^ d) > e)" which share a common destination. With the default threshold, SimplifyCFG is unable to fold them, because computing the condition of the second branch "(i \| c) ^ d > e" requires two bonus instructions. With the threshold increased, SimplifyCFG can fold the two branches so that the loop body contains only one branch, making the code conceptually look like: sum += (((i ^ a) > b) & (((i \| c ) ^ d) > e)) ? 0 : input[i]; Increasing the threshold significantly improves the performance of this particular example. In the configuration where both conditions are guaranteed to be true, increasing the threshold from 1 to 2 improves the performance by 18.24%. Even in the configuration where the first condition is false and the second condition is true, which favors shortcuts, increasing the threshold from 1 to 2 still improves the performance by 4.35%. We are still looking for a good threshold and maybe a better cost model than just counting the number of bonus instructions. However, according to the above numbers, we think it is at least worth adding a threshold to enable more experiments and tuning. Let me know what you think. Thanks! Test Plan: Added one test case to check the threshold is in effect Reviewers: nadav, eliben, meheff, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D5529 llvm-svn: 218711
*	Use a loop to simplify the runtime unrolling prologue.	Kevin Qin	2014-09-29	1	-118/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: if (extraiters == loopfactor) jump L1 if (extraiters == loopfactor-1) jump L2 ... L1: LoopBody; L2: LoopBody; ... if tripcount < loopfactor jump End Loop: ... End: It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop. This commit is to use a loop to execute the extra iterations in prologue, like extraiters = tripcount % loopfactor if (extraiters == 0) jump Loop: else jump Prol Prol: LoopBody; extraiters -= 1 // Omitted if unroll factor is 2. if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2. if (tripcount < loopfactor) jump End Loop: ... End: Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution. llvm-svn: 218604
*	GlobalOpt: Preserve comdats of unoptimized initializers	Reid Kleckner	2014-09-23	1	-45/+26
\| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than slurping in and splatting out the whole ctor list, preserve the existing array entries without trying to understand them. Only remove the entries that we know we can optimize away. This way we don't need to wire through priority and comdats or anything else we might add. Fixes a linker issue where the .init_array or .ctors entry would point to discarded initialization code if the comdat group from the TU with the faulty global_ctors entry was dropped. llvm-svn: 218337
*	Fixing a build error.	Chris Bieneman	2014-09-17	1	-1/+1
\| \| \| \|	llvm-svn: 217983
*	Refactoring SimplifyLibCalls to remove static initializers and generally ↵	Chris Bieneman	2014-09-17	1	-1878/+1643
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cleaning up the code. Summary: This eliminates ~200 lines of code mostly file scoped struct definitions that were unnecessary. Reviewers: chandlerc, resistor Reviewed By: resistor Subscribers: morisset, resistor, llvm-commits Differential Revision: http://reviews.llvm.org/D5364 llvm-svn: 217982
*	Remove dead code in SimplifyCFG	Jingyue Wu	2014-09-15	1	-43/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: UsedByBranch is always true according to how BonusInst is defined. Test Plan: Passes check-all, and also verified if (BonusInst && !UsedByBranch) { ... } is never entered during check-all. Reviewers: resistor, nadav, jingyue Reviewed By: jingyue Subscribers: llvm-commits, eliben, meheff Differential Revision: http://reviews.llvm.org/D5324 llvm-svn: 217824
*	Simplify code. No functionality change.	Benjamin Kramer	2014-09-13	1	-15/+3
\| \| \| \|	llvm-svn: 217726
*	Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.)	Hal Finkel	2014-09-07	8	-51/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change, which allows @llvm.assume to be used from within computeKnownBits (and other associated functions in ValueTracking), adds some (optional) parameters to computeKnownBits and friends. These functions now (optionally) take a "context" instruction pointer, an AssumptionTracker pointer, and also a DomTree pointer, and most of the changes are just to pass this new information when it is easily available from InstSimplify, InstCombine, etc. As explained below, the significant conceptual change is that known properties of a value might depend on the control-flow location of the use (because we care that the @llvm.assume dominates the use because assumptions have control-flow dependencies). This means that, when we ask if bits are known in a value, we might get different answers for different uses. The significant changes are all in ValueTracking. Two main changes: First, as with the rest of the code, new parameters need to be passed around. To make this easier, I grouped them into a structure, and I made internal static versions of the relevant functions that take this structure as a parameter. The new code does as you might expect, it looks for @llvm.assume calls that make use of the value we're trying to learn something about (often indirectly), attempts to pattern match that expression, and uses the result if successful. By making use of the AssumptionTracker, the process of finding @llvm.assume calls is not expensive. Part of the structure being passed around inside ValueTracking is a set of already-considered @llvm.assume calls. This is to prevent a query using, for example, the assume(a == b), to recurse on itself. The context and DT params are used to find applicable assumptions. An assumption needs to dominate the context instruction, or come after it deterministically. In this latter case we only handle the specific case where both the assumption and the context instruction are in the same block, and we need to exclude assumptions from being used to simplify their own ephemeral values (those which contribute only to the assumption) because otherwise the assumption would prove its feeding comparison trivial and would be removed. This commit adds the plumbing and the logic for a simple masked-bit propagation (just enough to write a regression test). Future commits add more patterns (and, correspondingly, more regression tests). llvm-svn: 217342
*	Add an Assumption-Tracking Pass	Hal Finkel	2014-09-07	2	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds an immutable pass, AssumptionTracker, which keeps a cache of @llvm.assume call instructions within a module. It uses callback value handles to keep stale functions and intrinsics out of the map, and it relies on any code that creates new @llvm.assume calls to notify it of the new instructions. The benefit is that code needing to find @llvm.assume intrinsics can do so directly, without scanning the function, thus allowing the cost of @llvm.assume handling to be negligible when none are present. The current design is intended to be lightweight. We don't keep track of anything until we need a list of assumptions in some function. The first time this happens, we scan the function. After that, we add/remove @llvm.assume calls from the cache in response to registration calls and ValueHandle callbacks. There are no new direct test cases for this pass, but because it calls it validation function upon module finalization, we'll pick up detectable inconsistencies from the other tests that touch @llvm.assume calls. This pass will be used by follow-up commits that make use of @llvm.assume. llvm-svn: 217334
*	Enable noalias metadata by default and swap the order of the SLP and Loop ↵	James Molloy	2014-09-04	1	-1/+1
\| \| \| \| \| \| \| \|	vectorizers by default. After some time maturing, hopefully the flags themselves will be removed. llvm-svn: 217144
*	Feed AA to the inliner and use AA->getModRefBehavior in AddAliasScopeMetadata	Hal Finkel	2014-09-01	1	-11/+17
\| \| \| \| \| \| \| \| \| \| \| \|	This feeds AA through the IFI structure into the inliner so that AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which functions only access their arguments (instead of just hard-coding some knowledge of memory intrinsics). Most of the information is only available from BasicAA; this is important for preserving alias scoping information for target-specific intrinsics when doing the noalias parameter attribute to metadata conversion. llvm-svn: 216866
*	Fix AddAliasScopeMetadata again - alias.scope must be a complete description	Hal Finkel	2014-09-01	1	-15/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I thought that I had fixed this problem in r216818, but I did not do a very good job. The underlying issue is that when we add alias.scope metadata we are asserting that this metadata completely describes the aliasing relationships within the current aliasing scope domain, and so in the context of translating noalias argument attributes, the pointers must all be based on noalias arguments (as underlying objects) and have no other kind of underlying object. In r216818 excluding appropriate accesses from getting alias.scope metadata is done by looking for underlying objects that are not identified function-local objects -- but that's wrong because allocas, etc. are also function-local objects and we need to explicitly check that all underlying objects are the noalias arguments for which we're adding metadata aliasing scopes. This fixes the underlying-object check for adding alias.scope metadata, and does some refactoring of the related capture-checking eligibility logic (and adds more comments; hopefully making everything a bit clearer). Fixes self-hosting on x86_64 with -mllvm -enable-noalias-to-md-conversion (the feature is still disabled by default). llvm-svn: 216863
*	Fix AddAliasScopeMetadata to not add scopes when deriving from unknown pointers	Hal Finkel	2014-08-30	1	-25/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous implementation of AddAliasScopeMetadata, which adds noalias metadata to preserve noalias parameter attribute information when inlining had a flaw: it would add alias.scope metadata to accesses which might have been derived from pointers other than noalias function parameters. This was incorrect because even some access known not to alias with all noalias function parameters could easily alias with an access derived from some other pointer. Instead, when deriving from some unknown pointer, we cannot add alias.scope metadata at all. This fixes a miscompile of the test-suite's tramp3d-v4. Furthermore, we cannot add alias.scope to functions unless we know they access only argument-derived pointers (currently, we know this only for memory intrinsics). Also, we fix a theoretical problem with using the NoCapture attribute to skip the capture check. This is incorrect (as explained in the comment added), but would not matter in any code generated by Clang because we get only inferred nocapture attributes in Clang-generated IR. This functionality is not yet enabled by default. llvm-svn: 216818
*	Fix a typo in AddAliasScopeMetadata	Hal Finkel	2014-08-29	1	-1/+1
\| \| \| \|	llvm-svn: 216741
*	Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or ↵	Craig Topper	2014-08-27	3	-16/+8
\| \| \| \| \| \|	just letting them be implicitly created. llvm-svn: 216525
*	Remove dangling initializers in GlobalDCE	Bruno Cardoso Lopes	2014-08-25	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	GlobalDCE deletes global vars and updates their initializers to nullptr while leaving underlying constants to be cleaned up later by its uses. The clean up may never happen, fix this by forcing it every time it's safe to destroy constants. Final patch by Rafael Espindola http://reviews.llvm.org/D4931 <rdar://problem/17523868> llvm-svn: 216390
*	Use range based for loops to avoid needing to re-mention SmallPtrSet size.	Craig Topper	2014-08-24	3	-15/+10
\| \| \| \|	llvm-svn: 216351
*	Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator ↵	David Blaikie	2014-08-21	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	changes to ensure discriminator changes don't introduce new DWARF DW_TAG_lexical_blocks. Somewhat unnoticed in the original implementation of discriminators, but it could cause instructions to end up in new, small, DW_TAG_lexical_blocks due to the use of DILexicalBlock to track discriminator changes. Instead, use DILexicalBlockFile which we already use to track file changes without introducing new scopes, so it works well to track discriminator changes in the same way. llvm-svn: 216239