bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Reapply r155136 after fixing PR12599.	Jakob Stoklund Olesen	2012-04-23	1	-39/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Original commit message: Defer some shl transforms to DAGCombine. The shl instruction is used to represent multiplication by a constant power of two as well as bitwise left shifts. Some InstCombine transformations would turn an shl instruction into a bit mask operation, making it difficult for later analysis passes to recognize the constsnt multiplication. Disable those shl transformations, deferring them to DAGCombine time. An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'. These transformations are deferred: (X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses) (X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1) (X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2) The corresponding exact transformations are preserved, just like div-exact + mul: (X >>?,exact C) << C --> X (X >>?,exact C1) << C2 --> X << (C2-C1) (X >>?,exact C1) << C2 --> X >>?,exact (C1-C2) The disabled transformations could also prevent the instruction selector from recognizing rotate patterns in hash functions and cryptographic primitives. I have a test case for that, but it is too fragile. llvm-svn: 155362
*	Fix issue 67 by checking that the interface functions weren't redefined in ↵	Alexander Potapenko	2012-04-23	1	-4/+18
\| \| \| \| \| \|	the compiled source file. llvm-svn: 155346
*	[tsan] use llvm/ADT/Statistic.h for tsan stats	Kostya Serebryany	2012-04-23	1	-40/+17
\| \| \| \|	llvm-svn: 155341
*	Revert r155136 "Defer some shl transforms to DAGCombine."	Jakob Stoklund Olesen	2012-04-20	1	-35/+39
\| \| \| \| \| \| \| \| \|	While the patch was perfect and defect free, it exposed a really nasty bug in X86 SelectionDAG that caused an llc crash when compiling lencod. I'll put the patch back in after fixing the SelectionDAG problem. llvm-svn: 155181
*	Put this expensive check below the less expensive ones.	Bill Wendling	2012-04-19	1	-9/+9
\| \| \| \|	llvm-svn: 155166
*	Avoid a bug in the path count computation, preventing an infinite	Dan Gohman	2012-04-19	1	-1/+1
\| \| \| \| \| \|	loop repeatedlt making the same change. This is for rdar://11256239. llvm-svn: 155160
*	Defer some shl transforms to DAGCombine.	Jakob Stoklund Olesen	2012-04-19	1	-39/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The shl instruction is used to represent multiplication by a constant power of two as well as bitwise left shifts. Some InstCombine transformations would turn an shl instruction into a bit mask operation, making it difficult for later analysis passes to recognize the constsnt multiplication. Disable those shl transformations, deferring them to DAGCombine time. An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'. These transformations are deferred: (X >>? C) << C --> X & (-1 << C) (When X >> C has multiple uses) (X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2) (When C2 > C1) (X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2) (When C1 > C2) The corresponding exact transformations are preserved, just like div-exact + mul: (X >>?,exact C) << C --> X (X >>?,exact C1) << C2 --> X << (C2-C1) (X >>?,exact C1) << C2 --> X >>?,exact (C1-C2) The disabled transformations could also prevent the instruction selector from recognizing rotate patterns in hash functions and cryptographic primitives. I have a test case for that, but it is too fragile. llvm-svn: 155136
*	Don't crash on code where the user put __attribute__((constructor)) on	Dan Gohman	2012-04-18	1	-1/+5
\| \| \| \| \| \|	a function with arguments. This fixes rdar://11265785. llvm-svn: 155073
*	Use a heavy hammer to fix PR12573.	Bill Wendling	2012-04-18	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \|	If the loop contains invoke instructions, whose unwind edge escapes the loop, then don't try to unswitch the loop. Doing so may cause the unwind edge to be split, which not only is non-trivial but doesn't preserve loop simplify information. Fixes PR12573 llvm-svn: 154987
*	loop-reduce: Add an early bailout to catch extremely large loops.	Andrew Trick	2012-04-18	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This introduces a threshold of 200 IV Users, which is very conservative but should be sufficient to avoid serious compile time sink or stack overflow. The llvm test-suite with LTO never exceeds 190 users per loop. The bug doesn't relate to a specific type of loop. Checking in an arbitrary giant loop as a unit test would be silly. Fixes rdar://11262507. llvm-svn: 154983
*	fix pr12559: mark unavailable win32 math libcalls	Joe Groff	2012-04-17	1	-15/+10
\| \| \| \| \| \|	also fix SimplifyLibCalls to use TLI rather than compile-time conditionals to enable optimizations on floor, ceil, round, rint, and nearbyint llvm-svn: 154960
*	Fix style violation in BBVectorize (pointed out by Bill Wendling)	Hal Finkel	2012-04-16	1	-3/+3
\| \| \| \|	llvm-svn: 154810
*	Add a Fixme.	Bill Wendling	2012-04-16	1	-0/+2
\| \| \| \|	llvm-svn: 154793
*	Simplify checking for pointer types in BBVectorize (this change was ↵	Hal Finkel	2012-04-16	1	-5/+2
\| \| \| \| \| \|	suggested by Duncan). llvm-svn: 154787
*	Fix an error in BBVectorize important for vectorizing pointer types.	Hal Finkel	2012-04-14	1	-0/+31
\| \| \| \| \| \| \| \| \| \|	When vectorizing pointer types it is important to realize that potential pairs cannot be connected via the address pointer argument of a load or store. This is because even after vectorization, the address is still a scalar because the address of the higher half of the pair is implicit from the address of the lower half (it need not be, and should not be, explicitly computed). llvm-svn: 154735
*	Enhance BBVectorize to more-properly handle pointer values and vectorize GEPs.	Hal Finkel	2012-04-14	1	-2/+27
\| \| \| \|	llvm-svn: 154734
*	Add support to BBVectorize for vectorizing selects.	Hal Finkel	2012-04-13	1	-0/+8
\| \| \| \|	llvm-svn: 154700
*	Add some comments, and fix a few places that missed setting Changed.	Dan Gohman	2012-04-13	1	-2/+24
\| \| \| \|	llvm-svn: 154687
*	Consider ObjC runtime calls objc_storeWeak and others which make a copy of	Dan Gohman	2012-04-13	1	-14/+29
\| \| \| \| \| \| \|	their argument as "escape" points for objc_retainBlock optimization. This fixes rdar://11229925. llvm-svn: 154682
*	By default, use Early-CSE instead of GVN for vectorization cleanup.	Hal Finkel	2012-04-13	1	-2/+9
\| \| \| \| \| \| \| \| \| \|	As has been suggested by Duncan and others, Early-CSE and GVN should do similar redundancy elimination, but Early-CSE is much less expensive. Most of my autovectorization benchmarks show a performance regresion, but all of these are < 0.1%, and so I think that it is still worth using the less expensive pass. llvm-svn: 154673
*	Use the new Use-aware dominates method to apply the objc runtime	Dan Gohman	2012-04-13	1	-8/+5
\| \| \| \| \| \| \|	library return value optimization for phi uses. Even when the phi itself is not dominated, the specific use may be dominated. llvm-svn: 154647
*	Code-gen may inject code into the IR before it emits the ASM. The linker	Bill Wendling	2012-04-13	1	-0/+6
\| \| \| \| \| \| \| \|	obviously cannot know that this code is present, let alone used. So prevent the internalize pass from internalizing those global values which code-gen may insert. llvm-svn: 154645
*	Don't move objc_autorelease calls past autorelease pool boundaries when	Dan Gohman	2012-04-13	1	-3/+43
\| \| \| \| \| \| \|	optimizing autorelease calls on phi nodes with null operands. This fixes rdar://11207070. llvm-svn: 154642
*	Typo.	Chad Rosier	2012-04-11	1	-1/+1
\| \| \| \|	llvm-svn: 154522
*	Add two statistics to help track how we are computing the inline cost.	Chandler Carruth	2012-04-11	1	-0/+6
\| \| \| \| \| \|	Yea, 'NumCallerCallersAnalyzed' isn't a great name, suggestions welcome. llvm-svn: 154492
*	[tsan] two more compile-time optimizations:	Kostya Serebryany	2012-04-10	1	-11/+42
\| \| \| \| \| \| \| \| \| \| \| \| \|	- don't isntrument reads from constant globals. Saves ~1.5% of instrumented instructions on CPU2006 (counting static instructions, not their execution). - don't insrument reads from vtable (which is a global constant too). Saves ~5%. I did not measure the run-time impact of this, but it is certainly non-negative. llvm-svn: 154444
*	[tsan] compile-time instrumentation: do not instrument a read if	Kostya Serebryany	2012-04-10	1	-5/+82
\| \| \| \| \| \| \| \| \| \| \| \| \|	a write to the same temp follows in the same BB. Also add stats printing. On Spec CPU2006 this optimization saves roughly 4% of instrumented reads (which is 3% of all instrumented accesses): Writes : 161216 Reads : 446458 Reads-before-write: 18295 llvm-svn: 154418
*	Fix 12513: Loop unrolling breaks with indirect branches.	Andrew Trick	2012-04-10	2	-29/+18
\| \| \| \| \| \| \| \|	Take this opportunity to generalize the indirectbr bailout logic for loop transformations. CFG transformations will never get indirectbr right, and there's no point trying. llvm-svn: 154386
*	whitespace	Andrew Trick	2012-04-10	1	-140/+140
\| \| \| \|	llvm-svn: 154385
*	Teach InstCombine to nuke a common alloca pattern -- an alloca which has	Chandler Carruth	2012-04-08	1	-1/+70
\| \| \| \| \| \| \| \| \| \| \| \|	GEPs, bit casts, and stores reaching it but no other instructions. These often show up during the iterative processing of the inliner, SROA, and DCE. Once we hit this point, we can completely remove the alloca. These were actually showing up in the final, fully optimized code in a bunch of inliner tests I've been working on, and notably they show up after LLVM finishes optimizing away all function calls involved in hash_combine(a, b). llvm-svn: 154285
*	Refactor: Use positive field names in VectorizeConfig.	Hongbin Zheng	2012-04-07	1	-13/+15
\| \| \| \|	llvm-svn: 154249
*	Sink the collection of return instructions until after all	Chandler Carruth	2012-04-06	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \|	simplification has been performed. This is a bit less efficient (requires another ilist walk of the basic blocks) but shouldn't matter in practice. More importantly, it's just too much work to keep track of all the various ways the return instructions can be mutated while simplifying them. This fixes yet another crasher, reported by Daniel Dunbar. llvm-svn: 154179
*	Make GVN's propagateEquality non-recursive. No intended functionality change.	Duncan Sands	2012-04-06	1	-98/+105
\| \| \| \| \| \|	The modifications are a lot more trivial than they appear to be in the diff! llvm-svn: 154174
*	Sink the return instruction collection until after we're done deleting	Chandler Carruth	2012-04-06	1	-7/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	dead code, including dead return instructions in some cases. Otherwise, we end up having a bogus poniter to a return instruction that blows up much further down the road. It turns out that this pattern is both simpler to code, easier to update in the face of enhancements to the inliner cleanup, and likely cheaper given that it won't add dead instructions to the list. Thanks to John Regehr's numerous test cases for teasing this out. llvm-svn: 154157
*	Fix accidentally inverted logic from r152803, and make the	Dan Gohman	2012-04-05	1	-1/+1
\| \| \| \| \| \|	testcase slightly less trivial. This fixes rdar://11171718. llvm-svn: 154118
*	BBVectorize: Add the const modifier to the VectorizeConfig because we won't	Hongbin Zheng	2012-04-05	1	-1/+1
\| \| \| \| \| \|	modify it. llvm-svn: 154098
*	Introduce the VectorizeConfig class, with which we can control the behavior	Hongbin Zheng	2012-04-05	1	-32/+60
\| \| \| \| \| \| \| \| \|	of the BBVectorizePass without using command line option. As pointed out by Hal, we can ask the TargetLoweringInfo for the architecture specific VectorizeConfig to perform vectorizing with architecture specific information. llvm-svn: 154096
*	Add the function "vectorizeBasicBlock" which allow users vectorize a	Hongbin Zheng	2012-04-05	1	-5/+19
\| \| \| \| \| \| \|	BasicBlock in other passes, e.g. we can call vectorizeBasicBlock in the loop unroll pass right after the loop is unrolled. llvm-svn: 154089
*	Pass the right sign to TLI->isLegalICmpImmediate.	Jakob Stoklund Olesen	2012-04-05	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	LSR can fold three addressing modes into its ICmpZero node: ICmpZero BaseReg + Offset => ICmp BaseReg, -Offset ICmpZero -1ScaleReg + Offset => ICmp ScaleReg, Offset ICmpZero BaseReg + -1ScaleReg => ICmp BaseReg, ScaleReg The first two cases are only used if TLI->isLegalICmpImmediate() likes the offset. Make sure the right Offset sign is passed to this method in the second case. The ARM version is not symmetric. <rdar://problem/11184260> llvm-svn: 154079
*	Always compute all the bits in ComputeMaskedBits.	Rafael Espindola	2012-04-04	8	-45/+28
\| \| \| \| \| \| \| \|	This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011
*	LoopUnrollPass: Use variable "Threshold" instead of "CurrentThreshold" when	Hongbin Zheng	2012-04-04	1	-2/+2
\| \| \| \| \| \| \|	reducing unroll count, otherwise the reduced unroll count is not taking the "OptimizeForSize" attribute into account. llvm-svn: 154007
*	Add an option to turn off the expensive GVN load PRE part of GVN.	Bill Wendling	2012-04-02	1	-4/+5
\| \| \| \|	llvm-svn: 153902
*	Fast fix for PR12343:	Stepan Dyatkovskiy	2012-04-02	1	-4/+29
\| \| \| \| \| \| \| \| \| \|	http://llvm.org/bugs/show_bug.cgi?id=12343 We have not trivial way for splitting edges that are goes from indirect branch. We can do it with some tricks, but it should be additionally discussed. And it is still dangerous due to difficulty of indirect branches controlling. Fix forbids this case for unswitching. llvm-svn: 153879
*	Belatedly address some code review from Chris.	Chandler Carruth	2012-04-01	1	-1/+1
\| \| \| \| \| \| \|	As a side note, I really dislike array_pod_sort... Do we really still care about any STL implementations that get this so wrong? Does libc++? llvm-svn: 153834
*	Fix a pretty scary bug I introduced into the always inliner with	Chandler Carruth	2012-04-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	a single missing character. Somehow, this had gone untested. I've added tests for returns-twice logic specifically with the always-inliner that would have caught this, and fixed the bug. Thanks to Matt for the careful review and spotting this!!! =D llvm-svn: 153832
*	Give the always-inliner its own custom filter. It shouldn't have to pay	Chandler Carruth	2012-03-31	1	-20/+63
\| \| \| \| \| \| \| \| \| \| \| \|	the very high overhead of the complex inline cost analysis when all it wants to do is detect three patterns which must not be inlined. Comment the code, clean it up, and leave some hints about possible performance improvements if this ever shows up on a profile. Moving this off of the (now more expensive) inline cost analysis is particularly important because we have to run this inliner even at -O0. llvm-svn: 153814
*	Remove a bunch of empty, dead, and no-op methods from all of these	Chandler Carruth	2012-03-31	3	-26/+0
\| \| \| \| \| \| \| \| \| \|	interfaces. These methods were used in the old inline cost system where there was a persistent cache that had to be updated, invalidated, and cleared. We're now doing more direct computations that don't require this intricate dance. Even if we resume some level of caching, it would almost certainly have a simpler and more narrow interface than this. llvm-svn: 153813
*	Initial commit for the rewrite of the inline cost analysis to operate	Chandler Carruth	2012-03-31	3	-38/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	on a per-callsite walk of the called function's instructions, in breadth-first order over the potentially reachable set of basic blocks. This is a major shift in how inline cost analysis works to improve the accuracy and rationality of inlining decisions. A brief outline of the algorithm this moves to: - Build a simplification mapping based on the callsite arguments to the function arguments. - Push the entry block onto a worklist of potentially-live basic blocks. - Pop the first block off of the front of the worklist (for breadth-first ordering) and walk its instructions using a custom InstVisitor. - For each instruction's operands, re-map them based on the simplification mappings available for the given callsite. - Compute any simplification possible of the instruction after re-mapping, and store that back int othe simplification mapping. - Compute any bonuses, costs, or other impacts of the instruction on the cost metric. - When the terminator is reached, replace any conditional value in the terminator with any simplifications from the mapping we have, and add any successors which are not proven to be dead from these simplifications to the worklist. - Pop the next block off of the front of the worklist, and repeat. - As soon as the cost of inlining exceeds the threshold for the callsite, stop analyzing the function in order to bound cost. The primary goal of this algorithm is to perfectly handle dead code paths. We do not want any code in trivially dead code paths to impact inlining decisions. The previous metric was extremely flawed here, and would always subtract the average cost of two successors of a conditional branch when it was proven to become an unconditional branch at the callsite. There was no handling of wildly different costs between the two successors, which would cause inlining when the path actually taken was too large, and no inlining when the path actually taken was trivially simple. There was also no handling of the code path, only the immediate successors. These problems vanish completely now. See the added regression tests for the shiny new features -- we skip recursive function calls, SROA-killing instructions, and high cost complex CFG structures when dead at the callsite being analyzed. Switching to this algorithm required refactoring the inline cost interface to accept the actual threshold rather than simply returning a single cost. The resulting interface is pretty bad, and I'm planning to do lots of interface cleanup after this patch. Several other refactorings fell out of this, but I've tried to minimize them for this patch. =/ There is still more cleanup that can be done here. Please point out anything that you see in review. I've worked really hard to try to mirror at least the spirit of all of the previous heuristics in the new model. It's not clear that they are all correct any more, but I wanted to minimize the change in this single patch, it's already a bit ridiculous. One heuristic that is not yet mirrored is to allow inlining of functions with a dynamic alloca if the caller has a dynamic alloca. I will add this back, but I think the most reasonable way requires changes to the inliner itself rather than just the cost metric, and so I've deferred this for a subsequent patch. The test case is XFAIL-ed until then. As mentioned in the review mail, this seems to make Clang run about 1% to 2% faster in -O0, but makes its binary size grow by just under 4%. I've looked into the 4% growth, and it can be fixed, but requires changes to other parts of the inliner. llvm-svn: 153812
*	Internalize: Remove reference of @llvm.noinline, it was replaced with the ↵	Benjamin Kramer	2012-03-31	1	-1/+0
\| \| \| \| \| \|	noinline attribute a long time ago. llvm-svn: 153806
*	Correctly vectorize powi.	Hal Finkel	2012-03-31	1	-11/+33
\| \| \| \| \| \| \| \|	The powi intrinsic requires special handling because it always takes a single integer power regardless of the result type. As a result, we can vectorize only if the powers are equal. Fixes PR12364. llvm-svn: 153797