bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "Reapply: Teach SROA how to update debug info for fragmented variables."	Adrian Prantl	2015-01-08	1	-60/+8
\| \| \| \| \| \| \|	This reverts commit r225379 while investigating an assertion failure reported by Alexey. llvm-svn: 225424
*	Reapply: Teach SROA how to update debug info for fragmented variables.	Adrian Prantl	2015-01-07	1	-8/+60
\| \| \| \| \| \| \| \|	The two buildbot failures were addressed in LLVM r225378 and CFE r225359. This rapplies commit 225272 without modifications. llvm-svn: 225379
*	Analysis: Reformulate WillNotOverflowUnsignedAdd for reusability	David Majnemer	2015-01-07	3	-45/+12
\| \| \| \| \| \| \| \|	WillNotOverflowUnsignedAdd's smarts will live in ValueTracking as computeOverflowForUnsignedAdd. It now returns a tri-state result: never overflows, always overflows and sometimes overflows. llvm-svn: 225329
*	InstCombine: Just a small tidy-up	David Majnemer	2015-01-07	1	-3/+2
\| \| \| \|	llvm-svn: 225328
*	Revert "Reapply: Teach SROA how to update debug info for fragmented variables."	Adrian Prantl	2015-01-06	1	-60/+8
\| \| \| \| \| \| \| \| \|	because of a tsan buildbot failure. This reverts commit 225272. Fix should be coming soon. llvm-svn: 225288
*	This patch teaches IndVarSimplify to add nuw and nsw to certain kinds	Sanjoy Das	2015-01-06	1	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of operations that provably don't overflow. For example, we can prove %civ.inc below does not sign-overflow. With this change, IndVarSimplify changes %civ.inc to an add nsw. define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) { entry: %length = load i32* %length_ptr, !range !0 %len.sub.1 = sub i32 %length, 1 %upper = icmp slt i32 %init, %len.sub.1 br i1 %upper, label %loop, label %exit loop: %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ] %civ.inc = add i32 %civ, 1 %cmp = icmp slt i32 %civ.inc, %length br i1 %cmp, label %latch, label %break latch: store i32 0, i32* %array %check = icmp slt i32 %civ.inc, %len.sub.1 br i1 %check, label %loop, label %break break: ret i32 %civ.inc exit: ret i32 42 } Differential Revision: http://reviews.llvm.org/D6748 llvm-svn: 225282
*	Reapply: Teach SROA how to update debug info for fragmented variables.	Adrian Prantl	2015-01-06	1	-8/+60
\| \| \| \| \| \| \| \| \| \|	This also rolls in the changes discussed in http://reviews.llvm.org/D6766. Defers migrating the debug info for new allocas until after all partitions are created. Thanks to Chandler for reviewing! llvm-svn: 225272
*	Convert fcmp with 0.0 from casted integers to icmp	Matt Arsenault	2015-01-06	1	-4/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is already handled in general when it is known the conversion can't lose bits with smaller integer types casted into wider floating point types. This pattern happens somewhat often in GPU programs that cast workitem intrinsics to float, which are often compared with 0. Specifically handle the special case of compares with zero which should also be known to not lose information. I had a more general version of this which allows equality compares if the casted float is exactly representable in the integer, but I'm not 100% confident that is always correct. Also fold cases that aren't integers to true / false. llvm-svn: 225265
*	InstCombine: Bitcast call arguments from/to pointer/integer type	David Majnemer	2015-01-06	1	-4/+13
\| \| \| \| \| \| \|	Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing arguments and return values when DataLayout says it is safe to do so. llvm-svn: 225254
*	SymbolRewriter: use iplist::splice	Saleem Abdulrasool	2015-01-05	1	-1/+1
\| \| \| \| \| \| \| \|	The swap implementation for iplist is currently unsupported. Simply splice the old list into place, which achieves the same purpose. This is needed in order to thread the -frewrite-map-file frontend option correctly. NFC. llvm-svn: 225186
*	SymbolRewriter: 80-column	Saleem Abdulrasool	2015-01-05	1	-2/+4
\| \| \| \| \| \|	Wrap a couple of lines. NFC. llvm-svn: 225185
*	Replace several 'assert(false' with 'llvm_unreachable' or fold a condition ↵	Craig Topper	2015-01-05	1	-1/+1
\| \| \| \| \| \|	into the assert. llvm-svn: 225160
*	Fixed a bug in memory dependence checking module of loop vectorization. The ↵	Jiangning Liu	2015-01-05	1	-48/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	following loop should not be vectorized with current algorithm. {code} // loop body ... = a[i] (1) ... = a[i+1] (2) ....... a[i+1] = .... (3) a[i] = ... (4) {code} The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed. For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized. The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly. llvm-svn: 225159
*	[SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an	Chandler Carruth	2015-01-05	1	-11/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	assert out of the new pre-splitting in SROA. This fix makes the code do what was originally intended -- when we have a store of a load both dealing in the same alloca, we force them to both be pre-split with identical offsets. This is really quite hard to do because we can keep discovering problems as we go along. We have to track every load over the current alloca which for any resaon becomes invalid for pre-splitting, and go back to remove all stores of those loads. I've included a couple of test cases derived from PR22093 that cover the different ways this can happen. While that PR only really triggered the first of these two, its the same fundamental issue. The other challenge here is documented in a FIXME now. We end up being quite a bit more aggressive for pre-splitting when loads and stores don't refer to the same alloca. This aggressiveness comes at the cost of introducing potentially redundant loads. It isn't clear that this is the right balance. It might be considerably better to require that we only do pre-splitting when we can presplit every load and store involved in the entire operation. That would give more consistent if conservative results. Unfortunately, it requires a non-trivial change to the actual pre-splitting operation in order to correctly handle cases where we end up pre-splitting stores out-of-order. And it isn't 100% clear that this is the right direction, although I'm starting to suspect that it is. llvm-svn: 225149
*	[PM] Split the AssumptionTracker immutable pass into two separate APIs:	Chandler Carruth	2015-01-04	35	-298/+304
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a cache of assumptions for a single function, and an immutable pass that manages those caches. The motivation for this change is two fold. Immutable analyses are really hacks around the current pass manager design and don't exist in the new design. This is usually OK, but it requires that the core logic of an immutable pass be reasonably partitioned off from the pass logic. This change does precisely that. As a consequence it also paves the way for the many utility functions that deal in the assumptions to live in both pass manager worlds by creating an separate non-pass object with its own independent API that they all rely on. Now, the only bits of the system that deal with the actual pass mechanics are those that actually need to deal with the pass mechanics. Once this separation is made, several simplifications become pretty obvious in the assumption cache itself. Rather than using a set and callback value handles, it can just be a vector of weak value handles. The callers can easily skip the handles that are null, and eventually we can wrap all of this up behind a filter iterator. For now, this adds boiler plate to the various passes, but this kind of boiler plate will end up making it possible to port these passes to the new pass manager, and so it will end up factored away pretty reasonably. llvm-svn: 225131
*	InstCombine: match can find ConstantExprs, don't assume we have a Value	David Majnemer	2015-01-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	We assumed the output of a match was a Value, this would cause us to assert because we would fail a cast<>. Instead, use a helper in the Operator family to hide the distinction between Value and Constant. This fixes PR22087. llvm-svn: 225127
*	[asan] simplify the tracing code, make it use the same guard variables as ↵	Kostya Serebryany	2015-01-03	1	-25/+12
\| \| \| \| \| \|	coverage llvm-svn: 225103
*	InstCombine: Detect when llvm.umul.with.overflow always overflows	David Majnemer	2015-01-02	1	-0/+2
\| \| \| \| \| \| \|	We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero and LHSKnownOne * RHSKnownOne overflow. llvm-svn: 225077
*	Analysis: Reformulate WillNotOverflowUnsignedMul for reusability	David Majnemer	2015-01-02	3	-53/+9
\| \| \| \| \| \| \| \|	WillNotOverflowUnsignedMul's smarts will live in ValueTracking as computeOverflowForUnsignedMul. It now returns a tri-state result: never overflows, always overflows and sometimes overflows. llvm-svn: 225076
*	[SROA] Teach SROA to be more aggressive in splitting now that we have	Chandler Carruth	2015-01-02	1	-27/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a pre-splitting pass over loads and stores. Historically, splitting could cause enough problems that I hamstrung the entire process with a requirement that splittable integer loads and stores must cover the entire alloca. All smaller loads and stores were unsplittable to prevent chaos from ensuing. With the new pre-splitting logic that does load/store pair splitting I introduced in r225061, we can now very nicely handle arbitrarily splittable loads and stores. In order to fully benefit from these smarts, we need to mark all of the integer loads and stores as splittable. However, we don't actually want to rewrite partitions with all integer loads and stores marked as splittable. This will fail to extract scalar integers from aggregates, which is kind of the point of SROA. =] In order to resolve this, what we really want to do is only do pre-splitting on the alloca slices with integer loads and stores fully splittable. This allows us to uncover all non-integer uses of the alloca that would benefit from a split in an integer load or store (and where introducing the split is safe because it is just memory transfer from a load to a store). Once done, we make all the non-whole-alloca integer loads and stores unsplittable just as they have historically been, repartition and rewrite. The result is that when there are integer loads and stores anywhere within an alloca (such as from a memcpy of a sub-object of a larger object), we can split them up if there are non-integer components to the aggregate hiding beneath. I've added the challenging test cases to demonstrate how this is able to promote to scalars even a case where we have even partially overlapping loads and stores. This restores the single-store behavior for small arrays of i8s which is really nice. I've restored both the little endian testing and big endian testing for these exactly as they were prior to r225061. It also forced me to be more aggressive in an alignment test to actually defeat SROA. =] Without the added volatiles there, we actually split up the weird i16 loads and produce nice double allocas with better alignment. This also uncovered a number of bugs where we failed to handle splittable load and store slices which didn't have a begininng offset of zero. Those fixes are included, and without them the existing test cases explode in glorious fireworks. =] I've kept support for leaving whole-alloca integer loads and stores as splittable even for the purpose of rewriting, but I think that's likely no longer needed. With the new pre-splitting, we might be able to remove all the splitting support for loads and stores from the rewriter. Not doing that in this patch to try to isolate any performance regressions that causes in an easy to find and revert chunk. llvm-svn: 225074
*	[SROA] Make the computation of adjusted pointers not leak GEP	Chandler Carruth	2015-01-02	1	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instructions. I noticed this when working on dialing up how aggressively we can pre-split loads and stores. My test case wasn't passing because dead GEPs into the allocas persisted when they were built by this routine. This isn't terribly harmful, we still rewrote and promoted the alloca and I can't conceive of how to cause this to happen in a case where we will keep the exact same alloca but rewrite and promote the uses of it. If that ever happened, we'd get an assert out of mem2reg. So I don't have a direct test case yet, but the subsequent commit's test case wouldn't pass without this. There are other problems fixed by this patch that I spotted purely by inspection such as the fact that getAdjustedPtr could have actually deleted dead base pointers. I don't know how to get a base pointer to go into getAdjustedPtr today, so I think this bug could never have manifested (and I certainly can't write a test case for it) but, it wasn't the intent of the code. The code really just wanted to GC the new instructions built. That can be done more directly by comparing with the base pointer which is the only non-new instruction that this code can return. llvm-svn: 225073
*	[SROA] Fix the loop exit placement to be prior to indexing the splits	Chandler Carruth	2015-01-02	1	-4/+8
\| \| \| \| \| \| \| \|	array. This prevents it from walking out of bounds on the splits array. Bug found with the existing tests by ASan and by the MSVC debug build. llvm-svn: 225069
*	[SROA] Fix two total think-os in r225061 that should have been caught on	Chandler Carruth	2015-01-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a +asserts bootstrap, but my bootstrap had asserts off. Oops. Anyways, in some places it is reasonable to cast (as a sanity check) the pointer operand to a load or store to an instruction within SROA -- namely when the pointer operand is expected to be derived from an alloca, and thus always an instruction. However, the pre-splitting code also deals with loads and stores to non-alloca pointers and there we need to just use the Value*. Nothing about the code relied on the instruction cast, it was only there essentially as an invariant assertion. Remove the two that don't actually hold. This should fix the proximate issue in PR22080, but I'm also doing an asserts bootstrap myself to see if there are other issues lurking. I'll craft a reduced test case in a moment, but I wanted to get the tree healthy as quickly as possible. llvm-svn: 225068
*	[SROA] Switch to using a more direct debug logging technique in one part	Chandler Carruth	2015-01-01	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	of my new load and store splitting, and fix a bug where it logged a totally irrelevant slice rather than the actual slice in question. The logging here previously worked because we used to place new slices onto the back of the core sequence, but that caused other problems. I updated the actual code to store new slices in their own vector but didn't update the logging. There isn't a good way to reuse the logging any more, and frankly it wasn't needed. We can directly log this bit more easily. llvm-svn: 225063
*	[SROA] Fix formatting with clang-format which I managed to fail to do	Chandler Carruth	2015-01-01	1	-48/+48
\| \| \| \| \| \|	prior to committing r225061. Sorry for that. llvm-svn: 225062
*	[SROA] Teach SROA how to much more intelligently handle split loads and	Chandler Carruth	2015-01-01	1	-2/+484
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	stores. When there are accesses to an entire alloca with an integer load or store as well as accesses to small pieces of the alloca, SROA splits up the large integer accesses. In order to do that, it uses bit math to merge the small accesses into large integers. While this is effective, it produces insane IR that can cause significant problems in the rest of the optimizer: - It can cause load and store mismatches with GVN on the non-alloca side where we end up loading an i64 (or some such) rather than loading specific elements that are stored. - We can't always get rid of the integer bit math, which is why we can't always fix the loads and stores to work well with GVN. - This is especially bad when we have operations that mix poorly with integer bit math such as floating point operations. - It will block things like the vectorizer which might be able to handle the scalar stores that underly the aggregate. At the same time, we can't just directly split up these loads and stores in all cases. If there is actual integer arithmetic involved on the values, then using integer bit math is actually the perfect lowering because we can often combine it heavily with the surrounding math. The solution this patch provides is to find places where SROA is partitioning aggregates into small elements, and look for splittable loads and stores that it can split all the way to some other adjacent load and store. These are uniformly the cases where failing to split the loads and stores hurts the optimizer that I have seen, and I've looked extensively at the code produced both from more and less aggressive approaches to this problem. However, it is quite tricky to actually do this in SROA. We may have loads and stores to the same alloca, or other complex patterns that are hard to handle. This complexity leads to the somewhat subtle algorithm implemented here. We have to do this entire process as a separate pass over the partitioning of the alloca, and split up all of the loads prior to splitting the stores so that we can handle safely the cases of overlapping, including partially overlapping, loads and stores to the same alloca. We also have to reconstitute the post-split slice configuration so we can avoid iterating again over all the alloca uses (the slow part of SROA). But we also have to ensure that when we split up loads and stores to other allocas, we do re-iterate over them in SROA to adapt to the more refined partitioning now required. With this, I actually think we can fix a long-standing TODO in SROA where I avoided splitting as many loads and stores as probably should be splittable. This limitation historically mitigated the fallout of all the bad things mentioned above. Now that we have more intelligent handling, I plan to remove the FIXME and more aggressively mark integer loads and stores as splittable. I'll do that in a follow-up patch to help with bisecting any fallout. The net result of this change should be more fine-grained and accurate scalars being formed out of aggregates. At the very least, Clang now generates perfect code for this high-level test case using std::complex<float>: #include <complex> void g1(std::complex<float> &x, float a, float b) { x += std::complex<float>(a, b); } void g2(std::complex<float> &x, float a, float b) { x -= std::complex<float>(a, b); } void foo(const std::complex<float> &x, float a, float b, std::complex<float> &x1, std::complex<float> &x2) { std::complex<float> l1 = x; g1(l1, a, b); std::complex<float> l2 = x; g2(l2, a, b); x1 = l1; x2 = l2; } This code isn't just hypothetical either. It was reduced out of the hot inner loops of essentially every part of the Eigen math library when using std::complex<float>. Those loops would consistently and pervasively hop between the floating point unit and the integer unit due to bit math extraction and insertion of floating point values that were "stored" in a 64-bit integer register around the loop backedge. So far, this change has passed a bootstrap and I have done some other testing and so far, no issues. That doesn't mean there won't be though, so I'll be prepared to help with any fallout. If you performance swings in particular, please let me know. I'm very curious what all the impact of this change will be. Stay tuned for the follow-up to also split more integer loads and stores. llvm-svn: 225061
*	InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X	Sanjay Patel	2014-12-31	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Some day the backend may handle instruction-level fast math flags and make this transform unnecessary, but it's still better practice to use the canonical representation of fneg when possible (use a -0.0). This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ). See also http://reviews.llvm.org/D6723. Differential Revision: http://reviews.llvm.org/D6731 llvm-svn: 225050
*	InstCombine: try to transform A-B < 0 into A < B	David Majnemer	2014-12-31	1	-0/+20
\| \| \| \| \| \| \|	We are allowed to move the 'B' to the right hand side if we an prove there is no signed overflow and if the comparison itself is signed. llvm-svn: 225034
*	[asan] change _sanitizer_cov_module_init to accept int* instead of int**	Kostya Serebryany	2014-12-30	1	-18/+34
\| \| \| \|	llvm-svn: 224999
*	Some code improvements in Masked Load/Store.	Elena Demikhovsky	2014-12-30	1	-21/+10
\| \| \| \| \| \|	No functional changes. llvm-svn: 224986
*	Carry facts about nullness and undef across GC relocation	Philip Reames	2014-12-29	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change implements four basic optimizations: If a relocated value isn't used, it doesn't need to be relocated. If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.) If the value being relocated is undef, the relocation is meaningless. If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.) I outlined other planned work in comments. Differential Revision: http://reviews.llvm.org/D6600 llvm-svn: 224968
*	Refine the notion of MayThrow in LICM to include a header specific version	Philip Reames	2014-12-29	1	-8/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up. This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs. define void @nothrow_header(i64 %x, i64 %y, i1 %cond) { ; CHECK-LABEL: nothrow_header ; CHECK-LABEL: entry ; CHECK: %div = udiv i64 %x, %y ; CHECK-LABEL: loop ; CHECK: call void @use(i64 %div) entry: br label %loop loop: ; preds = %entry, %for.inc %div = udiv i64 %x, %y br i1 %cond, label %loop-if, label %exit loop-if: call void @use(i64 %div) br label %loop exit: ret void } The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load. The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata. Differential Revision: http://reviews.llvm.org/D6725 llvm-svn: 224965
*	Loading from null is valid outside of addrspace 0	Philip Reames	2014-12-29	1	-10/+10
\| \| \| \| \| \| \| \| \| \|	This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen. This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces. We really should introduce a hook to control this property on a per target per address space basis. We may be loosing valuable optimizations in some address spaces by being too conservative. Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me. llvm-svn: 224961
*	InstCombine: Infer nuw for multiplies	David Majnemer	2014-12-26	2	-0/+39
\| \| \| \| \| \| \|	A multiply cannot unsigned wrap if there are bitwidth, or more, leading zero bits between the two operands. llvm-svn: 224849
*	InstCombe: Infer nsw for multiplies	David Majnemer	2014-12-26	4	-89/+91
\| \| \| \| \| \| \|	We already utilize this logic for reducing overflow intrinsics, it makes sense to reuse it for normal multiplies as well. llvm-svn: 224847
*	Masked Load/Store - Changed the order of parameters in intrinsics.	Elena Demikhovsky	2014-12-25	1	-15/+5
\| \| \| \| \| \| \|	No functional changes. The documentation is coming. llvm-svn: 224829
*	[SROA] Update the documentation and names for accessing the slices	Chandler Carruth	2014-12-24	1	-29/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	within a partition of an alloca in SROA. This reflects the fact that the organization of the slices isn't really ideal for analysis, but is the naive way in which the slices are available while we're processing them in the core partitioning algorithm. It is possible we could improve matters, and I've left a FIXME with one of my ideas for how to do this, but it is a lot of work, the benefit is somewhat minor, and it isn't clear that it would be strictly better. =/ Not really satisfying, but I'm out of really good ideas. This also improves one place where the debug logging failed to mark some split partitions. Now we log in one place, slightly later, and with accurate information about whether the slice is split by the partition being rewritten. llvm-svn: 224800
*	[SROA] Refactor the integer and vector promotion testing logic to	Chandler Carruth	2014-12-24	1	-47/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	operate in terms of the new Partition class, and generally have a more clear set of arguments. No functionality changed. The most notable improvements here are consistently using the terminology of 'partition' for a collection of slices that will be rewritten together and 'slice' for a region of an alloca that is used by a particular instruction. This also makes it more clear that the split things are actually slices as well, just ones that will be split by the proposed partition. This doesn't yet address the confusing aspects of the partition's interface where slices that will be split by the partition and start prior to the partition are accesssed via Partition::splitSlices() while the core range of slices exposed by a Partition includes both unsplit slices and slices which will be split by the end, but started within the offset range of the partition. This is particularly hard to address because the algorithm which computes partitions quite literally doesn't know which slices these will end up being until too late. I'm looking at whether I can fix that or not, but I'm not optimistic. I'll update the comments and/or names to further explain this either way. I've also added one FIXME in this patch relating to this confusion so that I don't forget about it. llvm-svn: 224798
*	[asan] change the coverage collection scheme so that we can easily emit ↵	Kostya Serebryany	2014-12-23	1	-12/+24
\| \| \| \| \| \|	coverage for the entire process as a single bit set, and if coverage_bitset=1 actually emit that bitset llvm-svn: 224789
*	[SimplifyCFG] Revise common code sinking	Michael Liao	2014-12-23	1	-30/+30
\| \| \| \| \| \| \| \| \| \|	- Fix the case where more than 1 common instructions derived from the same operand cannot be sunk. When a pair of value has more than 1 derived values in both branches, only 1 derived value could be sunk. - Replace BB1 -> (BB2, PN) map with joint value map, i.e. map of (BB1, BB2) -> PN, which is more accurate to track common ops. llvm-svn: 224757
*	Remove a bad cast in CloneModule()	Michael Kuperstein	2014-12-23	1	-1/+1
\| \| \| \| \| \|	A cast that was introduced in r209007 was accidentally left in after the changes made to GlobalAlias rules in r210062. This crashes if the aliasee is a now-leggal ConstantExpr. llvm-svn: 224756
*	Revert r224739: Debug info: Teach SROA how to update debug info for	Chandler Carruth	2014-12-23	1	-30/+1
\| \| \| \| \| \| \| \| \| \| \|	fragmented variables. This caused codegen to start crashing when we built somewhat large programs with debug info and optimizations. 'check-msan' hit in, and I suspect a bootstrap would as well. I mailed a test case to the review thread. llvm-svn: 224750
*	Remove dynamic allocation/indirection from GCOVBlocks owned by GCOVFunction	David Blaikie	2014-12-22	1	-22/+25
\| \| \| \| \| \| \| \| \|	Since these are all created in the DenseMap before they are referenced, there's no problem with pointer validity by the time it's required. This removes another use of DeleteContainerSeconds/manual memory management which I'm cleaning up from time to time. llvm-svn: 224744
*	[SROA] Lift the logic for traversing the alloca slices one partition at	Chandler Carruth	2014-12-22	1	-157/+303
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a time into a partition iterator and a Partition class. There is a lot of knock-on simplification that this enables, largely stemming from having a Partition object to refer to in lots of helpers. I've only done a minimal amount of that because enoguh stuff is changing as-is in this commit. This shouldn't change any observable behavior. I've worked hard to preserve the exact traversal semantics which were originally present even though some of them make no sense. I'll be changing some of this in subsequent commits now that the logic is carefully factored into a reusable place. The primary motivation for this change is to break the rewriting into phases in order to support more intelligent rewriting. For example, I'm planning to change how split loads and stores are rewritten to remove the significant overuse of integer bit packing in the resulting code and allow more effective secondary splitting of aggregates. For any of this to work, they have to share the exact traversal logic. llvm-svn: 224742
*	[LCSSA] Handle PHI insertion in disjoint loops	Bruno Cardoso Lopes	2014-12-22	3	-10/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Take two disjoint Loops L1 and L2. LoopSimplify fails to simplify some loops (e.g. when indirect branches are involved). In such situations, it can happen that an exit for L1 is the header of L2. Thus, when we create PHIs in one of such exits we are also inserting PHIs in L2 header. This could break LCSSA form for L2 because these inserted PHIs can also have uses in L2 exits, which are never handled in the current implementation. Provide a fix for this corner case and test that we don't assert/crash on that. Differential Revision: http://reviews.llvm.org/D6624 rdar://problem/19166231 llvm-svn: 224740
*	Debug info: Teach SROA how to update debug info for fragmented variables.	Adrian Prantl	2014-12-22	1	-1/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to generate debug info for extremely advanced code such as typedef struct { long int a; int b;} S; int foo(S s) { return s.b; } which at -O1 on x86_64 is codegen'd into define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 { ret i32 %s.coerce1, !dbg !24 } with this patch we emit the following debug info for this TAG_formal_parameter [3] AT_location( 0x00000000 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 ) AT_name( "s" ) AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" ) Thanks to chandlerc, dblaikie, and echristo for their feedback on all previous iterations of this patch! llvm-svn: 224739
*	InstCombine: Squash an icmp+select into bitwise arithmetic	David Majnemer	2014-12-20	1	-6/+24
\| \| \| \| \| \| \| \| \|	(X & INT_MIN) == 0 ? X ^ INT_MIN : X into X \| INT_MIN (X & INT_MIN) != 0 ? X ^ INT_MIN : X into X & INT_MAX This fixes PR21993. llvm-svn: 224676
*	[SROA] Run clang-format over the entire SROA pass as I wrote it before	Chandler Carruth	2014-12-20	1	-157/+138
\| \| \| \| \| \| \| \| \| \| \| \|	much of the glory of clang-format, and now any time I touch it I risk introducing formatting changes as part of a functional commit. Also, clang-format is way better at formatting my code than I am. Most of this is a huge improvement although I reverted a couple of places where I hit a clang-format bug with lambdas that has been filed but not (fully) fixed. llvm-svn: 224666
*	[BBVectorize] Remove two more redundant assignments.	Tilmann Scheller	2014-12-19	1	-2/+0
\| \| \| \| \| \|	Found by the Clang static analyzer. llvm-svn: 224590
*	[BBVectorize] Remove redundant assignment.	Tilmann Scheller	2014-12-19	1	-1/+0
\| \| \| \| \| \|	Found by the Clang static analyzer. llvm-svn: 224589