bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SimplifyLibCalls] Turn memchr(const, C, const) into a bitfield check.	Benjamin Kramer	2015-03-21	2	-2/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	strchr("123!", C) != nullptr is a common pattern to check if C is one of 1, 2, 3 or !. If the largest element of the string is smaller than the target's register size we can easily create a bitfield and just do a simple test for set membership. int foo(char C) { return strchr("123!", C) != nullptr; } now becomes cmpl $64, %edi ## range check sbbb %al, %al movabsq $0xE000200000001, %rcx btq %rdi, %rcx ## bit test sbbb %cl, %cl andb %al, %cl ## and the two conditions andb $1, %cl movzbl %cl, %eax ## returning an int ret (imho the backend should expand this into a series of branches, but that's a different story) The code is currently limited to bit fields that fit in a register, so usually 64 or 32 bits. Sadly, this misses anything using alpha chars or {}. This could be fixed by just emitting a i128 bit field, but that can generate really ugly code so we have to find a better way. To some degree this is also recreating switch lowering logic, but we can't simply emit a switch instruction and thus change the CFG within instcombine. llvm-svn: 232902
*	SimplifyLibCalls: Add basic optimization of memchr calls.	Benjamin Kramer	2015-03-21	1	-0/+121
\| \| \| \| \| \|	This is just memchr(x, y, 0) -> nullptr and constant folding. llvm-svn: 232896
*	MemoryDependenceAnalysis: Don't miscompile atomics	David Majnemer	2015-03-21	2	-85/+17
\| \| \| \| \| \| \| \| \| \| \| \|	r216771 introduced a change to MemoryDependenceAnalysis that allowed it to reason about acquire/release operations. However, this change does not ensure that the acquire/release operations pair. Unfortunately, this leads to miscompiles as we won't see an acquire load as properly memory effecting. This largely reverts r216771. This fixes PR22708. llvm-svn: 232889
*	[X86, AVX] instcombine common cases of vperm2* intrinsics into shuffles	Sanjay Patel	2015-03-20	1	-0/+236
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vperm2* intrinsics are just shuffles. In a few special cases, they're not even shuffles. Optimizing intrinsics in InstCombine is better than handling this in the front-end for at least two reasons: 1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders really angry (and so I have regrets about some patches from last week). 2. Doing mask conversion logic in header files is hard to write and subsequently read. There are a couple of TODOs in this patch to complete this optimization. Differential Revision: http://reviews.llvm.org/D8486 llvm-svn: 232852
*	Correctly estimate SROA savings for store operands in inline cost analysis.	Wei Mi	2015-03-20	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \|	When estimating SROA savings, we want to see if an address is derived off an alloca in the caller. For store instructions, operand 1 is the address operand, but the current code uses operand 0. Use getPointerOperand for loads and stores to fix this. Patch by Easwaran Raman. http://reviews.llvm.org/D8425 llvm-svn: 232827
*	LowerBitSets: Avoid reusing byte set addresses.	Peter Collingbourne	2015-03-19	1	-4/+8
\| \| \| \| \| \| \| \| \| \|	Each use of the byte array uses a different alias. This makes the backend less likely to reuse previously computed byte array addresses, improving the security of the CFI mechanism based on this pass. Differential Revision: http://reviews.llvm.org/D8455 llvm-svn: 232770
*	[InstCombine] Don't fold a GEP into itself through a PHI node	Daniel Jasper	2015-03-19	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This can only occur (I think) through the back-edge of the loop. However, folding a GEP into itself means that the value of the previous iteration needs to be stored in the meantime, thus requiring an additional register variable to be live, but not actually achieving anything (the gep still needs to be executed once per loop iteration). The attached test case is derived from: typedef unsigned uint32; typedef unsigned char uint8; inline uint8 f(uint32 value, uint8 target) { while (value >= 0x80) { value >>= 7; ++target; } ++target; return target; } uint8 g(uint32 b, uint8 target) { target = f(b, f(42, target)); return target; } What happens is that the GEP stored in incptr2 is folded into itself through the loop's back-edge and the phi-node stored in loopptr, effectively incrementing the ptr by "2" in each iteration instead of "1". In this case, it is actually increasing the number of GEPs required as the GEP before the loop can't be folded away anymore. For comparison: With this patch: define i8* @test4(i32 %value, i8* %buffer) { entry: %cmp = icmp ugt i32 %value, 127 br i1 %cmp, label %loop.header, label %exit loop.header: ; preds = %entry br label %loop.body loop.body: ; preds = %loop.body, %loop.header %buffer.pn = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ] %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ] %loopptr = getelementptr inbounds i8, i8* %buffer.pn, i64 1 %shr = lshr i32 %newval, 7 %cmp2 = icmp ugt i32 %newval, 16383 br i1 %cmp2, label %loop.body, label %loop.exit loop.exit: ; preds = %loop.body br label %exit exit: ; preds = %loop.exit, %entry %0 = phi i8* [ %loopptr, %loop.exit ], [ %buffer, %entry ] %incptr3 = getelementptr inbounds i8, i8* %0, i64 2 ret i8* %incptr3 } Without this patch: define i8* @test4(i32 %value, i8* %buffer) { entry: %incptr = getelementptr inbounds i8, i8* %buffer, i64 1 %cmp = icmp ugt i32 %value, 127 br i1 %cmp, label %loop.header, label %exit loop.header: ; preds = %entry br label %loop.body loop.body: ; preds = %loop.body, %loop.header %0 = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ] %loopptr = phi i8* [ %incptr, %loop.header ], [ %incptr2, %loop.body ] %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ] %shr = lshr i32 %newval, 7 %incptr2 = getelementptr inbounds i8, i8* %0, i64 2 %cmp2 = icmp ugt i32 %newval, 16383 br i1 %cmp2, label %loop.body, label %loop.exit loop.exit: ; preds = %loop.body br label %exit exit: ; preds = %loop.exit, %entry %ptr2 = phi i8* [ %incptr2, %loop.exit ], [ %incptr, %entry ] %incptr3 = getelementptr inbounds i8, i8* %ptr2, i64 1 ret i8* %incptr3 } Review: http://reviews.llvm.org/D8245 llvm-svn: 232718
*	TLI: Add addVectorizableFunctionsFromVecLib.	Michael Zolotukhin	2015-03-17	1	-0/+182
\| \| \| \| \| \| \| \|	Also, add several entries to vectorizable functions table, and corresponding tests. The table isn't complete, it'll be populated later. Review: http://reviews.llvm.org/D8131 llvm-svn: 232531
*	TTI: Honour cost model for estimating cost of vector-intrinsic and calls.	Michael Zolotukhin	2015-03-17	1	-18/+10
\| \| \| \| \|	Review: http://reviews.llvm.org/D8096 llvm-svn: 232528
*	[SwitchLowering] Remove incoming values in the reverse order	Michael Liao	2015-03-17	1	-0/+134
\| \| \| \| \| \| \|	- To prevent invalidating successive indices. llvm-svn: 232510
*	[IRCE] Re-commit tests cases.	Sanjoy Das	2015-03-17	2	-0/+74
\| \| \| \| \| \| \| \|	Re-commit the test cases added in r232444. These now use -irce-print-changed-loops and -irce-print-range-checks so they run correctly on a without asserts build of llvm. llvm-svn: 232452
*	[IRCE] Delete two tests.	Sanjoy Das	2015-03-17	2	-74/+0
\| \| \| \| \| \| \| \|	I accidentally checked in two tests that used -debug-only -- these fail on a release LLVM build. Temporarily delete these from the repo to keep the bots green while I fix this locally. llvm-svn: 232446
*	[IRCE] Support half-range checks.	Sanjoy Das	2015-03-17	2	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \|	This change to IRCE gets it to recognize "half" range checks. Half range checks are range checks that only either check if the index is `slt` some positive integer ("length") or if the index is `sge` `0`. The range solver does not try to be clever / aggressive about solving half-range checks -- it transforms "I < L" to "0 <= I < L" and "0 <= I" to "0 <= I < INT_SMAX". This is safe, but not always optimal. llvm-svn: 232444
*	GCOV: Make the exit block placement from r223193 optional	Justin Bogner	2015-03-16	1	-7/+16
\| \| \| \| \| \| \| \|	By default we want our gcov emission to stay 4.2 compatible, which means we need to continue emit the exit block last by default. We add an option to emit it before the body for users that need it. llvm-svn: 232438
*	LowerBitSets: do not use private aliases at all on Darwin.	Peter Collingbourne	2015-03-16	1	-4/+14
\| \| \| \| \| \| \|	LLVM currently turns these into linker-private symbols, which can be dead stripped by the Darwin linker. llvm-svn: 232435
*	DebugInfo: Fix testcases that fail -verify-debug-info=true	Duncan P. N. Exon Smith	2015-03-16	7	-13/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of PR22777, fix testcases that fail the debug info verifier. The changes fall into the following categories: - Empty `filename:` fields in `MDFile`s. Compile units and some types require non-empty filenames. A number of testcases have empty filenames, probably due to hand-reduction of testcases. - Not-quite empty arrays: `!{i32 0}`. This used to be equivalent in the debug info schema to `!{}`. They cause problems for `!MDSubroutineType`'s `types:` array, since it requires all operands to be valid types. (Note that `!{null}` is the correct type array for functions that take no arguments and return `void`.) - Significantly bitrotted testcases. Nodes got left behind a few upgrades ago because of missing or invalid tags. llvm-svn: 232415
*	[objc-arc] Make the ARC optimizer more conservative by forcing it to be ↵	Michael Gottesman	2015-03-16	2	-62/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	non-safe in both direction, but mitigate the problem by noting that we just care if there was a further use. The problem here is the infamous one direction known safe. I was hesitant to turn it off before b/c of the potential for regressions without an actual bug from users hitting the problem. This is that bug ; ). The main performance impact of having known safe in both directions is that often times it is very difficult to find two releases without a use in-between them since we are so conservative with determining potential uses. The one direction known safe gets around that problem by taking advantage of many situations where we have two retains in a row, allowing us to avoid that problem. That being said, the one direction known safe is unsafe. Consider the following situation: retain(x) retain(x) call(x) call(x) release(x) Then we know the following about the reference count of x: // rc(x) == N (for some N). retain(x) // rc(x) == N+1 retain(x) // rc(x) == N+2 call A(x) call B(x) // rc(x) >= 1 (since we can not release a deallocated pointer). release(x) // rc(x) >= 0 That is all the information that we can know statically. That means that we know that A(x), B(x) together can release (x) at most N+1 times. Lets say that we remove the inner retain, release pair. // rc(x) == N (for some N). retain(x) // rc(x) == N+1 call A(x) call B(x) // rc(x) >= 1 release(x) // rc(x) >= 0 We knew before that A(x), B(x) could release x up to N+1 times meaning that rc(x) may be zero at the release(x). That is not safe. On the other hand, consider the following situation where we have a must use of release(x) that x must be kept alive for after the release(x)**. Then we know that: // rc(x) == N (for some N). retain(x) // rc(x) == N+1 retain(x) // rc(x) == N+2 call A(x) call B(x) // rc(x) >= 2 (since we know that we are going to release x and that that release can not be the last use of x). release(x) // rc(x) >= 1 (since we can not deallocate the pointer since we have a must use after x). … // rc(x) >= 1 use(x) Thus we know that statically the calls to A(x), B(x) can together only release rc(x) N times. Thus if we remove the inner retain, release pair: // rc(x) == N (for some N). retain(x) // rc(x) == N+1 call A(x) call B(x) // rc(x) >= 1 … // rc(x) >= 1 use(x) We are still safe unless in the final … there are unbalanced retains, releases which would have caused the program to blow up anyways even before optimization occurred. The simplest form of must use is an additional release that has not been paired up with any retain (if we had paired the release with a retain and removed it we would not have the additional use). This fits nicely into the ARC framework since basically what you do is say that given any nested releases regardless of what is in between, the inner release is known safe. This enables us to get back the lost performance. <rdar://problem/19023795> llvm-svn: 232351
*	Verifier: Check debug info intrinsic arguments	Duncan P. N. Exon Smith	2015-03-15	21	-75/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Verify that debug info intrinsic arguments are valid. (These checks will not recurse through the full debug info graph, so they don't need to be cordoned of in `DebugInfoVerifier`.) With those checks in place, changing the `DbgIntrinsicInst` accessors to downcast to `MDLocalVariable` and `MDExpression` is natural (added isa specializations in `Metadata.h` to support this). Added tests to `test/Verifier` for the new -verify checks, and fixed the debug info in all the in-tree tests. If you have out-of-tree testcases that have started to fail to -verify, hopefully the verify checks are helpful. The most likely problem is that the expression argument is `!{}` (instead of `!MDExpression()`). llvm-svn: 232296
*	Update InstCombine to transform aggregate stores into scalar stores.	Mehdi Amini	2015-03-14	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a first step toward getting proper support for aggregate loads and stores. Test Plan: Added unittests Reviewers: reames, chandlerc Reviewed By: chandlerc Subscribers: majnemer, joker.eph, chandlerc, llvm-commits Differential Revision: http://reviews.llvm.org/D7780 Patch by Amaury Sechet From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 232284
*	Add a bunch of CHECK missing colons in tests. NFC.	Ahmed Bougacha	2015-03-14	1	-1/+1
\| \| \| \| \| \|	Some wouldn't pass; fixed most, the rest will be fixed separately. llvm-svn: 232239
*	LowerBitSets: Do not export symbols for bit set referenced globals on Darwin.	Peter Collingbourne	2015-03-14	1	-0/+5
\| \| \| \| \| \| \|	The linker on that platform may re-order symbols or strip dead symbols, which will break bit set checks. Avoid this by hiding the symbols from the linker. llvm-svn: 232235
*	Reapply "[Reassociate] Add initial support for vector instructions."	Robert Lougher	2015-03-13	1	-43/+189
\| \| \| \| \| \| \| \| \|	This reapplies the patch previously committed at revision 232190. This was reverted at revision 232196 as it caused test failures in tests that did not expect operands to be commuted. I have made the tests more resilient to reassociation in revision 232206. llvm-svn: 232209
*	instcombine: alloca: Canonicalize scalar allocation array size	Duncan P. N. Exon Smith	2015-03-13	1	-2/+2
\| \| \| \| \| \| \| \| \|	As a follow-up to r232200, add an `-instcombine` to canonicalize scalar allocations to `i32 1`. Since r232200, `iX 1` (for X != 32) are only created by RAUWs, so this shouldn't fire too often. Nevertheless, it's a cheap check and a nice cleanup. llvm-svn: 232202
*	AsmWriter: Write alloca array size explicitly (and -instcombine fixup)	Duncan P. N. Exon Smith	2015-03-13	1	-3/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Write the `alloca` array size explicitly when it's non-canonical. Previously, if the array size was `iX 1` (where X is not 32), the type would mutate to `i32` when round-tripping through assembly. The testcase I added fails in `verify-uselistorder` (as well as `FileCheck`), since the use-lists for `i32 1` and `i64 1` change. (Manman Ren came across this when running `verify-uselistorder` on some non-trivial, optimized code as part of PR5680.) The type mutation started with r104911, which allowed array sizes to be something other than an `i32`. Starting with r204945, we "canonicalized" to `i64` on 64-bit platforms -- and then on every round-trip through assembly, mutated back to `i32`. I bundled a fixup for `-instcombine` to avoid r204945 on scalar allocations. (There wasn't a clean way to sequence this into two commits, since the assembly change on its own caused testcase churn, and the `-instcombine` change can't be tested without the assembly changes.) An obvious alternative fix -- change `AllocaInst::AllocaInst()`, `AsmWriter` and `LLParser` to treat `intptr_t` as the canonical type for scalar allocations -- was rejected out of hand, since this required teaching them each about the data layout. A follow-up commit will add an `-instcombine` to canonicalize the scalar allocation array size to `i32 1` rather than leaving `iX 1` alone. rdar://problem/20075773 llvm-svn: 232200
*	Revert: "[Reassociate] Add initial support for vector instructions."	Robert Lougher	2015-03-13	1	-189/+43
\| \| \| \| \| \| \|	This reverts revision 232190 due to buildbot failure reported on clang-hexagon-elf for test arm64_vtst.c. To be investigated. llvm-svn: 232196
*	[Reassociate] Add initial support for vector instructions.	Robert Lougher	2015-03-13	1	-43/+189
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds initial support for vector instructions to the reassociation pass. It enables most parts of the pass to work with vectors but to keep the size of the patch small, optimization of Xor trees, canonicalization of negative constants and converting shifts to muls, etc., have been left out. This will be handled in later patches. The patch is based on an initial patch by Chad Rosier. Differential Revision: http://reviews.llvm.org/D7566 llvm-svn: 232190
*	[opaque pointer type] Add textual IR support for explicit type parameter to ↵	David Blaikie	2015-03-13	180	-749/+749
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gep operator Similar to gep (r230786) and load (r230794) changes. Similar migration script can be used to update test cases, which successfully migrated all of LLVM and Polly, but about 4 test cases needed manually changes in Clang. (this script will read the contents of stdin and massage it into stdout - wrap it in the 'apply.sh' script shown in previous commits + xargs to apply it over a large set of test cases) import fileinput import sys import re rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s$)((<\d\s+x\s+)?([^@]?)(\|\saddrspace\(\d+$)\s\(?(3)>)\s*)(?=$\|%\|@\|null\|undef\|blockaddress\|getelementptr\|addrspacecast\|bitcast\|inttoptr\|zeroinitializer\|<\|\[\[[a-zA-Z]\|\{\{)", re.MULTILINE \| re.DOTALL) def conv(match): line = match.group(1) line += match.group(4) line += ", " line += match.group(2) return line line = sys.stdin.read() off = 0 for match in re.finditer(rep, line): sys.stdout.write(line[off:match.start()]) sys.stdout.write(conv(match)) off = match.end() sys.stdout.write(line[off:]) llvm-svn: 232184
*	ConstantFold: Fix big shift constant folding	David Majnemer	2015-03-13	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \|	Constant folding for shift IR instructions ignores all bits above 32 of second argument (shift amount). Because of that, some undef results are not recognized and APInt can raise an assert failure if second argument has more than 64 bits. Patch by Paweł Bylica! Differential Revision: http://reviews.llvm.org/D7701 llvm-svn: 232176
*	Reapply 'Run LICM pass after loop unrolling pass.'	Kevin Qin	2015-03-12	1	-0/+43
\| \| \| \| \| \| \| \| \|	It's firstly committed at r231630, and reverted at r231635. Function pass InstructionSimplifier is inserted as barrier to make sure loop unroll pass won't affect on LICM pass. llvm-svn: 232011
*	InstCombine: Don't fold call bitcast into args if callee is byval	David Majnemer	2015-03-11	1	-0/+15
\| \| \| \| \| \| \|	This fixes a bug reported here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150309/265341.html llvm-svn: 231948
*	Inliner should not add callgraph edges for intrinsic calls (PR22857)	Sanjay Patel	2015-03-11	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \|	The CallGraphNode function "addCalledFunction()" asserts that edges are not to intrinsics. This patch makes sure that the Inliner does not add such an edge to the callgraph. Fix for clang crash by assertion: https://llvm.org/bugs/show_bug.cgi?id=22857 Differential Revision: http://reviews.llvm.org/D8231 llvm-svn: 231927
*	Prefer pipes over temporary files in a feeble attempt to stabilize this test ↵	Benjamin Kramer	2015-03-11	1	-3/+2
\| \| \| \| \| \|	on windows. llvm-svn: 231923
*	If a conditional branch jumps to the same target, remove the condition	Philip Reames	2015-03-10	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \|	Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass. I noticed this completely by accident in another test case. It's not anything that actually came from a real workload. p.s. We should probably do the same thing for switch instructions. Differential Revision: http://reviews.llvm.org/D8220 llvm-svn: 231881
*	Infer known bits from dominating conditions	Philip Reames	2015-03-10	1	-0/+152
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact. Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems. In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow. Even with that restriction, it handles many interesting cases: * Early exits from functions * Early exits from loops (for context instructions in the loop and after the check) * Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..) Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks. This patch implements two approaches to the problem that need further benchmarking. Approach 1 is to directly walk the dominator tree looking for interesting conditions. Approach 2 is to inspect other uses of the value being queried for interesting comparisons. From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated. Differential Revision: http://reviews.llvm.org/D7708 llvm-svn: 231879
*	Fix a crash in InstCombine where we could try to truncate a switch ↵	Owen Anderson	2015-03-10	1	-0/+7
\| \| \| \| \| \|	comparison to zero width. llvm-svn: 231761
*	Fix an infinite loop in InstCombine when an instruction with no users and ↵	Owen Anderson	2015-03-10	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \|	side effects can be constant folded. ReplaceInstUsesWith needs to return nullptr when the input has no users, because in that case it does not mutate the program. Otherwise, we can get stuck in an infinite loop of repeatedly attempting to constant fold and instruction with no users. llvm-svn: 231755
*	Revert r231630 - Run LICM pass after loop unrolling pass.	Kevin Qin	2015-03-09	1	-43/+0
\| \| \| \| \| \|	As it broke llvm bootstrap. llvm-svn: 231635
*	[AArch64] Enable partial & runtime unrolling on cortex-a57	Kevin Qin	2015-03-09	3	-0/+112
\| \| \| \| \| \| \| \|	For inner one of nested loops, it is more likely to be a hot loop, and the runtime check can be promoted out from patch 0001, so the overhead is less, we can try a doubled threshold to unroll more loops. llvm-svn: 231632
*	Introduce runtime unrolling disable matadata and use it to mark the scalar ↵	Kevin Qin	2015-03-09	3	-2/+37
\| \| \| \| \| \| \| \| \| \| \|	loop from vectorization. Runtime unrolling is an expensive optimization which can bring benefit only if the loop is hot and iteration number is relatively large enough. For some loops, we know they are not worth to be runtime unrolled. The scalar loop from vectorization is one of the cases. llvm-svn: 231631
*	Run LICM pass after loop unrolling pass.	Kevin Qin	2015-03-09	1	-0/+43
\| \| \| \| \| \| \| \| \|	Runtime unrollng will introduce a runtime check in loop prologue. If the unrolled loop is a inner loop, then the proglogue will be inside the outer loop. LICM pass can help to promote the runtime check out if the checked value is loop invariant. llvm-svn: 231630
*	InstCombine: fix fold "fcmp x, undef" to account for NaN	Mehdi Amini	2015-03-09	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: See the two test cases. ; Can fold fcmp with undef on one side by choosing NaN for the undef ; Can fold fcmp with undef on both side ; fcmp u_pred undef, undef -> true ; fcmp o_pred undef, undef -> false ; because whatever you choose for the first undef ; you can choose NaN for the other undef Reviewers: hfinkel, chandlerc, majnemer Reviewed By: majnemer Subscribers: majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D7617 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231626
*	Teach DataLayout to infer a plausible alignment for things even when nothing ↵	Owen Anderson	2015-03-08	1	-0/+10
\| \| \| \| \| \|	is specified by the user. llvm-svn: 231613
*	Do not restrict interleaved unrolling to small loops, depending on the target.	Olivier Sallenave	2015-03-06	1	-0/+73
\| \| \| \|	llvm-svn: 231528
*	Add a new pass "Loop Interchange"	Karthik Bhat	2015-03-06	3	-0/+820
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pass interchanges loops to provide a more cache-friendly memory access. For e.g. given a loop like - for(int i=0;i<N;i++) for(int j=0;j<N;j++) A[j][i] = A[j][i]+B[j][i]; is interchanged to - for(int j=0;j<N;j++) for(int i=0;i<N;i++) A[j][i] = A[j][i]+B[j][i]; This pass is currently disabled by default. To give a brief introduction it consists of 3 stages- LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix. LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time. LoopInterchangeTransform : Which does the actual transform. LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks. TODO: 1) Add support for reductions and lcssa phi. 2) Improve profitability model. 3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange. 4) Improve compile time regression found in llvm lnt due to this pass. 5) Fix issues in Dependency Analysis module. A special thanks to Hal for reviewing this code. Review: http://reviews.llvm.org/D7499 llvm-svn: 231458
*	[objc-arc] Remove annotations code.	Michael Gottesman	2015-03-06	1	-83/+0
\| \| \| \| \| \| \|	It will always be in the history if it is needed again. Now it is just dead code. llvm-svn: 231435
*	Teach ComputeNumSignBits about signed reminder.	Nadav Rotem	2015-03-06	1	-0/+21
\| \| \| \| \| \|	This optimization a continuation of r231140 that reasoned about signed div. llvm-svn: 231433
*	[RewriteStatepointsForGC] Yet more test cases for relocation	Philip Reames	2015-03-05	1	-2/+174
\| \| \| \| \| \| \| \|	At this point, we should have decent coverage of the involved code. I've got a few more test cases to cleanup and submit, but what's here is already reasonable. I've got a collection of liveness tests which will be posted for review along with a decent liveness algorithm in the next few days. Once those are in, the code in this file should be well tested and I can start renaming things without risk of serious breakage. llvm-svn: 231414
*	[RewriteStatepointsForGC] Add additional tests around relocation	Philip Reames	2015-03-05	1	-0/+123
\| \| \| \| \| \|	These are focused around the actual relocation rewriting itself, not the rest of the infrastructure. llvm-svn: 231399
*	[InstCombine] Fix an assertion when fmul has a ConstantExpr operand	Michael Kuperstein	2015-03-05	1	-0/+8
\| \| \| \| \| \| \| \| \|	isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions. Patch by Pawel Jurek <pawel.jurek@intel.com> Differential Revision: http://reviews.llvm.org/D8053 llvm-svn: 231359
*	Make DataLayout Non-Optional in the Module	Mehdi Amini	2015-03-04	44	-347/+231
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: DataLayout keeps the string used for its creation. As a side effect it is no longer needed in the Module. This is "almost" NFC, the string is no longer canonicalized, you can't rely on two "equals" DataLayout having the same string returned by getStringRepresentation(). Get rid of DataLayoutPass: the DataLayout is in the Module The DataLayout is "per-module", let's enforce this by not duplicating it more than necessary. One more step toward non-optionality of the DataLayout in the module. Make DataLayout Non-Optional in the Module Module->getDataLayout() will never returns nullptr anymore. Reviewers: echristo Subscribers: resistor, llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D7992 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 231270