bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[SimplifyLibCalls] Add a new transformation: pow(exp(x), y) -> exp(x*y)	Davide Italiano	2015-11-03	3	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This one is enabled only under -ffast-math (due to rounding/overflows) but allows us to emit shorter code. Before (on FreeBSD x86-64): 4007f0: 50 push %rax 4007f1: f2 0f 11 0c 24 movsd %xmm1,(%rsp) 4007f6: e8 75 fd ff ff callq 400570 <exp2@plt> 4007fb: f2 0f 10 0c 24 movsd (%rsp),%xmm1 400800: 58 pop %rax 400801: e9 7a fd ff ff jmpq 400580 <pow@plt> 400806: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 40080d: 00 00 00 After: 4007b0: f2 0f 59 c1 mulsd %xmm1,%xmm0 4007b4: e9 87 fd ff ff jmpq 400540 <exp2@plt> 4007b9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) Differential Revision: http://reviews.llvm.org/D14045 llvm-svn: 251976
*	[CodegenPrepare] Do not rematerialize gc.relocates across different basic blocks	Igor Laevsky	2015-11-03	1	-0/+39
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D14258 llvm-svn: 251957
*	Fix PR25372 - teach replaceCongruentPHIs to handle cases where SE evaluates ↵	Silviu Baranga	2015-11-03	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a PHI to a SCEVConstant Summary: Since now Scalar Evolution can create non-add rec expressions for PHI nodes, it can also create SCEVConstant expressions. This will confuse replaceCongruentPHIs, which previously relied on the fact that SCEV could not produce constants in this case. We will now replace the node with a constant in these cases - or avoid processing the Phi in case of a type mismatch. Reviewers: sanjoy Subscribers: llvm-commits, majnemer Differential Revision: http://reviews.llvm.org/D14230 llvm-svn: 251938
*	LoopVectorizer - skip 'bitcast' between GEP and load.	Elena Demikhovsky	2015-11-03	2	-86/+125
\| \| \| \| \| \| \| \| \| \| \| \|	Skipping 'bitcast' in this case allows to vectorize load: %arrayidx = getelementptr inbounds double, double* %in, i64 %indvars.iv %tmp53 = bitcast double** %arrayidx to i64* %tmp54 = load i64, i64* %tmp53, align 8 Differential Revision http://reviews.llvm.org/D14112 llvm-svn: 251907
*	Revert "[IndVarSimplify] Rewrite loop exit values with their initial values ↵	Tobias Grosser	2015-11-03	2	-76/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	from loop preheader" Commit 251839 triggers miscompiles on some bots: http://lab.llvm.org:8011/builders/perf-x86_64-penryn-O3-polly-fast/builds/13723 (The commit is listed in 13722, but due to an existing failure introduced in 13721 and reverted in 13723 the failure is only visible in 13723) To verify r251839 is indeed the only change that triggered the buildbot failures and to ensure the buildbots remain green while investigating I temporarily revert this commit. At the current state it is unclear if this commit introduced some miscompile or if it only exposed code to Polly that is subsequently miscompiled by Polly. llvm-svn: 251901
*	[CGP] widen switch condition and case constants to target's register width ↵	Sanjay Patel	2015-11-02	2	-0/+190
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(2nd try) This is a redo of r251849 except the tests have been split into arch-specific folders to hopefully make the bots happy. This is a follow-up from the discussion in D12965. The block-at-a-time limitation of SelectionDAG also came up in D13297. Without the InstCombine change from D12965, I don't expect this patch to make any difference in the real world because InstCombine does not shrink cases like this in visitSwitchInst(). But we need to have this CGP safety harness in place before proceeding with any shrinkage in D12965, so we won't generate extra extends for compares. I've opted for IR regression tests in the patch because that seems like a clearer way to test the transform, but PowerPC CodeGen for an i16 widening test is shown below. x86 will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473 Before: BB#0: mr 4, 3 extsh. 3, 4 ble 0, .LBB0_5 BB#1: cmpwi 3, 99 bgt 0, .LBB0_9 BB#2: rlwinm 4, 4, 0, 16, 31 <--- 32-bit mask/extend li 3, 0 cmplwi 4, 1 beqlr 0 BB#3: cmplwi 4, 10 bne 0, .LBB0_12 BB#4: li 3, 1 blr .LBB0_5: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 65436 beq 0, .LBB0_13 BB#6: cmplwi 3, 65526 beq 0, .LBB0_15 BB#7: cmplwi 3, 65535 bne 0, .LBB0_12 BB#8: li 3, 4 blr .LBB0_9: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 100 beq 0, .LBB0_14 ... After: BB#0: rlwinm 4, 3, 0, 16, 31 <--- mask/extend to 32-bit and then use that for comparisons cmpwi 4, 999 ble 0, .LBB0_5 BB#1: lis 3, 0 ori 3, 3, 65525 cmpw 4, 3 bgt 0, .LBB0_9 BB#2: cmplwi 4, 1000 beq 0, .LBB0_14 BB#3: cmplwi 4, 65436 bne 0, .LBB0_13 BB#4: li 3, 6 blr .LBB0_5: li 3, 0 cmplwi 4, 1 beqlr 0 BB#6: cmplwi 4, 10 beq 0, .LBB0_12 BB#7: cmplwi 4, 100 bne 0, .LBB0_13 BB#8: li 3, 2 blr .LBB0_9: cmplwi 4, 65526 beq 0, .LBB0_15 BB#10: cmplwi 4, 65535 bne 0, .LBB0_13 ... Differential Revision: http://reviews.llvm.org/D13532 llvm-svn: 251857
*	revert r251849; need to move tests to arch-specific folders	Sanjay Patel	2015-11-02	1	-107/+0
\| \| \| \|	llvm-svn: 251851
*	Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using ↵	Cong Hou	2015-11-02	2	-4/+50
\| \| \| \| \| \| \| \| \| \| \| \|	larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. This is the second attempt to submit this patch. The first attempt got a test failure on ARM. This patch is updated to try to fix the failure (more specifically, by handling the case when VF=1). Differential revision: http://reviews.llvm.org/D8943 llvm-svn: 251850
*	[CGP] widen switch condition and case constants to target's register width	Sanjay Patel	2015-11-02	1	-0/+107
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow-up from the discussion in D12965. The block-at-a-time limitation of SelectionDAG also came up in D13297. Without the InstCombine change from D12965, I don't expect this patch to make any difference in the real world because InstCombine does not shrink cases like this in visitSwitchInst(). But we need to have this CGP safety harness in place before proceeding with any shrinkage in D12965, so we won't generate extra extends for compares. I've opted for IR regression tests in the patch because that seems like a clearer way to test the transform, but PowerPC CodeGen for an i16 widening test is shown below. x86 will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473 Before: BB#0: mr 4, 3 extsh. 3, 4 ble 0, .LBB0_5 BB#1: cmpwi 3, 99 bgt 0, .LBB0_9 BB#2: rlwinm 4, 4, 0, 16, 31 <--- 32-bit mask/extend li 3, 0 cmplwi 4, 1 beqlr 0 BB#3: cmplwi 4, 10 bne 0, .LBB0_12 BB#4: li 3, 1 blr .LBB0_5: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 65436 beq 0, .LBB0_13 BB#6: cmplwi 3, 65526 beq 0, .LBB0_15 BB#7: cmplwi 3, 65535 bne 0, .LBB0_12 BB#8: li 3, 4 blr .LBB0_9: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 100 beq 0, .LBB0_14 ... After: BB#0: rlwinm 4, 3, 0, 16, 31 <--- mask/extend to 32-bit and then use that for comparisons cmpwi 4, 999 ble 0, .LBB0_5 BB#1: lis 3, 0 ori 3, 3, 65525 cmpw 4, 3 bgt 0, .LBB0_9 BB#2: cmplwi 4, 1000 beq 0, .LBB0_14 BB#3: cmplwi 4, 65436 bne 0, .LBB0_13 BB#4: li 3, 6 blr .LBB0_5: li 3, 0 cmplwi 4, 1 beqlr 0 BB#6: cmplwi 4, 10 beq 0, .LBB0_12 BB#7: cmplwi 4, 100 bne 0, .LBB0_13 BB#8: li 3, 2 blr .LBB0_9: cmplwi 4, 65526 beq 0, .LBB0_15 BB#10: cmplwi 4, 65535 bne 0, .LBB0_13 ... Differential Revision: http://reviews.llvm.org/D13532 llvm-svn: 251849
*	[IndVarSimplify] Rewrite loop exit values with their initial values from ↵	Chen Li	2015-11-02	2	-1/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	loop preheader Summary: This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13974 llvm-svn: 251839
*	TvOS: add missing support for some libcalls.	Tim Northover	2015-11-02	2	-32/+37
\| \| \| \|	llvm-svn: 251811
*	Preserve load alignment and dereferenceable metadata during some transformations	Artur Pilipenko	2015-11-02	10	-2/+282
\| \| \| \| \| \| \| \|	Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D13953 llvm-svn: 251809
*	[SCEV] Don't create SCEV expressions that break LCSSA	Sanjoy Das	2015-10-31	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \|	Prevent `createNodeFromSelectLikePHI` from creating SCEV expressions that break LCSSA. A better fix for the same issue is to teach SCEVExpander to not break LCSSA by inserting PHI nodes at appropriate places. That's planned for the future. Fixes PR25360. llvm-svn: 251756
*	SamplePGO - Count sample records in embedded profiles when computing coverage.	Diego Novillo	2015-10-31	2	-0/+137
\| \| \| \| \| \| \|	The initial coverage checking code for sample records failed to count records inside inlined profiles. This change fixes the oversight. llvm-svn: 251752
*	[SimplifyLibCalls] Add test to ensure transform is not executed if fast-math	Davide Italiano	2015-10-31	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	attribute is not present. During my refactor in r251595 I changed the behavior of optimizeSqrt(), skipping the transformation if the function wasn't marked with unsafe-fp-math attribute. This fixed a bug, as confirmed by Sanjay (before the optimization was silently executed anyway), although it wasn't my primary aim. This commit adds a test to ensure the code doesn't break again. Reported by: Marcello Maggioni Discussed with: Sanjay Patel llvm-svn: 251747
*	[PM] Port StripDeadPrototypes to the new pass manager	Justin Bogner	2015-10-30	2	-1/+13
\| \| \| \| \| \| \|	This is a really straightforward port. Also adds a test for the pass, since it only seemed to be tested tangentially before. llvm-svn: 251726
*	[PM] Port ADCE to the new pass manager	Justin Bogner	2015-10-30	1	-0/+1
\| \| \| \|	llvm-svn: 251725
*	Whitespace. NFC	Justin Bogner	2015-10-30	1	-2/+0
\| \| \| \|	llvm-svn: 251724
*	Recommit r251680 (also need to update clang test)	Dehao Chen	2015-10-30	1	-0/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the discriminator assignment algorithm * If a scope has already been assigned a discriminator, do not reassign a nested discriminator for it. * If the file and line both match, even if the column does not match, we should assign a new discriminator for the stmt. original code: ; #1 int foo(int i) { ; #2 if (i == 3 \|\| i == 5) return 100; else return 99; ; #3 } ; i == 3: discriminator 0 ; i == 5: discriminator 2 ; return 100: discriminator 1 ; return 99: discriminator 3 llvm-svn: 251689
*	Remove oneline.ll test.	Dehao Chen	2015-10-30	1	-0/+0
\| \| \| \|	llvm-svn: 251688
*	Revert r251680:	Dehao Chen	2015-10-30	1	-98/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the discriminator assignment algorithm * If a scope has already been assigned a discriminator, do not reassign a nested discriminator for it. * If the file and line both match, even if the column does not match, we should assign a new discriminator for the stmt. original code: ; #1 int foo(int i) { ; #2 if (i == 3 \|\| i == 5) return 100; else return 99; ; #3 } ; i == 3: discriminator 0 ; i == 5: discriminator 2 ; return 100: discriminator 1 ; return 99: discriminator 3 llvm-svn: 251685
*	Update the discriminator assignment algorithm	Dehao Chen	2015-10-30	1	-0/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* If a scope has already been assigned a discriminator, do not reassign a nested discriminator for it. * If the file and line both match, even if the column does not match, we should assign a new discriminator for the stmt. original code: ; #1 int foo(int i) { ; #2 if (i == 3 \|\| i == 5) return 100; else return 99; ; #3 } ; i == 3: discriminator 0 ; i == 5: discriminator 2 ; return 100: discriminator 1 ; return 99: discriminator 3 llvm-svn: 251680
*	Revert r251593.	Diego Novillo	2015-10-29	1	-1/+1
\| \| \| \| \| \| \|	The patch in r251593 was only papering over the problem. The actual fix was committed in r251623. llvm-svn: 251635
*	Revert the revision 251592 as it fails a test on some platforms.	Cong Hou	2015-10-29	2	-50/+4
\| \| \| \|	llvm-svn: 251617
*	[LVI/CVP] Teach LVI about range metadata	Philip Reames	2015-10-29	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Somewhat shockingly for an analysis pass which is computing constant ranges, LVI did not understand the ranges provided by range metadata. As part of this change, I included a change to CVP primarily because doing so made it much easier to write small self contained test cases. CVP was previously only handling the non-local operand case, but given that LVI can sometimes figure out information about instructions standalone, I don't see any reason to restrict this. There could possibly be a compile time impact from this, but I suspect it should be minimal. If anyone has an example which substaintially regresses, please let me know. I could restrict the block local handling to ICmps feeding Terminator instructions if needed. Note that this patch continues a somewhat bad practice in LVI. In many cases, we know facts about values, and separate context sensitive facts about values. LVI makes no effort to distinguish and will frequently cache the same value fact repeatedly for different contexts. I would like to change this, but that's a large enough change that I want it to go in separately with clear documentation of what's changing. Other examples of this include the non-null handling, and arguments. As a meta comment: the entire motivation of this change was being able to write smaller (aka reasonable sized) test cases for a future patch teaching LVI about select instructions. Differential Revision: http://reviews.llvm.org/D13543 llvm-svn: 251606
*	[InstSimplify] sgt on i1s also encodes implication	Philip Reames	2015-10-29	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Follow on to http://reviews.llvm.org/D13074, implementing something pointed out by Sanjoy. His truth table from his comment on that bug summarizes things well: LHS \| RHS \| LHS >=s RHS \| LHS implies RHS 0 \| 0 \| 1 (0 >= 0) \| 1 0 \| 1 \| 1 (0 >= -1) \| 1 1 \| 0 \| 0 (-1 >= 0) \| 0 1 \| 1 \| 1 (-1 >= -1) \| 1 The key point is that an "i1 1" is the value "-1", not "1". Differential Revision: http://reviews.llvm.org/D13756 llvm-svn: 251597
*	[SimplifyCFG] Constant fold a branch implied by it's incoming edge	Philip Reames	2015-10-29	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \|	The most common use case is when eliminating redundant range checks in an example like the following: c = a[i+1] + a[i]; Note that all the smarts of the transform (the implication engine) is already in ValueTracking and is tested directly through InstructionSimplify. Differential Revision: http://reviews.llvm.org/D13040 llvm-svn: 251596
*	Tweak test check pattern to fix bot failure.	Diego Novillo	2015-10-29	1	-1/+1
\| \| \| \|	llvm-svn: 251593
*	Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using ↵	Cong Hou	2015-10-29	2	-4/+50
\| \| \| \| \| \| \| \|	larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. llvm-svn: 251592
*	SamplePGO - Add flag to check sampling coverage.	Diego Novillo	2015-10-28	2	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the flag -mllvm -sample-profile-check-coverage=N to the SampleProfile pass. N is the percent of input sample records that the user expects to apply. If the pass does not use N% (or more) of the sample records in the input, it emits a warning. This is useful to detect some forms of stale profiles. If the code has drifted enough from the original profile, there will be records that do not match the IR anymore. This will not detect cases where a sample profile record for line L is referring to some other instructions that also used to be at line L. llvm-svn: 251568
*	Revert "r251451 - [AliasSetTracker] Use mod/ref information for UnknownInstr"	Hal Finkel	2015-10-28	1	-40/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It looks like this broke the stage 2 builder: http://lab.llvm.org:8080/green/job/clang-stage2-configure-Rlto/6989/ Original commit message: AliasSetTracker does not need to convert the access mode to ModRefAccess if the new visited UnknownInst has only 'REF' modrefinfo to existing pointers in the sets. Patch by Andrew Zhogin! llvm-svn: 251562
*	[JumpThreading] Use dominating conditions to prove implications	Sanjoy Das	2015-10-28	1	-0/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If P branches to Q conditional on C and Q branches to R conditional on C' and C => C' then the branch conditional on C' can be folded to an unconditional branch. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13972 llvm-svn: 251557
*	Reapply: [LIR] Add support for creating memsets from loops with a negative ↵	Chad Rosier	2015-10-28	1	-0/+80
\| \| \| \| \| \| \| \|	stride. The simple fix is to prevent forming memcpy from loops with a negative stride. llvm-svn: 251518
*	Revert "[LIR] Add support for creating memsets from loops with a negative ↵	Chad Rosier	2015-10-28	1	-44/+0
\| \| \| \| \| \| \| \|	stride." This reverts commit r251512. This is causing LNT/chomp to fail. llvm-svn: 251513
*	[LIR] Add support for creating memsets from loops with a negative stride.	Chad Rosier	2015-10-28	1	-0/+44
\| \| \| \| \| \|	http://reviews.llvm.org/D14125 llvm-svn: 251512
*	Revert r251492 "[IndVarSimplify] Rewrite loop exit values with their	Chen Li	2015-10-28	2	-76/+1
\| \| \| \| \| \|	initial values from loop preheader", because it broke some bots. llvm-svn: 251498
*	[IndVarSimplify] Rewrite loop exit values with their initial values from ↵	Chen Li	2015-10-28	2	-1/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	loop preheader Summary: This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch. Reviewers: sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13974 llvm-svn: 251492
*	[GVN] Make a test case more robust	Sanjoy Das	2015-10-28	1	-12/+12
\| \| \| \| \| \| \| \| \| \|	The singleton !range metadata gets simplified more aggressively after a later change, so change the !range metadata to contain more than one element. While at it, turn some `; CHECK` s to `; CHECK-LABEL:` s. llvm-svn: 251485
*	[SimplifyCFG] Don't DCE catchret because the successor is unreachable	David Majnemer	2015-10-27	1	-0/+20
\| \| \| \| \| \| \|	CatchReturnInst has side-effects: it runs a destructor. This destructor could conceivably run forever/call exit/etc. and should not be removed. llvm-svn: 251461
*	[AliasSetTracker] Use mod/ref information for UnknownInstr	Hal Finkel	2015-10-27	1	-0/+40
\| \| \| \| \| \| \| \| \| \|	AliasSetTracker does not need to convert the access mode to ModRefAccess if the new visited UnknownInst has only 'REF' modrefinfo to existing pointers in the sets. Patch by Andrew Zhogin! llvm-svn: 251451
*	[ScalarEvolutionExpander] PHI on a catchpad can be used on both edges	David Majnemer	2015-10-27	1	-0/+42
\| \| \| \| \| \| \| \|	A PHI on a catchpad might be used by both edges out of the catchpad, feeding back into a loop. In this case, just use the insertion point. Anything more clever would require new basic blocks or PHI placement. llvm-svn: 251442
*	Revert r251291, "Loop Vectorizer - skipping "bitcast" before GEP"	NAKAMURA Takumi	2015-10-27	2	-125/+86
\| \| \| \| \| \| \|	It causes miscompilation of llvm/lib/ExecutionEngine/Interpreter/Execution.cpp. See also PR25324. llvm-svn: 251436
*	[SLP] Be more aggressive about reduction width selection.	Charlie Turner	2015-10-27	1	-0/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This change could be way off-piste, I'm looking for any feedback on whether it's an acceptable approach. It never seems to be a problem to gobble up as many reduction values as can be found, and then to attempt to reduce the resulting tree. Some of the workloads I'm looking at have been aggressively unrolled by hand, and by selecting reduction widths that are not constrained by a vector register size, it becomes possible to profitably vectorize. My test case shows such an unrolling which SLP was not vectorizing (on neither ARM nor X86) before this patch, but with it does vectorize. I measure no significant compile time impact of this change when combined with D13949 and D14063. There are also no significant performance regressions on ARM/AArch64 in SPEC or LNT. The more principled approach I thought of was to generate several candidate tree's and use the cost model to pick the cheapest one. That seemed like quite a big design change (the algorithms seem very much one-shot), and would likely be a costly thing for compile time. This seemed to do the job at very little cost, but I'm worried I've misunderstood something! Reviewers: nadav, jmolloy Subscribers: mssimpso, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D14116 llvm-svn: 251428
*	[SLP] Try a bit harder to find reduction PHIs	Charlie Turner	2015-10-27	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently, when the SLP vectorizer considers whether a phi is part of a reduction, it dismisses phi's whose incoming blocks are not the same as the block containing the phi. For the patterns I'm looking at, extending this rule to allow phis whose incoming block is a containing loop latch allows me to vectorize certain workloads. There is no significant compile-time impact, and combined with D13949, no performance improvement measured in ARM/AArch64 in any of SPEC2000, SPEC2006 or LNT. Reviewers: jmolloy, mcrosier, nadav Subscribers: mssimpso, nadav, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D14063 llvm-svn: 251425
*	[SLP] Treat SelectInsts as reduction values.	Charlie Turner	2015-10-27	1	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Certain workloads, in particular sum-of-absdiff loops, can be vectorized using SLP if it can treat select instructions as reduction values. The test case is a bit awkward. The AArch64 cost model needs some tuning to not be so pessimistic about selects. I've had to tweak the SLP threshold here. Reviewers: jmolloy, mzolotukhin, spatel, nadav Subscribers: nadav, mssimpso, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D13949 llvm-svn: 251424
*	Fix SamplePGO segfault when debug info is missing.	Diego Novillo	2015-10-27	2	-0/+41
\| \| \| \| \| \| \| \| \| \|	When emitting a remark for a conditional branch annotation, the remark uses the line location information of the conditional branch in the message. In some cases, that information is unavailable and the optimization would segfaul. I'm still not sure whether this is a bug or WAI, but the optimizer should not die because of this. llvm-svn: 251420
*	[ScalarEvolutionExpander] Properly insert no-op casts + EH Pads	David Majnemer	2015-10-27	1	-0/+148
\| \| \| \| \| \| \| \| \| \| \|	We want to insert no-op casts as close as possible to the def. This is tricky when the cast is of a PHI node and the BasicBlocks between the def and the use cannot hold any instructions. Iteratively walk EH pads until we hit a non-EH pad. This fixes PR25326. llvm-svn: 251393
*	[RS4GC] Strip noalias attribute after statepoint rewrite	Igor Laevsky	2015-10-26	2	-1/+65
\| \| \| \| \| \| \| \| \|	We should remove noalias along with dereference and dereference_or_null attributes because statepoint could potentially touch the entire heap including noalias objects. Differential Revision: http://reviews.llvm.org/D14032 llvm-svn: 251333
*	SamplePGO - Add optimization reports.	Diego Novillo	2015-10-26	2	-0/+192
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a couple of optimization remarks to the SamplePGO transformation. When it decides to inline a hot function (to mimic the inline stack and repeat useful inline decisions in the original build). It will also report branch destinations. For instance, given the code fragment: 6 if (i < 1000) 7 sum -= i; 8 else 9 sum += -i * rand(); If the 'else' branch is taken most of the time, building this code with -Rpass=sample-profile will produce: a.cc:9:14: remark: most popular destination for conditional branches at small.cc:6:9 [-Rpass=sample-profile] sum += -i * rand(); ^ llvm-svn: 251330
*	[safestack] Fast access to the unsafe stack pointer on AArch64/Android.	Evgeniy Stepanov	2015-10-26	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Android libc provides a fixed TLS slot for the unsafe stack pointer, and this change implements direct access to that slot on AArch64 via __builtin_thread_pointer() + offset. This change also moves more code into TargetLowering and its target-specific subclasses to get rid of target-specific codegen in SafeStackPass. This change does not touch the ARM backend because ARM lowers builting_thread_pointer as aeabi_read_tp, which is not available on Android. The previous iteration of this change was reverted in r250461. This version leaves the generic, compiler-rt based implementation in SafeStack.cpp instead of moving it to TargetLoweringBase in order to allow testing without a TargetMachine. llvm-svn: 251324