bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SLP] Look-ahead operand reordering heuristic.	Vasileios Porpodas	2019-11-11	1	-58/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for examples). Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk Reviewed By: RKSimon, dtemirbulatov Subscribers: xbolva00, Carrot, hiraditya, phosek, rnk, rcorcs, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60897
*	[SLP] respect target register width for GEP vectorization (PR43578)	Sanjay Patel	2019-10-09	2	-45/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We failed to account for the target register width (max vector factor) when vectorizing starting from GEPs. This causes vectorization to proceed to obviously illegal widths as in: https://bugs.llvm.org/show_bug.cgi?id=43578 For x86, this also means that SLP can produce rogue AVX or AVX512 code even when the user specifies a narrower vector width. The AArch64 test in ext-trunc.ll appears to be better using the narrower width. I'm not exactly sure what getelementptr.ll is trying to do, but it's testing with "-slp-threshold=-18", so I'm not worried about those diffs. The x86 test is an over-reduction from SPEC h264; this patch appears to restore the perf loss caused by SLP when using -march=haswell. Differential Revision: https://reviews.llvm.org/D68667 llvm-svn: 374183
*	[SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && ↵	Alexey Bataev	2019-09-29	3	-88/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"SCEVAddRecExpr operand is not loop-invariant!") Initially SLP vectorizer replaced all going-to-be-vectorized instructions with Undef values. It may break ScalarEvaluation and may cause a crash. Reworked SLP vectorizer so that it does not replace vectorized instructions by UndefValue anymore. Instead vectorized instructions are marked for deletion inside if BoUpSLP class and deleted upon class destruction. Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29641 llvm-svn: 373166
*	Revert [SLP] Fix for PR31847: Assertion failed: ↵	Jordan Rupprecht	2019-09-26	3	-34/+88
\| \| \| \| \| \| \| \|	(isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!") This reverts r372626 (git commit 6a278d9073bdc158d31d4f4b15bbe34238f22c18) llvm-svn: 373019
*	[SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && ↵	Alexey Bataev	2019-09-23	3	-88/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"SCEVAddRecExpr operand is not loop-invariant!") Summary: Initially SLP vectorizer replaced all going-to-be-vectorized instructions with Undef values. It may break ScalarEvaluation and may cause a crash. Reworked SLP vectorizer so that it does not replace vectorized instructions by UndefValue anymore. Instead vectorized instructions are marked for deletion inside if BoUpSLP class and deleted upon class destruction. Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy Differential Revision: https://reviews.llvm.org/D29641 llvm-svn: 372626
*	Improve reduction intrinsics by overloading result value.	Sander de Smalen	2019-06-13	4	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch uses the mechanism from D62995 to strengthen the definitions of the reduction intrinsics by letting the scalar result/accumulator type be overloaded from the vector element type. For example: ; The LLVM LangRef specifies that the scalar result must equal the ; vector element type, but this is not checked/enforced by LLVM. declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a) This patch changes that into: declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a) Which has the type-constraint more explicit and causes LLVM to check the result type with the vector element type. Reviewers: RKSimon, arsenm, rnk, greened, aemerson Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62996 llvm-svn: 363240
*	Revert "Temporarily Revert "Add basic loop fusion pass.""	Eric Christopher	2019-04-17	20	-0/+2966
\| \| \| \| \| \| \| \|	The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552
*	Temporarily Revert "Add basic loop fusion pass."	Eric Christopher	2019-04-17	20	-2966/+0
\| \| \| \| \| \| \| \|	As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546
*	[SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction ↵	Simon Pilgrim	2019-03-25	3	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	canonicalization Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops. This is prep work towards: (a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands (b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops. Differential Revision: https://reviews.llvm.org/D59738 llvm-svn: 356913
*	[SLP] fix variables names in test; NFC	Sanjay Patel	2019-03-22	1	-168/+168
\| \| \| \| \| \| \|	'tmpXXX' conflicts with the auto-generated script regex names. That could cause mask a bug or fail if the output changes. llvm-svn: 356790
*	Update GettingStarted guide to recommend that people use the new	James Y Knight	2019-01-14	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	official Git repository. Remove the directions for using git-svn, and demote the prominence of the svn instructions. Also, fix a few other issues while I'm in there: * Mention LLVM_ENABLE_PROJECTS more. * Getting started doesn't need to mention test-suite, but should mention clang and the other projects. * Remove mentions of "configure", since that's long gone. I've also adjusted a few other mentions of svn to point to github, but have not done so comprehensively. Differential Revision: https://reviews.llvm.org/D56654 llvm-svn: 351130
*	[SLP]Update test checks for the SPL vectorizer, NFC.	Alexey Bataev	2019-01-11	9	-92/+411
\| \| \| \|	llvm-svn: 350967
*	[SLPVectorizer] regenerate test checks; NFC	Sanjay Patel	2018-10-20	1	-30/+31
\| \| \| \|	llvm-svn: 344848
*	[SLP] Fix insert point for reused extract instructions.	Alexey Bataev	2018-08-07	1	-0/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reworked the previously committed patch to insert shuffles for reused extract element instructions in the correct position. Previous logic was incorrect, and might lead to the crash with PHIs and EH instructions. Reviewers: efriedma, javed.absar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50143 llvm-svn: 339166
*	[SLP] Fix PR38339: Instruction does not dominate all uses!	Alexey Bataev	2018-07-31	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If the ExtractElement instructions can be optimized out during the vectorization and we need to reshuffle the parent vector, this ShuffleInstruction may be inserted in the wrong place causing compiler to produce incorrect code. Reviewers: spatel, RKSimon, mkuper, hfinkel, javed.absar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49928 llvm-svn: 338380
*	[SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair	Simon Pilgrim	2018-06-22	1	-19/+15
\| \| \| \| \| \| \| \| \| \|	SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle. This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle. Differential Revision: https://reviews.llvm.org/D48477 llvm-svn: 335349
*	[CostModel][AArch64] Add some initial costs for SK_Select and ↵	Simon Pilgrim	2018-06-22	1	-83/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SK_PermuteSingleSrc AArch64 was only setting costs for SK_Transpose, which meant that many of the simpler shuffles (e.g. SK_Select and SK_PermuteSingleSrc for larger vector elements) was being severely overestimated by the default shuffle expansion. This patch adds costs to help improve SLP performance and avoid a regression in reductions introduced by D48174. I'm not very knowledgeable about AArch64 shuffle lowering so I've kept the extra costs to a minimum - someone who knows this code can add extra costs which should improve vectorization a lot more. Differential Revision: https://reviews.llvm.org/D48172 llvm-svn: 335329
*	[SLPVectorizer] Relax "alternate" opcode vectorisation to work with any ↵	Simon Pilgrim	2018-06-20	1	-16/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	SK_Select shuffle pattern D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand. This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation. The AArch64 regression will be fixed by D48172. Differential Revision: https://reviews.llvm.org/D48174 llvm-svn: 335130
*	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.	Shiva Chen	2018-05-09	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841
*	[SLP] Add additional test for transposable binary operations with reuse	Matthew Simpson	2018-05-01	1	-2/+44
\| \| \| \|	llvm-svn: 331274
*	[SLPVectorizer] Debug info shouldn't impact spill cost computation.	Davide Italiano	2018-04-30	1	-0/+93
\| \| \| \| \| \| \| \| \| \|	<rdar://problem/39794738> (Also, PR32761). Differential Revision: https://reviews.llvm.org/D46199 llvm-svn: 331199
*	[SLP] Add tests for transposable binary operations	Matthew Simpson	2018-04-26	1	-0/+292
\| \| \| \| \| \| \|	These test cases are vectorizable, but we are currently unable to vectorize them effectively. llvm-svn: 330945
*	[SLP] Use getExtractWithExtendCost() to compute the scalar cost of ↵	Haicheng Wu	2018-04-16	1	-16/+69
\| \| \| \| \| \| \| \| \| \| \| \| \|	extractelement/ext pair We use getExtractWithExtendCost to calculate the cost of extractelement and s\|zext together when computing the extract cost after vectorization, but we calculate the cost of extractelement and s\|zext separately when computing the scalar cost which is larger than it should be. Differential Revision: https://reviews.llvm.org/D45469 llvm-svn: 330143
*	[SLP] update a test case. NFC.	Haicheng Wu	2018-04-11	1	-15/+17
\| \| \| \|	llvm-svn: 329818
*	[SLP] Distinguish "demanded and shrinkable" from "demanded and not ↵	Haicheng Wu	2018-04-03	1	-11/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shrinkable" values when determining the minimum bitwidth We use two approaches for determining the minimum bitwidth. * Demanded bits * Value tracking If demanded bits doesn't result in a narrower type, we then try value tracking. We need this if we want to root SLP trees with the indices of getelementptr instructions since all the bits of the indices are demanded. But there is a missing piece though. We need to be able to distinguish "demanded and shrinkable" from "demanded and not shrinkable". For example, the bits of %i in %i = sext i32 %e1 to i64 %gep = getelementptr inbounds i64, i64* %p, i64 %i are demanded, but we can shrink %i's type to i32 because it won't change the result of the getelementptr. On the other hand, in %tmp15 = sext i32 %tmp14 to i64 %tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0 it doesn't make sense to shrink %tmp15 and we can skip the value tracking. Ideas are from Matthew Simpson! Differential Revision: https://reviews.llvm.org/D44868 llvm-svn: 329035
*	[SLP] Add more checks to a test case. NFC.	Haicheng Wu	2018-03-26	1	-0/+14
\| \| \| \|	llvm-svn: 328572
*	[SLP] Add a test case. NFC.	Haicheng Wu	2018-03-26	1	-0/+38
\| \| \| \|	llvm-svn: 328546
*	[SLP] Stop counting cost of gather sequences with multiple uses	Matthew Simpson	2018-03-23	1	-27/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When building the SLP tree, we look for reuse among the vectorized tree entries. However, each gather sequence is represented by a unique tree entry, even though the sequence may be identical to another one. This means, for example, that a gather sequence with two uses will be counted twice when computing the cost of the tree. We should only count the cost of the definition of a gather sequence rather than its uses. During code generation, the redundant gather sequences are emitted, but we optimize them away with CSE. So it looks like this problem just affects the cost model. Differential Revision: https://reviews.llvm.org/D44742 llvm-svn: 328316
*	[SLP] Add test case for a gather sequence with multiple uses	Matthew Simpson	2018-03-21	1	-0/+66
\| \| \| \|	llvm-svn: 328133
*	[AArch64] Implement getArithmeticReductionCost	Matthew Simpson	2018-03-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	This patch provides an implementation of getArithmeticReductionCost for AArch64. We can specialize the cost of add reductions since they are computed using the 'addv' instruction. Differential Revision: https://reviews.llvm.org/D44490 llvm-svn: 327702
*	[AArch64] add SLP test based on TSVC; NFC	Sanjay Patel	2018-02-27	1	-0/+127
\| \| \| \| \| \| \| \|	This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 326217
*	[AArch64] fix IR names to not be 'tmp' because that gives the CHECK script ↵	Sanjay Patel	2018-02-21	1	-40/+40
\| \| \| \| \| \|	problems llvm-svn: 325718
*	[AArch64] add SLP test for matmul (PR36280); NFC	Sanjay Patel	2018-02-21	1	-0/+139
\| \| \| \| \| \| \| \|	This is a slight reduction of one of the benchmarks that suffered with D43079. Cost model changes should not cause this test to remain scalarized. llvm-svn: 325717
*	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280)	Sanjay Patel	2018-02-21	1	-1/+1
\| \| \| \| \| \| \| \|	There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658
*	[TTI CostModel] change default cost of FP ops to 1 (PR36280)	Sanjay Patel	2018-02-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515
*	[SLP] Take user instructions cost into consideration in insertelement ↵	Alexey Bataev	2018-02-12	1	-84/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vectorization. Summary: For better vectorization result we should take into consideration the cost of the user insertelement instructions when we try to vectorize sequences that build the whole vector. I.e. if we have the following scalar code: ``` <Scalar code> insertelement <ScalarCode>, ... ``` we should consider the cost of the last `insertelement ` instructions as the cost of the scalar code. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42657 llvm-svn: 324893
*	[SLP] Add/update tests for SLP vectorizer, NFC.	Alexey Bataev	2018-01-10	1	-14/+299
\| \| \| \|	llvm-svn: 322225
*	[SLP] Added more missed optimization remarks	Adam Nemet	2017-11-15	2	-7/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Added more remarks to SLP pass, in particular "missed" optimization remarks. Also proposed several tests for new functionality. Patch by Vladimir Miloserdov! For reference you may look at: https://reviews.llvm.org/rL302811 Reviewers: anemet, fhahn Reviewed By: anemet Subscribers: javed.absar, lattner, petecoup, yakush, llvm-commits Differential Revision: https://reviews.llvm.org/D38367 llvm-svn: 318307
*	Keep Optimization Remark Yaml in NewPM	Sam Elliott	2017-08-20	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds. So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes. Fixes https://bugs.llvm.org/show_bug.cgi?id=33951 Reviewers: anemet, chandlerc Reviewed By: anemet, chandlerc Subscribers: javed.absar, chandlerc, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D36906 llvm-svn: 311271
*	[SLP] General improvements of SLP vectorization process.	Alexey Bataev	2017-08-07	1	-51/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310260
*	Revert "[SLP] General improvements of SLP vectorization process."	Alexey Bataev	2017-08-07	1	-49/+51
\| \| \| \| \| \|	This reverts commit r310255. llvm-svn: 310257
*	[SLP] General improvements of SLP vectorization process.	Alexey Bataev	2017-08-07	1	-51/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310255
*	[SLPVectorizer] Test update, NFC.	Alexey Bataev	2017-08-02	1	-30/+151
\| \| \| \|	llvm-svn: 309814
*	Re-commit r302678, fixing PR33053.	Amara Emerson	2017-05-16	1	-32/+8
\| \| \| \| \| \| \|	The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211
*	[SLP] Enable 64-bit wide vectorization on AArch64	Adam Nemet	2017-05-15	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116
*	Revert r302678 "[AArch64] Enable use of reduction intrinsics."	Hans Wennborg	2017-05-15	1	-8/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115
*	[SLP] Emit optimization remarks	Adam Nemet	2017-05-11	3	-4/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811
*	[AArch64] Enable use of reduction intrinsics.	Amara Emerson	2017-05-10	1	-32/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new experimental reduction intrinsics can now be used, so I'm enabling this for AArch64. We will need this for SVE anyway, so it makes sense to do this for NEON reductions as well. The existing code to match shufflevector patterns are replaced with a direct lowering of the reductions to AArch64-specific nodes. Tests updated with the new, simpler, representation. Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 302678
*	[SLP] Fix for PR32038: extra add of PHI node when it is not required.	Alexey Bataev	2017-03-01	1	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If horizontal reduction tree starts from the binary operation that is used in PHI node, but this PHI is not used in horizontal reduction, we may end up with extra addition of this PHI node after vectorization. Here is an example: ``` %phi = phi i32 [ %tmp, %end], ... ... %tmp = add i32 %tmp1, %tmp2 end: ``` after vectorization we always have something like: ``` %phi = phi i32 [ %tmp, %end], ... ... %red = extractelement <8 x 32> %vec.red, 0 %tmp = add i32 %red, %phi end: ``` even if `%phi` is not used in reduction tree. Patch considers these PHI nodes as extra arguments and considers them in the final result iff they really used in reduction. Reviewers: mkuper, hfinkel, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30409 llvm-svn: 296606
*	[SLP] A test for a fix of PR32038.	Alexey Bataev	2017-02-27	1	-0/+124
\| \| \| \|	llvm-svn: 296349