bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Fix separator in header comment	Tobias Grosser	2016-06-22	1	-1/+1
\| \| \| \| \| \| \|	This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in http://reviews.llvm.org/D21488 and was split out to increase readability. llvm-svn: 273437
*	clang-tidy: apply modern-use-nullptr fixes	Tobias Grosser	2016-06-22	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Instead of using 0 or NULL use the C++11 nullptr symbol when referencing null pointers. This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in http://reviews.llvm.org/D21488 and was split out to increase readability. llvm-svn: 273435
*	[NFC] Use isl_schedule_node_band_n_member to get the number of dimensions of ↵	Roman Gareev	2016-06-22	1	-9/+4
\| \| \| \| \| \|	a band node. llvm-svn: 273400
*	Apply all necessary tilings and unrollings to get a micro-kernel	Roman Gareev	2016-06-22	1	-6/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the first patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we create the BLIS micro-kernel by applying a combination of tiling and unrolling. In subsequent changes we will add the extraction of the BLIS macro-kernel and implement the packing transformation. Contributed-by: Roman Gareev <gareevroman@gmail.com> Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: http://reviews.llvm.org/D21140 llvm-svn: 273397
*	[NFC] Outline the application of register tiling.	Roman Gareev	2016-06-12	1	-7/+14
\| \| \| \|	llvm-svn: 272515
*	[NFC] "#include <ciso646>" is unnecessary, because "and", "or" were replaced	Roman Gareev	2016-06-08	1	-1/+0
\| \| \| \| \| \|	by "&&", "\|\|". llvm-svn: 272168
*	[NFC] Check that a parameter of ScheduleTreeOptimizer::isMatrMultPattern ↵	Roman Gareev	2016-06-04	1	-0/+3
\| \| \| \| \| \|	contains a correct partial schedule llvm-svn: 271780
*	[FIX] Fix potential issue related to subtraction from an unsigned 0 in ↵	Roman Gareev	2016-06-03	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \|	circularShiftOutputDims Reported-by: Mehdi Amini <mehdi.amini@apple.com> Contributed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: http://reviews.llvm.org/D20969 llvm-svn: 271705
*	[GSoC 2016] [Polly] [FIX] Determination of statements that contain matrix	Roman Gareev	2016-05-31	1	-7/+7
\| \| \| \| \| \| \| \| \| \|	multiplication Fix small issues related to characters, operators and descriptions of tests. Differential Revision: http://reviews.llvm.org/D20806 llvm-svn: 271264
*	Decouple SCoP building logic from pass	Johannes Doerfert	2016-05-31	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Created a new pass ScopInfoRegionPass. As name suggests, it is a region pass and it is there to preserve compatibility with our existing Polly passes. ScopInfoRegionPass will return a SCoP object for a valid region while the creation of the SCoP stays in the ScopInfo class. Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in> Reviewed-by: Tobias Grosser <tobias@grosser.es>, Johannes Doerfert <doerfert@cs.uni-saarland.de> Differential Revision: http://reviews.llvm.org/D20770 llvm-svn: 271259
*	MSVC compile fix: #include <ciso646>. NFC.	Michael Kruse	2016-05-30	1	-0/+1
\| \| \| \| \| \| \| \| \|	This header is required to make the ISO 646 alternative operator spellings ("and", "or" instead of "&&", "\|\|") work. Should these operators be replaced by the standard ones as already suggested by Johannes, also remove this #include again. llvm-svn: 271206
*	Determination of statements that contain matrix multiplication	Roman Gareev	2016-05-28	1	-5/+111
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add determination of statements that contain, in particular, matrix multiplications and can be optimized with [1] to try to get close-to-peak performance. It can be enabled via polly-pm-based-opts, which is false by default. Refs: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Contributed-by: Roman Gareev <gareevroman@gmail.com> Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: http://reviews.llvm.org/D20575 llvm-svn: 271128
*	[ScheduleOptimizer] Add -polly-opt-outer-coincidence option.	Michael Kruse	2016-05-02	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \|	Add a command line switch to set the isl_options_set_schedule_outer_coincidence option. ISL then tries to build schedules where the outer member of a band satisfies the coincidence constraints. In practice this allows loop skewing for more parallelism in inner loops. llvm-svn: 268222
*	Add __isl_give annotations to return types [NFC]	Johannes Doerfert	2016-04-09	1	-1/+1
\| \| \| \|	llvm-svn: 265882
*	Allow the client of DependenceInfo to obtain dependences at different ↵	Hongbin Zheng	2016-03-03	2	-3/+4
\| \| \| \| \| \|	granularities. llvm-svn: 262591
*	Adapt to LLVM head, again	Hongbin Zheng	2016-02-25	1	-1/+1
\| \| \| \|	llvm-svn: 261905
*	Revert "Adapt to LLVM head. NFC"	Hongbin Zheng	2016-02-25	1	-1/+1
\| \| \| \| \| \|	This reverts commit 4d3753b9646a69c00d234ccd6e91dc3d0ea5d643. llvm-svn: 261892
*	Adapt to LLVM head. NFC	Hongbin Zheng	2016-02-25	1	-1/+1
\| \| \| \|	llvm-svn: 261886
*	Annotation of SIMD loops	Roman Gareev	2016-02-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip parallelism checks. The buildbot shows the following compile/execution time changes: Compile time: Improvements Δ Previous Current σ …/gesummv -6.06% 0.2640 0.2480 0.0055 …/gemver -4.46% 0.4480 0.4280 0.0044 …/covariance -4.31% 0.8360 0.8000 0.0065 …/adi -3.23% 0.9920 0.9600 0.0065 …/doitgen -2.53% 0.9480 0.9240 0.0090 …/3mm -2.33% 1.0320 1.0080 0.0087 Execution time: Regressions Δ Previous Current σ …/viterbi 1.70% 5.1840 5.2720 0.0074 …/smallpt 1.06% 12.4920 12.6240 0.0040 Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: http://reviews.llvm.org/D14491 llvm-svn: 261620
*	Adjust formatting to clang-format changes in 256149	Tobias Grosser	2015-12-21	1	-1/+1
\| \| \| \|	llvm-svn: 256151
*	Full/partial tile separation for vectorization	Tobias Grosser	2015-10-20	1	-0/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We isolate full tiles from partial tiles to be able to, for example, vectorize loops with parametric lower and/or upper bounds. If we use -polly-vectorizer=stripmine, we can see execution-time improvements: correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to 0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from 8m5565s to 7m4285s (-13.18 %). The current full/partial tile separation increases compile-time more than necessary. As a result, we see in compile time regressions, for example, for 3mm from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected as we generate more IR and consequently more time is spent in the LLVM backends. However, a first investiagation has shown that a larger portion of compile time is unnecessarily spent inside Polly's parallelism detection and could be eliminated by propagating existing knowledge about vector loop parallelism. Before enabling -polly-vectorizer=stripmine by default, it is necessary to address this compile-time issue. Contributed-by: Roman Gareev <gareevroman@gmail.com> Reviewers: jdoerfert, grosser Subscribers: grosser, #polly Differential Revision: http://reviews.llvm.org/D13779 llvm-svn: 250809
*	Remove independent blocks pass	Johannes Doerfert	2015-10-18	1	-373/+0
\| \| \| \| \| \| \| \| \| \| \|	Polly can now be used as a analysis only tool as long as the code generation is disabled. However, we do not have an alternative to the independent blocks pass in place yet, though in the relevant cases this does not seem to impact the performance much. Nevertheless, a virtual alternative that allows the same transformations without changing the input region will follow shortly. llvm-svn: 250652
*	Sort includes using 'clang-format -sort-includes'	Tobias Grosser	2015-10-15	1	-2/+2
\| \| \| \|	llvm-svn: 250392
*	RegisterPasses: Optionally run inliner before Polly	Tobias Grosser	2015-10-12	1	-0/+13
\| \| \| \| \| \| \| \|	This will allow us to optimize C++ template code with Polly. This support is mostly for debugging purpose and individual experiments. The ultimate goal is still to run Polly later in the pass manager when inlining already happened. llvm-svn: 250092
*	[NFC] Move helper functions to ScopHelper	Johannes Doerfert	2015-10-09	2	-2/+0
\| \| \| \| \| \| \| \|	Helper functions in the BlockGenerators.h/cpp introduce dependences from the frontend to the backend of Polly. As they are used in ScopDetection, ScopInfo, etc. we move them to the ScopHelper file. llvm-svn: 249919
*	[NFC] Consistenly use commented and annotated ScopPass functions	Johannes Doerfert	2015-09-27	2	-6/+10
\| \| \| \| \| \| \| \| \| \| \|	The changes affect methods that are part of the Pass interface and include: - Comments that describe the methods purpose. - A consistent use of the keywords override and virtual. Additionally, the printScop method is now optional and removed from SCoP passes that do not implement it. llvm-svn: 248685
*	[NFC] Use releaseMemory to release internal memory	Johannes Doerfert	2015-09-27	1	-4/+2
\| \| \| \|	llvm-svn: 248684
*	[PM] Update Polly for the new AA infrastructure landed in r247167.	Chandler Carruth	2015-09-09	1	-0/+8
\| \| \| \|	llvm-svn: 247198
*	Make our data-locality schedule tree transforms externally accessible	Tobias Grosser	2015-08-24	1	-141/+48
\| \| \| \| \| \| \| \|	Other passes which perform different optimizations might be interested in also applying data-locality transformations as part of their overall transformation. llvm-svn: 245824
*	Use marker nodes to annotate the different levels of tiling	Tobias Grosser	2015-08-23	1	-7/+26
\| \| \| \| \| \| \|	Currently, marker nodes are ignored during AST generation, but visible in the -debug-only=polly-ast output. llvm-svn: 245809
*	Do really not unroll the vector loop in combination with register tiling	Tobias Grosser	2015-08-20	1	-3/+2
\| \| \| \| \| \| \|	The previous commit lacked a test case for register tiling + pre-vectorization and we obviously got it immediately wrong. llvm-svn: 245599
*	Add experimental support for trivial register tiling	Tobias Grosser	2015-08-20	1	-0/+30
\| \| \| \| \| \| \| \| \| \|	Register tiling in Polly is for now just an additional level of tiling which is fully unrolled. It is disabled by default. To make this useful for more than experiments, we still need a cost function as well as possibly further optimizations that teach LLVM to actually put some of the values we got into scalar registers. llvm-svn: 245564
*	Add support for two-level tiling	Tobias Grosser	2015-08-20	1	-14/+34
\| \| \| \| \| \| \| \| \| \|	By default we only use one level of tiling for loops, but in general tiling for multiple levels is trivial for us. Hence, we add a set of options that allow people to play with a second level of tiling. If this is profitable for some cases we can work on heuristics that allow us to identify these cases and use two-level tiling for them. llvm-svn: 245563
*	Factor out check for tileable band node.	Tobias Grosser	2015-08-20	1	-7/+27
\| \| \| \|	llvm-svn: 245559
*	Introduce tileBand function to simplify code	Tobias Grosser	2015-08-20	1	-19/+31
\| \| \| \|	llvm-svn: 245558
*	Add some forgotten isl memory annotations	Tobias Grosser	2015-08-20	1	-2/+3
\| \| \| \|	llvm-svn: 245557
*	Make prevectorization width configurable	Tobias Grosser	2015-08-19	1	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Polly uses 'prevectorization' to enable outer loop vectorization. When vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop iterations which are than interchanged to the innermost level such that LLVM's inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this loop. The number of loop iterations to strip-mine is now configurable with the option -polly-prevect-width=<number-of-prevec-dims>. This is mostly a debugging option. We should probably add a heuristic that derives the number of prevectorization dimensions from the target data and the data types used. llvm-svn: 245424
*	Do not use negative option name	Tobias Grosser	2015-08-19	1	-9/+5
\| \| \| \| \| \|	Instead of -polly-no-tiling, we use -polly-tiling=false to disable tiling. llvm-svn: 245423
*	Simplify tiling code a bit	Tobias Grosser	2015-08-19	1	-19/+15
\| \| \| \| \| \| \|	We only need to allocate the tile size vector if we actually want to perform a tiling. llvm-svn: 245422
*	Drop dead and disable code from IndependentBlocks	Tobias Grosser	2015-08-18	1	-161/+0
\| \| \| \| \| \| \|	Since Polly has now support for the code generation of scalar and PHI dependences this code was unused and is now dropped. llvm-svn: 245284
*	Fix Polly after SCEV port to new pass manager	Tobias Grosser	2015-08-17	2	-6/+6
\| \| \| \| \| \|	This fixes compilation after LLVM commit r245193. llvm-svn: 245211
*	AST Generation Paper published in TOPLAS	Tobias Grosser	2015-08-15	1	-7/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The July issue of TOPLAS contains a 50 page discussion of the AST generation techniques used in Polly. This discussion gives not only an in-depth description of how we (re)generate an imperative AST from our polyhedral based mathematical program description, but also gives interesting insights about: - Schedule trees: A tree-based mathematical program description that enables us to perform loop transformations on an abstract level, while issues like the generation of the correct loop structure and loop bounds will be taken care of by our AST generator. - Polyhedral unrolling: We discuss techniques that allow the unrolling of non-trivial loops in the context of parameteric loop bounds, complex tile shapes and conditionally executed statements. Such unrolling support enables the generation of predicated code e.g. in the context of GPGPU computing. - Isolation for full/partial tile separation: We discuss native support for handling full/partial tile separation and -- in general -- native support for isolation of boundary cases to enable smooth code generation for core computations. - AST generation with modulo constraints: We discuss how modulo mappings are lowered to efficient C/LLVM code. - User-defined constraint sets for run-time checks We discuss how arbitrary sets of constraints can be used to automatically create run-time checks that ensure a set of constrainst actually hold. This feature is very useful to verify at run-time various assumptions that have been taken program optimization. Polyhedral AST generation is more than scanning polyhedra Tobias Grosser, Sven Verdoolaege, Albert Cohen ACM Transations on Programming Languages and Systems (TOPLAS), 37(4), July 2015 llvm-svn: 245157
*	Remove leftover comment	Michael Kruse	2015-08-10	1	-1/+0
\| \| \| \| \| \| \|	The function to which this commit applies has been removed in a previous commit. llvm-svn: 244450
*	[Polly] Remove dead code in IndependentBlocks	Michael Kruse	2015-08-08	1	-29/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The splitExitBlock function is never called. Going to replace its functionality in successive patches that do not modify the IR. Reviewers: grosser Subscribers: pollydev Projects: #polly Differential Revision: http://reviews.llvm.org/D11865 llvm-svn: 244404
*	Rewrite getPrevectorMap using schedule trees operations	Tobias Grosser	2015-07-28	1	-82/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Schedule trees are a lot easier to work with, for both humans and machines. For humans the more structured schedule representation is easier to reason about. Together with the more abstract isl programming interface this can result in a lot cleaner code (see this changeset). For machines, the structured schedule and the fact that we now use explicit piecewise affine expressions instead of integer maps makes it easier to generate code from this schedule tree. As a result, we can already see a slight compile-time improvement -- for 3mm from 0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such as full-partial tile separation will most likely result in more streamlined code to be generated. Contributed-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 243458
*	Simplify some isl expression we use	Tobias Grosser	2015-07-26	1	-3/+1
\| \| \| \| \|	Suggested-by: Sven Verdoolaege <skimo-polly@kotnet.org> llvm-svn: 243254
*	Prevectorize the schedule of the band (or the point loop in case of tiling)	Tobias Grosser	2015-07-25	1	-10/+12
\| \| \| \| \|	Contributed-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 243214
*	Use schedule trees to represent execution order of statements	Tobias Grosser	2015-07-14	1	-31/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of flat schedules, we now use so-called schedule trees to represent the execution order of the statements in a SCoP. Schedule trees make it a lot easier to analyze, understand and modify properties of a schedule, as specific nodes in the tree can be choosen and possibly replaced. This patch does not yet fully move our DependenceInfo pass to schedule trees, as some additional performance analysis is needed here. (In general schedule trees should be faster in compile-time, as the more structured representation is generally easier to analyze and work with). We also can not yet perform the reduction analysis on schedule trees. For more information regarding schedule trees, please see Section 6 of https://lirias.kuleuven.be/handle/123456789/497238 llvm-svn: 242130
*	Remove code for scalar and PHI to array translation	Tobias Grosser	2015-06-26	2	-153/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes old code that has been disabled since several weeks and was hidden behind the flags -disable-polly-intra-scop-scalar-to-array=false and -polly-model-phi-nodes=false. Earlier, Polly used to translate scalars and PHI nodes to single element arrays, as this avoided the need for their special handling in Polly. With Johannes' patches adding native support for such scalar references to Polly, this code is not needed any more. After this commit both -polly-prepare and -polly-independent are now mostly no-ops. Only a couple of simple transformations still remain, but they are scheduled for removal too. Thanks again to Johannes Doerfert for his nice work in making all this code obsolete. llvm-svn: 240766
*	Update ISL to isl-0.15-3-g532568a	Michael Kruse	2015-06-18	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This version adds small integer optimization, but is not active by default. It will be enabled in a later commit. The schedule-fuse=min/max option has been replaced by the serialize-sccs option. Adapting Polly was necessary, but retaining the name polly-opt-fusion=min/max. Differential Revision: http://reviews.llvm.org/D10505 Reviewers: grosser llvm-svn: 240027