bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	Fix debug info metadata for upstream change in LLVM.	Adrian Prantl	2016-12-20	1	-1/+1
\| \| \| \|	llvm-svn: 290154
*	Revert "Fix debug info metadata for upstream change in LLVM."	Adrian Prantl	2016-12-16	1	-1/+1
\| \| \| \|	llvm-svn: 289983
*	Fix debug info metadata for upstream change in LLVM.	Adrian Prantl	2016-12-16	1	-1/+1
\| \| \| \|	llvm-svn: 289953
*	Restrict ranges of extension maps	Roman Gareev	2016-12-15	1	-0/+127
\| \| \| \| \| \| \| \| \| \| \|	To prevent copy statements from accessing arrays out of bounds, ranges of their extension maps are restricted, according to the constraints of domains. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D25655 llvm-svn: 289815
*	The order of the loops defines the data reused in the BLIS implementation of	Roman Gareev	2016-12-15	2	-125/+134
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	gemm ([1]). In particular, elements of the matrix B, the second operand of matrix multiplication, are reused between iterations of the innermost loop. To keep the reused data in cache, only elements of matrix A, the first operand of matrix multiplication, should be evicted during an iteration of the innermost loop. To provide such a cache replacement policy, elements of the matrix A can, in particular, be loaded first and, consequently, be least-recently-used. In our case matrices are stored in row-major order instead of column-major order used in the BLIS implementation ([1]). One of the ways to address it is to accordingly change the order of the loops of the loop nest. However, it makes elements of the matrix A to be reused in the innermost loop and, consequently, requires to load elements of the matrix B first. Since the LLVM vectorizer always generates loads from the matrix A before loads from the matrix B and we can not provide it. Consequently, we only change the BLIS micro kernel and the computation of its parameters instead. In particular, reused elements of the matrix B are successively multiplied by specific elements of the matrix A . Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D25653 llvm-svn: 289806
*	[ScopInfo] Fold constant coefficients in array dimensions to the right	Tobias Grosser	2016-12-02	2	-1/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows us to delinearize code such as the one below, where the array sizes are A[][2 * n] as there are n times two elements in the innermost dimension. Alternatively, we could try to generate another dimension for the struct in the innermost dimension, but as the struct has constant size, recovering this dimension is easy. struct com { double Real; double Img; }; void foo(long n, struct com A[][n]) { for (long i = 0; i < 100; i++) for (long j = 0; j < 1000; j++) A[i][j].Real += A[i][j].Img; } int main() { struct com A[100][1000]; foo(1000, A); llvm-svn: 288489
*	[FIX] Do not try to hoist obviously overwritten loads	Johannes Doerfert	2016-12-01	1	-4/+5
\| \| \| \|	llvm-svn: 288328
*	[DeLICM] Add pass boilerplate code.	Michael Kruse	2016-11-29	1	-0/+36
\| \| \| \| \| \| \| \| \| \|	Add an empty DeLICM pass, without any functional parts. Extracting the boilerplate from the the functional part reduces the size of the code to review (https://reviews.llvm.org/D24716) Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 288160
*	[ScopDetect] Expand statistics of the detected scops	Tobias Grosser	2016-11-26	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now collect: Number of total loops Number of loops in scops Number of scops Number of scops with maximal loop depth 1 Number of scops with maximal loop depth 2 Number of scops with maximal loop depth 3 Number of scops with maximal loop depth 4 Number of scops with maximal loop depth 5 Number of scops with maximal loop depth 6 and larger Number of loops in scops (profitable scops only) Number of scops (profitable scops only) Number of scops with maximal loop depth 1 (profitable scops only) Number of scops with maximal loop depth 2 (profitable scops only) Number of scops with maximal loop depth 3 (profitable scops only) Number of scops with maximal loop depth 4 (profitable scops only) Number of scops with maximal loop depth 5 (profitable scops only) Number of scops with maximal loop depth 6 and larger (profitable scops only) These statistics are certainly completely accurate as we might drop scops when building up their polyhedral representation, but they should give a good indication of the number of scops we detect. llvm-svn: 287973
*	[ScopDetectionDiagnostic] Collect statistics for each diagnostic type	Tobias Grosser	2016-11-26	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	Our original statistics were added before we introduced a more fine-grained diagnostic system, but the granularity of our statistics has never been increased accordingly. This change introduces now one statistic counter per diagnostic to enable us to collect fine-grained statistics about who certain scops are not detected. In case coarser grained statistics are needed, the user is expected to combine counters manually. llvm-svn: 287968
*	[CodeGen] Add flag to code-generate most memory access expressions	Tobias Grosser	2016-11-22	1	-0/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce the new flag -polly-codegen-generate-expressions which forces Polly to code generate AST expressions instead of using our SCEV based access expression generation even for cases where the original memory access relation was not changed and the SCEV based access expression could be code generated without any issue. This is an experimental option for better testing the isl ast expression generation. The default behavior of Polly remains unchanged. We also exclude a couple of cases for which the AST expression is not yet working. llvm-svn: 287694
*	[test] Simplify test case by removing unreferenced instructions [NFC]	Tobias Grosser	2016-11-22	1	-4/+0
\| \| \| \| \| \| \| \| \|	Drop instructions that do not influence the memory impact of a basic block. They are not needed to reproduce the original bug (verified) and will cause random test noise if we would decide to only model the instructions that have visible side-effects. llvm-svn: 287626
*	[test] Ensure important basic blocks in test case have side effects	Tobias Grosser	2016-11-22	1	-4/+36
\| \| \| \| \| \| \| \| \| \|	Add two store instructions at the end of basic blocks that are required to reproduce the original bug to ensure we always process and model these basic blocks. This makes this test case stable even in case we would decide to bail out early of basic blocks which do not modify the global state. Also add additional check lines to verify how we model the basic block. llvm-svn: 287625
*	test: add more details to non-affine test case	Tobias Grosser	2016-11-22	1	-6/+35
\| \| \| \| \| \| \| \| \| \| \|	We add CHECK lines to this test case to make it easier to see the difference between affine and non-affine memory accesses. We also change the test case to use a parameteric index expression as otherwise our range analysis will understand that the non-affine memory access can only access input[1], which makes it difficult to see that the memory access is in-fact modeled as non-affine access. llvm-svn: 287623
*	Probably overwritten loads should not be considered hoistable	Johannes Doerfert	2016-11-17	1	-1/+1
\| \| \| \| \| \| \| \|	Do not assume a load to be hoistable/invariant if the pointer is used by another instruction in the SCoP that might write to memory and that is always executed. llvm-svn: 287272
*	[tests] Adjust test output to recent changed SCEV canonocalization [NFC]	Tobias Grosser	2016-11-13	1	-1/+1
\| \| \| \| \| \| \|	LLVM recently changed the SCEV canonicalization which changed the output of one of our GPGPU test cases. llvm-svn: 286770
*	[ScopDetect] Evaluate and verify branches at branch condition, not icmp	Tobias Grosser	2016-11-13	3	-0/+187
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The validity of a branch condition must be verified at the location of the branch (the branch instruction), not the location of the icmp that is used in the branch instruction. When verifying at the wrong location, we may accept an icmp that is defined within a loop which itself dominates, but does not contain the branch instruction. Such loops cannot be modeled as we only introduce domain dimensions for surrounding loops. To address this problem we change the scop detection to evaluate and verify SCEV expressions at the right location. This issue has been around since at least r179148 "scop detection: properly instantiate SCEVs to the place where they are used", where we explicitly set the scope to the wrong location. Before this commit the scope was not explicitly set, which probably also resulted in the scope around the ICmp to be choosen. This resolves http://llvm.org/PR30989 Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286769
*	SCEVAffinator: pass parameter-only set to addRestriction if BB=nullptr	Tobias Grosser	2016-11-10	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Assumptions can either be added for a given basic block, in which case the set describing the assumptions is expected to match the dimensions of its domain. In case no basic block is provided a parameter-only set is expected to describe the assumption. The piecewise expressions that are generated by the SCEVAffinator sometimes have a zero-dimensional domain (e.g., [p] -> { [] : p <= -129 or p >= 128 }), which looks similar to a parameter-only domain, but is still a set domain. This change adds an assert that checks that we always pass parameter domains to addAssumptions if BB is empty to make mismatches here fail early. We also change visitTruncExpr to always convert to parameter sets, if BB is null. This change resolves http://llvm.org/PR30941 Another alternative to this change would have been to inspect all code to make sure we directly generate in the SCEV affinator parameter sets in case of empty domains. However, this would likely complicate the code which combines parameter and non-parameter domains when constructing a statement domain. We might still consider doing this at some point, but as this likely requires several non-local changes this should probably be done as a separate refactoring. Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286444
*	IslAst: always use the context during ast generation	Tobias Grosser	2016-11-10	4	-18/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Providing the context to the ast generator allows for additional simplifcations and -- more importantly -- allows to generate loops with only partially bounded domains, assuming the domains are bounded for all parameter configurations that are valid as defined by the context. This change fixes the crash reported in http://llvm.org/PR30956 The original reason why we did not include the context when generating an AST was that CLooG and later isl used to sometimes transfer some of the constraints that bound the size of parameters from the context into the generated AST. This resulted in operations with very large constants, which sometimes introduced problematic integer overflows. The latest versions of the isl AST generator are careful to not introduce such constants. Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286442
*	SCEVValidator: add new parameters resulting from constant extraction	Tobias Grosser	2016-11-10	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When extracting constant expressions out of SCEVs, new parameters may be introduced, which have not been registered before. This change scans SCEV expressions after constant extraction again to make sure newly introduced parameters are registered. We may for example extract the constant '8' from the expression '((8 * ((%a * %b) + %c)) + (-8 * %a))' and obtain the expression '(((-1 + %b) * %a) + %c)'. The new expression has a new parameter '(-1 + %b) * %a)', which was not registered before, but must be registered to not crash. This closes http://llvm.org/PR30953 Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286430
*	Do not allow switch statements in loop latches	Tobias Grosser	2016-11-10	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In r248701 "Allow switch instructions in SCoPs" support for switch statements has been introduced, but support for switch statements in loop latches was incomplete. This change completely disables switch statements in loop latches. The original commit changed addLoopBoundsToHeaderDomain to support non-branch terminator instructions, but this change was incorrect: it added a check for BI != null to the if-branch of a condition, but BI was used in the else branch es well. As a result, when a non-branch terminator instruction is encounted a nullptr dereference is triggered. Due to missing test coverage, this bug was overlooked. r249273 "[FIX] Approximate non-affine loops correctly" added code to disallow switch statements for non-affine loops, if they appear in either a loop latch or a loop exit. We adapt this code to now prohibit switch statements in loop latches even if the control condition is affine. We could possibly add support for switch statements in loop latches, but such support should be evaluated and tested separately. This fixes llvm.org/PR30952 Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286426
*	[ScopInfo] Make memset etc. affine where possible.	Eli Friedman	2016-11-01	1	-13/+61
\| \| \| \| \| \| \| \| \|	We don't actually check whether a MemoryAccess is affine in very many places, but one important one is in checks for aliasing. Differential Revision: https://reviews.llvm.org/D25706 llvm-svn: 285746
*	Add missing test from r284848.	Eli Friedman	2016-11-01	1	-0/+49
\| \| \| \| \| \| \|	Original commit title: [SCEVAffinator] Make precise modular math more correct. llvm-svn: 285745
*	[ScopInfo] Fix: use raw source pointer.	Michael Kruse	2016-10-25	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \|	When adding an llvm.memcpy instruction to AliasSetTracker, it uses the raw source and target pointers which preserve bitcasts. MemAccInst::getPointerOperand() also returns the raw target pointers, but Scop::buildAliasGroups() did not for the source pointer. This lead to mismatches between AliasSetTracker and ScopInfo on which pointer to use. Fixed by also using raw pointers in Scop::buildAliasGroups(). llvm-svn: 285071
*	[SCEVAffinator] Make precise modular math more correct.	Eli Friedman	2016-10-21	5	-15/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Integer math in LLVM IR is modular. Integer math in isl is arbitrary-precision. Modeling LLVM IR math correctly in isl requires either adding assumptions that math doesn't actually overflow, or explicitly wrapping the math. However, expressions with the "nsw" flag are special; we can pretend they're arbitrary-precision because it's undefined behavior if the result wraps. SCEV expressions based on IR instructions with an nsw flag also carry an nsw flag (roughly; actually, the real rule is a bit more complicated, but the details don't matter here). Before this patch, SCEV flags were also overloaded with an additional function: the ZExt code was mutating SCEV expressions as a hack to indicate to checkForWrapping that we don't need to add assumptions to the operand of a ZExt; it'll add explicit wrapping itself. This kind of works... the problem is that if anything else ever touches that SCEV expression, it'll get confused by the incorrect flags. Instead, with this patch, we make the decision about whether to explicitly wrap the math a bit earlier, basing the decision purely on the SCEV expression itself, and not its users. Differential Revision: https://reviews.llvm.org/D25287 llvm-svn: 284848
*	[test] Fix buildbot after SCEV change.	Michael Kruse	2016-10-18	1	-1/+1
\| \| \| \| \| \| \| \|	Update test after commit r284501: [SCEV] Make CompareValueComplexity a little bit smarter Contributed-by: Sanjoy Das <sanjoy@playingwithpointers.com> llvm-svn: 284543
*	Handle multi-dimensional invariant load.	Eli Friedman	2016-10-17	1	-0/+54
\| \| \| \| \| \| \|	If the address of a load depends on another load, make sure to emit the loads in the right order. llvm-svn: 284426
*	[ScopDetect] Depend transitively on ScalarEvolution.	Michael Kruse	2016-10-17	1	-0/+47
\| \| \| \| \| \| \|	ScopDetection might be queried by -dot-scops or -view-scops passes for which it accesses ScalarEvolution. llvm-svn: 284385
*	[test] Add missing colon.	Michael Kruse	2016-10-16	1	-1/+1
\| \| \| \|	llvm-svn: 284349
*	[cmake] Add polly-isl-test dependency to lit tests.	Michael Kruse	2016-10-16	1	-1/+1
\| \| \| \| \| \|	Also handle the in-llvm-tree case forgotten in r284339. llvm-svn: 284347
*	[cmake] Add polly-isl-test dependency to lit tests.	Michael Kruse	2016-10-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	lit recursively iterates through the test subdirectories and finds the ISL unittest. For this test to work, the polly-isl-test executable needs to be compiled. Add the polly-isl-test dependency to POLLY_TEST_DEPS. This makes check-polly and check-polly-tests work from a fresh build directory. llvm-svn: 284339
*	[test] Add -polly-unprofitable-scalar-accs to test that needs it.	Michael Kruse	2016-10-16	1	-0/+1
\| \| \| \| \| \| \| \|	The test non_affine_loop_used_later.ll also tests the profability heuristic. Add the option -polly-unprofitable-scalar-accs explicitely to ensure that the test succeeds if the default value is changed. llvm-svn: 284338
*	[ScopInfo/CodeGen] ExitPHI reads are implicit.	Michael Kruse	2016-10-12	2	-2/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under some conditions MK_Value read accessed where converted to MK_ExitPHI read accessed. This is unexpected because MK_ExitPHI read accesses are implicit after the scop execution. This behaviour was introduced in r265261, which fixed a failed assertion/crash in CodeGen. Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(), despite its name, also handles accesses of kind MK_Value, only to skip them because they access values that are usually not PHI nodes in the SCoP region's exit block. Except in the situation observed in r265261. Do not convert value accessed to ExitPHI accesses and do not handle value accesses like ExitPHI accessed in CodeGen anymore. llvm-svn: 284023
*	[cmake] Move isl_test artifacts to Polly folder.	Michael Kruse	2016-10-07	1	-0/+1
\| \| \| \| \| \| \|	Folders in Visual Studio solutions help organize the build artifacts from all LLVM projects. There is a folder to keep Polly-built files in. llvm-svn: 283546
*	Build and run isl_test as part of check-polly	Tobias Grosser	2016-10-04	4	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Running isl tests is important to gain confidence that the isl build we created works as expected. Besides the actual isl tests, there are also isl AST generation tests shipped with isl. This change only adds support for the isl unit tests. AST generation test support is left for a later commit. There is a choice to run tests directly through the build system or in the context of lit. We choose to run tests as part of lit to as this allows us to easily set environment variables, print output only on error and generally run the tests directly from the lit command. Reviewers: brad.king, Meinersbur Subscribers: modocache, brad.king, pollydev, beanz, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D25155 llvm-svn: 283245
*	[ScopInfo] Add -polly-unprofitable-scalar-accs option.	Michael Kruse	2016-10-04	1	-0/+97
\| \| \| \| \| \| \| \| \|	With this option one can disable the heuristic that assumes that statements with a scalar write access cannot be profitably optimized. Such a statement instances necessarily have WAW-dependences to itself. With DeLICM scalar accesses can be changed to array accesses, which can avoid these WAW-dependence. llvm-svn: 283233
*	[ScopInfo] Scalar access do not have indirect base pointers.	Michael Kruse	2016-10-04	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ScopArrayInfo used to determine base pointer origins by looking up whether the base pointer is a load. The "base pointer" for scalar accesses is the llvm::Value being accessed. This is only a symbolic base pointer, it represents the alloca variable (.s2a or .phiops) generated for it at code generation. This patch disables determining base pointer origin for scalars. A test case where this caused a crash will be added in the next commit. In that test SAI tried to get the origin base pointer that was only declared later, therefore not existing. This is probably only possible for scalars used in PHINode incoming blocks. llvm-svn: 283232
*	[ScopDetection] Remove redundant checks for endless loops	Tobias Grosser	2016-09-20	1	-15/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Both `canUseISLTripCount()` and `addOverApproximatedRegion()` contained checks to reject endless loops which are now removed and replaced by a single check in `isValidLoop()`. For reporting such loops the `ReportLoopOverlapWithNonAffineSubRegion` is renamed to `ReportLoopHasNoExit`. The test case `ReportLoopOverlapWithNonAffineSubRegion.ll` is adapted and renamed as well. The schedule generation in `buildSchedule()` is based on the following assumption: Given some block B that is contained in a loop L and a SESE region R, we assume that L is contained in R or the other way around. However, this assumption is broken in the presence of endless loops that are nested inside other loops. Therefore, in order to prevent erroneous behavior in `buildSchedule()`, r265280 introduced a corresponding check in `canUseISLTripCount()` to reject endless loops. Unfortunately, it was possible to bypass this check with -polly-allow-nonaffine-loops which was fixed by adding another check to reject endless loops in `allowOverApproximatedRegion()` in r273905. Hence there existed two separate locations that handled this case. Thank you Johannes Doerfert for helping to provide the above background information. Reviewers: Meinersbur, grosser Subscribers: _jdoerfert, pollydev Differential Revision: https://reviews.llvm.org/D24560 Contributed-by: Matthias Reisinger <d412vv1n@gmail.com> llvm-svn: 281987
*	Fix spelling in CMakeLists	Tobias Grosser	2016-09-19	1	-1/+1
\| \| \| \|	llvm-svn: 281897
*	GPGPU: add missing REQUIRES line to test case	Tobias Grosser	2016-09-18	1	-2/+3
\| \| \| \|	llvm-svn: 281850
*	GPGPU: Do not run mostly sequential kernels in GPU	Tobias Grosser	2016-09-18	1	-0/+112
\| \| \| \| \| \| \| \|	In case sequential kernels are found deeper in the loop tree than any parallel kernel, the overall scop is probably mostly sequential. Hence, run it on the CPU. llvm-svn: 281849
*	GPGPU: Dynamically ensure 'sufficient compute'	Tobias Grosser	2016-09-18	1	-2/+22
\| \| \| \| \| \| \| \| \| \| \| \| \|	Offloading to a GPU is only beneficial if there is a sufficient amount of compute that can be accelerated. Many kernels just have a very small number of dynamic compute, which means GPU acceleration is not beneficial. We compute at run-time an approximation of how many dynamic instructions will be executed and fall back to CPU code in case this number is not sufficiently large. To keep the run-time checking code simple, we over-approximate the number of instructions executed in each statement by computing the volume of the rectangular hull of its iteration space. llvm-svn: 281848
*	GPGPU: Make test cases independent of register numbering [NFC]	Tobias Grosser	2016-09-18	3	-13/+13
\| \| \| \|	llvm-svn: 281847
*	GPGPU: Store back non-read-only scalars	Tobias Grosser	2016-09-17	1	-0/+176
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We may generate GPU kernels that store into scalars in case we run some sequential code on the GPU because the remaining data is expected to already be on the GPU. For these kernels it is important to not keep the scalar values in thread-local registers, but to store them back to the corresponding device memory objects that backs them up. We currently only store scalars back at the end of a kernel. This is only correct if precisely one thread is executed. In case more than one thread may be run, we currently invalidate the scop. To support such cases correctly, we would need to always load and store back from a corresponding global memory slot instead of a thread-local alloca slot. llvm-svn: 281838
*	GPGPU: Detect read-only scalar arrays ...	Tobias Grosser	2016-09-17	6	-81/+60
\| \| \| \| \| \|	and pass these by value rather than by reference. llvm-svn: 281837
*	GPGPU: Do not assume arrays start at 0	Tobias Grosser	2016-09-15	2	-6/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our alias checks precisely check that the minimal and maximal accessed elements do not overlap in a kernel. Hence, we must ensure that our host <-> device transfers do not touch additional memory locations that are not covered in the alias check. To ensure this, we make sure that the data we copy for a given array is only the data from the smallest element accessed to the largest element accessed. We also adjust the size of the array according to the offset at which the array is actually accessed. An interesting result of this is: In case array are accessed with negative subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to cover the full array. This is important as such code indeed exists in the wild. llvm-svn: 281611
*	Perform copying to created arrays according to the packing transformation	Roman Gareev	2016-09-14	1	-9/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the fourth patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we perform copying to created arrays, which is the last step to implement the packing transformation. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D23260 llvm-svn: 281441
*	GPGPU: Allow region statements	Tobias Grosser	2016-09-13	1	-0/+86
\| \| \| \|	llvm-svn: 281305
*	GPGPU: Extend types when array sizes have smaller types	Tobias Grosser	2016-09-13	1	-0/+65
\| \| \| \| \| \|	This prevents a compiler crash. llvm-svn: 281303
*	Adapt test case to recent change in Global Variable Definition	Tobias Grosser	2016-09-13	1	-1/+1
\| \| \| \|	llvm-svn: 281295