diff options
author | Roman Gareev <gareevroman@gmail.com> | 2016-12-15 11:47:38 +0000 |
---|---|---|
committer | Roman Gareev <gareevroman@gmail.com> | 2016-12-15 11:47:38 +0000 |
commit | 8babe1a21609933d149a61bd5ff43c26251d8590 (patch) | |
tree | e322993fbbed4670a26ce7e127f2eccd598dfabf | |
parent | 552c8e960e9e88d3c96a467df2e37efa27eff7d2 (diff) | |
download | bcm5719-llvm-8babe1a21609933d149a61bd5ff43c26251d8590.tar.gz bcm5719-llvm-8babe1a21609933d149a61bd5ff43c26251d8590.zip |
The order of the loops defines the data reused in the BLIS implementation of
gemm ([1]). In particular, elements of the matrix B, the second operand of
matrix multiplication, are reused between iterations of the innermost loop.
To keep the reused data in cache, only elements of matrix A, the first operand
of matrix multiplication, should be evicted during an iteration of the
innermost loop. To provide such a cache replacement policy, elements of the
matrix A can, in particular, be loaded first and, consequently, be
least-recently-used.
In our case matrices are stored in row-major order instead of column-major
order used in the BLIS implementation ([1]). One of the ways to address it is
to accordingly change the order of the loops of the loop nest. However, it
makes elements of the matrix A to be reused in the innermost loop and,
consequently, requires to load elements of the matrix B first. Since the LLVM
vectorizer always generates loads from the matrix A before loads from the
matrix B and we can not provide it. Consequently, we only change the BLIS micro
kernel and the computation of its parameters instead. In particular, reused
elements of the matrix B are successively multiplied by specific elements of
the matrix A .
Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D25653
llvm-svn: 289806
-rw-r--r-- | polly/include/polly/ScheduleOptimizer.h | 25 | ||||
-rw-r--r-- | polly/lib/Transform/ScheduleOptimizer.cpp | 60 | ||||
-rw-r--r-- | polly/test/ScheduleOptimizer/mat_mul_pattern_data_layout.ll | 50 | ||||
-rw-r--r-- | polly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll | 209 |
4 files changed, 184 insertions, 160 deletions
diff --git a/polly/include/polly/ScheduleOptimizer.h b/polly/include/polly/ScheduleOptimizer.h index 9ce0739f010..2307288590a 100644 --- a/polly/include/polly/ScheduleOptimizer.h +++ b/polly/include/polly/ScheduleOptimizer.h @@ -134,16 +134,29 @@ private: /// a micro-kernel. The micro-kernel is a loop around a rank-1 /// (i.e., outer product) update. /// - /// For a detailed description please see: - /// Analytical Modeling is Enough for High Performance BLIS + /// For a detailed description please see [1]. + /// + /// The order of the loops defines the data reused in the BLIS implementation + /// of gemm ([1]). In particular, elements of the matrix B, the second + /// operand of matrix multiplication, are reused between iterations of the + /// innermost loop. To keep the reused data in cache, only elements of matrix + /// A, the first operand of matrix multiplication, should be evicted during + /// an iteration of the innermost loop. To provide such a cache replacement + /// policy, elements of the matrix A can, in particular, be loaded first and, + /// consequently, be least-recently-used. + /// + /// In our case matrices are stored in row-major order instead of + /// column-major order used in the BLIS implementation ([1]). It affects only + /// on the form of the BLIS micro kernel and the computation of its + /// parameters. In particular, reused elements of the matrix B are + /// successively multiplied by specific elements of the matrix A. + /// + /// Refs.: + /// [1] - Analytical Modeling is Enough for High Performance BLIS /// Tze Meng Low, Francisco D Igual, Tyler M Smith, Enrique S Quintana-Orti /// Technical Report, 2014 /// http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf /// - /// In our case matrices are stored in row-major order, which is taken into - /// account during the creation of the BLIS kernels and the computation - /// of their parameters. - /// /// @see ScheduleTreeOptimizer::createMicroKernel /// @see ScheduleTreeOptimizer::createMacroKernel /// @see getMicroKernelParams diff --git a/polly/lib/Transform/ScheduleOptimizer.cpp b/polly/lib/Transform/ScheduleOptimizer.cpp index 25ab166f4aa..b5145e8cb4b 100644 --- a/polly/lib/Transform/ScheduleOptimizer.cpp +++ b/polly/lib/Transform/ScheduleOptimizer.cpp @@ -538,8 +538,10 @@ permuteBandNodeDimensions(__isl_take isl_schedule_node *Node, unsigned FirstDim, __isl_give isl_schedule_node *ScheduleTreeOptimizer::createMicroKernel( __isl_take isl_schedule_node *Node, MicroKernelParamsTy MicroKernelParams) { - return applyRegisterTiling(Node, {MicroKernelParams.Mr, MicroKernelParams.Nr}, - 1); + applyRegisterTiling(Node, {MicroKernelParams.Mr, MicroKernelParams.Nr}, 1); + Node = isl_schedule_node_parent(isl_schedule_node_parent(Node)); + Node = permuteBandNodeDimensions(Node, 0, 1); + return isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0); } __isl_give isl_schedule_node *ScheduleTreeOptimizer::createMacroKernel( @@ -553,6 +555,7 @@ __isl_give isl_schedule_node *ScheduleTreeOptimizer::createMacroKernel( {MacroKernelParams.Mc, MacroKernelParams.Nc, MacroKernelParams.Kc}, 1); Node = isl_schedule_node_parent(isl_schedule_node_parent(Node)); Node = permuteBandNodeDimensions(Node, 1, 2); + Node = permuteBandNodeDimensions(Node, 0, 2); return isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0); } @@ -609,18 +612,15 @@ getMacroKernelParams(const MicroKernelParamsTy &MicroKernelParams) { return {1, 1, 1}; int Cbr = floor( (CacheLevelAssociativity[0] - 1) / - (1 + static_cast<double>(MicroKernelParams.Mr) / MicroKernelParams.Nr)); + (1 + static_cast<double>(MicroKernelParams.Nr) / MicroKernelParams.Mr)); int Kc = (Cbr * CacheLevelSizes[0]) / - (MicroKernelParams.Nr * CacheLevelAssociativity[0] * 8); - double Cac = static_cast<double>(MicroKernelParams.Mr * Kc * 8 * - CacheLevelAssociativity[1]) / + (MicroKernelParams.Mr * CacheLevelAssociativity[0] * 8); + double Cac = static_cast<double>(Kc * 8 * CacheLevelAssociativity[1]) / CacheLevelSizes[1]; - double Cbc = static_cast<double>(MicroKernelParams.Nr * Kc * 8 * - CacheLevelAssociativity[1]) / + double Cbc = static_cast<double>(Kc * 8 * CacheLevelAssociativity[1]) / CacheLevelSizes[1]; - int Mc = floor(MicroKernelParams.Mr / Cac); - int Nc = - floor((MicroKernelParams.Nr * (CacheLevelAssociativity[1] - 2)) / Cbc); + int Mc = floor((CacheLevelAssociativity[1] - 2) / Cac); + int Nc = floor(1 / Cbc); return {Mc, Nc, Kc}; } @@ -867,36 +867,38 @@ static __isl_give isl_schedule_node *optimizeDataLayoutMatrMulPattern( Node = isl_schedule_node_parent(Node); Node = isl_schedule_node_child(isl_schedule_node_band_split(Node, 2), 0); auto *AccRel = - getMatMulAccRel(isl_map_copy(MapOldIndVar), MacroParams.Kc, 3, 6); - unsigned FirstDimSize = MacroParams.Mc * MacroParams.Kc / MicroParams.Mr; - unsigned SecondDimSize = MicroParams.Mr; + getMatMulAccRel(isl_map_copy(MapOldIndVar), MacroParams.Kc, 3, 7); + unsigned FirstDimSize = MacroParams.Nc * MacroParams.Kc / MicroParams.Nr; + unsigned SecondDimSize = MicroParams.Nr; auto *SAI = Stmt->getParent()->createScopArrayInfo( - MemAccessA->getElementType(), "Packed_A", {FirstDimSize, SecondDimSize}); + MemAccessB->getElementType(), "Packed_B", {FirstDimSize, SecondDimSize}); AccRel = isl_map_set_tuple_id(AccRel, isl_dim_out, SAI->getBasePtrId()); - auto *OldAcc = MemAccessA->getAccessRelation(); - MemAccessA->setNewAccessRelation(AccRel); + auto *OldAcc = MemAccessB->getAccessRelation(); + MemAccessB->setNewAccessRelation(AccRel); auto *ExtMap = - getMatMulExt(Stmt->getIslCtx(), MacroParams.Mc, 0, MacroParams.Kc); - ExtMap = isl_map_project_out(ExtMap, isl_dim_in, 1, 1); + getMatMulExt(Stmt->getIslCtx(), 0, MacroParams.Nc, MacroParams.Kc); + isl_map_move_dims(ExtMap, isl_dim_out, 0, isl_dim_in, 0, 1); + isl_map_move_dims(ExtMap, isl_dim_in, 2, isl_dim_out, 0, 1); + ExtMap = isl_map_project_out(ExtMap, isl_dim_in, 2, 1); auto *Domain = Stmt->getDomain(); auto *NewStmt = Stmt->getParent()->addScopStmt( - OldAcc, MemAccessA->getAccessRelation(), isl_set_copy(Domain)); + OldAcc, MemAccessB->getAccessRelation(), isl_set_copy(Domain)); ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, NewStmt->getDomainId()); Node = createExtensionNode(Node, ExtMap); Node = isl_schedule_node_child(Node, 0); - AccRel = getMatMulAccRel(MapOldIndVar, MacroParams.Kc, 4, 7); - FirstDimSize = MacroParams.Nc * MacroParams.Kc / MicroParams.Nr; - SecondDimSize = MicroParams.Nr; + AccRel = getMatMulAccRel(MapOldIndVar, MacroParams.Kc, 4, 6); + FirstDimSize = MacroParams.Mc * MacroParams.Kc / MicroParams.Mr; + SecondDimSize = MicroParams.Mr; SAI = Stmt->getParent()->createScopArrayInfo( - MemAccessB->getElementType(), "Packed_B", {FirstDimSize, SecondDimSize}); + MemAccessA->getElementType(), "Packed_A", {FirstDimSize, SecondDimSize}); AccRel = isl_map_set_tuple_id(AccRel, isl_dim_out, SAI->getBasePtrId()); - OldAcc = MemAccessB->getAccessRelation(); - MemAccessB->setNewAccessRelation(AccRel); - ExtMap = getMatMulExt(Stmt->getIslCtx(), 0, MacroParams.Nc, MacroParams.Kc); - isl_map_move_dims(ExtMap, isl_dim_out, 0, isl_dim_in, 1, 1); + OldAcc = MemAccessA->getAccessRelation(); + MemAccessA->setNewAccessRelation(AccRel); + ExtMap = getMatMulExt(Stmt->getIslCtx(), MacroParams.Mc, 0, MacroParams.Kc); + isl_map_move_dims(ExtMap, isl_dim_out, 0, isl_dim_in, 0, 1); isl_map_move_dims(ExtMap, isl_dim_in, 2, isl_dim_out, 0, 1); NewStmt = Stmt->getParent()->addScopStmt( - OldAcc, MemAccessB->getAccessRelation(), Domain); + OldAcc, MemAccessA->getAccessRelation(), Domain); ExtMap = isl_map_set_tuple_id(ExtMap, isl_dim_out, NewStmt->getDomainId()); Node = createExtensionNode(Node, ExtMap); Node = isl_schedule_node_child(isl_schedule_node_child(Node, 0), 0); diff --git a/polly/test/ScheduleOptimizer/mat_mul_pattern_data_layout.ll b/polly/test/ScheduleOptimizer/mat_mul_pattern_data_layout.ll index aa68d298c15..069b9bffc71 100644 --- a/polly/test/ScheduleOptimizer/mat_mul_pattern_data_layout.ll +++ b/polly/test/ScheduleOptimizer/mat_mul_pattern_data_layout.ll @@ -9,37 +9,37 @@ ; C[i][j] += alpha * A[i][k] * B[k][j]; ; } ; -; CHECK: double Packed_A[ { [] -> [(1024)] } ][ { [] -> [(4)] } ]; // Element size 8 -; CHECK: double Packed_B[ { [] -> [(3072)] } ][ { [] -> [(8)] } ]; // Element size 8 +; CHECK: double Packed_B[ { [] -> [(512)] } ][ { [] -> [(8)] } ]; // Element size 8 +; CHECK-NEXT: double Packed_A[ { [] -> [(6144)] } ][ { [] -> [(4)] } ]; // Element size 8 ; ; CHECK: { Stmt_Copy_0[i0, i1, i2] -> MemRef_arg6[i0, i2] }; -; CHECK: new: { Stmt_Copy_0[i0, i1, i2] -> Packed_A[o0, o1] : 256*floor((-i2 + o0)/256) = -i2 + o0 and 4*floor((-i0 + o1)/4) = -i0 + o1 and 0 <= o1 <= 3 and -3 + i0 - 16*floor((i0)/16) <= 4*floor((o0)/256) <= i0 - 16*floor((i0)/16) }; +; CHECK-NEXT: new: { Stmt_Copy_0[i0, i1, i2] -> Packed_A[o0, o1] : 256*floor((-i2 + o0)/256) = -i2 + o0 and 4*floor((-i0 + o1)/4) = -i0 + o1 and 0 <= o1 <= 3 and -3 + i0 - 96*floor((i0)/96) <= 4*floor((o0)/256) <= i0 - 96*floor((i0)/96) }; ; ; CHECK: { Stmt_Copy_0[i0, i1, i2] -> MemRef_arg7[i2, i1] }; -; CHECK: new: { Stmt_Copy_0[i0, i1, i2] -> Packed_B[o0, o1] : 256*floor((-i2 + o0)/256) = -i2 + o0 and 8*floor((-i1 + o1)/8) = -i1 + o1 and 0 <= o1 <= 7 and -7 + i1 - 96*floor((i1)/96) <= 8*floor((o0)/256) <= i1 - 96*floor((i1)/96) }; +; CHECK-NEXT: new: { Stmt_Copy_0[i0, i1, i2] -> Packed_B[o0, o1] : 256*floor((-i2 + o0)/256) = -i2 + o0 and 8*floor((-i1 + o1)/8) = -i1 + o1 and 0 <= o1 <= 7 and -7 + i1 - 16*floor((i1)/16) <= 8*floor((o0)/256) <= i1 - 16*floor((i1)/16) }; ; ; CHECK: CopyStmt_0 -; CHECK: Domain := -; CHECK: { CopyStmt_0[i0, i1, i2] : 0 <= i0 <= 1055 and 0 <= i1 <= 1055 and 0 <= i2 <= 1023 }; -; CHECK: Schedule := -; CHECK: ; -; CHECK: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] -; CHECK: null; -; CHECK: new: { CopyStmt_0[i0, i1, i2] -> Packed_A[o0, o1] : 256*floor((-i2 + o0)/256) = -i2 + o0 and 4*floor((-i0 + o1)/4) = -i0 + o1 and 0 <= o1 <= 3 and -3 + i0 - 16*floor((i0)/16) <= 4*floor((o0)/256) <= i0 - 16*floor((i0)/16) }; -; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 0] -; CHECK: null; -; CHECK: new: { CopyStmt_0[i0, i1, i2] -> MemRef_arg6[i0, i2] }; -; CHECK: CopyStmt_1 -; CHECK: Domain := -; CHECK: { CopyStmt_1[i0, i1, i2] : 0 <= i0 <= 1055 and 0 <= i1 <= 1055 and 0 <= i2 <= 1023 }; -; CHECK: Schedule := -; CHECK: ; -; CHECK: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] -; CHECK: null; -; CHECK: new: { CopyStmt_1[i0, i1, i2] -> Packed_B[o0, o1] : 256*floor((-i2 + o0)/256) = -i2 + o0 and 8*floor((-i1 + o1)/8) = -i1 + o1 and 0 <= o1 <= 7 and -7 + i1 - 96*floor((i1)/96) <= 8*floor((o0)/256) <= i1 - 96*floor((i1)/96) }; -; CHECK: ReadAccess := [Reduction Type: NONE] [Scalar: 0] -; CHECK: null; -; CHECK: new: { CopyStmt_1[i0, i1, i2] -> MemRef_arg7[i2, i1] }; +; CHECK-NEXT: Domain := +; CHECK-NEXT: { CopyStmt_0[i0, i1, i2] : 0 <= i0 <= 1055 and 0 <= i1 <= 1055 and 0 <= i2 <= 1023 }; +; CHECK-NEXT: Schedule := +; CHECK-NEXT: ; +; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] +; CHECK-NEXT: null; +; CHECK-NEXT: new: { CopyStmt_0[i0, i1, i2] -> Packed_B[o0, o1] : 256*floor((-i2 + o0)/256) = -i2 + o0 and 8*floor((-i1 + o1)/8) = -i1 + o1 and 0 <= o1 <= 7 and -7 + i1 - 16*floor((i1)/16) <= 8*floor((o0)/256) <= i1 - 16*floor((i1)/16) }; +; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0] +; CHECK-NEXT: null; +; CHECK-NEXT: new: { CopyStmt_0[i0, i1, i2] -> MemRef_arg7[i2, i1] }; +; CHECK-NEXT: CopyStmt_1 +; CHECK-NEXT: Domain := +; CHECK-NEXT: { CopyStmt_1[i0, i1, i2] : 0 <= i0 <= 1055 and 0 <= i1 <= 1055 and 0 <= i2 <= 1023 }; +; CHECK-NEXT: Schedule := +; CHECK-NEXT: ; +; CHECK-NEXT: MustWriteAccess := [Reduction Type: NONE] [Scalar: 0] +; CHECK-NEXT: null; +; CHECK-NEXT: new: { CopyStmt_1[i0, i1, i2] -> Packed_A[o0, o1] : 256*floor((-i2 + o0)/256) = -i2 + o0 and 4*floor((-i0 + o1)/4) = -i0 + o1 and 0 <= o1 <= 3 and -3 + i0 - 96*floor((i0)/96) <= 4*floor((o0)/256) <= i0 - 96*floor((i0)/96) }; +; CHECK-NEXT: ReadAccess := [Reduction Type: NONE] [Scalar: 0] +; CHECK-NEXT: null; +; CHECK-NEXT: new: { CopyStmt_1[i0, i1, i2] -> MemRef_arg6[i0, i2] }; ; target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-unknown" diff --git a/polly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll b/polly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll index 45a352c8d37..f2117eba4d1 100644 --- a/polly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll +++ b/polly/test/ScheduleOptimizer/pattern-matching-based-opts_3.ll @@ -11,107 +11,116 @@ ; } ; ; CHECK: { -; CHECK: // 1st level tiling - Tiles -; CHECK: for (int c0 = 0; c0 <= 32; c0 += 1) -; CHECK: for (int c1 = 0; c1 <= 32; c1 += 1) { -; CHECK: // 1st level tiling - Points -; CHECK: for (int c2 = 0; c2 <= 31; c2 += 1) -; CHECK: for (int c3 = 0; c3 <= 31; c3 += 1) -; CHECK: Stmt_bb14(32 * c0 + c2, 32 * c1 + c3); -; CHECK: } -; CHECK: // Register tiling - Tiles -; CHECK: for (int c0 = 0; c0 <= 263; c0 += 1) -; CHECK: for (int c1 = 0; c1 <= 131; c1 += 1) -; CHECK: for (int c2 = 0; c2 <= 1023; c2 += 1) { -; CHECK: // Register tiling - Points -; CHECK: // 1st level tiling - Tiles -; CHECK: // 1st level tiling - Points -; CHECK: { -; CHECK: Stmt_bb24(4 * c0, 8 * c1, c2); -; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 1, c2); -; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 2, c2); -; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 3, c2); -; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 4, c2); -; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 5, c2); -; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 6, c2); -; CHECK: Stmt_bb24(4 * c0, 8 * c1 + 7, c2); -; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1, c2); -; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 1, c2); -; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 2, c2); -; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 3, c2); -; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 4, c2); -; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 5, c2); -; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 6, c2); -; CHECK: Stmt_bb24(4 * c0 + 1, 8 * c1 + 7, c2); -; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1, c2); -; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 1, c2); -; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 2, c2); -; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 3, c2); -; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 4, c2); -; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 5, c2); -; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 6, c2); -; CHECK: Stmt_bb24(4 * c0 + 2, 8 * c1 + 7, c2); -; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1, c2); -; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 1, c2); -; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 2, c2); -; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 3, c2); -; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 4, c2); -; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 5, c2); -; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 6, c2); -; CHECK: Stmt_bb24(4 * c0 + 3, 8 * c1 + 7, c2); -; CHECK: } -; CHECK: } -; CHECK: } +; CHECK-NEXT: // 1st level tiling - Tiles +; CHECK-NEXT: for (int c0 = 0; c0 <= 32; c0 += 1) +; CHECK-NEXT: for (int c1 = 0; c1 <= 32; c1 += 1) { +; CHECK-NEXT: // 1st level tiling - Points +; CHECK-NEXT: for (int c2 = 0; c2 <= 31; c2 += 1) +; CHECK-NEXT: for (int c3 = 0; c3 <= 31; c3 += 1) +; CHECK-NEXT: Stmt_bb14(32 * c0 + c2, 32 * c1 + c3); +; CHECK-NEXT: } +; CHECK-NEXT: // Register tiling - Tiles +; CHECK-NEXT: for (int c0 = 0; c0 <= 131; c0 += 1) +; CHECK-NEXT: for (int c1 = 0; c1 <= 263; c1 += 1) +; CHECK-NEXT: for (int c2 = 0; c2 <= 1023; c2 += 1) { +; CHECK-NEXT: // Register tiling - Points +; CHECK-NEXT: // 1st level tiling - Tiles +; CHECK-NEXT: // 1st level tiling - Points +; CHECK-NEXT: { +; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 1, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 2, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 3, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 4, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 5, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 6, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1, 8 * c0 + 7, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 1, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 2, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 3, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 4, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 5, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 6, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 1, 8 * c0 + 7, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 1, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 2, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 3, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 4, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 5, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 6, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 2, 8 * c0 + 7, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 1, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 2, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 3, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 4, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 5, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 6, c2); +; CHECK-NEXT: Stmt_bb24(4 * c1 + 3, 8 * c0 + 7, c2); +; CHECK-NEXT: } +; CHECK-NEXT: } +; CHECK-NEXT: } ; -; EXTRACTION-OF-MACRO-KERNEL: // 1st level tiling - Tiles -; EXTRACTION-OF-MACRO-KERNEL: for (int c0 = 0; c0 <= 65; c0 += 1) -; EXTRACTION-OF-MACRO-KERNEL: for (int c1 = 0; c1 <= 3; c1 += 1) -; EXTRACTION-OF-MACRO-KERNEL: for (int c2 = 0; c2 <= 10; c2 += 1) { -; EXTRACTION-OF-MACRO-KERNEL: // 1st level tiling - Points -; EXTRACTION-OF-MACRO-KERNEL: // Register tiling - Tiles -; EXTRACTION-OF-MACRO-KERNEL: for (int c3 = 0; c3 <= 3; c3 += 1) -; EXTRACTION-OF-MACRO-KERNEL: for (int c4 = 0; c4 <= 11; c4 += 1) -; EXTRACTION-OF-MACRO-KERNEL: for (int c5 = 0; c5 <= 255; c5 += 1) { -; EXTRACTION-OF-MACRO-KERNEL: // Register tiling - Points -; EXTRACTION-OF-MACRO-KERNEL: // 1st level tiling - Tiles -; EXTRACTION-OF-MACRO-KERNEL: // 1st level tiling - Points -; EXTRACTION-OF-MACRO-KERNEL: { -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 1, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 2, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 1, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 2, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 3, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 4, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 5, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 6, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: Stmt_bb24(16 * c0 + 4 * c3 + 3, 96 * c2 + 8 * c4 + 7, 256 * c1 + c5); -; EXTRACTION-OF-MACRO-KERNEL: } -; EXTRACTION-OF-MACRO-KERNEL: } -; EXTRACTION-OF-MACRO-KERNEL: } -; EXTRACTION-OF-MACRO-KERNEL: } +; EXTRACTION-OF-MACRO-KERNEL: { +; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Tiles +; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c0 = 0; c0 <= 32; c0 += 1) +; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c1 = 0; c1 <= 32; c1 += 1) { +; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Points +; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c2 = 0; c2 <= 31; c2 += 1) +; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c3 = 0; c3 <= 31; c3 += 1) +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb14(32 * c0 + c2, 32 * c1 + c3); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: } +; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Tiles +; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c0 = 0; c0 <= 65; c0 += 1) +; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c1 = 0; c1 <= 3; c1 += 1) +; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c2 = 0; c2 <= 10; c2 += 1) { +; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Points +; EXTRACTION-OF-MACRO-KERNEL-NEXT: // Register tiling - Tiles +; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c3 = 0; c3 <= 1; c3 += 1) +; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c4 = 0; c4 <= 23; c4 += 1) +; EXTRACTION-OF-MACRO-KERNEL-NEXT: for (int c5 = 0; c5 <= 255; c5 += 1) { +; EXTRACTION-OF-MACRO-KERNEL-NEXT: // Register tiling - Points +; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Tiles +; EXTRACTION-OF-MACRO-KERNEL-NEXT: // 1st level tiling - Points +; EXTRACTION-OF-MACRO-KERNEL-NEXT: { +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 1, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 2, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 3, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 4, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 5, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 6, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4, 16 * c0 + 8 * c3 + 7, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 1, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 2, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 3, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 4, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 5, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 6, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 1, 16 * c0 + 8 * c3 + 7, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 1, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 2, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 3, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 4, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 5, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 6, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 2, 16 * c0 + 8 * c3 + 7, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 1, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 2, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 3, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 4, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 5, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 6, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: Stmt_bb24(96 * c2 + 4 * c4 + 3, 16 * c0 + 8 * c3 + 7, 256 * c1 + c5); +; EXTRACTION-OF-MACRO-KERNEL-NEXT: } +; EXTRACTION-OF-MACRO-KERNEL-NEXT: } +; EXTRACTION-OF-MACRO-KERNEL-NEXT: } +; EXTRACTION-OF-MACRO-KERNEL-NEXT: } ; target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-unknown" |