summaryrefslogtreecommitdiffstats
path: root/llvm
diff options
context:
space:
mode:
authorRoman Gareev <gareevroman@gmail.com>2016-12-21 12:51:12 +0000
committerRoman Gareev <gareevroman@gmail.com>2016-12-21 12:51:12 +0000
commitbe5299af0b88b74bf986d223883c0992bd0d993f (patch)
tree4636ffa178e20ed0b63a26ebb0f5a437c91dbd78 /llvm
parent85e12d285188de3f02c82b0bd1da8abfb5ad12f3 (diff)
downloadbcm5719-llvm-be5299af0b88b74bf986d223883c0992bd0d993f.tar.gz
bcm5719-llvm-be5299af0b88b74bf986d223883c0992bd0d993f.zip
Change the determination of parameters of macro-kernel
Typically processor architectures do not include an L3 cache, which means that Nc, the parameter of the micro-kernel, is, for all practical purposes, redundant ([1]). However, its small values can cause the redundant packing of the same elements of the matrix A, the first operand of the matrix multiplication. At the same time, big values of the parameter Nc can cause segmentation faults in case the available stack is exceeded. This patch adds an option to specify the parameter Nc as a multiple of the parameter of the micro-kernel Nr. In case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8 it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak). Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28019 llvm-svn: 290256
Diffstat (limited to 'llvm')
0 files changed, 0 insertions, 0 deletions
OpenPOWER on IntegriCloud