diff options
author | Justin Lebar <jlebar@google.com> | 2018-01-10 03:02:12 +0000 |
---|---|---|
committer | Justin Lebar <jlebar@google.com> | 2018-01-10 03:02:12 +0000 |
commit | 9d3afd3c0613685c7f839588b784080999329f68 (patch) | |
tree | 4f246e4b5f8b1688efa8a5fa8c8e27f2ba9a4e01 /llvm/lib/Transforms/Vectorize | |
parent | 772aea2b91ac3cea3d8b21dc8aed11df0ed6443d (diff) | |
download | bcm5719-llvm-9d3afd3c0613685c7f839588b784080999329f68.tar.gz bcm5719-llvm-9d3afd3c0613685c7f839588b784080999329f68.zip |
Add explanatory comment to LoadStoreVectorizer.
Reviewers: arsenm
Subscribers: rengolin, sanjoy, wdng, hiraditya, asbirlea
Differential Revision: https://reviews.llvm.org/D41890
llvm-svn: 322157
Diffstat (limited to 'llvm/lib/Transforms/Vectorize')
-rw-r--r-- | llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp | 32 |
1 files changed, 32 insertions, 0 deletions
diff --git a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp index dc83b6d4d29..2fd39766bd8 100644 --- a/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp +++ b/llvm/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp @@ -6,6 +6,38 @@ // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// +// +// This pass merges loads/stores to/from sequential memory addresses into vector +// loads/stores. Although there's nothing GPU-specific in here, this pass is +// motivated by the microarchitectural quirks of nVidia and AMD GPUs. +// +// (For simplicity below we talk about loads only, but everything also applies +// to stores.) +// +// This pass is intended to be run late in the pipeline, after other +// vectorization opportunities have been exploited. So the assumption here is +// that immediately following our new vector load we'll need to extract out the +// individual elements of the load, so we can operate on them individually. +// +// On CPUs this transformation is usually not beneficial, because extracting the +// elements of a vector register is expensive on most architectures. It's +// usually better just to load each element individually into its own scalar +// register. +// +// However, nVidia and AMD GPUs don't have proper vector registers. Instead, a +// "vector load" loads directly into a series of scalar registers. In effect, +// extracting the elements of the vector is free. It's therefore always +// beneficial to vectorize a sequence of loads on these architectures. +// +// Vectorizing (perhaps a better name might be "coalescing") loads can have +// large performance impacts on GPU kernels, and opportunities for vectorizing +// are common in GPU code. This pass tries very hard to find such +// opportunities; its runtime is quadratic in the number of loads in a BB. +// +// Some CPU architectures, such as ARM, have instructions that load into +// multiple scalar registers, similar to a GPU vectorized load. In theory ARM +// could use this pass (with some modifications), but currently it implements +// its own pass to do something similar to what we do here. #include "llvm/ADT/APInt.h" #include "llvm/ADT/ArrayRef.h" |