diff options
| author | Rui Ueyama <ruiu@google.com> | 2016-12-03 23:35:22 +0000 |
|---|---|---|
| committer | Rui Ueyama <ruiu@google.com> | 2016-12-03 23:35:22 +0000 |
| commit | c3aacfd91bcf41a122c64b21c39f85cb72a1b333 (patch) | |
| tree | d70f16b4858968f4f5250d8cc3948058f220db31 /lld/ELF/Threads.h | |
| parent | f421403650fa2ed5247efa66683283d2cfc0adf1 (diff) | |
| download | bcm5719-llvm-c3aacfd91bcf41a122c64b21c39f85cb72a1b333.tar.gz bcm5719-llvm-c3aacfd91bcf41a122c64b21c39f85cb72a1b333.zip | |
Add comments about the use of threads in LLD.
llvm-svn: 288606
Diffstat (limited to 'lld/ELF/Threads.h')
| -rw-r--r-- | lld/ELF/Threads.h | 48 |
1 files changed, 48 insertions, 0 deletions
diff --git a/lld/ELF/Threads.h b/lld/ELF/Threads.h index 4103762c7d5..58d24970e7b 100644 --- a/lld/ELF/Threads.h +++ b/lld/ELF/Threads.h @@ -6,6 +6,54 @@ // License. See LICENSE.TXT for details. // //===----------------------------------------------------------------------===// +// +// LLD supports threads to distribute workloads to multiple cores. Using +// multicore is most effective when more than one core are idle. At the +// last step of a build, it is often the case that a linker is the only +// active process on a computer. So, we are naturally interested in using +// threads wisely to reduce latency to deliver results to users. +// +// That said, we don't want to do "too clever" things using threads. +// Complex multi-threaded algorithms are sometimes extremely hard to +// justify the correctness and can easily mess up the entire design. +// +// Fortunately, when a linker links large programs (when the link time is +// most critical), it spends most of the time to work on massive number of +// small pieces of data of the same kind. Here are examples: +// +// - We have hundreds of thousands of input sections that need to be +// copied to a result file at the last step of link. Once we fix a file +// layout, each section can be copied to its destination and its +// relocations can be applied independently. +// +// - We have tens of millions of small strings when constructing a +// mergeable string section. +// +// For the cases such as the former, we can just use parallel_for_each +// instead of std::for_each (or a plain for loop). Because tasks are +// completely independent from each other, we can run them in parallel +// without any coordination between them. That's very easy to understand +// and justify. +// +// For the cases such as the latter, we can use parallel algorithms to +// deal with massive data. We have to write code for a tailored algorithm +// for each problem, but the complexity of multi-threading is isolated in +// a single pass and doesn't affect the linker's overall design. +// +// The above approach seems to be working fairly well. As an example, when +// linking Chromium (output size 1.6 GB), using 4 cores reduces latency to +// 75% compared to single core (from 12.66 seconds to 9.55 seconds) on my +// machine. Using 40 cores reduces it to 63% (from 12.66 seconds to 7.95 +// seconds). Because of the Amdahl's law, the speedup is not linear, but +// as you add more cores, it gets faster. +// +// On a final note, if you are trying to optimize, keep the axiom "don't +// guess, measure!" in mind. Some important passes of the linker are not +// that slow. For example, resolving all symbols is not a very heavy pass, +// although it would be very hard to parallelize it. You want to first +// identify a slow pass and then optimize it. +// +//===----------------------------------------------------------------------===// #ifndef LLD_ELF_THREADS_H #define LLD_ELF_THREADS_H |

