[ELF][PPC] Allow PT_LOAD to have overlapping p_offset ranges

This change affects the non-linker script case (precisely, when the `SECTIONS` command is not used). It deletes 3 alignments at PT_LOAD boundaries for the default case: the size of a powerpc64 binary can be decreased by at most 192kb. The technique can be ported to other targets. Let me demonstrate the idea with a maxPageSize=65536 example: When assigning the address to the first output section of a new PT_LOAD, if the end p_vaddr of the previous PT_LOAD is 0x10020, we advance to the next multiple of maxPageSize: 0x20000. The new PT_LOAD will thus have p_vaddr=0x20000. Because p_offset and p_vaddr are congruent modulo maxPageSize, p_offset will be 0x20000, leaving a p_offset gap [0x10020, 0x20000) in the output. Alternatively, if we advance to 0x20020, the new PT_LOAD will have p_vaddr=0x20020. We can pick either 0x10020 or 0x20020 for p_offset! Obviously 0x10020 is the choice because it leaves no gap. At runtime, p_vaddr will be rounded down by pagesize (65536 if pagesize=maxPageSize). This PT_LOAD will load additional initial contents from p_offset ranges [0x10000,0x10020), which will also be loaded by the previous PT_LOAD. This is fine if -z noseparate-code is in effect or if we are not transiting between executable and non-executable segments. ld.bfd -z noseparate-code leverages this technique to keep output small. This patch implements the technique in lld, which is mostly effective on targets with large defaultMaxPageSize (AArch64/MIPS/PPC: 65536). The 3 removed alignments can save almost 3*65536 bytes. Two places that rely on p_vaddr%pagesize = 0 have to be updated. 1) We used to round p_memsz(PT_GNU_RELRO) up to commonPageSize (defaults to 4096 on all targets). Now p_vaddr%commonPageSize may be non-zero. The updated formula takes account of that factor. 2) Our TP offsets formulae are only correct if p_vaddr%p_align = 0. Fix them. See the updated comments in InputSection.cpp for details. On targets that we enable the technique (only PPC64 now), we can potentially make `p_vaddr(PT_TLS)%p_align(PT_TLS) != 0` if `sh_addralign(.tdata) < sh_addralign(.tbss)` This exposes many problems in ld.so implementations, especially the offsets of dynamic TLS blocks. Known issues: FreeBSD 13.0-CURRENT rtld-elf (i386/amd64/powerpc/arm64) glibc (HEAD) i386 and x86_64 https://sourceware.org/bugzilla/show_bug.cgi?id=24606 musl<=1.1.22 on TLS Variant I architectures (aarch64/powerpc64/...) So, force p_vaddr%p_align = 0 by rounding dot up to p_align(PT_TLS). The technique will be enabled (with updated tests) for other targets in subsequent patches. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D64906 llvm-svn: 369343
author: Fangrui Song <maskray@google.com> 2019-08-20 08:34:25 +0000
committer: Fangrui Song <maskray@google.com> 2019-08-20 08:34:25 +0000
commit: 01c7f4b6066935c5bd6f844fdb1d419c568aef14 (patch)
tree: 52b3d696bcbe381633714e4a2efaac1acb84e5ae /lld/ELF/InputSection.cpp
parent: ebc8fd3c0c644faabaf2affbfc3d172f592ac5f3 (diff)
download: bcm5719-llvm-01c7f4b6066935c5bd6f844fdb1d419c568aef14.tar.gz
bcm5719-llvm-01c7f4b6066935c5bd6f844fdb1d419c568aef14.zip
1 files changed, 26 insertions, 14 deletions
diff --git a/lld/ELF/InputSection.cpp b/lld/ELF/InputSection.cpp
index e8a821bd33b..813423bb5e7 100644
--- a/lld/ELF/InputSection.cpp
+++ b/lld/ELF/InputSection.cpp
@@ -608,27 +608,39 @@ static int64_t getTlsTpOffset(const Symbol &s) {
   if (&s == ElfSym::tlsModuleBase)
     return 0;
 
+  // There are 2 TLS layouts. Among targets we support, x86 uses TLS Variant 2
+  // while most others use Variant 1. At run time TP will be aligned to p_align.
+
+  // Variant 1. TP will be followed by an optional gap (which is the size of 2
+  // pointers on ARM/AArch64, 0 on other targets), followed by alignment
+  // padding, then the static TLS blocks. The alignment padding is added so that
+  // (TP + gap + padding) is congruent to p_vaddr modulo p_align.
+  //
+  // Variant 2. Static TLS blocks, followed by alignment padding are placed
+  // before TP. The alignment padding is added so that (TP - padding -
+  // p_memsz) is congruent to p_vaddr modulo p_align.
+  elf::PhdrEntry *tls = Out::tlsPhdr;
   switch (config->emachine) {
+    // Variant 1.
   case EM_ARM:
   case EM_AARCH64:
-    // Variant 1. The thread pointer points to a TCB with a fixed 2-word size,
-    // followed by a variable amount of alignment padding, followed by the TLS
-    // segment.
-    return s.getVA(0) + alignTo(config->wordsize * 2, Out::tlsPhdr->p_align);
-  case EM_386:
-  case EM_X86_64:
-    // Variant 2. The TLS segment is located just before the thread pointer.
-    return s.getVA(0) - alignTo(Out::tlsPhdr->p_memsz, Out::tlsPhdr->p_align);
+    return s.getVA(0) + config->wordsize * 2 +
+           ((tls->p_vaddr - config->wordsize * 2) & (tls->p_align - 1));
   case EM_MIPS:
   case EM_PPC:
   case EM_PPC64:
-    // The thread pointer points to a fixed offset from the start of the
-    // executable's TLS segment. An offset of 0x7000 allows a signed 16-bit
-    // offset to reach 0x1000 of TCB/thread-library data and 0xf000 of the
-    // program's TLS segment.
-    return s.getVA(0) - 0x7000;
+    // Adjusted Variant 1. TP is placed with a displacement of 0x7000, which is
+    // to allow a signed 16-bit offset to reach 0x1000 of TCB/thread-library
+    // data and 0xf000 of the program's TLS segment.
+    return s.getVA(0) + (tls->p_vaddr & (tls->p_align - 1)) - 0x7000;
   case EM_RISCV:
-    return s.getVA(0);
+    return s.getVA(0) + (tls->p_vaddr & (tls->p_align - 1));
+
+    // Variant 2.
+  case EM_386:
+  case EM_X86_64:
+    return s.getVA(0) - tls->p_memsz -
+           ((-tls->p_vaddr - tls->p_memsz) & (tls->p_align - 1));
   default:
     llvm_unreachable("unhandled Config->EMachine");
   }
author	Fangrui Song <maskray@google.com>	2019-08-20 08:34:25 +0000
committer	Fangrui Song <maskray@google.com>	2019-08-20 08:34:25 +0000
commit	01c7f4b6066935c5bd6f844fdb1d419c568aef14 (patch)
tree	52b3d696bcbe381633714e4a2efaac1acb84e5ae /lld/ELF/InputSection.cpp
parent	ebc8fd3c0c644faabaf2affbfc3d172f592ac5f3 (diff)
download	bcm5719-llvm-01c7f4b6066935c5bd6f844fdb1d419c568aef14.tar.gz bcm5719-llvm-01c7f4b6066935c5bd6f844fdb1d419c568aef14.zip