bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[LLD][ELF][ARM][AArch64] Only round up ThunkSection Size when large OS.	Peter Smith	2020-02-04	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In D71281 a fix was put in to round up the size of a ThunkSection to the nearest 4KiB when performing errata patching. This fixed a problem with a very large instrumented program that had thunks and patches mutually trigger each other. Unfortunately it triggers an assertion failure in an AArch64 allyesconfig build of the kernel. There is a specific assertion preventing an InputSectionDescription being larger than 4KiB. This will always trigger if there is at least one Thunk needed in that InputSectionDescription, which is possible for an allyesconfig build. Abstractly the problem case is: .text : { (.text) ; ... . = ALIGN(SZ_4K); __idmap_text_start = .; (.idmap.text) __idmap_text_end = .; ... } The assertion checks that __idmap_text_end - __idmap_start is < 4 KiB. Note that there is more than one InputSectionDescription in the OutputSection so we can't just restrict the fix to OutputSections smaller than 4 KiB. The fix presented here limits the D71281 to InputSectionDescriptions that meet the following conditions: 1.) The OutputSection is bigger than the thunkSectionSpacing so adding thunks will affect the addresses of following code. 2.) The InputSectionDescription is larger than 4 KiB. This will prevent any assertion failures that an InputSectionDescription is < 4 KiB in size. We do this at ThunkSection creation time as at this point we know that the addresses are stable and up to date prior to adding the thunks as assignAddresses() will have been called immediately prior to thunk generation. The fix reverts the two tests affected by D71281 to their original state as they no longer need the 4KiB size roundup. I've added simpler tests to check for D71281 when the OutputSection size is larger than the ThunkSection spacing. Fixes https://github.com/ClangBuiltLinux/linux/issues/812 Differential Revision: https://reviews.llvm.org/D72344 (cherry picked from commit 01ad4c838466bd5db180608050ed8ccb3b62d136)
*	[ELF] Decrease alignment of ThunkSection on 64-bit targets from 8 to 4	Fangrui Song	2020-02-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	ThunkSection contains 4-byte instructions on all targets that use thunks. Thunks should not be used in any performance sensitive places, and locality/cache line/instruction fetching arguments should not apply. We use 16 bytes as preferred function alignments for modern PowerPC cores. In any case, 8 is not optimal. Differential Revision: https://reviews.llvm.org/D72819 (cherry picked from commit 870094decfc9fe80c8e0a6405421b7d09b97b02b)
*	[ELF][PPC32] Support canonical PLT	Fangrui Song	2020-01-25	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-fno-pie produces a pair of non-GOT-non-PLT relocations R_PPC_ADDR16_{HA,LO} (R_ABS) referencing external functions. ``` lis 3, func@ha la 3, func@l(3) ``` In a -no-pie/-pie link, if func is not defined in the executable, a canonical PLT entry (st_value>0, st_shndx=0) will be needed. References to func in shared objects will be resolved to this address. -fno-pie -pie should fail with "can't create dynamic relocation ... against ...", so we just need to think about -no-pie. On x86, the PLT entry passes the JMP_SLOT offset to the rtld PLT resolver. On x86-64: the PLT entry passes the JUMP_SLOT index to the rtld PLT resolver. On ARM/AArch64: the PLT entry passes &.got.plt[n]. The PLT header passes &.got.plt[fixed-index]. The rtld PLT resolver can compute the JUMP_SLOT index from the two addresses. For these targets, the canonical PLT entry can just reuse the regular PLT entry (in PltSection). On PPC32: PltSection (.glink) consists of `b PLTresolve` instructions and `PLTresolve`. The rtld PLT resolver depends on r11 having been set up to the .plt (GotPltSection) entry. On PPC64 ELFv2: PltSection (.glink) consists of `__glink_PLTresolve` and `bl __glink_PLTresolve`. The rtld PLT resolver depends on r12 having been set up to the .plt (GotPltSection) entry. We cannot reuse a `b PLTresolve`/`bl __glink_PLTresolve` in PltSection as a canonical PLT entry. PPC64 ELFv2 avoids the problem by using TOC for any external reference, even in non-pic code, so the canonical PLT entry scenario should not happen in the first place. For PPC32, we have to create a PLT call stub as the canonical PLT entry. The code sequence sets up r11. Reviewed By: Bdragon28 Differential Revision: https://reviews.llvm.org/D73399 (cherry picked from commit 837e8a9c0cd097034e023dfba146d17ce132998c)
*	[ELF] Add -z force-ibt and -z shstk for Intel Control-flow Enforcement ↵	Fangrui Song	2020-01-13	1	-2/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Technology This patch is a joint work by Rui Ueyama and me based on D58102 by Xiang Zhang. It adds Intel CET (Control-flow Enforcement Technology) support to lld. The implementation follows the draft version of psABI which you can download from https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI. CET introduces a new restriction on indirect jump instructions so that you can limit the places to which you can jump to using indirect jumps. In order to use the feature, you need to compile source files with -fcf-protection=full. * IBT is enabled if all input files are compiled with the flag. To force enabling ibt, pass -z force-ibt. * SHSTK is enabled if all input files are compiled with the flag, or if -z shstk is specified. IBT-enabled executables/shared objects have two PLT sections, ".plt" and ".plt.sec". For the details as to why we have two sections, please read the comments. Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D59780
*	[lld] Fix -Wrange-loop-analysis warnings	Fangrui Song	2020-01-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	One instance looks like a false positive: lld/ELF/Relocations.cpp:1622:14: note: use reference type 'const std::pair<ThunkSection , uint32_t> &' (aka 'cons t pair<lld::elf::ThunkSection , unsigned int> &') to prevent copying for (const std::pair<ThunkSection *, uint32_t> ts : isd->thunkSections) It is not changed in this commit.
*	[ELF] Support input section description .gnu.version* in /DISCARD/	Fangrui Song	2019-12-26	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Linux powerpc discards `(.gnu.version)` (arch/powerpc/kernel/vmlinux.lds.S) to suppress --orphan-handling=warn warnings in the -pie output `.tmp_vmlinux1` The support is simple. Just add isLive() to: 1) Fix an assertion in SectionBase::getPartition() called by VersionTableSection::isNeeded(). 2) Suppress DT_VERSYM, DT_VERDEF, DT_VERNEED and DT_VERNEEDNUM, if the relevant section is discarded. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D71819
*	[ELF] writePlt, writeIplt: replace parameters gotPltEntryAddr and index with ↵	Fangrui Song	2019-12-18	1	-7/+3
\| \| \| \| \| \| \| \| \| \|	`const Symbol &`. NFC PPC::writeIplt (IPLT code sequence, D71621) needs to access `Symbol`. Reviewed By: grimar, ruiu Differential Revision: https://reviews.llvm.org/D71631
*	[ELF] Rename .plt to .iplt and decrease EM_PPC{,64} alignment of .glink to 4	Fangrui Song	2019-12-17	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GNU ld creates the synthetic section .iplt, and has a built-in linker script that assigns .iplt to the output section .plt . There is no output section named .iplt . Making .iplt an output section actually has a benefit that makes the tricky toolchain feature stand out. Symbolizers don't have to deal with mixed PLT entries (e.g. llvm-objdump -d incorrectly annotates such jump targets). On EM_PPC{,64}, .glink contains a PLT resolver and a series of jump instructions. The 4-byte entry size makes it unnecessary to have an alignment of 16. Mark ppc32-gnu-ifunc.s and ppc32-gnu-ifunc-nonpreemptable.s as `XFAIL: *`. They test IPLT on EM_PPC, which never works. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D71520
*	[ELF] Add IpltSection	Fangrui Song	2019-12-17	1	-16/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PltSection is used by both PLT and IPLT. The PLT section may have a header while the IPLT section does not. Split off IpltSection from PltSection to be clearer. Unlike other targets, PPC64 cannot use the same code sequence for PLT and IPLT. This helps make a future PPC64 patch (D71509) more isolated. On EM_386 and EM_X86_64, when PLT is empty while IPLT is not, currently we are inconsistent whether the PLT header is conceptually attached to in.plt or in.iplt . Consistently attach the header to in.plt can make the -z retpolineplt logic simpler. It also makes `jmp` point to an aesthetically better place for non-retpolineplt cases. Reviewed By: grimar, ruiu Differential Revision: https://reviews.llvm.org/D71519
*	[ELF] Delete relOff from TargetInfo::writePLT	Fangrui Song	2019-12-16	1	-7/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change only affects EM_386. relOff can be computed from `index` easily, so it is unnecessarily passed as a parameter. Both in.plt and in.iplt entries are written by writePLT. For in.iplt, the instruction `push reloc_offset` will change because `index` is now different. Fortunately, this does not matter because `push; jmp` is only used by PLT. IPLT does not need the code sequence. Reviewed By: grimar, ruiu Differential Revision: https://reviews.llvm.org/D71518
*	[ELF] De-template PltSection::addEntry. NFC	Fangrui Song	2019-12-16	1	-6/+1
\|
*	Revert an accidental commit af5ca40b47b3e85c3add81ccdc0b787c4bc355ae	Rui Ueyama	2019-12-13	1	-2/+0
\|
*	temporary	Rui Ueyama	2019-12-13	1	-0/+2
\|
*	[LLD][ELF][AArch64][ARM] When errata patching, round thunk size to 4KiB.	Peter Smith	2019-12-11	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On some edge cases such as Chromium compiled with full instrumentation we have a .text section over twice the size of the maximum branch range and the instrumented code generation containing many examples of the erratum sequence. The combination of Thunks and many erratum sequences causes finalizeAddressDependentContent() to not converge. We end up with: start - Thunk Creation (disturbs addresses after thunks, creating more patches) - Patch Creation (disturbs addresses after patches, creating more thunks) - goto start In most images with few thunks and patches the mutual disturbance does not cause convergence problems. As the .text size and number of patches go up the risk increases. A way to prevent the thunk creation from interfering with patch creation is to round up the size of the thunks to a 4KiB boundary when the erratum patch is enabled. As the erratum sequence only triggers when an instruction sequence starts at 0xff8 or 0xffc modulo (4 KiB) by making the thunks not affect addresses modulo (4 KiB) we prevent thunks from interfering with the patch. The patches themselves could be aggregated in the same way that Thunks are within ThunkSections and we could round up the size in the same way. This would reduce the number of patches created in a .text section size > 128 MiB but would not likely help convergence problems. Differential Revision: https://reviews.llvm.org/D71281 fixes (remaining part of) pr44071, other part in D71242
*	[ELF][PPC64] Support long branch thunks with addends	Fangrui Song	2019-12-05	1	-7/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes PPC64 part of PR40438 // clang -target ppc64le -c a.cc // .text.unlikely may be placed in a separate output section (via -z keep-text-section-prefix) // The distance between bar in .text.unlikely and foo in .text may be larger than 32MiB. static void foo() {} __attribute__((section(".text.unlikely"))) static int bar() { foo(); return 0; } __attribute__((used)) static int dummy = bar(); This patch makes such thunks with addends work for PPC64. AArch64: .text -> `__AArch64ADRPThunk_ (adrp x16, ...; add x16, x16, ...; br x16)` -> target PPC64: .text -> `__long_branch_ (addis 12, 2, ...; ld 12, ...(12); mtctr 12; bctr)` -> target AArch64 can leverage ADRP to jump to the target directly, but PPC64 needs to load an address from .branch_lt . Before Power ISA v3.0, the PC-relative ADDPCIS was not available. .branch_lt was invented to work around the limitation. Symbol::ppc64BranchltIndex is replaced by PPC64LongBranchTargetSection::entry_index which take addends into consideration. The tests are rewritten: ppc64-long-branch.s tests -no-pie and ppc64-long-branch-pi.s tests -pie and -shared. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D70937
*	[LLD][ELF][AArch64] .note.gnu.property sections should have alignment 8	Peter Smith	2019-12-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	The .note.gnu.property SHT_NOTE sections on AArch64 (a 64-bit target) should have alignment 8 to more closely match the binutils implementation where alignment is 4-bytes on 32-bit machines and 8-bytes on 64-bit machines. Previously LLD was using 4 for both 32-bit and 64-bit machines. Differential Revision: https://reviews.llvm.org/D70962
*	ELF: Discard .ARM.exidx sections for empty functions instead of misordering ↵	Peter Collingbourne	2019-11-04	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	them. The logic added in r372781 caused ARMExidxSyntheticSection::addSection() to return false for exidx sections without a link order dep that passed isValidExidxSectionDep(). This included exidx sections for empty functions. As a result, such exidx sections would end up treated like ordinary sections and would end up being laid out before the ARMExidxSyntheticSection, most likely in the wrong order relative to the exidx entries in the ARMExidxSyntheticSection, breaking the orderedness invariant relied upon by unwinders. Fix this by simply discarding such sections. Differential Revision: https://reviews.llvm.org/D69744
*	Fix a few typos in lld/ELF to cycle bots	Nico Weber	2019-10-28	1	-6/+6
\|
*	[ELF] Wrap things in `namespace lld { namespace elf {`, NFC	Fangrui Song	2019-10-07	1	-87/+89
\| \| \| \| \| \| \| \| \| \| \|	This makes it clear `ELF/*/.cpp` files define things in the `lld::elf` namespace and simplifies `elf::foo` to `foo`. Reviewed By: atanasyan, grimar, ruiu Differential Revision: https://reviews.llvm.org/D68323 llvm-svn: 373885
*	ELF: Add .interp synthetic sections first in createSyntheticSections().	Peter Collingbourne	2019-10-01	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our .interp section is not a SyntheticSection. As a result, it terminates the loop in removeUnusedSyntheticSections(). This has at least two consequences: - The synthetic .bss and .bss.rel.ro sections are always present in dynamically linked executables, even when they are not needed. - The synthetic .ARM.exidx (and possibly other) sections are always present in partitions other than the last one, even when not needed. .ARM.exidx in particular is problematic because it assumes that its list of code sections is non-empty in getLinkOrderDep(), which can lead to a crash if the partition does not have any code sections. Fix these problems by moving the creation of the .interp sections to the top of createSyntheticSections(). While here, make the code a little less error-prone by changing the add() lambdas to take a SyntheticSection instead of an InputSectionBase. Differential Revision: https://reviews.llvm.org/D68256 llvm-svn: 373347
*	[ELF][ARM] Fix crash when discarding InputSections that have .ARM.exidx	Peter Smith	2019-09-24	1	-6/+18
\| \| \| \| \| \| \| \| \| \| \| \|	When /DISCARD/ is used on an input section, that input section may have a .ARM.exidx metadata section that depends on it. As the discard handling comes after the .ARM.exidx synthetic section is created we need to make sure that we account for the case where the .ARM.exidx output section should be removed because there are no more live input sections. Differential Revision: https://reviews.llvm.org/D67848 llvm-svn: 372781
*	[ELF] Make MergeInputSection merging aware of output sections	Fangrui Song	2019-09-24	1	-61/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes PR38748 mergeSections() calls getOutputSectionName() to get output section names. Two MergeInputSections may be merged even if they are made different by SECTIONS commands. This patch moves mergeSections() after processSectionCommands() and addOrphanSections() to fix the issue. The new pass is renamed to OutputSection::finalizeInputSections(). processSectionCommands() and addorphanSections() are changed to add sections to InputSectionDescription::sectionBases. finalizeInputSections() merges MergeInputSections and migrates `sectionBases` to `sections`. For the -r case, we drop an optimization that tries keeping sh_entsize non-zero. This is for the simplicity of addOrphanSections(). The updated merge-entsize2.s reflects the change. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D67504 llvm-svn: 372734
*	[ELF] Don't shrink RelrSection	Fangrui Song	2019-09-04	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes PR43214. The size of SHT_RELR may oscillate between 2 numbers (see D53003 for a similar --pack-dyn-relocs=android issue). This can happen if the shrink of SHT_RELR causes it to take more words to encode relocation offsets (this can happen with thunks or segments with overlapping p_offset ranges), and the expansion of SHT_RELR causes it to take fewer words to encode relocation offsets. To avoid the issue, add padding 1s to the end of the relocation section if its size would decrease. Trailing 1s do not decode to more relocations. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D67164 llvm-svn: 370923
*	[ELF][RISCV] Assign st_shndx of __global_pointer$ to 1 if .sdata does not exist	Fangrui Song	2019-08-28	1	-17/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This essentially reverts the code change of D63132 and switches to a simpler approach. In an executable/shared object, st_shndx of a symbol can be: 1) SHN_UNDEF: undefined symbol (or canonical PLT) 2) SHN_ABS: absolute symbol 3) any other value (usually a regular section index) represents a relative symbol. The actual value does not matter. Many ld.so (musl, all archs except MIPS of FreeBSD rtld-elf) even treat 2) and 3) the same. If .sdata does not exist, it does not matter what value/section __global_pointer$ has, as long as it is relative (otherwise there will be a pedantic lld error. See D63132). Just set the st_shndx arbitrarily to 1. Dummy st_shndx=1 may be used by __rela_iplt_start, linker-script-defined symbols outside a section, __dso_handle, etc. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D66798 llvm-svn: 370172
*	[ELF] EhFrameSection: postpone FDE liveness check to finalizeSections	Fangrui Song	2019-08-26	1	-27/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EhFrameSection::addSection checks liveness of FDE early. This makes it infeasible to move combineEhSections() before ICF. Postpone the check to EhFrameSection::finalizeContents(). This is what ARMExidxSyntheticSection does and it will make a subsequent patch D66717 simpler. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D66727 llvm-svn: 369890
*	Reland D65242 "[ELF] More dynamic relocation packing""	Fangrui Song	2019-08-21	1	-7/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixed a bug in r369488. When config->isRela is false, i->r_addend is not initialized (see encodeDynamicReloc). So we should check config->isRela before accessing r_addend: - if (j - i < 3 \|\| i->r_addend) + if (j - i < 3 \|\| (config->isRela && i->r_addend != 0)) Original description: Currently, with Android dynamic relocation packing, only relative relocations are grouped together. This patch implements similar packing for non-relative relocations. The implementation groups non-relative relocations with the same r_info and r_addend, if using RELA. By requiring a minimum group size of 3, this achieves smaller relocation sections. Building Android for an ARM32 device, I see the total size of /system/lib decrease by 392 KB. Grouping by r_info also allows the runtime dynamic linker to implement an 1-entry cache to reduce the number of symbol lookup required. With such 1-entry cache implemented on Android, I'm seeing 10% to 20% reduction in total time spent in runtime linker for several executables that I tested. As a simple correctness check, I've also built x86_64 Android and booted successfully. Differential Revision: https://reviews.llvm.org/D65242 Patch by Vic Yang llvm-svn: 369507
*	Revert D65242 "[ELF] More dynamic relocation packing"	Fangrui Song	2019-08-21	1	-66/+7
\| \| \| \| \| \| \| \| \|	This reverts r369488 and r369489. The change broke build bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/14511 http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/34407 llvm-svn: 369497
*	[ELF] More dynamic relocation packing	Fangrui Song	2019-08-21	1	-7/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, with Android dynamic relocation packing, only relative relocations are grouped together. This patch implements similar packing for non-relative relocations. The implementation groups non-relative relocations with the same r_info and r_addend, if using RELA. By requiring a minimum group size of 3, this achieves smaller relocation sections. Building Android for an ARM32 device, I see the total size of /system/lib decrease by 392 KB. Grouping by r_info also allows the runtime dynamic linker to implement an 1-entry cache to reduce the number of symbol lookup required. With such 1-entry cache implemented on Android, I'm seeing 10% to 20% reduction in total time spent in runtime linker for several executables that I tested. As a simple correctness check, I've also built x86_64 Android and booted successfully. Differential Revision: https://reviews.llvm.org/D66491 Patch by Vic Yang! llvm-svn: 369488
*	[LLD] Migrate llvm::make_unique to std::make_unique	Jonas Devlieghere	2019-08-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. Differential revision: https://reviews.llvm.org/D66259 llvm-svn: 368936
*	[ELF] --gdb-index: fix odd variable name cUs after r365730 and replace ↵	Fangrui Song	2019-08-14	1	-7/+6
\| \| \| \| \| \|	lower_bound with partition_point. NFC llvm-svn: 368845
*	DebugInfo: Explicitly handle errors when parsing unit DIEs	David Blaikie	2019-08-09	1	-0/+4
\| \| \| \| \| \| \| \|	This ensures these errors produce a non-zero exit and improves the context (providing the name of the input object and section being parsed). llvm-svn: 368378
*	API update for change to LLVM's lib/DebugInfo/DWARF	David Blaikie	2019-08-07	1	-2/+2
\| \| \| \|	llvm-svn: 368190
*	[ELF][ARM] Fix /DISCARD/ of section with .ARM.exidx section	Peter Smith	2019-08-06	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The combineEhSections runs, by design, before processSectionCommands so that input exception sections like .ARM.exidx and .eh_frame are not assigned to OutputSections. Unfortunately if /DISCARD/ removes InputSections that have associated .ARM.exidx sections without discarding the .ARM.exidx synthetic section then we will end up crashing when trying to sort the InputSections in ascending address order. We fix this by filtering out the sections that have been discarded prior to processing the InputSections in finalizeContents(). fixes pr42890 Differential Revision: https://reviews.llvm.org/D65759 llvm-svn: 368041
*	[ELF] Consistently prioritize non-* wildcards overs "*" in version scripts	Fangrui Song	2019-08-05	1	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We prioritize non-* wildcards overs VER_NDX_LOCAL/VER_NDX_GLOBAL "". This patch generalizes the rule to "" of other versions and thus fixes PR40176. I don't feel strongly about this GNU linkers' behavior but the generalization simplifies code. Delete `config->defaultSymbolVersion` which was used to special case VER_NDX_LOCAL/VER_NDX_GLOBAL "*". In `SymbolTable::scanVersionScript`, custom versions are handled the same way as VER_NDX_LOCAL/VER_NDX_GLOBAL. So merge `config->versionScript{Locals,Globals}` into `config->versionDefinitions`. Overall this seems to simplify the code. In `SymbolTable::assign{Exact,Wildcard}Versions`, `sym->verdefIndex == config->defaultSymbolVersion` is changed to `verdefIndex == UINT32_C(-1)`. This allows us to give duplicate assignment diagnostics for `{ global: foo; };` `V1 { global: foo; };` In test/linkerscript/version-script.s: vs_index of an undefined symbol changes from 0 to 1. This doesn't matter (arguably 1 is better because the binding is STB_GLOBAL) because vs_index of an undefined symbol is ignored. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D65716 llvm-svn: 367869
*	[ELF] Move R_*_IRELATIVE from .rel[a].plt to .rel[a].dyn unless ↵	Fangrui Song	2019-08-03	1	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	--pack-dyn-relocs=android[+relr] An R__IRELATIVE represents the address of a STT_GNU_IFUNC symbol (redirected at runtime) which is non-preemptable and is not associated with a canonical PLT (associated with a symbol with a section index of SHN_UNDEF but a non-zero st_value). .rel[a].plt [DT_JMPREL, DT_JMPREL+DT_JMPRELSZ) contains relocations that can be lazily resolved. R__IRELATIVE are always eagerly resolved, so conceptually they do not belong to .rela.plt. "iplt" is mostly a misnomer. glibc powerpc and powerpc64 do not resolve R__IRELATIVE if they are in .rela.plt. // a.o - synthesized PLT call stub has an R__IRELATIVE void ifunc(); int main() { ifunc(); } // b.o static void real() {} asm (".type ifunc, %gnu_indirect_function"); void ifunc() { return &real; } The lld-linked executable crashes. ld.bfd places R__IRELATIVE in .rela.dyn and the executable works. glibc i386, x86_64, and aarch64 have logic (glibc/sysdeps//dl-machine.h:elf_machine_lazy_rel) to eagerly resolve R__IRELATIVE in .rel[a].plt so the lld-linked executable works. Move R_*_IRELATIVE from .rel[a].plt to .rel[a].dyn to fix the crashes on glibc powerpc/powerpc64. This also helps simplifying ifunc implementation in FreeBSD rtld-elf powerpc64. If --pack-dyn-relocs=android[+relr] is specified, the Android packed dynamic relocation format is used for .rela.dyn. We cannot name in.relaIplt ".rela.dyn" because the output section will have mixed formats. This can be improved in the future. Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D65651 llvm-svn: 367745
*	[ELF] Fix variable names in comments after VariableName -> variableName change	Fangrui Song	2019-07-16	1	-19/+17
\| \| \| \| \| \|	Also fix some typos. llvm-svn: 366181
*	[Coding style change][lld] Rename variables for non-ELF ports	Rui Ueyama	2019-07-11	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \|	This patch does the same thing as r365595 to other subdirectories, which completes the naming style change for the entire lld directory. With this, the naming style conversion is complete for lld. Differential Revision: https://reviews.llvm.org/D64473 llvm-svn: 365730
*	[Coding style change] Rename variables so that they start with a lowercase ↵	Rui Ueyama	2019-07-10	1	-1673/+1673
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	letter This patch is mechanically generated by clang-llvm-rename tool that I wrote using Clang Refactoring Engine just for creating this patch. You can see the source code of the tool at https://reviews.llvm.org/D64123. There's no manual post-processing; you can generate the same patch by re-running the tool against lld's code base. Here is the main discussion thread to change the LLVM coding style: https://lists.llvm.org/pipermail/llvm-dev/2019-February/130083.html In the discussion thread, I proposed we use lld as a testbed for variable naming scheme change, and this patch does that. I chose to rename variables so that they are in camelCase, just because that is a minimal change to make variables to start with a lowercase letter. Note to downstream patch maintainers: if you are maintaining a downstream lld repo, just rebasing ahead of this commit would cause massive merge conflicts because this patch essentially changes every line in the lld subdirectory. But there's a remedy. clang-llvm-rename tool is a batch tool, so you can rename variables in your downstream repo with the tool. Given that, here is how to rebase your repo to a commit after the mass renaming: 1. rebase to the commit just before the mass variable renaming, 2. apply the tool to your downstream repo to mass-rename variables locally, and 3. rebase again to the head. Most changes made by the tool should be identical for a downstream repo and for the head, so at the step 3, almost all changes should be merged and disappear. I'd expect that there would be some lines that you need to merge by hand, but that shouldn't be too many. Differential Revision: https://reviews.llvm.org/D64121 llvm-svn: 365595
*	[ELF] Allow placing non-string SHF_MERGE sections with different alignments ↵	Fangrui Song	2019-07-04	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into the same MergeSyntheticSection The difference from D63432/r365015 is that this patch does not place SHF_STRINGS sections with different alignments into the same MergeSyntheticSection. Doing that would: (1) create unnecessary padding and thus waste space. Add a test tail-merge-string-align2.s to check no extra padding is created. (2) make some input sections unaligned when tail merge (-O2) is enabled. The alignment of MergeTailAlignment::Builder was out of sync in D63432. MOVAPS on such unaligned strings can raise SIGSEGV. This should fix PR42289: the Linux kernel has a use case that input files have .rodata.cst32 sections with different alignments. The expectation (and what ld.bfd and gold do) is that in the -r link, there is only one .rodata.cst32 (SHF_MERGE sections with different alignments can be combined), but lld currently creates one for each different alignment. The current merging strategy: 1) Group SHF_MERGE sections by (name, sh_flags, sh_entsize and sh_addralign). Merging is performed among a group, even if -O0 is specified. 2) Create one output section for each group. This is a special case in addInputSec(). This patch changes 1) to: 1) Group SHF_MERGE sections by (name, sh_flags, sh_entsize). Merging is performed among a group, even if -O0 is specified. We will thus create just one .rodata.cst32 . This also improves merging efficiency when sections with the same name but different alignments are combined. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D64200 llvm-svn: 365139
*	Revert D63432 "[ELF] Allow placing SHF_MERGE sections with different ↵	Fangrui Song	2019-07-03	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	alignments into the same MergeSyntheticSection" This reverts r365015. David Zarzycki reported this change broke stage2 and stage3 tests. The root cause is still not very clear, but I guess some SHF_MERGE sections with the same name have different alignments. They were not merged before but were merged after r365015. Something that assumes address uniqueness of such mergeable data caused the bug. llvm-svn: 365048
*	[ELF] Allow placing SHF_MERGE sections with different alignments into the ↵	Fangrui Song	2019-07-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	same MergeSyntheticSection This should fix PR42289: the Linux kernel has a use case that input files have .rodata.cst32 sections with different alignments. The expectation (and what ld.bfd and gold do) is that in the -r link, there is only one .rodata.cst32 (SHF_MERGE sections with different alignments can be combined), but lld currently creates one for each different alignment. The current merging strategy: 1) Group SHF_MERGE sections by (name, sh_flags, sh_entsize and sh_addralign). String merging is performed among a group, even if -O0 is specified. 2) Create one output section for each group. This is a special case in addInputSec(). This patch changes 1) to: 1) Group SHF_MERGE sections by (name, sh_flags, sh_entsize). String merging is performed among a group, even if -O0 is specified. We will thus create just one .rodata.cst32 . This also improves merging efficiency when sections with the same name but different alignments are combined. Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D63432 llvm-svn: 365015
*	[ELF] Do not produce DT_JMPREL and DT_PLTGOT if .rela.plt is empty.	Igor Kudrin	2019-06-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	If .rela.plt is mentioned in a linker script, it might be preserved even if it is empty. In that case, LLD created DT_JMPREL and DT_PLTGOT dynamic tags. When the tags exist, a dynamic loader writes values into reserved slots in .got.plt to support lazy symbol resolution. The problem is that, in fact, the linker has not reserved that space, and the writing may occur into the memory allocated for something else. Differential Revision: https://reviews.llvm.org/D63869 llvm-svn: 364639
*	[ELF][RISCV] Create dummy .sdata for __global_pointer$ if .sdata does not exist	Fangrui Song	2019-06-14	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If .sdata is absent, linker synthesized __global_pointer$ gets a section index of SHN_ABS. (ld.bfd has a similar issue: binutils PR24678) Scrt1.o may use `lla gp, __global_pointer$` to reference the symbol PC relatively. In -pie/-shared mode, lld complains if a PC relative relocation references an absolute symbol (SHN_ABS) but ld.bfd doesn't: ld.lld: error: relocation R_RISCV_PCREL_HI20 cannot refer to lute symbol: __global_pointer$ Let the reference of __global_pointer$ to force creation of .sdata to fix the problem. This is similar to _GLOBAL_OFFSET_TABLE_, which forces creation of .got or .got.plt . Also, change the visibility from STV_HIDDEN to STV_DEFAULT and don't define the symbol for -shared. This matches ld.bfd, though I don't understand why it uses STV_DEFAULT. Reviewed By: ruiu, jrtc27 Differential Revision: https://reviews.llvm.org/D63132 llvm-svn: 363351
*	ELF: Create synthetic sections for loadable partitions.	Peter Collingbourne	2019-06-07	1	-113/+298
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We create several types of synthetic sections for loadable partitions, including: - The dynamic symbol table. This allows code outside of the loadable partitions to find entry points with dlsym. - Creating a dynamic symbol table also requires the creation of several other synthetic sections for the partition, such as the dynamic table and hash table sections. - The partition's ELF header is represented as a synthetic section in the combined output file, and will be used by llvm-objcopy to extract partitions. Differential Revision: https://reviews.llvm.org/D62350 llvm-svn: 362819
*	[ELF][AArch64] Support for BTI and PAC	Peter Smith	2019-06-07	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Branch Target Identification (BTI) and Pointer Authentication (PAC) are architecture features introduced in v8.5a and 8.3a respectively. The new instructions have been added in the hint space so that binaries take advantage of support where it exists yet still run on older hardware. The impact of each feature is: BTI: For executable pages that have been guarded, all indirect branches must have a destination that is a BTI instruction of the appropriate type. For the static linker, this means that PLT entries must have a "BTI c" as the first instruction in the sequence. BTI is an all or nothing property for a link unit, any indirect branch not landing on a valid destination will cause a Branch Target Exception. PAC: The dynamic loader encodes with PACIA the address of the destination that the PLT entry will load from the .plt.got, placing the result in a subset of the top-bits that are not valid virtual addresses. The PLT entry may authenticate these top-bits using the AUTIA instruction before branching to the destination. Use of PAC in PLT sequences is a contract between the dynamic loader and the static linker, it is independent of whether the relocatable objects use PAC. BTI and PAC are independent features that can be combined. So we can have several combinations of PLT: - Standard with no BTI or PAC - BTI PLT with "BTI c" as first instruction. - PAC PLT with "AUTIA1716" before the indirect branch to X17. - BTIPAC PLT with "BTI c" as first instruction and "AUTIA1716" before the first indirect branch to X17. The use of BTI and PAC in relocatable object files are encoded by feature bits in the .note.gnu.property section in a similar way to Intel CET. There is one AArch64 specific program property GNU_PROPERTY_AARCH64_FEATURE_1_AND and two target feature bits defined: - GNU_PROPERTY_AARCH64_FEATURE_1_BTI -- All executable sections are compatible with BTI. - GNU_PROPERTY_AARCH64_FEATURE_1_PAC -- All executable sections have return address signing enabled. Due to the properties of FEATURE_1_AND the static linker can tell when all input relocatable objects have the BTI and PAC feature bits set. The static linker uses this to enable the appropriate PLT sequence. Neither -> standard PLT GNU_PROPERTY_AARCH64_FEATURE_1_BTI -> BTI PLT GNU_PROPERTY_AARCH64_FEATURE_1_PAC -> PAC PLT Both properties -> BTIPAC PLT In addition to the .note.gnu.properties there are two new command line options: --force-bti : Act as if all relocatable inputs had GNU_PROPERTY_AARCH64_FEATURE_1_BTI and warn for every relocatable object that does not. --pac-plt : Act as if all relocatable inputs had GNU_PROPERTY_AARCH64_FEATURE_1_PAC. As PAC is a contract between the loader and static linker no warning is given if it is not present in an input. Two processor specific dynamic tags are used to communicate that a non standard PLT sequence is being used. DTI_AARCH64_BTI_PLT and DTI_AARCH64_BTI_PAC. Differential Revision: https://reviews.llvm.org/D62609 llvm-svn: 362793
*	[PPC32] Improve the 32-bit PowerPC port	Fangrui Song	2019-06-06	1	-6/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Many -static/-no-pie/-shared/-pie applications linked against glibc or musl should work with this patch. This also helps FreeBSD PowerPC64 to migrate their lib32 (PR40888). * Fix default image base and max page size. * Support new-style Secure PLT (see below). Old-style BSS PLT is not implemented, so it is not suitable for FreeBSD rtld now because it doesn't support Secure PLT yet. * Support more initial relocation types: R_PPC_ADDR32, R_PPC_REL16, R_PPC_LOCAL24PC, R_PPC_PLTREL24, and R_PPC_GOT16. The addend of R_PPC_PLTREL24 is special: it decides the call stub PLT type but it should be ignored for the computation of target symbol VA. Support GNU ifunc * Support .glink used for lazy PLT resolution in glibc * Add a new thunk type: PPC32PltCallStub that is similar to PPC64PltCallStub. It is used by R_PPC_REL24 and R_PPC_PLTREL24. A PLT stub used in -fPIE/-fPIC usually loads an address relative to .got2+0x8000 (-fpie/-fpic code uses _GLOBAL_OFFSET_TABLE_ relative addresses). Two .got2 sections in two object files have different addresses, thus a PLT stub can't be shared by two object files. To handle this incompatibility, change the parameters of Thunk::isCompatibleWith to `const InputSection &, const Relocation &`. PowerPC psABI specified an old-style .plt (BSS PLT) that is both writable and executable. Linkers don't make separate RW- and RWE segments, which causes all initially writable memory (think .data) executable. This is a big security concern so a new PLT scheme (secure PLT) was developed to address the security issue. TLS will be implemented in D62940. glibc older than ~2012 requires .rela.dyn to include .rela.plt, it can not handle the DT_RELA+DT_RELASZ == DT_JMPREL case correctly. A hack (not included in this patch) in LinkerScript.cpp addOrphanSections() to work around the issue: if (Config->EMachine == EM_PPC) { // Older glibc assumes .rela.dyn includes .rela.plt Add(In.RelaDyn); if (In.RelaPlt->isLive() && !In.RelaPlt->Parent) In.RelaDyn->getParent()->addSection(In.RelaPlt); } Reviewed By: ruiu Differential Revision: https://reviews.llvm.org/D62464 llvm-svn: 362721
*	Read .note.gnu.property sections and emit a merged .note.gnu.property section.	Rui Ueyama	2019-06-05	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch also adds `--require-cet` option for the sake of testing. The actual feature for IBT-aware PLT is not included in this patch. This is a part of https://reviews.llvm.org/D59780. Submitting this first should make it easy to work with a related change (https://reviews.llvm.org/D62609). Differential Revision: https://reviews.llvm.org/D62853 llvm-svn: 362579
*	[ELF] Delete GotEntrySize and GotPltEntrySize	Fangrui Song	2019-05-31	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	GotEntrySize and GotPltEntrySize were added in D22288. Later, with the introduction of wordsize() (then Config->Wordsize), they become redundant, because there is no target that sets GotEntrySize or GotPltEntrySize to a number different from Config->Wordsize. Reviewed By: grimar, ruiu Differential Revision: https://reviews.llvm.org/D62727 llvm-svn: 362220
*	ELF: Add basic partition data structures and behaviours.	Peter Collingbourne	2019-05-29	1	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change causes us to read partition specifications from partition specification sections and split output sections into partitions according to their reachability from partition entry points. This is only the first step towards a full implementation of partitions. Later changes will add additional synthetic sections to each partition so that they can be loaded independently. Differential Revision: https://reviews.llvm.org/D60353 llvm-svn: 361925
*	[ELF] -z combreloc: sort dynamic relocations by ↵	Fangrui Song	2019-05-20	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(!is_relative,symbol_index,r_offset) We currently sort dynamic relocations by (!is_relative,symbol_index). Add r_offset as the third key. This makes `readelf -r` debugging easier (relocations to the same symbol are ordered by r_offset). Refactor the test combreloc.s (renamed from combrelocs.s) to check R_X86_64_RELATIVE, and delete --expand-relocs. The difference from the reverted D61477 is that we keep !is_relative as the first key. In local dynamic TLS model, DTPMOD (e.g. R_ARM_TLS_DTPMOD32 R_X86_64_DTPMOD and R_PPC{,64}_DTPMOD) may use 0 as the symbol index. Reviewed By: grimar Differential Revision: https://reviews.llvm.org/D62141 llvm-svn: 361164