summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen/TargetPassConfig.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Revert "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into ↵Clement Courbet2019-06-261-0/+13
| | | | | | | | | | | | | | opt pipeline." Breaks sanitizers: libFuzzer :: cxxstring.test libFuzzer :: memcmp.test libFuzzer :: recommended-dictionary.test libFuzzer :: strcmp.test libFuzzer :: value-profile-mem.test libFuzzer :: value-profile-strcmp.test llvm-svn: 364416
* [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.Clement Courbet2019-06-261-13/+0
| | | | | | | | | This allows later passes (in particular InstCombine) to optimize more cases. One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0. llvm-svn: 364412
* Rename ExpandISelPseudo->FinalizeISel, delay register reservationMatt Arsenault2019-06-191-6/+7
| | | | | | | | | | | This allows targets to make more decisions about reserved registers after isel. For example, now it should be certain there are calls or stack objects in the frame or not, which could have been introduced by legalization. Patch by Matthias Braun llvm-svn: 363757
* [SystemZ, RegAlloc] Favor 3-address instructions during instruction selection.Jonas Paulsson2019-06-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868
* [MergeICmps] Make the pass compatible with the new pass manager.Clement Courbet2019-05-231-1/+1
| | | | | | | | | | | | Reviewers: gchatelet, spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62287 llvm-svn: 361490
* [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with ↵Amara Emerson2019-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | constants only. Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't be degraded. This change also improves the IRTranslator so that in most places, but not all, it creates constants using the MIRBuilder directly instead of first creating a new destination vreg and then creating a constant. By doing this, the buildConstant() method can just return the vreg of an existing G_CONSTANT instead of having to create a COPY from it. I measured a 0.2% improvement in compile time and a 0.9% improvement in code size at -O0 ARM64. Compile time: Program base cse diff test-suite...ark/tramp3d-v4/tramp3d-v4.test 9.04 9.12 0.8% test-suite...Mark/mafft/pairlocalalign.test 2.68 2.66 -0.7% test-suite...-typeset/consumer-typeset.test 5.53 5.51 -0.4% test-suite :: CTMark/lencod/lencod.test 5.30 5.28 -0.3% test-suite :: CTMark/Bullet/bullet.test 25.82 25.76 -0.2% test-suite...:: CTMark/ClamAV/clamscan.test 6.92 6.90 -0.2% test-suite...TMark/7zip/7zip-benchmark.test 34.24 34.17 -0.2% test-suite :: CTMark/SPASS/SPASS.test 6.25 6.24 -0.1% test-suite...:: CTMark/sqlite3/sqlite3.test 1.66 1.66 -0.1% test-suite :: CTMark/kimwitu++/kc.test 13.61 13.60 -0.0% Geomean difference -0.2% Code size: Program base cse diff test-suite...-typeset/consumer-typeset.test 1315632 1266480 -3.7% test-suite...:: CTMark/ClamAV/clamscan.test 1313892 1297508 -1.2% test-suite :: CTMark/lencod/lencod.test 1439504 1423112 -1.1% test-suite...TMark/7zip/7zip-benchmark.test 2936980 2904172 -1.1% test-suite :: CTMark/Bullet/bullet.test 3478276 3445460 -0.9% test-suite...ark/tramp3d-v4/tramp3d-v4.test 8082868 8033492 -0.6% test-suite :: CTMark/kimwitu++/kc.test 3870380 3853972 -0.4% test-suite :: CTMark/SPASS/SPASS.test 1434904 1434896 -0.0% test-suite...Mark/mafft/pairlocalalign.test 764528 764528 0.0% test-suite...:: CTMark/sqlite3/sqlite3.test 782092 782092 0.0% Geomean difference -0.9% Differential Revision: https://reviews.llvm.org/D60580 llvm-svn: 358369
* [GlobalISel] Introduce a CSEConfigBase class to allow targets to define ↵Amara Emerson2019-04-151-0/+5
| | | | | | | | | | | | | | their own CSE configs. Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE configs that can be passed between TargetPassConfig and the targets' custom pass configs. This CSEConfigBase allows targets to create custom CSE configs which is then used by the GISel passes for the CSEMIRBuilder. This support will be used in a follow up commit to allow constant-only CSE for -O0 compiles in D60580. llvm-svn: 358368
* CodeGen: Refactor regallocator command line and target selectionMatt Arsenault2019-03-191-27/+34
| | | | | | | | | | This will allow targets more flexibility to replace the register allocator core passes. In a future commit, AMDGPU will run the core register assignment passes twice, and will also want to disallow using the standard -regalloc option. llvm-svn: 356506
* Restore ability for C++ API users to Enable IPRA.Daniel Sanders2019-02-221-1/+1
| | | | | | | | | | | | | | | | | | | Summary: Prior to r310876 one of our out-of-tree targets was enabling IPRA by modifying the TargetOptions::EnableIPRA. This no longer works on current trunk since the useIPRA() hook overrides any values that are set in advance. This patch adjusts the behaviour of the hook so that API users and useIPRA() can both enable it but useIPRA() cannot disable it if the API user already enabled it. Reviewers: arsenm Reviewed By: arsenm Subscribers: wdng, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D38043 llvm-svn: 354692
* CodeGen: Make RegAllocRegistry a template classMatt Arsenault2019-02-221-4/+0
| | | | | | | | | | | Will allow re-using the machinery for independent sets of register allocators. This will allow AMDGPU to use separate command line options for the allocator to use for SGPRs separate from VGPRs. llvm-svn: 354687
* [GISel]: Change how CSE is enabled by default for each passAditya Nandakumar2019-01-241-0/+4
| | | | | | | | | | | | | | | https://reviews.llvm.org/D57178 Now add a hook in TargetPassConfig to query if CSE needs to be enabled. By default this hook returns false only for O0 opt level but this can be overridden by the target. As a consequence of the default of enabled for non O0, a few tests needed to be updated to not use CSE (by passing in -O0) to the run line. reviewed by: arsenm llvm-svn: 352126
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [GlobalISel] Fix choice of instruction selector for AArch64 at -O0 with ↵Petr Pavlu2019-01-081-12/+23
| | | | | | | | | | | | | | | | | -global-isel=0 Commit rL347861 introduced an unintentional change in the behaviour when compiling for AArch64 at -O0 with -global-isel=0. Previously, explicitly disabling GlobalISel resulted in using FastISel but an updated condition in the commit changed it to using SelectionDAG. The patch fixes this condition and slightly better organizes the code that chooses the instruction selector. Fixes PR40131. Differential Revision: https://reviews.llvm.org/D56266 llvm-svn: 350626
* MIR: Add method to stop after specific runs of passesMatt Arsenault2018-12-041-8/+38
| | | | | | | | | | Currently if you use -{start,stop}-{before,after}, it picks the first instance with the matching pass name. If you run the same pass multiple times, there's no way to distinguish them. Allow specifying a run index wih ,N to specify which you mean. llvm-svn: 348285
* [GlobalISel] Make EnableGlobalISel always set when GISel is enabledPetr Pavlu2018-11-291-13/+17
| | | | | | | | | | | | | | | | | | | Change meaning of TargetOptions::EnableGlobalISel. The flag was previously set only when a target switched on GlobalISel but it is now always set when the GlobalISel pipeline is enabled. This makes the flag consistent with TargetOptions::EnableFastISel and allows its use in other parts of the compiler to determine when GlobalISel is enabled. The EnableGlobalISel flag had previouly only one use in TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value to determine if GlobalISel was enabled by a target and returned false in such a case. To preserve the current behaviour, a new flag TargetOptions::GlobalISelAbort is introduced to separately record the abort behaviour. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347861
* Type safe version of MachinePassRegistrySerge Guelton2018-11-091-1/+2
| | | | | | | | | | | Previous version used type erasure through a `void* (*)()` pointer, which triggered gcc warning and implied a lot of reinterpret_cast. This version should make it harder to hit ourselves in the foot. Differential revision: https://reviews.llvm.org/D54203 llvm-svn: 346522
* LLVMTargetMachine/TargetPassConfig: Simplify handling of start/stop options; NFCMatthias Braun2018-11-021-2/+7
| | | | | | | | | | - Make some TargetPassConfig methods that just check whether options have been set static. - Shuffle code in LLVMTargetMachine around so addPassesToGenerateCode only deals with TargetPassConfig now (but not with MCContext or the creation of MachineModuleInfo) llvm-svn: 345918
* [llc] Error out when -print-machineinstrs is used with an unknown passFrancis Visoiu Mistrih2018-10-301-9/+11
| | | | | | | | We used to assert instead of reporting an error. PR39494 llvm-svn: 345589
* Correct implementation of -verify-machineinstrs such that it's still ↵Daniel Sanders2018-10-031-5/+5
| | | | | | | | | | | overridable for EXPENSIVE_CHECKS -verify-machineinstrs was implemented as a simple bool. As a result, the 'VerifyMachineCode == cl::BOU_UNSET' used by EXPENSIVE_CHECKS to make it on by default but possible to disable didn't work as intended. Changed -verify-machineinstrs to a boolOrDefault to correct this. llvm-svn: 343696
* [globalisel][verifier] Run the MachineVerifier from IRTranslator onwardsDaniel Sanders2018-10-021-0/+2
| | | | | | | | | | | | | | -verify-machineinstrs inserts the MachineVerifier after every MachineInstr-based pass. However, GlobalISel creates MachineInstr-based passes earlier than DAGISel and the corresponding verifiers are not being added. This patch fixes that. If GlobalISel triggers the fallback path then the MIR can be left in a bad state that is going to be cleared by ResetMachineFunctions. In this situation verifying between GlobalISel passes will prevent the fallback path from recovering from this. As a result, we bail out of verifying a function if the FailedISel attribute is present. llvm-svn: 343613
* Remove trailing spaceFangrui Song2018-07-301-1/+1
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* [MachineOutliner] Add support for target-default outlining.Jessica Paquette2018-06-301-4/+9
| | | | | | | | | | | | | | | | | This adds functionality to the outliner that allows targets to specify certain functions that should be outlined from by default. If a target supports default outlining, then it specifies that in its TargetOptions. In the case that it does, and the user hasn't specified that they *never* want to outline, the outliner will be added to the pass pipeline and will run on those default functions. This is a preliminary patch for turning the outliner on by default under -Oz for AArch64. https://reviews.llvm.org/D48776 llvm-svn: 336040
* [MachineOutliner] Add always and never options to -enable-machine-outlinerJessica Paquette2018-06-291-4/+11
| | | | | | | | | | | | | | | | | This is a recommit of r335887, which was erroneously committed earlier. To enable the MachineOutliner by default on AArch64, we need to be able to disable the MachineOutliner and also provide an option to "always" enable the outliner. This adds that capability. It allows the user to still use the old -enable-machine-outliner option, which defaults to "always". This is building up to allowing the user to specify "always" versus the target default outlining behaviour. https://reviews.llvm.org/D48682 llvm-svn: 335986
* [MachineOutliner] Never add the outliner in -O0Jessica Paquette2018-06-281-1/+1
| | | | | | | | | | | | This is a recommit of r335879. We shouldn't add the outliner when compiling at -O0 even if -enable-machine-outliner is passed in. This makes sure that we don't add it in this case. This also removes -O0 from the outliner DWARF test. llvm-svn: 335930
* [MachineOutliner] Define MachineOutliner support in TargetOptionsJessica Paquette2018-06-281-1/+2
| | | | | | | | | | | | | | | Targets should be able to define whether or not they support the outliner without the outliner being added to the pass pipeline. Before this, the outliner pass would be added, and ask the target whether or not it supports the outliner. After this, it's possible to query the target in TargetPassConfig, before the outliner pass is created. This ensures that passing -enable-machine-outliner will not modify the pass pipeline of any target that does not support it. https://reviews.llvm.org/D48683 llvm-svn: 335887
* Revert "[MachineOutliner] Add always and never options to ↵Jessica Paquette2018-06-281-12/+4
| | | | | | | | | -enable-machine-outliner" I accidentally committed this instead of D48683 because I haven't had coffee yet. llvm-svn: 335883
* Revert "[MachineOutliner] Never add the outliner in -O0"Jessica Paquette2018-06-281-2/+1
| | | | | | | | | | | This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091. It relies on r335872 since that introduces the machine outliner flags test. I meant to commit D48683 in that commit, but got mixed up and committed D48682 instead. So, I'm reverting this and r335872, since D48682 hasn't made it through review yet. llvm-svn: 335882
* [MachineOutliner] Never add the outliner in -O0Jessica Paquette2018-06-281-1/+2
| | | | | | | | | | | We shouldn't add the outliner when compiling at -O0 even if -enable-machine-outliner is passed in. This makes sure that we don't add it in this case. This also updates machine-outliner-flags to reflect the change and improves the comment describing what that test does. llvm-svn: 335879
* [MachineOutliner] Add always and never options to -enable-machine-outlinerJessica Paquette2018-06-281-4/+12
| | | | | | | | | | | | | To enable the MachineOutliner by default on AArch64, we need to be able to disable the MachineOutliner and also provide an option to "always" enable the outliner. This adds that capability. It allows the user to still use the old -enable-machine-outliner option, which defaults to "always". This is building up to allowing the user to specify "always" versus the target-default outlining behaviour. llvm-svn: 335872
* [WebAssembly] Add Wasm exception handling prepare passHeejin Ahn2018-05-311-1/+6
| | | | | | | | | | | | | | | | Summary: This adds a pass that transforms a program to be prepared for Wasm exception handling. This is using Windows EH instructions and based on the previous Wasm EH proposal. (https://github.com/WebAssembly/exception-handling/blob/master/proposals/Exceptions.md) Reviewers: dschuff, majnemer Subscribers: jfb, mgorny, sbc100, jgravelle-google, JDevlieghere, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D43746 llvm-svn: 333696
* [MachineOutliner] NFC: Move EnableLinkOnceODROutlining into MachineOutliner.cppJessica Paquette2018-04-191-6/+1
| | | | | | | | | This moves the EnableLinkOnceODROutlining flag from TargetPassConfig.cpp into MachineOutliner.cpp. It also removes OutlineFromLinkOnceODRs from the MachineOutliner constructor. This is now handled by the moved command-line flag. llvm-svn: 330373
* Transforms: Introduce Transforms/Utils.h rather than spreading the ↵David Blaikie2018-03-281-0/+1
| | | | | | | | | declarations amongst Scalar.h and IPO.h Fixes layering - Transforms/Utils shouldn't depend on including a Scalar or IPO header, because Scalar and IPO depend on Utils. llvm-svn: 328717
* [CodeGen] Add a new pass for PostRA sinkJun Bum Lim2018-03-221-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This pass sinks COPY instructions into a successor block, if the COPY is not used in the current block and the COPY is live-in to a single successor (i.e., doesn't require the COPY to be duplicated). This avoids executing the the copy on paths where their results aren't needed. This also exposes additional opportunites for dead copy elimination and shrink wrapping. These copies were either not handled by or are inserted after the MachineSink pass. As an example of the former case, the MachineSink pass cannot sink COPY instructions with allocatable source registers; for AArch64 these type of copy instructions are frequently used to move function parameters (PhyReg) into virtual registers in the entry block.. For the machine IR below, this pass will sink %w19 in the entry into its successor (%bb.1) because %w19 is only live-in in %bb.1. ``` %bb.0: %wzr = SUBSWri %w1, 1 %w19 = COPY %w0 Bcc 11, %bb.2 %bb.1: Live Ins: %w19 BL @fun %w0 = ADDWrr %w0, %w19 RET %w0 %bb.2: %w0 = COPY %wzr RET %w0 ``` As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be able to see %bb.0 as a candidate. With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted in spec2000/2006/2017 on AArch64. Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz Reviewed By: sebpop Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41463 llvm-svn: 328237
* [MergeICmps] Re-land 324317 "Enable the MergeICmps Pass by default."Clement Courbet2018-03-191-5/+4
| | | | | | Now that PR36557 is fixed. llvm-svn: 327840
* [MergeICmps] Revert 324317 "Enable the MergeICmps Pass by default."Clement Courbet2018-03-021-4/+5
| | | | | | While working on PR36557. llvm-svn: 326575
* [TLS] use emulated TLS if the target supports only this modeChih-Hung Hsieh2018-02-281-1/+1
| | | | | | | | | | | | | | | Emulated TLS is enabled by llc flag -emulated-tls, which is passed by clang driver. When llc is called explicitly or from other drivers like LTO, missing -emulated-tls flag would generate wrong TLS code for targets that supports only this mode. Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether emulated TLS code should be generated. Unit tests are modified to run with and without the -emulated-tls flag. Differential Revision: https://reviews.llvm.org/D42999 llvm-svn: 326341
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2018-02-271-0/+4
| | | | | | | | Re-enable commit r323991 now that r325931 has been committed to make MachineOperand::isRenamable() check more conservative w.r.t. code changes and opt-in on a per-target basis. llvm-svn: 326208
* [WebAssembly] Add exception handling option and featureHeejin Ahn2018-02-241-0/+3
| | | | | | | | | | | | | | Summary: Add a llc command line option and WebAssembly architecture feature for exception handling. Reviewers: dschuff Subscribers: jfb, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D43683 llvm-svn: 326004
* Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Quentin Colombet2018-02-171-4/+0
| | | | | | | | | | | | | | | | | This reverts commit r323991. This commit breaks target that don't model all the register constraints in TableGen. So far the workaround was to set the hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the cases. For instance, when mutating an instruction (like in the lowering of COPYs) the isRenamable flag is not properly updated. The same problem will happen when attaching machine operand from one instruction to another. Geoff Berry is working on a fix in https://reviews.llvm.org/D43042. llvm-svn: 325421
* [MergeICmps] Re-commit rL324317 "Enable the MergeICmps Pass by default."Clement Courbet2018-02-071-5/+4
| | | | | | | | | | | | | | | | | | With fixes from rL324341. Original commit message: [MergeICmps] Enable the MergeICmps Pass by default. Summary: Now that PR33325 is fixed, this should always improve the generated code. Reviewers: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42793 llvm-svn: 324465
* Revert "[MergeICmps] Enable the MergeICmps Pass by default."Clement Courbet2018-02-061-4/+5
| | | | | | | | Breaks clang-ppc64be-linux-multistage buildbot. This reverts commit 515bab711f308c2e8299c49dd8c84ea6a2e0b60e. llvm-svn: 324319
* [MergeICmps] Enable the MergeICmps Pass by default.Clement Courbet2018-02-061-5/+4
| | | | | | | | | | | | Summary: Now that PR33325 is fixed, this should always improve the generated code. Reviewers: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42793 llvm-svn: 324317
* [MachineCopyPropagation] Extend pass to do COPY source forwardingGeoff Berry2018-02-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | Summary: This change extends MachineCopyPropagation to do COPY source forwarding and adds an additional run of the pass to the default pass pipeline just after register allocation. This version of this patch uses the newly added MachineOperand::isRenamable bit to avoid forwarding registers is such a way as to violate constraints that aren't captured in the Machine IR (e.g. ABI or ISA constraints). This change is a continuation of the work started in D30751. Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits Differential Revision: https://reviews.llvm.org/D41835 llvm-svn: 323991
* [GlobalISel] Don't fall back to FastISel.Amara Emerson2018-01-241-0/+2
| | | | | | | Apparently checking the pass structure isn't enough to ensure that we don't fall back to FastISel, as it's set up as part of the SelectionDAGISel. llvm-svn: 323369
* Introduce the "retpoline" x86 mitigation technique for variant #2 of the ↵Chandler Carruth2018-01-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155
* Split MachineLICM into EarlyMachineLICM and MachineLICM; NFCMatthias Braun2018-01-191-10/+4
| | | | | | | | | | | | | This avoids playing games with pseudo pass IDs and avoids using an unreliable MRI::isSSA() check to determine whether register allocation has happened. Note that this renames: - MachineLICMID -> EarlyMachineLICM - PostRAMachineLICMID -> MachineLICMID to be consistent with the EarlyTailDuplicate/TailDuplicate naming. llvm-svn: 322927
* Split TailDuplicatePass into pre- and post-RA variant; NFCMatthias Braun2018-01-191-3/+1
| | | | | | | | Split TailDuplicatePass into EarlyTailDuplicate and TailDuplicate. This avoids playing games with fake pass IDs and using MRI::isSSA() to determine pre-/post-RA state. llvm-svn: 322926
* Fix the failure caused by r322773Volkan Keles2018-01-181-8/+3
| | | | | | Do not run GlobalISel if `-fast-isel=0 -global-isel=false`. llvm-svn: 322800
* Add a TargetOption to enable/disable GlobalISelVolkan Keles2018-01-171-15/+14
| | | | | | | | | | | | | | | | | | | | | Summary: This patch adds a new target option in order to control GlobalISel. This will allow the users to enable/disable GlobalISel prior to the backend by calling `TargetMachine::setGlobalISel(bool Enable)`. No test case as there is already a test to check GlobalISel command line options. See: CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll. Reviewers: qcolombet, aemerson, ab, dsanders Reviewed By: qcolombet Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42137 llvm-svn: 322773
* [AArch64][GlobalISel] Enable GlobalISel at -O0 by defaultAmara Emerson2018-01-021-2/+10
| | | | | | | | | | | Tests updated to explicitly use fast-isel at -O0 instead of implicitly. This change also allows an explicit -fast-isel option to override an implicitly enabled global-isel. Otherwise -fast-isel would have no effect at -O0. Differential Revision: https://reviews.llvm.org/D41362 llvm-svn: 321655
OpenPOWER on IntegriCloud