summaryrefslogtreecommitdiffstats
path: root/clang/lib/Basic/Targets/X86.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Support -march=tigerlakePengfei Wang2019-08-121-0/+10
| | | | | | | | | | | | Support -march=tigerlake for x86. Compare with Icelake Client, It include 4 more new features ,they are avx512vp2intersect, movdiri, movdir64b, shstk. Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D65840 llvm-svn: 368543
* Test commit. NFC.Sjoerd Meijer2019-07-241-1/+1
| | | | | | | Removed 2 trailing whitespaces in 2 files that used to be in different repos to test my new github monorepo workflow. llvm-svn: 366904
* Support __seg_fs and __seg_gs on x86JF Bastien2019-07-141-0/+5
| | | | | | | | | | | | | | | | | | | | | Summary: GCC supports named address spaces macros: https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html clang does as well with address spaces: https://clang.llvm.org/docs/LanguageExtensions.html#memory-references-to-specified-segments Add the __seg_fs and __seg_gs macros for compatibility with GCC. <rdar://problem/52944935> Subscribers: jkorous, dexonsmith, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D64676 llvm-svn: 366028
* [X86] Attempt to make the Intel core CPU inheritance a little more readable ↵Craig Topper2019-06-101-24/+27
| | | | | | | | | | | | and maintainable The recently added cooperlake CPU has made our already ugly switch statement even worse. There's a CPU exclusion list around the bf16 feature in the cooper lake block. I worry that we'll have to keep adding new CPUs to that until bf16 intercepts a client space CPU. We have several other exclusion lists in other parts of the switch due to skylakeserver, cascadelake, and cooperlake not having sgx. Another for cannonlake not having clwb but having all other features from skx. This removes all these special ifs at the cost of some duplication of features and a goto. I've copied all of the skx features into either cannonlake or icelakeclient(for clwb). And pulled sklyakeserver, cascadelake, and cooperlake out of the main inheritance chain into their own chain. At the end of skylakeserver we merge back into the main chain at skylakeclient but below sgx. I think this is at least easier to follow. Differential Revision: https://reviews.llvm.org/D63018 llvm-svn: 362965
* [X86] -march=cooperlake (clang)Pengfei Wang2019-06-071-3/+11
| | | | | | | | | | Support intel -march=cooperlake in clang Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62835 llvm-svn: 362781
* [X86] Add ENQCMD instructionsPengfei Wang2019-06-061-0/+6
| | | | | | | | | | | | For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Patch by Tianqing Wang (tianqing) Differential Revision: https://reviews.llvm.org/D62282 llvm-svn: 362685
* [X86] Add VP2INTERSECT instructionsPengfei Wang2019-05-311-1/+7
| | | | | | | | | | Support intel AVX512 VP2INTERSECT instructions in clang Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D62367 llvm-svn: 362196
* [X86] Split multi-line chained assignments into single lines to avoid making ↵Craig Topper2019-05-231-26/+24
| | | | | | | | clang-format create triangle shaped indentation. Simplify one if statement to remove a bunch of string matches. NFCI We had an if statement that checked over every avx512* feature to see if it should enabled avx512f. Since they are all prefixed with avx512 just check for that instead. llvm-svn: 361557
* [X86] Stop implicitly enabling avx512vl when avx512bf16 is enabled.Craig Topper2019-05-161-3/+1
| | | | | | | | Previously we were doing this so that the 256 bit selectw builtin could be used in the implementation of the 512->256 bit conversion intrinsic. After this commit we now use a masked convert builtin that will emit the intrinsic call and the 256-bit select from custom code in CGBuiltin. Then the header only needs to call that one intrinsic. llvm-svn: 360924
* Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper ↵Luo, Yuanke2019-05-061-0/+13
| | | | | | | | | | | | | | | | | | | | | | | Lake Summary: 1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake; 2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision. For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference Patch by LiuTianle Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon Reviewed By: craig.topper Subscribers: mgorny, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D60552 llvm-svn: 360018
* Range-style std::find{,_if} -> llvm::find{,_if}. NFCFangrui Song2019-03-311-6/+3
| | | | llvm-svn: 357359
* [X86] Correct the value of MaxAtomicInlineWidth for pre-586 cpusCraig Topper2019-03-211-2/+10
| | | | | | | | | | Use the new cx8 feature flag that was added to the backend to represent support for cmpxchg8b. Use this flag to set the MaxAtomicInlineWidth. This also assumes all the cmpxchg instructions are enabled for CK_Generic which is what cc1 defaults to when nothing is specified. Differential Revision: https://reviews.llvm.org/D59566 llvm-svn: 356709
* [X86] Use the CPUKind enum from PROC_ALIAS to directly get the CPUKind in ↵Craig Topper2019-03-211-3/+2
| | | | | | | | fillValidCPUList. We were using getCPUKind which translates the string to the enum also using PROC_ALIAS. This just cuts out the string compares. llvm-svn: 356686
* [X86] Remove getCPUKindCanonicalName which is unused.Craig Topper2019-03-201-12/+0
| | | | | | Differential Revision: https://reviews.llvm.org/D59578 llvm-svn: 356580
* [X86] Separate PentiumPro and i686. They aren't aliases in the backend.Craig Topper2019-03-201-0/+2
| | | | | | | | | | PentiumPro has HasNOPL set in the backend. i686 does not. Despite having a function that looks like it canonicalizes alias names. It doesn't seem to be called. So I don't think this is a functional change. But its good to be consistent between the backend and frontend. llvm-svn: 356537
* [X86] Only define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_16 in 64-bit mode.Craig Topper2019-03-141-1/+1
| | | | | | | | | | | | | | | | | | | Summary: This define should correspond to CMPXCHG16B being available which requires 64-bit mode. I checked and gcc also seems to only define this in 64-bit mode. Reviewers: RKSimon, spatel, efriedma, jyknight, jfb Reviewed By: jfb Subscribers: jfb, cfe-commits, llvm-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D59287 llvm-svn: 356118
* [X86] Remove 'cx16' from 'prescott' and 'yonah' as they are 32-bit only CPUs ↵Craig Topper2019-03-131-2/+3
| | | | | | and cmpxchg16b requires 64-bit mode. llvm-svn: 356008
* [X86] AMD znver2 enablementGanesh Gopalasubramanian2019-02-261-0/+8
| | | | | | | | | | | | | | | | | This patch enables the following 1) AMD family 17h "znver2" tune flag (-march, -mcpu). 2) ISAs that are enabled for "znver2" architecture. 3) For the time being, it uses the znver1 scheduler model. 4) Tests are updated. 5) This patch is the clang counterpart to D58343 Reviewers: craig.topper Tags: #clang Differential Revision: https://reviews.llvm.org/D58344 llvm-svn: 354899
* [X86] Add clang support for X86 flag output parameters.Nirav Dave2019-02-141-0/+52
| | | | | | | | | | | | | | Summary: Add frontend support and expected flags for X86 inline assembly flag parameters. Reviewers: craig.topper, rnk, echristo Subscribers: eraman, nickdesaulniers, void, llvm-commits Differential Revision: https://reviews.llvm.org/D57394 llvm-svn: 354053
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* Revert "Clang side support for @cc assembly operands."Nirav Dave2019-01-181-52/+0
| | | | llvm-svn: 351561
* Clang side support for @cc assembly operands.Nirav Dave2019-01-181-0/+52
| | | | llvm-svn: 351559
* [X86] Add -march=cascadelake support in clang.Craig Topper2018-11-271-2/+10
| | | | | | | | | | This is skylake-avx512 with the addition of avx512vnni ISA. Patch by Jianping Chen Differential Revision: https://reviews.llvm.org/D54792 llvm-svn: 347682
* Add LLVM_FALLTHROUGH annotation after switchReid Kleckner2018-11-011-0/+1
| | | | | | | | | | | This silences a -Wimplicit-fallthrough warning from clang. GCC does not appear to warn when the case body ends in a switch. This is a somewhat surprising but intended fallthrough that I pulled out from my mechanical patch. The code intends to handle 'Yi' and related constraints as the 'x' constraint. llvm-svn: 345873
* [X86] Remove 'rtm' feature from KNL.Craig Topper2018-10-231-1/+0
| | | | | | | | I'm unsure if KNL has this feature, but the backend never thought it did, only clang did. The predefined-arch-macros test lost the check for __RTM__ on KNL when it was removed Skylake CPUs in r344117. I think we want to drop it from KNL for consistency with Skylake anyway regardless of how we got here. llvm-svn: 344978
* [X86] Remove FeatureRTM from Skylake processor listCraig Topper2018-10-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: There are a LOT of Skylakes and later without TSX-NI. Examples: - SKL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3-20-GHz- - KBL: https://ark.intel.com/products/97540/Intel-Core-i7-7560U-Processor-4M-Cache-up-to-3-80-GHz- - KBL-R: https://ark.intel.com/products/149091/Intel-Core-i7-8565U-Processor-8M-Cache-up-to-4-60-GHz- - CNL: https://ark.intel.com/products/136863/Intel-Core-i3-8121U-Processor-4M-Cache-up-to-3_20-GHz This feature seems to be present only on high-end desktop and server chips (I can't find any SKX without). This commit leaves it disabled for all processors, but can be re-enabled for specific builds with -mrtm. Matches https://reviews.llvm.org/D53041 Patch by Thiago Macieira Reviewers: erichkeane, craig.topper Reviewed By: craig.topper Subscribers: lebedev.ri, cfe-commits Differential Revision: https://reviews.llvm.org/D53042 llvm-svn: 344117
* Introduce code_model macrosAli Tamur2018-10-081-0/+5
| | | | | | | | | | | | | | | | | | | Summary: gcc defines macros such as __code_model_small_ based on the user passed command line flag -mcmodel. clang accepts a flag with the same name and similar effects, but does not generate any macro that the user can use. This cl narrows the gap between gcc and clang behaviour. However, achieving full compatibility with gcc is not trivial: The set of valid values for mcmodel in gcc and clang are not equal. Also, gcc defines different macros for different architectures. In this cl, we only tackle an easy part of the problem and define the macro only for x64 architecture. When the user does not specify a mcmodel, the macro for small code model is produced, as is the case with gcc. Reviewers: compnerd, MaskRay Reviewed By: MaskRay Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D52920 llvm-svn: 344000
* [X86] Add the movbe instruction intrinsics from icc.Craig Topper2018-09-281-0/+3
| | | | | | | | | | These intrinsics exist in icc. They can be found on the Intel Intrinsics Guide website. All the backend support is in place to pattern match a load+bswap or a bswap+store pattern to the MOVBE instructions. So we just need to get the frontend to emit the correct IR. The pointer arguments in icc are declared as void so I had to jump through a packed struct to forcing a specific alignment on the load/store. Same trick we use in the unaligned vector load/store intrinsics Differential Revision: https://reviews.llvm.org/D52586 llvm-svn: 343343
* Move AESNI generation to Skylake and GoldmontErich Keane2018-09-101-2/+2
| | | | | | | | | | | | | | | | | | | The instruction set first appeared with Westmere, but not all processors in that and the next few generations have the instructions. According to Wikipedia[1], the first generation in which all SKUs have AES instructions are Skylake and Goldmont. I can't find any Skylake, Kabylake, Kabylake-R or Cannon Lake currently listed at https://ark.intel.com that says "Intel® AES New Instructions" "No". This matches GCC commit https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01940.html [1] https://en.wikipedia.org/wiki/AES_instruction_set Patch By: thiagomacieira Differential Revision: https://reviews.llvm.org/D51510 llvm-svn: 341862
* [x86/retpoline] Split the LLVM concept of retpolines into separateChandler Carruth2018-08-231-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | subtarget features for indirect calls and indirect branches. This is in preparation for enabling *only* the call retpolines when using speculative load hardening. I've continued to use subtarget features for now as they continue to seem the best fit given the lack of other retpoline like constructs so far. The LLVM side is pretty simple. I'd like to eventually get rid of the old feature, but not sure what backwards compatibility issues that will cause. This does remove the "implies" from requesting an external thunk. This always seemed somewhat questionable and is now clearly not desirable -- you specify a thunk the same way no matter which set of things are getting retpolines. I really want to keep this nicely isolated from end users and just an LLVM implementation detail, so I've moved the `-mretpoline` flag in Clang to no longer rely on a specific subtarget feature by that name and instead to be directly handled. In some ways this is simpler, but in order to preserve existing behavior I've had to add some fallback code so that users who relied on merely passing -mretpoline-external-thunk continue to get the same behavior. We should eventually remove this I suspect (we have never tested that it works!) but I've not done that in this patch. Differential Revision: https://reviews.llvm.org/D51150 llvm-svn: 340515
* Remove trailing spaceFangrui Song2018-07-301-1/+1
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338291
* Implement cpu_dispatch/cpu_specific MultiversioningErich Keane2018-07-201-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As documented here: https://software.intel.com/en-us/node/682969 and https://software.intel.com/en-us/node/523346. cpu_dispatch multiversioning is an ICC feature that provides for function multiversioning. This feature is implemented with two attributes: First, cpu_specific, which specifies the individual function versions. Second, cpu_dispatch, which specifies the location of the resolver function and the list of resolvable functions. This is valuable since it provides a mechanism where the resolver's TU can be specified in one location, and the individual implementions each in their own translation units. The goal of this patch is to be source-compatible with ICC, so this implementation diverges from the ICC implementation in a few ways: 1- Linux x86/64 only: This implementation uses ifuncs in order to properly dispatch functions. This is is a valuable performance benefit over the ICC implementation. A future patch will be provided to enable this feature on Windows, but it will obviously more closely fit ICC's implementation. 2- CPU Identification functions: ICC uses a set of custom functions to identify the feature list of the host processor. This patch uses the cpu_supports functionality in order to better align with 'target' multiversioning. 1- cpu_dispatch function def/decl: ICC's cpu_dispatch requires that the function marked cpu_dispatch be an empty definition. This patch supports that as well, however declarations are also permitted, since the linker will solve the issue of multiple emissions. Differential Revision: https://reviews.llvm.org/D47474 llvm-svn: 337552
* [x86] invpcid intrinsicGabor Buella2018-05-251-0/+7
| | | | | | | | | | | | An intrinsic for an old instruction, as described in the Intel SDM. Reviewers: craig.topper, rnk Reviewed By: craig.topper, rnk Differential Revision: https://reviews.llvm.org/D47142 llvm-svn: 333256
* This patch aims to match the changes introducedAlexander Ivchenko2018-05-181-25/+0
| | | | | | | | | | | | | | | | | in gcc by https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The -mibt feature flag is being removed, and the -fcf-protection option now also defines a CET macro and causes errors when used on non-X86 targets, while X86 targets no longer check for -mibt and -mshstk to determine if -fcf-protection is supported. -mshstk is now used only to determine availability of shadow stack intrinsics. Comes with an LLVM patch (D46882). Patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D46881 llvm-svn: 332704
* [X86] ptwrite intrinsicGabor Buella2018-05-101-0/+7
| | | | | | | | | | Reviewers: craig.topper, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D46540 llvm-svn: 331962
* [x86] Introduce the pconfig intrinsicGabor Buella2018-05-081-0/+7
| | | | | | | | | | Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D46431 llvm-svn: 331740
* [X86] directstore and movdir64b intrinsicsGabor Buella2018-05-011-0/+14
| | | | | | | | | | Reviewers: spatel, craig.topper, RKSimon Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45984 llvm-svn: 331249
* [X86] WaitPKG intrinsicsGabor Buella2018-04-201-0/+7
| | | | | | | | | | Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45254 llvm-svn: 330463
* [X86] Introduce archs: goldmont-plus & tremontGabor Buella2018-04-161-0/+14
| | | | | | | | | | | | Reviewers: craig.topper Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D45613 llvm-svn: 330110
* [X86] Introduce cldemote intrinsicGabor Buella2018-04-131-0/+6
| | | | | | | | | | Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45257 llvm-svn: 329993
* [x86] wbnoinvd intrinsicGabor Buella2018-04-111-0/+8
| | | | | | | | | | | | | | The WBNOINVD instruction writes back all modified cache lines in the processor’s internal cache to main memory but does not invalidate (flush) the internal caches. Reviewers: craig.topper, zvi, ashlykov Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D43817 llvm-svn: 329848
* [X86] Split up -march=icelake to -client & -serverGabor Buella2018-04-101-2/+4
| | | | | | | | | | Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45056 llvm-svn: 329741
* [X86] Disable SGX for Skylake ServerGabor Buella2018-04-101-1/+2
| | | | | | | | | | Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45058 llvm-svn: 329701
* Fix typos in clangAlexander Kornienko2018-04-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399
* [X86] Disable CLWB in Cannon LakeCraig Topper2018-02-211-1/+2
| | | | | | | | | | | Cannon Lake does not support CLWB, therefore it does not include all features listed under SKX. Patch by Gabor Buella Differential Revision: https://reviews.llvm.org/D43459 llvm-svn: 325655
* [X86] Add 'sahf' CPU feature to frontendDimitry Andric2018-02-171-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Make clang accept `-msahf` (and `-mno-sahf`) flags to activate the `+sahf` feature for the backend, for bug 36028 (Incorrect use of pushf/popf enables/disables interrupts on amd64 kernels). This was originally submitted in bug 36037 by Jonathan Looney <jonlooney@gmail.com>. As described there, GCC also uses `-msahf` for this feature, and the backend already recognizes the `+sahf` feature. All that is needed is to teach clang to pass this on to the backend. The mapping of feature support onto CPUs may not be complete; rather, it was chosen to match LLVM's idea of which CPUs support this feature (see lib/Target/X86/X86.td). I also updated the affected test case (CodeGen/attr-target-x86.c) to match the emitted output. Reviewers: craig.topper, coby, efriedma, rsmith Reviewed By: craig.topper Subscribers: emaste, cfe-commits Differential Revision: https://reviews.llvm.org/D43394 llvm-svn: 325446
* Add X86 Support to ValidCPUList (enabling march notes)Erich Keane2018-02-081-2/+13
| | | | | | | | | | A followup to: https://reviews.llvm.org/D42978 This patch adds X86 and X86_64 support for enabling the march notes. Differential Revision: https://reviews.llvm.org/D43041 llvm-svn: 324674
* [X86] Add 'rdrnd' feature to silvermont to match recent gcc bug fix.Craig Topper2018-01-261-1/+1
| | | | | | gcc recently fixed this bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83546 llvm-svn: 323552
* [X86] Define __IBT__ when -mibt is specified.Craig Topper2018-01-261-0/+2
| | | | llvm-svn: 323543
* Introduce the "retpoline" x86 mitigation technique for variant #2 of the ↵Chandler Carruth2018-01-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155
OpenPOWER on IntegriCloud