summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86TargetMachine.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)Matthias Braun2018-06-281-1/+3
| | | | | | | | | | | | | Add NoTrapAfterNoreturn target option which skips emission of traps behind noreturn calls even if TrapUnreachable is enabled. Enable the feature on Mach-O to save code size; Comments suggest it is not possible to enable it for the other users of TrapUnreachable. rdar://41530228 DifferentialRevision: https://reviews.llvm.org/D48674 llvm-svn: 335877
* Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code ↵Reid Kleckner2018-06-251-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | models" The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC code model is not implemented for MachO yet. Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335508
* Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models"Reid Kleckner2018-06-211-4/+1
| | | | | | MCJIT can't handle R_X86_64_GOT64 yet. llvm-svn: 335300
* [X86] Implement more of x86-64 large and medium PIC code modelsReid Kleckner2018-06-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. Reviewers: chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335297
* [MS][ARM64] Hoist __ImageBase handling into TargetLoweringObjectFileCOFFReid Kleckner2018-06-121-2/+0
| | | | | | | | | | | | All COFF targets should use @IMGREL32 relocations for symbol differences against __ImageBase. Do the same for getSectionForConstant, so that immediates lowered to globals get merged across TUs. Patch by Chris January Differential Revision: https://reviews.llvm.org/D47783 llvm-svn: 334523
* Correct dwarf unwind information in function epiloguePetar Jovanovic2018-04-241-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: * CFI instructions do not affect code generation (they are not counted as instructions when tail duplicating or tail merging) * Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Added CFIInstrInserter pass: * analyzes each basic block to determine cfa offset and register are valid at its entry and exit * verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors * inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D42848 llvm-svn: 330706
* MachO: trap unreachable instructionsTim Northover2018-04-131-1/+2
| | | | | | | Debugability is more important than saving 4 bytes to let us to fall through to nonense. llvm-svn: 330073
* [x86] Introduce a pass to begin more systematically fixing PR36028 and ↵Chandler Carruth2018-04-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | similar issues. The key idea is to lower COPY nodes populating EFLAGS by scanning the uses of EFLAGS and introducing dedicated code to preserve the necessary state in a GPR. In the vast majority of cases, these uses are cmovCC and jCC instructions. For such cases, we can very easily save and restore the necessary information by simply inserting a setCC into a GPR where the original flags are live, and then testing that GPR directly to feed the cmov or conditional branch. However, things are a bit more tricky if arithmetic is using the flags. This patch handles the vast majority of cases that seem to come up in practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of partially preserved EFLAGS as LLVM doesn't currently model that at all. There are a large number of operations that techinaclly observe EFLAGS currently but shouldn't in this case -- they typically are using DF. Currently, they will not be handled by this approach. However, I have never seen this issue come up in practice. It is already pretty rare to have these patterns come up in practical code with LLVM. I had to resort to writing MIR tests to cover most of the logic in this pass already. I suspect even with its current amount of coverage of arithmetic users of EFLAGS it will be a significant improvement over the current use of pushf/popf. It will also produce substantially faster code in most of the common patterns. This patch also removes all of the old lowering for EFLAGS copies, and the hack that forced us to use a frame pointer when EFLAGS copies were found anywhere in a function so that the dynamic stack adjustment wasn't a problem. None of this is needed as we now lower all of these copies directly in MI and without require stack adjustments. Lots of thanks to Reid who came up with several aspects of this approach, and Craig who helped me work out a couple of things tripping me up while working on this. Differential Revision: https://reviews.llvm.org/D45146 llvm-svn: 329657
* Add the ShadowCallStack passVlad Tsyrklevich2018-04-041-0/+3
| | | | | | | | | | | | | | | | | | | | Summary: The ShadowCallStack pass instruments functions marked with the shadowcallstack attribute. The instrumented prolog saves the return address to [gs:offset] where offset is stored and updated in [gs:0]. The instrumented epilog loads/updates the return address from [gs:0] and checks that it matches the return address on the stack before returning. Reviewers: pcc, vitalybuka Reviewed By: pcc Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D44802 llvm-svn: 329139
* [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346Lama Saba2018-04-021-0/+3
| | | | | | | | | | | | | | | | | If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Differential revision: https://reviews.llvm.org/D41330 Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9 llvm-svn: 328973
* Move TargetLoweringObjectFile from CodeGen to Target to fix layeringDavid Blaikie2018-03-231-1/+1
| | | | | | | It's implemented in Target & include from other Target headers, so the header should be in Target. llvm-svn: 328392
* Revert r325128 ("[X86] Reduce Store Forward Block issues in HW").Richard Smith2018-02-231-1/+0
| | | | | | This is causing miscompiles in some situations. See the llvm-commits thread for the commit for details. llvm-svn: 325852
* [X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346Lama Saba2018-02-141-0/+1
| | | | | | | | | | | | | | | If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Change-Id: Ic41aa9ade6512e0478db66e07e2fde41b4fb35f9 llvm-svn: 325128
* Revert r324835 "[X86] Reduce Store Forward Block issues in HW"Hans Wennborg2018-02-121-1/+0
| | | | | | | | | | | | | | | | | | It asserts building Chromium; see PR36346. (This also reverts the follow-up r324836.) > If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. > A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. > The estimated penalty for a store forward block is ~13 cycles. > > This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence > of a load and a store. > > The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. > breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. llvm-svn: 324887
* [X86] Reduce Store Forward Block issues in HWLama Saba2018-02-111-0/+1
| | | | | | | | | | | | | | | If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748 llvm-svn: 324835
* [X86] Don't make 512-bit vectors legal when preferred vector width is 256 ↵Craig Topper2018-02-111-3/+18
| | | | | | | | | | | | | | | | | | bits and 512 bits aren't required This patch adds a new function attribute "required-vector-width" that can be set by the frontend to indicate the maximum vector width present in the original source code. The idea is that this would be set based on ABI requirements, intrinsics or explicit vector types being used, maybe simd pragmas, etc. The backend will then use this information to determine if its save to make 512-bit vectors illegal when the preference is for 256-bit vectors. For code that has no vectors in it originally and only get vectors through the loop and slp vectorizers this allows us to generate code largely similar to our AVX2 only output while still enabling AVX512 features like mask registers and gather/scatter. The loop vectorizer doesn't always obey TTI and will create oversized vectors with the expectation the backend will legalize it. In order to avoid changing the vectorizer and potentially harm our AVX2 codegen this patch tries to make the legalizer behavior similar. This is restricted to CPUs that support AVX512F and AVX512VL so that we have good fallback options to use 128 and 256-bit vectors and still get masking. I've qualified every place I could find in X86ISelLowering.cpp and added tests cases for many of them with 2 different values for the attribute to see the codegen differences. We still need to do frontend work for the attribute and teach the inliner how to merge it, etc. But this gets the codegen layer ready for it. Differential Revision: https://reviews.llvm.org/D42724 llvm-svn: 324834
* Introduce the "retpoline" x86 mitigation technique for variant #2 of the ↵Chandler Carruth2018-01-221-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155
* Separate LoopTraversal, ReachingDefAnalysis and BreakFalseDeps into their ↵Marina Yatsina2018-01-221-1/+1
| | | | | | | | | | | | | | | | | | own files. This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40330 https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40333 Change-Id: Ie5f8eb34d98cfdfae23a3072eb69b5794f0e2d56 llvm-svn: 323095
* Rename ExecutionDepsFix files to ExecutionDomainFixMarina Yatsina2018-01-221-1/+1
| | | | | | | | | | | | | | | | This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40330 https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40333 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40332 Change-Id: I6a048cca7fdafbfc42fb1bac94343e483befded8 llvm-svn: 323094
* Separate ExecutionDepsFix into 4 parts:Marina Yatsina2018-01-221-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains). 2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings. 3. BreakFalseDeps - Breaks false dependencies. 4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks. This also included the following changes to ExcecutionDepsFix original logic: 1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class. 2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class. Additional changes in affected files: 1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate. 2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate. Additional refactoring changes will follow. This commit is (almost) NFC. The only functional change is that now BreakFalseDeps will break dependency for all register classes. Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet. In a future commit several instructions (and tests) will be added. This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40333 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40330 Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275 llvm-svn: 323087
* [X86] Add support for passing 'prefer-vector-width' function attribute into ↵Craig Topper2018-01-201-2/+20
| | | | | | | | | | | | X86Subtarget and exposing via X86's getRegisterWidth TTI interface. This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for. I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10. This has been split from D41895 with the remainder in a follow up commit. llvm-svn: 323015
* Instrument Control Flow For Indirect Branch TrackingOren Ben Simhon2018-01-091-0/+2
| | | | | | | | | | | | | CET (Control-Flow Enforcement Technology) introduces a new mechanism called IBT (Indirect Branch Tracking). According to IBT, each Indirect branch should land on dedicated ENDBR instruction (End Branch). The new pass adds ENDBR instructions for every indirect jmp/call (including jumps using jump tables / switches). For more information, please see the following: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf Differential Revision: https://reviews.llvm.org/D40482 Change-Id: Icb754489faf483a95248f96982a4e8b1009eb709 llvm-svn: 322062
* (Re-landing) Expose a TargetMachine::getTargetTransformInfo functionSanjoy Das2017-12-221-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-land r321234. It had to be reverted because it broke the shared library build. The shared library build broke because there was a missing LLVMBuild dependency from lib/Passes (which calls TargetMachine::getTargetIRAnalysis) to lib/Target. As far as I can tell, this problem was always there but was somehow masked before (perhaps because TargetMachine::getTargetIRAnalysis was a virtual function). Original commit message: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 llvm-svn: 321375
* Revert "Expose a TargetMachine::getTargetTransformInfo function"Sanjoy Das2017-12-211-3/+4
| | | | | | This reverts commit r321234. It breaks the -DBUILD_SHARED_LIBS=ON build. llvm-svn: 321243
* Expose a TargetMachine::getTargetTransformInfo functionSanjoy Das2017-12-211-4/+3
| | | | | | | | | | | | | | | | | | | | | | | Summary: This makes the TargetMachine interface a bit simpler. We still need the std::function in TargetIRAnalysis to avoid having to add a dependency from Analysis to Target. See discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html I avoided adding all of the backend owners to this review since the change is simple, but let me know if you feel differently about this. Reviewers: echristo, MatzeB, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits Differential Revision: https://reviews.llvm.org/D41464 llvm-svn: 321234
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-2/+2
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* Revert "Correct dwarf unwind information in function epilogue for X86"Reid Kleckner2017-11-081-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts r317579, originally committed as r317100. There is a design issue with marking CFI instructions duplicatable. Not all targets support the CFIInstrInserter pass, and targets like Darwin can't cope with duplicated prologue setup CFI instructions. The compact unwind info emission fails. When the following code is compiled for arm64 on Mac at -O3, the CFI instructions end up getting tail duplicated, which causes compact unwind info emission to fail: int a, c, d, e, f, g, h, i, j, k, l, m; void n(int o, int *b) { if (g) f = 0; for (; f < o; f++) { m = a; if (l > j * k > i) j = i = k = d; h = b[c] - e; } } We get assembly that looks like this: ; BB#1: ; %if.then Lloh3: adrp x9, _f@GOTPAGE Lloh4: ldr x9, [x9, _f@GOTPAGEOFF] mov w8, wzr Lloh5: str wzr, [x9] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.lt LBB0_3 b LBB0_7 LBB0_2: ; %entry.if.end_crit_edge Lloh6: adrp x8, _f@GOTPAGE Lloh7: ldr x8, [x8, _f@GOTPAGEOFF] Lloh8: ldr w8, [x8] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.ge LBB0_7 LBB0_3: ; %for.body.lr.ph Note the multiple .cfi_def* directives. Compact unwind info emission can't handle that. llvm-svn: 317726
* Reland "Correct dwarf unwind information in function epilogue for X86"Petar Jovanovic2017-11-071-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reland r317100 with minor fix regarding ComputeCommonTailLength function in BranchFolding.cpp. Skipping top CFI instructions block needs to executed on several more return points in ComputeCommonTailLength(). Original r317100 message: "Correct dwarf unwind information in function epilogue for X86" This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. llvm-svn: 317579
* Revert "Correct dwarf unwind information in function epilogue for X86"Petar Jovanovic2017-11-011-7/+0
| | | | | | | This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf buildbot failure (build #15606). llvm-svn: 317136
* Correct dwarf unwind information in function epilogue for X86Petar Jovanovic2017-11-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D35844 llvm-svn: 317100
* X86: Register the X86CallFrameOptimization passZvi Rackover2017-10-241-0/+2
| | | | | | | | | | | | | | | | | Summary: The motivation of this change is to enable .mir testing for this pass. Added one test case to cover the functionality, this same case will be improved by a future patch. Reviewers: igorb, guyblank, DavidKreitzer Reviewed By: guyblank, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38729 llvm-svn: 316412
* [X86] Add a pass to convert instruction chains between domains.Guy Blank2017-10-221-0/+7
| | | | | | | | | | | | | | | | | The pass scans the function to find instruction chains that define registers in the same domain (closures). It then calculates the cost of converting the closure to another domain. If found profitable, the instructions are converted to instructions in the other domain and the register classes are changed accordingly. This commit adds the pass infrastructure and a simple conversion from the GPR domain to the Mask domain. Differential Revision: https://reviews.llvm.org/D37251 Change-Id: Ic2cf1d76598110401168326d411128ae2580a604 llvm-svn: 316288
* Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine"Matthias Braun2017-10-121-1/+1
| | | | | | | | | | Reverting to investigate layering effects of MCJIT not linking libCodeGen but using TargetMachine::getNameWithPrefix() breaking the lldb bots. This reverts commit r315633. llvm-svn: 315637
* TargetMachine: Merge TargetMachine and LLVMTargetMachineMatthias Braun2017-10-121-1/+1
| | | | | | | | | | | | | | | Merge LLVMTargetMachine into TargetMachine. - There is no in-tree target anymore that just implements TargetMachine but not LLVMTargetMachine. - It should still be possible to stub out all the various functions in case a target does not want to use lib/CodeGen - This simplifies the code and avoids methods ending up in the wrong interface. Differential Revision: https://reviews.llvm.org/D38489 llvm-svn: 315633
* [X86][NFC] Add X86CmovConverterPass to the pass registry.Amjad Aboud2017-10-021-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D38355 llvm-svn: 314726
* [GlobalISel] Make GlobalISel a non-optional library.Quentin Colombet2017-08-031-4/+0
| | | | | | | | With this change, the GlobalISel library gets always built. In particular, this is not possible to opt GlobalISel out of the build using the LLVM_BUILD_GLOBAL_ISEL variable any more. llvm-svn: 309990
* Delete Default and JITDefault code modelsRafael Espindola2017-08-031-3/+15
| | | | | | | | | | | | | | | IMHO it is an antipattern to have a enum value that is Default. At any given piece of code it is not clear if we have to handle Default or if has already been mapped to a concrete value. In this case in particular, only the target can do the mapping and it is nice to make sure it is always done. This deletes the two default enum values of CodeModel and uses an explicit Optional<CodeModel> when it is possible that it is unspecified. llvm-svn: 309911
* [X86] X86::CMOV to Branch heuristic based optimization.Amjad Aboud2017-07-161-0/+1
| | | | | | | | | | | | | | LLVM compiler recognizes opportunities to transform a branch into IR select instruction(s) - later it will be lowered into X86::CMOV instruction, assuming no other optimization eliminated the SelectInst. However, it is not always profitable to emit X86::CMOV instruction. For example, branch is preferable over an X86::CMOV instruction when: 1. Branch is well predicted 2. Condition operand is expensive, compared to True-value and the False-value operands In CodeGenPrepare pass there is a shallow optimization that tries to convert SelectInst into branch, but it is not enough. This commit, implements machine optimization pass that converts X86::CMOV instruction(s) into branch, based on a conservative heuristic. Differential Revision: https://reviews.llvm.org/D34769 llvm-svn: 308142
* [X86] Move GISel accessor initialization from TargetMachine to Subtarget.Quentin Colombet2017-07-011-47/+0
| | | | | | NFC llvm-svn: 306921
* [Solaris] emit .init_array instead of .ctors on Solaris (Sparc/x86)Davide Italiano2017-06-211-0/+2
| | | | | | | | Patch by Fedor Sergeev. Differential Revision: https://reviews.llvm.org/D33868 llvm-svn: 305948
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* Add LiveRangeShrink pass to shrink live range within BB.Dehao Chen2017-05-311-0/+1
| | | | | | | | | | | | | | Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB. Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb Reviewed By: MatzeB, andreadb Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D32563 llvm-svn: 304371
* TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFCMatthias Braun2017-05-301-2/+2
| | | | | | | | | | | TargetPassConfig is not useful for targets that do not use the CodeGen library, so we may just as well store a pointer to an LLVMTargetMachine instead of just to a TargetMachine. While at it, also change the constructor to take a reference instead of a pointer as the TM must not be nullptr. llvm-svn: 304247
* [globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to ↵Daniel Sanders2017-05-191-8/+1
| | | | | | | | | | | | | | | | | | per-function predicates. Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32861 llvm-svn: 303418
* Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB."Hans Wennborg2017-05-181-1/+0
| | | | | | | | | This also reverts follow-ups r303292 and r303298. It broke some Chromium tests under MSan, and apparently also internal tests at Google. llvm-svn: 303369
* [LegacyPassManager] Remove TargetMachine constructorsFrancis Visoiu Mistrih2017-05-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360
* [X86] Replace slow LEA instructions in X86Lama Saba2017-05-181-0/+2
| | | | | | | | | | | | | | | According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303333
* Only enable LiveRangeShrink for x86.Dehao Chen2017-05-171-0/+1
| | | | | | | | | | | | | | Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure. Reviewers: MatzeB, qcolombet Reviewed By: qcolombet Subscribers: jholewinski, jyknight, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D33294 llvm-svn: 303292
* Revert "[X86] Replace slow LEA instructions in X86"Reid Kleckner2017-05-161-2/+0
| | | | | | | This reverts commit r303183, it broke various buildbots and introduced sanitizer errors. llvm-svn: 303199
* [X86] Replace slow LEA instructions in X86Lama Saba2017-05-161-0/+2
| | | | | | | | | | | | | | | According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303183
OpenPOWER on IntegriCloud