summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86Subtarget.h
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Remove the SlowBTMem feature flag entirelyCraig Topper2017-10-151-4/+0
| | | | | | Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments. llvm-svn: 315862
* [codeview] Implement FPO data assembler directivesReid Kleckner2017-10-111-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This adds a set of new directives that describe 32-bit x86 prologues. The directives are limited and do not expose the full complexity of codeview FPO data. They are merely a convenience for the compiler to generate more readable assembly so we don't need to generate tons of labels in CodeGen. If our prologue emission changes in the future, we can change the set of available directives to suit our needs. These are modelled after the .seh_ directives, which use a different format that interacts with exception handling. The directives are: .cv_fpo_proc _foo .cv_fpo_pushreg ebp/ebx/etc .cv_fpo_setframe ebp/esi/etc .cv_fpo_stackalloc 200 .cv_fpo_endprologue .cv_fpo_endproc .cv_fpo_data _foo I tried to follow the implementation of ARM EHABI CFI directives by sinking most directives out of MCStreamer and into X86TargetStreamer. This helps avoid polluting non-X86 code with WinCOFF specific logic. I used cdb to confirm that this can show locals in parent CSRs in a few cases, most importantly the one where we use ESI as a frame pointer, i.e. the one in http://crbug.com/756153#c28 Once we have cdb integration in debuginfo-tests, we can add integration tests there. Reviewers: majnemer, hans Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D38776 llvm-svn: 315513
* X86: treat SwiftCC as Win64_CC on Win64Saleem Abdulrasool2017-09-201-0/+1
| | | | | | | | | | | The Swift CC is identical to Win64 CC with the exception of swift error being passed in r12 which is a CSR. However, since this calling convention is only used in swift -> swift code, it does not impact interoperability and can be treated entirely as Win64 CC. We would previously incorrectly lower the frame setup as we did not treat the frame as conforming to Win64 specifications. llvm-svn: 313813
* [X86] Adding X86 Processor FamiliesMohammed Agabaria2017-09-131-1/+19
| | | | | | | | | Adding x86 Processor families to initialize several uArch properties (based on the family) This patch shows how gather cost can be initialized based on the proc. family Differential Revision: https://reviews.llvm.org/D35348 llvm-svn: 313132
* [X86] Provide a separate feature bit for macro fusion support instead of ↵Craig Topper2017-08-301-0/+4
| | | | | | | | | | | | | | | | | | | | | basing it on the AVX flag Summary: Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge". This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion. This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX) This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature. Reviewers: spatel, chandlerc, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37280 llvm-svn: 312097
* [X86] Finish the subtarget and predicate implementation of CLWB.Craig Topper2017-08-291-0/+1
| | | | | | We don't have an intrinsic implemented for this instruction yet, but it looked odd that we were missing the accessor method from the subtarget. llvm-svn: 312064
* Mark Knights Landing as having slow two memory operand instructionsCraig Topper2017-08-291-4/+4
| | | | | | | | | | | | | | | | Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly. Patch by David Zarzycki. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37224 llvm-svn: 311979
* Reapply "[GlobalISel] Remove the GISelAccessor API."Quentin Colombet2017-08-151-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r310425, thus reapplying r310335 with a fix for link issue of the AArch64 unittests on Linux bots when BUILD_SHARED_LIBS is ON. Original commit message: [GlobalISel] Remove the GISelAccessor API. Its sole purpose was to avoid spreading around ifdefs related to building global-isel. Since r309990, GlobalISel is not optional anymore, thus, we can get rid of this mechanism all together. NFC. ---- The fix for the link issue consists in adding the GlobalISel library in the list of dependencies for the AArch64 unittests. This dependency comes from the use of AArch64Subtarget that needs to know how to destruct the GISel related APIs when being detroyed. Thanks to Bill Seurer and Ahmed Bougacha for helping me reproducing and understand the problem. llvm-svn: 310969
* Revert "[GlobalISel] Remove the GISelAccessor API."Quentin Colombet2017-08-081-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r310115. It causes a linker failure for the one of the unittests of AArch64 on one of the linux bot: http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/3429 : && /home/fedora/gcc/install/gcc-7.1.0/bin/g++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -O2 -L/home/fedora/gcc/install/gcc-7.1.0/lib64 -Wl,-allow-shlib-undefined -Wl,-O3 -Wl,--gc-sections unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o -o unittests/Target/AArch64/AArch64Tests lib/libLLVMAArch64CodeGen.so.6.0.0svn lib/libLLVMAArch64Desc.so.6.0.0svn lib/libLLVMAArch64Info.so.6.0.0svn lib/libLLVMCodeGen.so.6.0.0svn lib/libLLVMCore.so.6.0.0svn lib/libLLVMMC.so.6.0.0svn lib/libLLVMMIRParser.so.6.0.0svn lib/libLLVMSelectionDAG.so.6.0.0svn lib/libLLVMTarget.so.6.0.0svn lib/libLLVMSupport.so.6.0.0svn -lpthread lib/libgtest_main.so.6.0.0svn lib/libgtest.so.6.0.0svn -lpthread -Wl,-rpath,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && : unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x0): undefined reference to `vtable for llvm::LegalizerInfo' unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x8): undefined reference to `vtable for llvm::RegisterBankInfo' The particularity of this bot is that it is built with BUILD_SHARED_LIBS=ON However, I was not able to reproduce the problem so far. Reverting to unblock the bot. llvm-svn: 310425
* [GlobalISel] Remove the GISelAccessor API.Quentin Colombet2017-08-041-8/+9
| | | | | | | | | | Its sole purpose was to avoid spreading around ifdefs related to building global-isel. Since r309990, GlobalISel is not optional anymore, thus, we can get rid of this mechanism all together. NFC. llvm-svn: 310115
* [AArch64] Extend CallingConv::X86_64_Win64 to AArch64 as wellMartin Storsjo2017-07-171-1/+1
| | | | | | | | | | | | Rename the enum value from X86_64_Win64 to plain Win64. The symbol exposed in the textual IR is changed from 'x86_64_win64cc' to 'win64cc', but the numeric value is kept, keeping support for old bitcode. Differential Revision: https://reviews.llvm.org/D34474 llvm-svn: 308208
* [LLVM][X86][Goldmont] Adding new target-cpu: GoldmontMichael Zuckerman2017-06-291-1/+1
| | | | | | | | | | | | | | | | [LLVM SIDE] Connecting the GoldMont processor to his feature. Reviewers: 1. igorb 2. zvi 3. delena 4. RKSimon 5. craig.topper Differential Revision: https://reviews.llvm.org/D34504 llvm-svn: 306658
* [X86] Adding vpopcntd and vpopcntq instructionsOren Ben Simhon2017-05-251-0/+4
| | | | | | | | | AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858
* [globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to ↵Daniel Sanders2017-05-191-8/+1
| | | | | | | | | | | | | | | | | | per-function predicates. Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32861 llvm-svn: 303418
* [X86] Replace slow LEA instructions in X86Lama Saba2017-05-181-0/+6
| | | | | | | | | | | | | | | According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303333
* Revert "[X86] Replace slow LEA instructions in X86"Reid Kleckner2017-05-161-6/+0
| | | | | | | This reverts commit r303183, it broke various buildbots and introduced sanitizer errors. llvm-svn: 303199
* [X86] Replace slow LEA instructions in X86Lama Saba2017-05-161-0/+6
| | | | | | | | | | | | | | | According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303183
* [X86][LWP] Add llvm support for LWP instructions (reapplied).Simon Pilgrim2017-05-031-0/+4
| | | | | | | | | | This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Reapplied - this time without changing line endings of existing files. Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302041
* Revert rL302028 due to accidental line ending changes.Simon Pilgrim2017-05-031-4/+0
| | | | llvm-svn: 302038
* [X86][LWP] Add llvm support for LWP instructions.Simon Pilgrim2017-05-031-0/+4
| | | | | | | | This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302028
* [globalisel][tablegen] Compute available feature bits correctly.Daniel Sanders2017-04-291-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (RecomputePerFunction==0) and per-function (RecomputePerFunction==1). Per-function predicates are currently recomputed more frequently than necessary since the only predicate in this category is cheap to test. Per-module predicates are now computed in getSubtargetImpl() while per-function predicates are computed in selectImpl(). Tablegen now manages the PredicateBitset internally. It should only be necessary to add the required includes. Also fixed a problem revealed by the test case where constrainSelectedInstRegOperands() would attempt to tie operands that BuildMI had already tied. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32491 llvm-svn: 301750
* Rename FastString flag.Clement Courbet2017-04-211-3/+3
| | | | llvm-svn: 300959
* X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copiesClement Courbet2017-04-211-0/+4
| | | | | | | | | | | | when the subtarget has fast strings. This has two advantages: - Speed is improved. For example, on Haswell thoughput improvements increase linearly with size from 256 to 512 bytes, after which they plateau: (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes). - Code is much smaller (no need to handle boundaries). llvm-svn: 300957
* This patch closes PR#32216: Better testing of schedule model instruction ↵Andrew V. Tischenko2017-04-141-0/+3
| | | | | | | | latencies/throughputs. The details are here: https://reviews.llvm.org/D30941 llvm-svn: 300311
* [AVX-512] Make VEX encoded FMA instructions available when AVX512 is enabled ↵Craig Topper2017-03-171-2/+2
| | | | | | | | | | | | regardless of whether +fma was added on the command line. We weren't able to handle isel of the 128/256-bit FMA instructions when AVX512F was enabled but VLX and FMA weren't. I didn't mask FeatureAVX512 imply FeatureFMA as I wasn't sure I wanted disabling FMA to also disable AVX512. Instead we just can't prevent FMA instructions if AVX512 is enabled. Another option would be to promote 128/256-bit to 512-bit, do the operation and extract it. But that requires a lot of extra isel patterns. Since no CPUs exist that support AVX512, but not FMA just using the VEX instructions seems better. llvm-svn: 298051
* [X86] Generate VZEROUPPER for Skylake-avx512.Amjad Aboud2017-03-031-3/+5
| | | | | | | | VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859
* [Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stackPetr Hosek2017-02-241-0/+1
| | | | | | | | | | | | The Fuchsia ABI defines slots from the thread pointer where the stack-guard value for stack-protector, and the unsafe stack pointer for safe-stack, are stored. This parallels the Android ABI support. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30237 llvm-svn: 296081
* [X86] Use SHLD with both inputs from the same register to implement rotate ↵Craig Topper2017-02-211-0/+4
| | | | | | | | | | | | | | | | | | | on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697
* [X86] Remove the HLE feature flag.Craig Topper2017-02-091-4/+0
| | | | | | We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line. llvm-svn: 294562
* [X86] Remove INVPCID and SMAP feature flags. They aren't currently used by ↵Craig Topper2017-02-091-6/+0
| | | | | | | | any instructions and not tested. If we implement intrinsics for their instructions in the future, the feature flags can be added back with proper testing. llvm-svn: 294561
* [X86] Clzero intrinsic and its addition under znver1Craig Topper2017-02-091-0/+4
| | | | | | | | | | | | | | | | | This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558
* [X86] Add test for clflushopt intrinsic and only enable it to be selected if ↵Craig Topper2017-02-081-0/+1
| | | | | | the feature flag is set. llvm-svn: 294407
* [X86] Remove the VMFUNC feature flag. It was only partially implemented and ↵Craig Topper2017-02-081-3/+0
| | | | | | | | we have no support for codegening vmfunc instructions today. If that support ever gets added, the full feature flag support should come along with it. llvm-svn: 294406
* [X86] Remove PCOMMIT instruction support since Intel has deprecated this ↵Craig Topper2017-02-081-3/+0
| | | | | | | | instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction llvm-svn: 294405
* [X86] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵Eugene Zelenko2017-02-021-9/+21
| | | | | | minor fixes (NFC). llvm-svn: 293949
* [X86] Tune bypassing of slow division for Intel CPUsNikolai Bozhenov2017-01-121-1/+1
| | | | | | | | | | | | | | | | | | 64-bit integer division in Intel CPUs is extremely slow, much slower than 32-bit division. On the other hand, 8-bit and 16-bit divisions aren't any faster. The only important exception is Atom where DIV8 is fastest. Because of that, the patch 1) Enables bypassing of 64-bit division for Atom, Silvermont and all big cores. 2) Modifies 64-bit bypassing to use 32-bit division instead of 16-bit one. This doesn't make the shorter division slower but increases chances of taking it. Moreover, it's much more likely to prove at compile-time that a value fits 32 bits and doesn't require a run-time check (e.g. zext i32 to i64). Differential Revision: https://reviews.llvm.org/D28196 llvm-svn: 291800
* [X86] Prefer reduced width multiplication over pmulld on SilvermontZvi Rackover2016-12-061-0/+5
| | | | | | | | | | | | | | | | | | | | Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference Manual]: PMULLD has a throughput of 1/11 [instruction/cycles]. PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles]. Fixes pr31202. Analysis of this issue was done by Fahana Aleen. Reviewers: wmi, delena, mkuper Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27203 llvm-svn: 288844
* [PS4] Tighten up a triple check.Paul Robinson2016-11-301-1/+1
| | | | llvm-svn: 288286
* [X86][GlobalISel] Add minimal call lowering support to the IRTranslatorZvi Rackover2016-11-151-0/+13
| | | | | | | | | | | | | | | Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935
* [X86] Take advantage of the lzcnt instruction on btver2 architectures when ↵Pierre Gousseau2016-10-141-0/+4
| | | | | | | | | | | | | | | | ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248
* [XRay] ARM 32-bit no-Thumb support in LLVMDean Michael Berris2016-09-191-0/+2
| | | | | | | | | | | | This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: https://reviews.llvm.org/D23932 (Clang test) https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 281878
* Revert "[XRay] ARM 32-bit no-Thumb support in LLVM"Renato Golin2016-09-081-2/+0
| | | | | | | | | | And associated commits, as they broke the Thumb bots. This reverts commit r280935. This reverts commit r280891. This reverts commit r280888. llvm-svn: 280967
* [XRay] ARM 32-bit no-Thumb support in LLVMDean Michael Berris2016-09-081-0/+2
| | | | | | | | | | | | This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: 1. https://reviews.llvm.org/D23932 (Clang test) 2. https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 280888
* [X86] Heuristic to selectively build Newton-Raphson SQRT estimationNikolai Bozhenov2016-08-041-0/+10
| | | | | | | | | | | | | | | | | | | | | On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725
* Convert a few more comparisons to isPositionIndependent(). NFC.Rafael Espindola2016-06-271-3/+1
| | | | llvm-svn: 273945
* Simplify PICStyles.Rafael Espindola2016-06-201-6/+4
| | | | | | | | The main difference is that StubDynamicNoPIC is gone. The dynamic-no-pic mode as the name implies is simply not pic. It is just conservative about what it assumes to be dso local. llvm-svn: 273222
* Delete dead code. NFC.Rafael Espindola2016-06-201-8/+0
| | | | llvm-svn: 273206
* [X86Subtarget] Use isPositionIndependent(). NFC.Davide Italiano2016-06-181-0/+4
| | | | | | Differential Revision: http://reviews.llvm.org/D21480 llvm-svn: 273071
* [X86] Reduce memory allocations in X86TargetMachine::getSubtargetImplDavid Majnemer2016-05-201-1/+1
| | | | | | | | We performed a number of memory allocations each time getTTI was called, remove them by using SmallString. No functionality change intended. llvm-svn: 270246
* Refactor X86 symbol access classification.Rafael Espindola2016-05-201-7/+6
| | | | | | | | | | | | This refactors the logic in X86 to avoid code duplication. It also splits it in two steps: it first decides if a symbol is local to the DSO and then uses that information to decide how to access it. The first part is implemented by shouldAssumeDSOLocal. It is not in any way specific to X86. In a followup patch I intend to move it to somewhere common and reused it in other backends. llvm-svn: 270209
OpenPOWER on IntegriCloud