summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86Subtarget.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Add missing initialization for the HasPREFETCHWT1 subtarget variable.Craig Topper2017-12-221-0/+1
| | | | llvm-svn: 321340
* [X86] Fix uninitialized variable sanitizer warning from rL321074Simon Pilgrim2017-12-191-0/+1
| | | | llvm-svn: 321076
* X86/AArch64/ARM: Factor out common sincos_stret logic; NFCIMatthias Braun2017-12-181-9/+0
| | | | | | | | | | | Note: - X86ISelLowering: setLibcallName(SINCOS) was superfluous as InitLibcalls() already does it. - ARMISelLowering: Setting libcallnames for sincos/sincosf seemed superfluous as in the darwin case it wouldn't be used while for all other cases InitLibcalls already does it. llvm-svn: 321036
* AArch64/X86: Factor out common bzero logic; NFCMatthias Braun2017-12-181-13/+0
| | | | llvm-svn: 321035
* Remove redundant includes from lib/Target/X86.Michael Zolotukhin2017-12-131-4/+0
| | | | llvm-svn: 320636
* Control-Flow Enforcement Technology - Shadow Stack support (LLVM side)Oren Ben Simhon2017-11-261-0/+2
| | | | | | | | | | | | | | | | | | Shadow stack solution introduces a new stack for return addresses only. The HW has a Shadow Stack Pointer (SSP) that points to the next return address. If we return to a different address, an exception is triggered. The shadow stack is managed using a series of intrinsics that are introduced in this patch as well as the new register (SSP). The intrinsics are mapped to new instruction set that implements CET mechanism. The patch also includes initial infrastructure support for IBT. For more information, please see the following: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf Differential Revision: https://reviews.llvm.org/D40223 Change-Id: I4daa1f27e88176be79a4ac3b4cd26a459e88fed4 llvm-svn: 318996
* [x86][icelake]GFNICoby Tayree2017-11-261-0/+1
| | | | | | | | | | galois field arithmetic (GF(2^8)) insns: gf2p8affineinvqb gf2p8affineqb gf2p8mulb Differential Revision: https://reviews.llvm.org/D40373 llvm-svn: 318993
* [X86] Don't report gather is legal on Skylake CPUs when AVX2/AVX512 is ↵Craig Topper2017-11-251-6/+6
| | | | | | | | | | | | | | | | | | | disabled. Allow gather on SKX/CNL/ICL when AVX512 is disabled by using AVX2 instructions. Summary: This adds a new fast gather feature bit to cover all CPUs that support fast gather that we can use independent of whether the AVX512 feature is enabled. I'm only using this new bit to qualify AVX2 codegen. AVX512 is still implicitly assuming fast gather to keep tests working and to match the scatter behavior. Test command lines have been added for these two cases. Reviewers: magabari, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D40282 llvm-svn: 318983
* [x86][icelake]BITALGCoby Tayree2017-11-211-0/+1
| | | | | | | vpopcnt{b,w} Differential Revision: https://reviews.llvm.org/D40213 llvm-svn: 318748
* [x86][icelake]VNNICoby Tayree2017-11-211-0/+1
| | | | | | | | | Introducing Vector Neural Network Instructions, consisting of: vpdpbusd{s} vpdpwssd{s} Differential Revision: https://reviews.llvm.org/D40208 llvm-svn: 318746
* [x86][icelake]vbmi2Coby Tayree2017-11-211-0/+1
| | | | | | | | | | | introducing vbmi2, consisting of vpcompress{b,w} vpexpand{b,w} vpsh{l,r}d{w,d,q} vpsh{l,r}dv{w,d,q} Differential Revision: https://reviews.llvm.org/D40206 llvm-svn: 318745
* [x86][icelake]vpclmulqdq introductionCoby Tayree2017-11-211-0/+1
| | | | | | | an icelake promotion of pclmulqdq Differential Revision: https://reviews.llvm.org/D40101 llvm-svn: 318741
* [x86][icelake]VAES introductionCoby Tayree2017-11-211-0/+1
| | | | | | | an icelake promotion of AES Differential Revision: https://reviews.llvm.org/D40078 llvm-svn: 318740
* [X86] Fix 80 column violation and remove trailing whitespace. NFCCraig Topper2017-11-191-7/+8
| | | | llvm-svn: 318611
* Attribute nonlazybind should not affect calls to functions with hidden ↵Sriraman Tallam2017-11-081-9/+4
| | | | | | | | visibility. Differential Revision: https://reviews.llvm.org/D39625 llvm-svn: 317639
* Avoid PLT for external calls when attribute nonlazybind is used.Sriraman Tallam2017-11-031-2/+9
| | | | | | Differential Revision: https://reviews.llvm.org/D39065 llvm-svn: 317292
* Handle undefined weak hidden symbols on all architectures.Rafael Espindola2017-10-271-17/+1
| | | | | | | | | | | | | | | | We were handling the non-hidden case in lib/Target/TargetMachine.cpp, but the hidden case was handled in architecture dependent code and only X86_64 and AArch64 were covered. While it is true that some code sequences in some ABIs might be able to produce the correct value at runtime, that doesn't seem to be the common case. I left the AArch64 code in place since it also forces a got access for non-pic code. It is not clear if that is needed, but it is probably better to change that in another commit. llvm-svn: 316799
* [X86] Remove the SlowBTMem feature flag entirelyCraig Topper2017-10-151-1/+0
| | | | | | Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments. llvm-svn: 315862
* [X86] Adding X86 Processor FamiliesMohammed Agabaria2017-09-131-0/+14
| | | | | | | | | Adding x86 Processor families to initialize several uArch properties (based on the family) This patch shows how gather cost can be initialized based on the proc. family Differential Revision: https://reviews.llvm.org/D35348 llvm-svn: 313132
* [X86] Provide a separate feature bit for macro fusion support instead of ↵Craig Topper2017-08-301-0/+1
| | | | | | | | | | | | | | | | | | | | | basing it on the AVX flag Summary: Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge". This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion. This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX) This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature. Reviewers: spatel, chandlerc, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37280 llvm-svn: 312097
* Mark Knights Landing as having slow two memory operand instructionsCraig Topper2017-08-291-1/+1
| | | | | | | | | | | | | | | | Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly. Patch by David Zarzycki. Reviewers: craig.topper Reviewed By: craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37224 llvm-svn: 311979
* Reapply "[GlobalISel] Remove the GISelAccessor API."Quentin Colombet2017-08-151-41/+8
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r310425, thus reapplying r310335 with a fix for link issue of the AArch64 unittests on Linux bots when BUILD_SHARED_LIBS is ON. Original commit message: [GlobalISel] Remove the GISelAccessor API. Its sole purpose was to avoid spreading around ifdefs related to building global-isel. Since r309990, GlobalISel is not optional anymore, thus, we can get rid of this mechanism all together. NFC. ---- The fix for the link issue consists in adding the GlobalISel library in the list of dependencies for the AArch64 unittests. This dependency comes from the use of AArch64Subtarget that needs to know how to destruct the GISel related APIs when being detroyed. Thanks to Bill Seurer and Ahmed Bougacha for helping me reproducing and understand the problem. llvm-svn: 310969
* Fix access to undefined weak symbols in pic codeRafael Espindola2017-08-111-1/+17
| | | | | | | | | | When the access to a weak symbol is not a call, the access has to be able to produce the value 0 at runtime. We were sometimes producing code sequences where that was not possible if the code was leaded more than 4g away from 0. llvm-svn: 310756
* Revert "[GlobalISel] Remove the GISelAccessor API."Quentin Colombet2017-08-081-8/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit r310115. It causes a linker failure for the one of the unittests of AArch64 on one of the linux bot: http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/3429 : && /home/fedora/gcc/install/gcc-7.1.0/bin/g++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -O2 -L/home/fedora/gcc/install/gcc-7.1.0/lib64 -Wl,-allow-shlib-undefined -Wl,-O3 -Wl,--gc-sections unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o -o unittests/Target/AArch64/AArch64Tests lib/libLLVMAArch64CodeGen.so.6.0.0svn lib/libLLVMAArch64Desc.so.6.0.0svn lib/libLLVMAArch64Info.so.6.0.0svn lib/libLLVMCodeGen.so.6.0.0svn lib/libLLVMCore.so.6.0.0svn lib/libLLVMMC.so.6.0.0svn lib/libLLVMMIRParser.so.6.0.0svn lib/libLLVMSelectionDAG.so.6.0.0svn lib/libLLVMTarget.so.6.0.0svn lib/libLLVMSupport.so.6.0.0svn -lpthread lib/libgtest_main.so.6.0.0svn lib/libgtest.so.6.0.0svn -lpthread -Wl,-rpath,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && : unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x0): undefined reference to `vtable for llvm::LegalizerInfo' unittests/Target/AArch64/CMakeFiles/AArch64Tests.dir/InstSizes.cpp.o:(.toc+0x8): undefined reference to `vtable for llvm::RegisterBankInfo' The particularity of this bot is that it is built with BUILD_SHARED_LIBS=ON However, I was not able to reproduce the problem so far. Reverting to unblock the bot. llvm-svn: 310425
* [X86] Teach fastisel to select calls to dllimport functionsReid Kleckner2017-08-051-1/+6
| | | | | | | | | | | | | | Summary: Direct calls to dllimport functions are very common Windows. We should add them to the -O0 fast path. Reviewers: rafael Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D36197 llvm-svn: 310152
* [GlobalISel] Remove the GISelAccessor API.Quentin Colombet2017-08-041-41/+8
| | | | | | | | | | Its sole purpose was to avoid spreading around ifdefs related to building global-isel. Since r309990, GlobalISel is not optional anymore, thus, we can get rid of this mechanism all together. NFC. llvm-svn: 310115
* [GlobalISel] Make GlobalISel a non-optional library.Quentin Colombet2017-08-031-10/+0
| | | | | | | | With this change, the GlobalISel library gets always built. In particular, this is not possible to opt GlobalISel out of the build using the LLVM_BUILD_GLOBAL_ISEL variable any more. llvm-svn: 309990
* [CodeGen][X86] Fuchsia supports sincos* libcalls and sin+cos->sincos ↵Petr Hosek2017-07-231-3/+6
| | | | | | | | | | optimization Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D35748 llvm-svn: 308854
* [X86] Move GISel accessor initialization from TargetMachine to Subtarget.Quentin Colombet2017-07-011-0/+55
| | | | | | NFC llvm-svn: 306921
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [X86] Adding vpopcntd and vpopcntq instructionsOren Ben Simhon2017-05-251-0/+1
| | | | | | | | | AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858
* [globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to ↵Daniel Sanders2017-05-191-4/+2
| | | | | | | | | | | | | | | | | | per-function predicates. Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32861 llvm-svn: 303418
* [X86] Replace slow LEA instructions in X86Lama Saba2017-05-181-0/+1
| | | | | | | | | | | | | | | According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303333
* [X86] Disabling PLT in Regcall CC FunctionsOren Ben Simhon2017-05-041-2/+8
| | | | | | | | | | According to psABI, PLT stub clobbers XMM8-XMM15. In Regcall calling convention those registers are used for passing parameters. Thus we need to prevent lazy binding in Regcall. Differential Revision: https://reviews.llvm.org/D32430 llvm-svn: 302124
* [X86][LWP] Add llvm support for LWP instructions (reapplied).Simon Pilgrim2017-05-031-0/+1
| | | | | | | | | | This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Reapplied - this time without changing line endings of existing files. Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302041
* Revert rL302028 due to accidental line ending changes.Simon Pilgrim2017-05-031-1/+0
| | | | llvm-svn: 302038
* [X86][LWP] Add llvm support for LWP instructions.Simon Pilgrim2017-05-031-0/+1
| | | | | | | | This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302028
* X86: initialize a few subtarget variables.Tim Northover2017-05-011-0/+3
| | | | | | Otherwise an indeterminate value gets read, causing a bunch of UBSan failures. llvm-svn: 301819
* [globalisel][tablegen] Compute available feature bits correctly.Daniel Sanders2017-04-291-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (RecomputePerFunction==0) and per-function (RecomputePerFunction==1). Per-function predicates are currently recomputed more frequently than necessary since the only predicate in this category is cheap to test. Per-module predicates are now computed in getSubtargetImpl() while per-function predicates are computed in selectImpl(). Tablegen now manages the PredicateBitset internally. It should only be necessary to add the required includes. Also fixed a problem revealed by the test case where constrainSelectedInstRegOperands() would attempt to tie operands that BuildMI had already tied. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32491 llvm-svn: 301750
* Rename FastString flag.Clement Courbet2017-04-211-1/+1
| | | | llvm-svn: 300959
* X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copiesClement Courbet2017-04-211-0/+1
| | | | | | | | | | | | when the subtarget has fast strings. This has two advantages: - Speed is improved. For example, on Haswell thoughput improvements increase linearly with size from 256 to 512 bytes, after which they plateau: (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes). - Code is much smaller (no need to handle boundaries). llvm-svn: 300957
* [X86] Generate VZEROUPPER for Skylake-avx512.Amjad Aboud2017-03-031-1/+1
| | | | | | | | VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859
* [X86] Use SHLD with both inputs from the same register to implement rotate ↵Craig Topper2017-02-211-0/+1
| | | | | | | | | | | | | | | | | | | on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697
* [X86] Remove the HLE feature flag.Craig Topper2017-02-091-1/+0
| | | | | | We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line. llvm-svn: 294562
* [X86] Clzero intrinsic and its addition under znver1Craig Topper2017-02-091-0/+1
| | | | | | | | | | | | | | | | | This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558
* [X86] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵Eugene Zelenko2017-02-021-6/+7
| | | | | | minor fixes (NFC). llvm-svn: 293949
* X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128).Peter Collingbourne2017-02-021-2/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D28689 llvm-svn: 293844
* Remove an overeager assert from r288844.Joerg Sonnenberger2017-01-171-3/+0
| | | | llvm-svn: 292244
* IR, X86: Understand !absolute_symbol metadata on global variables.Peter Collingbourne2016-12-081-0/+4
| | | | | | | | | | | | | | | | | Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html Differential Revision: https://reviews.llvm.org/D25878 llvm-svn: 289087
* [X86] Prefer reduced width multiplication over pmulld on SilvermontZvi Rackover2016-12-061-0/+4
| | | | | | | | | | | | | | | | | | | | Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference Manual]: PMULLD has a throughput of 1/11 [instruction/cycles]. PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles]. Fixes pr31202. Analysis of this issue was done by Fahana Aleen. Reviewers: wmi, delena, mkuper Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27203 llvm-svn: 288844
OpenPOWER on IntegriCloud