summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86Subtarget.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Move GISel accessor initialization from TargetMachine to Subtarget.Quentin Colombet2017-07-011-0/+55
| | | | | | NFC llvm-svn: 306921
* Sort the remaining #include lines in include/... and lib/....Chandler Carruth2017-06-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is *entirely* mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787
* [X86] Adding vpopcntd and vpopcntq instructionsOren Ben Simhon2017-05-251-0/+1
| | | | | | | | | AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858
* [globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to ↵Daniel Sanders2017-05-191-4/+2
| | | | | | | | | | | | | | | | | | per-function predicates. Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32861 llvm-svn: 303418
* [X86] Replace slow LEA instructions in X86Lama Saba2017-05-181-0/+1
| | | | | | | | | | | | | | | According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303333
* [X86] Disabling PLT in Regcall CC FunctionsOren Ben Simhon2017-05-041-2/+8
| | | | | | | | | | According to psABI, PLT stub clobbers XMM8-XMM15. In Regcall calling convention those registers are used for passing parameters. Thus we need to prevent lazy binding in Regcall. Differential Revision: https://reviews.llvm.org/D32430 llvm-svn: 302124
* [X86][LWP] Add llvm support for LWP instructions (reapplied).Simon Pilgrim2017-05-031-0/+1
| | | | | | | | | | This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Reapplied - this time without changing line endings of existing files. Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302041
* Revert rL302028 due to accidental line ending changes.Simon Pilgrim2017-05-031-1/+0
| | | | llvm-svn: 302038
* [X86][LWP] Add llvm support for LWP instructions.Simon Pilgrim2017-05-031-0/+1
| | | | | | | | This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302028
* X86: initialize a few subtarget variables.Tim Northover2017-05-011-0/+3
| | | | | | Otherwise an indeterminate value gets read, causing a bunch of UBSan failures. llvm-svn: 301819
* [globalisel][tablegen] Compute available feature bits correctly.Daniel Sanders2017-04-291-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (RecomputePerFunction==0) and per-function (RecomputePerFunction==1). Per-function predicates are currently recomputed more frequently than necessary since the only predicate in this category is cheap to test. Per-module predicates are now computed in getSubtargetImpl() while per-function predicates are computed in selectImpl(). Tablegen now manages the PredicateBitset internally. It should only be necessary to add the required includes. Also fixed a problem revealed by the test case where constrainSelectedInstRegOperands() would attempt to tie operands that BuildMI had already tied. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32491 llvm-svn: 301750
* Rename FastString flag.Clement Courbet2017-04-211-1/+1
| | | | llvm-svn: 300959
* X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copiesClement Courbet2017-04-211-0/+1
| | | | | | | | | | | | when the subtarget has fast strings. This has two advantages: - Speed is improved. For example, on Haswell thoughput improvements increase linearly with size from 256 to 512 bytes, after which they plateau: (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes). - Code is much smaller (no need to handle boundaries). llvm-svn: 300957
* [X86] Generate VZEROUPPER for Skylake-avx512.Amjad Aboud2017-03-031-1/+1
| | | | | | | | VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859
* [X86] Use SHLD with both inputs from the same register to implement rotate ↵Craig Topper2017-02-211-0/+1
| | | | | | | | | | | | | | | | | | | on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697
* [X86] Remove the HLE feature flag.Craig Topper2017-02-091-1/+0
| | | | | | We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line. llvm-svn: 294562
* [X86] Clzero intrinsic and its addition under znver1Craig Topper2017-02-091-0/+1
| | | | | | | | | | | | | | | | | This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558
* [X86] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵Eugene Zelenko2017-02-021-6/+7
| | | | | | minor fixes (NFC). llvm-svn: 293949
* X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128).Peter Collingbourne2017-02-021-2/+12
| | | | | | Differential Revision: https://reviews.llvm.org/D28689 llvm-svn: 293844
* Remove an overeager assert from r288844.Joerg Sonnenberger2017-01-171-3/+0
| | | | llvm-svn: 292244
* IR, X86: Understand !absolute_symbol metadata on global variables.Peter Collingbourne2016-12-081-0/+4
| | | | | | | | | | | | | | | | | Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html Differential Revision: https://reviews.llvm.org/D25878 llvm-svn: 289087
* [X86] Prefer reduced width multiplication over pmulld on SilvermontZvi Rackover2016-12-061-0/+4
| | | | | | | | | | | | | | | | | | | | Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference Manual]: PMULLD has a throughput of 1/11 [instruction/cycles]. PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles]. Fixes pr31202. Analysis of this issue was done by Fahana Aleen. Reviewers: wmi, delena, mkuper Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27203 llvm-svn: 288844
* [X86][GlobalISel] Add minimal call lowering support to the IRTranslatorZvi Rackover2016-11-151-0/+20
| | | | | | | | | | | | | | | Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935
* [X86] Take advantage of the lzcnt instruction on btver2 architectures when ↵Pierre Gousseau2016-10-141-0/+1
| | | | | | | | | | | | | | | | ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248
* [X86] Heuristic to selectively build Newton-Raphson SQRT estimationNikolai Bozhenov2016-08-041-0/+2
| | | | | | | | | | | | | | | | | | | | | On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725
* Delete unused includes. NFC.Rafael Espindola2016-06-301-1/+0
| | | | llvm-svn: 274225
* Drop support for creating $stubs.Rafael Espindola2016-06-291-7/+0
| | | | | | They are created by ld64 since OS X 10.5. llvm-svn: 274130
* Move shouldAssumeDSOLocal to Target.Rafael Espindola2016-06-271-3/+2
| | | | | | Should fix the shared library build. llvm-svn: 273958
* Simplify PICStyles.Rafael Espindola2016-06-201-14/+6
| | | | | | | | The main difference is that StubDynamicNoPIC is gone. The dynamic-no-pic mode as the name implies is simply not pic. It is just conservative about what it assumes to be dso local. llvm-svn: 273222
* [X86Subtarget] Use isPositionIndependent(). NFC.Davide Italiano2016-06-181-3/+3
| | | | | | Differential Revision: http://reviews.llvm.org/D21480 llvm-svn: 273071
* Use shouldAssumeDSOLocal on AArch64.Rafael Espindola2016-05-261-43/+1
| | | | | | This reduces code duplication and now AArch64 also handles PIE. llvm-svn: 270844
* Fix shouldAssumeDSOLocal for private linkage.Rafael Espindola2016-05-251-1/+1
| | | | llvm-svn: 270746
* [X86] Reduce memory allocations in X86TargetMachine::getSubtargetImplDavid Majnemer2016-05-201-2/+2
| | | | | | | | We performed a number of memory allocations each time getTTI was called, remove them by using SmallString. No functionality change intended. llvm-svn: 270246
* Refactor X86 symbol access classification.Rafael Espindola2016-05-201-100/+110
| | | | | | | | | | | | This refactors the logic in X86 to avoid code duplication. It also splits it in two steps: it first decides if a symbol is local to the DSO and then uses that information to decide how to access it. The first part is implemented by shouldAssumeDSOLocal. It is not in any way specific to X86. In a followup patch I intend to move it to somewhere common and reused it in other backends. llvm-svn: 270209
* Record a TargetMachine instead of a Reloc::Model.Rafael Espindola2016-05-191-6/+7
| | | | | | Addresses r270095's code review. llvm-svn: 270147
* Remember the relocation model. NFC.Rafael Espindola2016-05-191-9/+8
| | | | | | This avoids passing a TargetMachine in a few places. llvm-svn: 270095
* Style fixes. NFC.Rafael Espindola2016-05-191-3/+3
| | | | llvm-svn: 270093
* Add new flag and intrinsic support for MWAITX and MONITORX instructionsAshutosh Nema2016-05-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT pair while adding a timer function, such that another termination of the MWAITX instruction occurs when the timer expires. The presence of the MONITORX and MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29. The MONITORX and MWAITX instructions are intercepted by the same bits that intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be monitored. MWAITX instruction causes the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is "0F 01 FB". These opcode information is used in adding tests for the disassembler. These instructions are enabled for AMD's bdver4 architecture. Patch by Ganesh Gopalasubramanian! Reviewers: echristo, craig.topper, RKSimon Subscribers: RKSimon, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19795 llvm-svn: 269911
* Simplify handling of hidden stub.Rafael Espindola2016-05-171-3/+3
| | | | | | | | | Since r207518 they are printed exactly like non-hidden stubs on x86 and since r207517 on ARM. This means we can use a single set for all stubs in those platforms. llvm-svn: 269776
* [X86] Extend some Linux special cases to cover kFreeBSD.Marcin Koscielnicki2016-05-051-2/+2
| | | | | | | | | | | | Both Linux and kFreeBSD use glibc, so follow similiar code paths. Add isTargetGlibc to check for this, and use it instead of isTargetLinux in a few places. Fixes PR22248 for kFreeBSD. Differential Revision: http://reviews.llvm.org/D19104 llvm-svn: 268624
* Differential Revision: http://reviews.llvm.org/D19733Sriraman Tallam2016-04-291-2/+1
| | | | llvm-svn: 268106
* Differential Revision: http://reviews.llvm.org/D19040Sriraman Tallam2016-04-221-4/+11
| | | | llvm-svn: 267229
* [X86] enable PIE for functionsAsaf Badouh2016-04-201-0/+29
| | | | | | | | Call locally defined function directly for PIE/fPIE Differential Revision: http://reviews.llvm.org/D19226 llvm-svn: 266863
* Test commit.Sriraman Tallam2016-04-111-1/+1
| | | | llvm-svn: 265976
* [X86] Introduction of FeatureX87.Andrey Turetskiy2016-03-231-0/+1
| | | | | | | | Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87. Differential Revision: http://reviews.llvm.org/D13979 llvm-svn: 264148
* Disable the vzeroupper insertion pass on PS4.Yunzhong Gao2016-02-121-0/+1
| | | | | | Differential Revision: http://reviews.llvm.org/D16837 llvm-svn: 260764
* Added Skylake client to X86 targets and featuresElena Demikhovsky2016-01-241-0/+1
| | | | | | | | | | | | | Changes in X86.td: I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X .. I added Skylake client processor and defined it's features FeatureADX was missing on KNL Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others Differential Revision: http://reviews.llvm.org/D16357 llvm-svn: 258659
* [AVX512] adding AVXVBMI feature flagMichael Zuckerman2016-01-171-0/+1
| | | | | | | | | | The feature flag is for VPERMB,VPERMI2B,VPERMT2B and VPMULTISHIFTQB instructions. More about the instruction can be found in: hattps://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf Differential Revision: http://reviews.llvm.org/D16190 llvm-svn: 258012
* [x86] adding PKU feature flagAsaf Badouh2015-12-151-0/+1
| | | | | | | | | the feature flag is essential for RDPKRU and WRPKRU instruction more about the instruction can be found in the SDM rev 56, vol 2 from http://www.intel.com/sdm Differential Revision: http://reviews.llvm.org/D15491 llvm-svn: 255644
* X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supportedHans Wennborg2015-12-041-0/+10
| | | | | | | | | | | | | | | These instructions are not supported by all CPUs in 64-bit mode. Emitting them causes Chromium to crash on start-up for users with such chips. (GCC puts these instructions behind -msahf on 64-bit for the same reason.) This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering from before r244503 when the instructions are not available. Differential Revision: http://reviews.llvm.org/D15240 llvm-svn: 254793
OpenPOWER on IntegriCloud