summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86/X86Subtarget.h
Commit message (Collapse)AuthorAgeFilesLines
* [LLVM][X86][Goldmont] Adding new target-cpu: GoldmontMichael Zuckerman2017-06-291-1/+1
| | | | | | | | | | | | | | | | [LLVM SIDE] Connecting the GoldMont processor to his feature. Reviewers: 1. igorb 2. zvi 3. delena 4. RKSimon 5. craig.topper Differential Revision: https://reviews.llvm.org/D34504 llvm-svn: 306658
* [X86] Adding vpopcntd and vpopcntq instructionsOren Ben Simhon2017-05-251-0/+4
| | | | | | | | | AVX512_VPOPCNTDQ is a new feature set that was published by Intel. The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq). Differential Revision: https://reviews.llvm.org/D33169 llvm-svn: 303858
* [globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to ↵Daniel Sanders2017-05-191-8/+1
| | | | | | | | | | | | | | | | | | per-function predicates. Summary: This causes them to be re-computed more often than necessary but resolves objections that were raised post-commit on r301750. Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls Reviewed By: qcolombet Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32861 llvm-svn: 303418
* [X86] Replace slow LEA instructions in X86Lama Saba2017-05-181-0/+6
| | | | | | | | | | | | | | | According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303333
* Revert "[X86] Replace slow LEA instructions in X86"Reid Kleckner2017-05-161-6/+0
| | | | | | | This reverts commit r303183, it broke various buildbots and introduced sanitizer errors. llvm-svn: 303199
* [X86] Replace slow LEA instructions in X86Lama Saba2017-05-161-0/+6
| | | | | | | | | | | | | | | According to Intel's Optimization Reference Manual for SNB+: " For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must dispatch via port 1: - LEA that has all three source operands: base, index, and offset - LEA that uses base and index registers where the base is EBP, RBP,or R13 - LEA that uses RIP relative addressing mode - LEA that uses 16-bit addressing mode " This patch currently handles the first 2 cases only. Differential Revision: https://reviews.llvm.org/D32277 llvm-svn: 303183
* [X86][LWP] Add llvm support for LWP instructions (reapplied).Simon Pilgrim2017-05-031-0/+4
| | | | | | | | | | This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Reapplied - this time without changing line endings of existing files. Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302041
* Revert rL302028 due to accidental line ending changes.Simon Pilgrim2017-05-031-4/+0
| | | | llvm-svn: 302038
* [X86][LWP] Add llvm support for LWP instructions.Simon Pilgrim2017-05-031-0/+4
| | | | | | | | This patch adds support for the the LightWeight Profiling (LWP) instructions which are available on all AMD Bulldozer class CPUs (bdver1 to bdver4). Differential Revision: https://reviews.llvm.org/D32769 llvm-svn: 302028
* [globalisel][tablegen] Compute available feature bits correctly.Daniel Sanders2017-04-291-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Predicate<> now has a field to indicate how often it must be recomputed. Currently, there are two frequencies, per-module (RecomputePerFunction==0) and per-function (RecomputePerFunction==1). Per-function predicates are currently recomputed more frequently than necessary since the only predicate in this category is cheap to test. Per-module predicates are now computed in getSubtargetImpl() while per-function predicates are computed in selectImpl(). Tablegen now manages the PredicateBitset internally. It should only be necessary to add the required includes. Also fixed a problem revealed by the test case where constrainSelectedInstRegOperands() would attempt to tie operands that BuildMI had already tied. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D32491 llvm-svn: 301750
* Rename FastString flag.Clement Courbet2017-04-211-3/+3
| | | | llvm-svn: 300959
* X86 memcpy: use REPMOVSB instead of REPMOVS{Q,D,W} for inline copiesClement Courbet2017-04-211-0/+4
| | | | | | | | | | | | when the subtarget has fast strings. This has two advantages: - Speed is improved. For example, on Haswell thoughput improvements increase linearly with size from 256 to 512 bytes, after which they plateau: (e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes). - Code is much smaller (no need to handle boundaries). llvm-svn: 300957
* This patch closes PR#32216: Better testing of schedule model instruction ↵Andrew V. Tischenko2017-04-141-0/+3
| | | | | | | | latencies/throughputs. The details are here: https://reviews.llvm.org/D30941 llvm-svn: 300311
* [AVX-512] Make VEX encoded FMA instructions available when AVX512 is enabled ↵Craig Topper2017-03-171-2/+2
| | | | | | | | | | | | regardless of whether +fma was added on the command line. We weren't able to handle isel of the 128/256-bit FMA instructions when AVX512F was enabled but VLX and FMA weren't. I didn't mask FeatureAVX512 imply FeatureFMA as I wasn't sure I wanted disabling FMA to also disable AVX512. Instead we just can't prevent FMA instructions if AVX512 is enabled. Another option would be to promote 128/256-bit to 512-bit, do the operation and extract it. But that requires a lot of extra isel patterns. Since no CPUs exist that support AVX512, but not FMA just using the VEX instructions seems better. llvm-svn: 298051
* [X86] Generate VZEROUPPER for Skylake-avx512.Amjad Aboud2017-03-031-3/+5
| | | | | | | | VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859
* [Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stackPetr Hosek2017-02-241-0/+1
| | | | | | | | | | | | The Fuchsia ABI defines slots from the thread pointer where the stack-guard value for stack-protector, and the unsafe stack pointer for safe-stack, are stored. This parallels the Android ABI support. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30237 llvm-svn: 296081
* [X86] Use SHLD with both inputs from the same register to implement rotate ↵Craig Topper2017-02-211-0/+4
| | | | | | | | | | | | | | | | | | | on Sandy Bridge and later Intel CPUs Summary: Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency. This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do. Reviewers: zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30181 llvm-svn: 295697
* [X86] Remove the HLE feature flag.Craig Topper2017-02-091-4/+0
| | | | | | We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line. llvm-svn: 294562
* [X86] Remove INVPCID and SMAP feature flags. They aren't currently used by ↵Craig Topper2017-02-091-6/+0
| | | | | | | | any instructions and not tested. If we implement intrinsics for their instructions in the future, the feature flags can be added back with proper testing. llvm-svn: 294561
* [X86] Clzero intrinsic and its addition under znver1Craig Topper2017-02-091-0/+4
| | | | | | | | | | | | | | | | | This patch does the following. 1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero 2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1) 3. Adds the clzero feature under znver1 architecture. 4. The custom inserter is added in Lowering. 5. A testcase is added to check the intrinsic. 6. The clzero instruction is added to assembler test. Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me. Differential revision: https://reviews.llvm.org/D29385 llvm-svn: 294558
* [X86] Add test for clflushopt intrinsic and only enable it to be selected if ↵Craig Topper2017-02-081-0/+1
| | | | | | the feature flag is set. llvm-svn: 294407
* [X86] Remove the VMFUNC feature flag. It was only partially implemented and ↵Craig Topper2017-02-081-3/+0
| | | | | | | | we have no support for codegening vmfunc instructions today. If that support ever gets added, the full feature flag support should come along with it. llvm-svn: 294406
* [X86] Remove PCOMMIT instruction support since Intel has deprecated this ↵Craig Topper2017-02-081-3/+0
| | | | | | | | instruction with no plans to release products with it. Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction llvm-svn: 294405
* [X86] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵Eugene Zelenko2017-02-021-9/+21
| | | | | | minor fixes (NFC). llvm-svn: 293949
* [X86] Tune bypassing of slow division for Intel CPUsNikolai Bozhenov2017-01-121-1/+1
| | | | | | | | | | | | | | | | | | 64-bit integer division in Intel CPUs is extremely slow, much slower than 32-bit division. On the other hand, 8-bit and 16-bit divisions aren't any faster. The only important exception is Atom where DIV8 is fastest. Because of that, the patch 1) Enables bypassing of 64-bit division for Atom, Silvermont and all big cores. 2) Modifies 64-bit bypassing to use 32-bit division instead of 16-bit one. This doesn't make the shorter division slower but increases chances of taking it. Moreover, it's much more likely to prove at compile-time that a value fits 32 bits and doesn't require a run-time check (e.g. zext i32 to i64). Differential Revision: https://reviews.llvm.org/D28196 llvm-svn: 291800
* [X86] Prefer reduced width multiplication over pmulld on SilvermontZvi Rackover2016-12-061-0/+5
| | | | | | | | | | | | | | | | | | | | Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference Manual]: PMULLD has a throughput of 1/11 [instruction/cycles]. PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles]. Fixes pr31202. Analysis of this issue was done by Fahana Aleen. Reviewers: wmi, delena, mkuper Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27203 llvm-svn: 288844
* [PS4] Tighten up a triple check.Paul Robinson2016-11-301-1/+1
| | | | llvm-svn: 288286
* [X86][GlobalISel] Add minimal call lowering support to the IRTranslatorZvi Rackover2016-11-151-0/+13
| | | | | | | | | | | | | | | Summary: Add basic functionality to support call lowering for X86. Currently only supports functions which return void and take zero arguments. Inspired by commit 286573. Reviewers: ab, qcolombet, t.p.northover Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26593 llvm-svn: 286935
* [X86] Take advantage of the lzcnt instruction on btver2 architectures when ↵Pierre Gousseau2016-10-141-0/+4
| | | | | | | | | | | | | | | | ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248
* [XRay] ARM 32-bit no-Thumb support in LLVMDean Michael Berris2016-09-191-0/+2
| | | | | | | | | | | | This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: https://reviews.llvm.org/D23932 (Clang test) https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 281878
* Revert "[XRay] ARM 32-bit no-Thumb support in LLVM"Renato Golin2016-09-081-2/+0
| | | | | | | | | | And associated commits, as they broke the Thumb bots. This reverts commit r280935. This reverts commit r280891. This reverts commit r280888. llvm-svn: 280967
* [XRay] ARM 32-bit no-Thumb support in LLVMDean Michael Berris2016-09-081-0/+2
| | | | | | | | | | | | This is a port of XRay to ARM 32-bit, without Thumb support yet. The XRay instrumentation support is moving up to AsmPrinter. This is one of 3 commits to different repositories of XRay ARM port. The other 2 are: 1. https://reviews.llvm.org/D23932 (Clang test) 2. https://reviews.llvm.org/D23933 (compiler-rt) Differential Revision: https://reviews.llvm.org/D23931 llvm-svn: 280888
* [X86] Heuristic to selectively build Newton-Raphson SQRT estimationNikolai Bozhenov2016-08-041-0/+10
| | | | | | | | | | | | | | | | | | | | | On modern Intel processors hardware SQRT in many cases is faster than RSQRT followed by Newton-Raphson refinement. The patch introduces a simple heuristic to choose between hardware SQRT instruction and Newton-Raphson software estimation. The patch treats scalars and vectors differently. The heuristic is that for scalars the compiler should optimize for latency while for vectors it should optimize for throughput. It is based on the assumption that throughput bound code is likely to be vectorized. Basically, the patch disables scalar NR for big cores and disables NR completely for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores. Secondly, vector SQRT has been greatly improved in Skylake and has better throughput compared to NR. Differential Revision: https://reviews.llvm.org/D21379 llvm-svn: 277725
* Convert a few more comparisons to isPositionIndependent(). NFC.Rafael Espindola2016-06-271-3/+1
| | | | llvm-svn: 273945
* Simplify PICStyles.Rafael Espindola2016-06-201-6/+4
| | | | | | | | The main difference is that StubDynamicNoPIC is gone. The dynamic-no-pic mode as the name implies is simply not pic. It is just conservative about what it assumes to be dso local. llvm-svn: 273222
* Delete dead code. NFC.Rafael Espindola2016-06-201-8/+0
| | | | llvm-svn: 273206
* [X86Subtarget] Use isPositionIndependent(). NFC.Davide Italiano2016-06-181-0/+4
| | | | | | Differential Revision: http://reviews.llvm.org/D21480 llvm-svn: 273071
* [X86] Reduce memory allocations in X86TargetMachine::getSubtargetImplDavid Majnemer2016-05-201-1/+1
| | | | | | | | We performed a number of memory allocations each time getTTI was called, remove them by using SmallString. No functionality change intended. llvm-svn: 270246
* Refactor X86 symbol access classification.Rafael Espindola2016-05-201-7/+6
| | | | | | | | | | | | This refactors the logic in X86 to avoid code duplication. It also splits it in two steps: it first decides if a symbol is local to the DSO and then uses that information to decide how to access it. The first part is implemented by shouldAssumeDSOLocal. It is not in any way specific to X86. In a followup patch I intend to move it to somewhere common and reused it in other backends. llvm-svn: 270209
* Record a TargetMachine instead of a Reloc::Model.Rafael Espindola2016-05-191-3/+2
| | | | | | Addresses r270095's code review. llvm-svn: 270147
* Remember the relocation model. NFC.Rafael Espindola2016-05-191-5/+5
| | | | | | This avoids passing a TargetMachine in a few places. llvm-svn: 270095
* Style fixes. NFC.Rafael Espindola2016-05-191-8/+6
| | | | llvm-svn: 270093
* Add new flag and intrinsic support for MWAITX and MONITORX instructionsAshutosh Nema2016-05-181-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT pair while adding a timer function, such that another termination of the MWAITX instruction occurs when the timer expires. The presence of the MONITORX and MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29. The MONITORX and MWAITX instructions are intercepted by the same bits that intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be monitored. MWAITX instruction causes the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is "0F 01 FB". These opcode information is used in adding tests for the disassembler. These instructions are enabled for AMD's bdver4 architecture. Patch by Ganesh Gopalasubramanian! Reviewers: echristo, craig.topper, RKSimon Subscribers: RKSimon, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D19795 llvm-svn: 269911
* [X86] Extend some Linux special cases to cover kFreeBSD.Marcin Koscielnicki2016-05-051-0/+2
| | | | | | | | | | | | Both Linux and kFreeBSD use glibc, so follow similiar code paths. Add isTargetGlibc to check for this, and use it instead of isTargetLinux in a few places. Fixes PR22248 for kFreeBSD. Differential Revision: http://reviews.llvm.org/D19104 llvm-svn: 268624
* Differential Revision: http://reviews.llvm.org/D19733Sriraman Tallam2016-04-291-1/+1
| | | | llvm-svn: 268106
* Differential Revision: http://reviews.llvm.org/D19040Sriraman Tallam2016-04-221-0/+8
| | | | llvm-svn: 267229
* [X86] enable PIE for functionsAsaf Badouh2016-04-201-0/+5
| | | | | | | | Call locally defined function directly for PIE/fPIE Differential Revision: http://reviews.llvm.org/D19226 llvm-svn: 266863
* [X86] Introduction of FeatureX87.Andrey Turetskiy2016-03-231-0/+4
| | | | | | | | Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87. Differential Revision: http://reviews.llvm.org/D13979 llvm-svn: 264148
* Remove Proc feature flags for X86 processors that are used to inherit ↵Craig Topper2016-02-131-2/+1
| | | | | | features from one processor to another. This exposed extra features to the -mattr command line that we shouldn't. Replace with just inherited listconcats. llvm-svn: 260832
* [x86-64] allow mfence even with -mno-sse (PR23203)Sanjay Patel2016-02-131-0/+5
| | | | | | | | | | | | | As shown in: https://llvm.org/bugs/show_bug.cgi?id=23203 ...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64, but the instruction def doesn't know that. I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :) Differential Revision: http://reviews.llvm.org/D17219 llvm-svn: 260828
OpenPOWER on IntegriCloud