summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* AMDGPU: Annotate implicitarg.ptr usageMatt Arsenault2017-07-286-6/+32
| | | | | | | | | | | We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398
* [AArch64] Standardize suffixes for LSE Atomics mnemonics (NFCI)Joel Jones2017-07-283-130/+130
| | | | | | | | | | | | | | | | This NFC changeset standardizes the suffixes used for LSE Atomics instructions. It changes the existing suffixes - 'b', 'h', 's', 'd' - to the existing standard 'B', 'H', 'W' and 'X'. This changeset is the result of the code review discussion for D35319. Patch by: steleman Differential Revision: https://reviews.llvm.org/D35927 llvm-svn: 309384
* [ARM] Add the option to directly access TLS pointerStrahinja Petrovic2017-07-283-1/+16
| | | | | | | | | This patch enables choice for accessing thread local storage pointer (like '-mtp' in gcc). Differential Revision: https://reviews.llvm.org/D34408 llvm-svn: 309381
* [MachineOutliner] NFC: Split up getOutliningBenefitJessica Paquette2017-07-284-247/+250
| | | | | | | | | | | | | | | | | | | | | This is some more cleanup in preparation for some actual functional changes. This splits getOutliningBenefit into two cost functions: getOutliningCallOverhead and getOutliningFrameOverhead. These functions return the number of instructions that would be required to call a specific function and the number of instructions that would be required to construct a frame for a specific funtion. The actual outlining benefit logic is moved into the outliner, which calls these functions. The goal of refactoring getOutliningBenefit is to: - Get us closer to getting rid of the IsTailCall flag - Further split up "target-specific" things and "general algorithm" things llvm-svn: 309356
* ARMFrameLowering: Only set ExtraCSSpill for actually unused registers.Matthias Braun2017-07-281-9/+18
| | | | | | | | | | The code assumed that unclobbered/unspilled callee saved registers are unused in the function. This is not true for callee saved registers that are also used to pass parameters such as swiftself. rdar://33401922 llvm-svn: 309350
* [X86] Fix latent bug in sibcall eligibility logicReid Kleckner2017-07-281-0/+7
| | | | | | | | | | | | | | | | | | | The X86 tail call eligibility logic was correct when it was written, but the addition of inalloca and argument copy elision broke its assumptions. It was assuming that fixed stack objects were immutable. Currently, we aim to emit a tail call if no arguments have to be re-arranged in memory. This code would trace the outgoing argument values back to check if they are loads from an incoming stack object. If the stack argument is immutable, then we won't need to store it back to the stack when we tail call. Fortunately, stack objects track their mutability, so we can just make the obvious check to fix the bug. This was http://crbug.com/749826 llvm-svn: 309343
* Remove unused function from AArch64 backend (NFC)Adrian Prantl2017-07-272-16/+0
| | | | llvm-svn: 309336
* [X86] Don't lie about legality to TLI's demanded bits.Ahmed Bougacha2017-07-271-2/+2
| | | | | | | | | | | Like r309323, X86 had a typo where it passed the wrong flags to TLO. Found by inspection; I haven't been able to tickle this into having observable behavior. I don't think it does, given that X86 doesn't have custom demanded bits logic, and the generic logic doesn't have a lot of exposure to illegal constructs. llvm-svn: 309325
* [AArch64] Remove outdated comment. NFC.Ahmed Bougacha2017-07-271-2/+0
| | | | | | There hasn't been a ternary since r231987. llvm-svn: 309324
* [AArch64] Fix legality info passed to demanded bits for TBI opt.Ahmed Bougacha2017-07-271-2/+2
| | | | | | | | | | | | | | | The (seldom-used) TBI-aware optimization had a typo lying dormant since it was first introduced, in r252573: when asking for demanded bits, it told TLI that it was running after legalize, where the opposite was true. This is an important piece of information, that the demanded bits analysis uses to make assumptions about the node. r301019 added such an assumption, which was broken by the TBI combine. Instead, pass the correct flags to TLO. llvm-svn: 309323
* [ARM] Add use-misched feature, to enable the MachineScheduler.Florian Hahn2017-07-273-8/+16
| | | | | | | | | | | | | | | | | | | Summary: This change makes it easier to experiment with the MachineScheduler in the ARM backend and also makes it very explicit which CPUs use the MachineScheduler (currently only swift and cyclone). Reviewers: MatzeB, t.p.northover, javed.absar Reviewed By: MatzeB Subscribers: aemerson, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35935 llvm-svn: 309316
* [X86] SET0 to use XMM registers where possible PR26018 PR32862Dinar Temirbulatov2017-07-271-5/+14
| | | | | | Differential Revision: https://reviews.llvm.org/D35839 llvm-svn: 309298
* [TargetParser] Use enum classes for various ARM kind enums.Florian Hahn2017-07-277-62/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Using c++11 enum classes ensures that only valid enum values are used for ArchKind, ProfileKind, VersionKind and ISAKind. This removes the need for checks that the provided values map to a proper enum value, allows us to get rid of AK_LAST and prevents comparing values from different enums. It also removes a bunch of static_cast from unsigned to enum values and vice versa, at the cost of introducing static casts to access AArch64ARCHNames and ARMARCHNames by ArchKind. FPUKind and ArchExtKind are the only remaining old-style enum in TargetParser.h. I think it's beneficial to keep ArchExtKind as old-style enum, but FPUKind can be converted too, but this patch is quite big, so could do this in a follow-up patch. I could also split this patch up a bit, if people would prefer that. Reviewers: rengolin, javed.absar, chandlerc, rovka Reviewed By: rovka Subscribers: aemerson, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35882 llvm-svn: 309287
* [ARM] Mark labels in skipAlignedDPRCS2Spills as fallthrough (NFC).Florian Hahn2017-07-271-0/+2
| | | | | | | | | The comment at the top of the switch statement indicates that the fall-through behavior is intentional. By using LLVM_FALLTHROUGH, -Wimplicit-fallthrough are silenced, which is enabled by default in GCC 7. llvm-svn: 309272
* Added cost of ZEROALL and ZEROUPPER instrs in btver2 cpu.Andrew V. Tischenko2017-07-271-0/+11
| | | | | | Differential Revision https://reviews.llvm.org/D35834 llvm-svn: 309269
* Re-commit: r309094 [globalisel][tablegen] Fuse the generated tables together.Daniel Sanders2017-07-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Now that we have control flow in place, fuse the per-rule tables into a single table. This is a compile-time saving at this point. However, this will also enable the optimization of a table so that similar instructions can be tested together, reducing the time spent on the matching the code. This is NFC in terms of externally visible behaviour but some internals have changed slightly. State.MIs is no longer reset between each rule that is attempted because it's not necessary to do so. As a consequence of this the restriction on the order that instructions are added to State.MIs has been relaxed to only affect recorded instructions that require new elements to be added to the vector. GIM_RecordInsn can now write to any element from 1 to State.MIs.size() instead of just State.MIs.size(). The compile-time regressions from the last commit were caused by the ARM target including a non-const variable (zero_reg) in the table and therefore generating an initializer for it. That variable is now const. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35681 llvm-svn: 309264
* [X86] Tidyup MaskedLoad/Store mask creation. NFCI.Simon Pilgrim2017-07-271-8/+3
| | | | | | Assign all concat elements to zero and then just replace the first element, instead of setting them all to null and copying everything in. llvm-svn: 309261
* [PowerPC] enable optimizeCompareInstr for branch with static branch hintHiroshi Inoue2017-07-272-7/+31
| | | | | | | | | | In optimizeCompareInstr, a compare instruction is eliminated by using a record form instruction if possible. If the branch instruction that uses the result of the compare has a static branch hint, the optimization does not happen. This patch makes this optimization happen regardless of the branch hint by splitting branch hint and branch condition before checking the predicate to identify the possible optimizations. Differential Revision: https://reviews.llvm.org/D35801 llvm-svn: 309255
* [Hexagon] Fix expensive checks build bot broken in r309230.Eugene Zelenko2017-07-261-0/+1
| | | | llvm-svn: 309236
* [Hexagon] Partially revert r309230 which caused some build bots failures.Eugene Zelenko2017-07-261-9/+7
| | | | llvm-svn: 309233
* [Hexagon] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-07-2614-179/+237
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 309230
* [AMDGPU] Optimize SI_IF lowering for simple if regionsStanislav Mekhanoshin2017-07-261-8/+23
| | | | | | | | | | | | | Currently SI_IF results in a s_and_saveexec_b64 followed by s_xor_b64. The xor is used to extract only the changed bits. In case of a simple if region where the only use of that value is in the SI_END_CF to restore the old exec mask, we can omit the xor and perform an or of the exec mask with the original exec value saved by the s_and_saveexec_b64. Differential Revision: https://reviews.llvm.org/D35861 llvm-svn: 309185
* [ARM] Minor cosmetic edits (NFC)Evandro Menezes2017-07-262-2/+2
| | | | | | Change the order of a case and the description for Exynos Mx processors. llvm-svn: 309184
* [AArch64] Adjust the cost model for Exynos M1 and M2Evandro Menezes2017-07-261-2/+4
| | | | | | Add the information for the scalar reciprocal square root approximation. llvm-svn: 309183
* AMDGPU : Widen extending scalar loads to 32-bits.Wei Ding2017-07-261-0/+45
| | | | | | Differential Revision: http://reviews.llvm.org/D35146 llvm-svn: 309178
* AMDGPU: Fix using SMRD instructions for argument loads in functionsMatt Arsenault2017-07-262-1/+31
| | | | | | These are not actually uniform values except in kernels. llvm-svn: 309172
* AMDGPU/GlobalISel: Mark 32-bit G_OR as legalTom Stellard2017-07-261-0/+2
| | | | | | | | | | | | Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35127 llvm-svn: 309165
* Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. ↵Peter Collingbourne2017-07-2610-42/+40
| | | | | | | | NFCI. This was a use-after-free waiting to happen. llvm-svn: 309159
* Simplify. NFC.Rafael Espindola2017-07-261-176/+114
| | | | llvm-svn: 309141
* [Hexagon] Mark raise_relocation_error as NORETURN.Florian Hahn2017-07-261-0/+1
| | | | | | | | | | | | | | | | Summary: This silences a couple of implicit fallthrough warnings with GCC 7.1 in this file. Reviewers: colinl, kparzysz Reviewed By: kparzysz Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35889 llvm-svn: 309129
* [NFC] test commit.Stefan Pintilie2017-07-261-0/+8
| | | | | | Added a comment to explain how to add a PPCISD node. llvm-svn: 309114
* DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offsetZvi Rackover2017-07-262-0/+34
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: Adding support for combining power2-strided build_vector's where the first build_vectori's operand is extracted from a non-zero index. Example: v4i32 build_vector((extract_elt V, 1), (extract_elt V, 3), (extract_elt V, 5), (extract_elt V, 7)) --> v4i32 truncate (bitcast (shuffle<1,u,3,u,5,u,7,u> V, u) to v4i64) Reviewers: delena, RKSimon, guyblank Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35700 llvm-svn: 309108
* [COFF, ARM64] Fix symbol offsets in ADRP/ADD/LDR/STR relocationsMartin Storsjo2017-07-261-13/+33
| | | | | | | | | | | | | | | | | | | | | | | In COFF, a symbol offset can't be stored in the relocation (as is done in ELF or MachO), but is stored as the immediate in the instruction itself. The immediate in the ADRP thus is the symbol offset in bytes, not in pages. For the PAGEOFFSET_12A/L relocations, ignore any offset outside of the lowest 12 bits; they won't have any effect on the ADD/LDR/STR instruction itself but only on the associated ADRP. This is similar to how the same issue is handled for MOVW/MOVT instructions in ELF (see e.g. SVN r307713, and r307728 in lld). This fixes "fixup out of range" errors while building larger object files, where temporary symbols end up as a plain section symbol and an offset, and fixes any cases where the symbol offset mean that the actual target ended up on a different page than the symbol itself. Differential Revision: https://reviews.llvm.org/D35791 llvm-svn: 309105
* [ARM] GlobalISel: Map G_GLOBAL_VALUE to GPRDiana Picus2017-07-261-0/+1
| | | | | | A G_GLOBAL_VALUE is basically a pointer, so it should live in the GPR. llvm-svn: 309101
* [ARM] GlobalISel: Mark G_GLOBAL_VALUE as legalDiana Picus2017-07-261-0/+1
| | | | llvm-svn: 309090
* [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.Michael Zuckerman2017-07-261-7/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch expands the support of lowerInterleavedStore to 32x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low). The patch goal is to optimize the following sequence: At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding each 32 chars: c0, c1, , c31 m0, m1, , m31 y0, y1, , y31 k0, k1, ., k31 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Reviewers: dorit Farhana RKSimon guyblank DavidKreitzer Differential Revision: https://reviews.llvm.org/D34601 llvm-svn: 309086
* TargetLowering: Change isShuffleMaskLegal's mask argument type to ↵Zvi Rackover2017-07-2611-22/+12
| | | | | | | | | | | | | ArrayRef<int>. NFCI. Changing mask argument type from const SmallVectorImpl<int>& to ArrayRef<int>. This came up in D35700 where a mask is received as an ArrayRef<int> and we want to pass it to TargetLowering::isShuffleMaskLegal(). Also saves a few lines of code. llvm-svn: 309085
* [X86][LLVM]Expanding Supports lowerInterleavedStore() in ↵Michael Zuckerman2017-07-262-47/+47
| | | | | | | | | | | | | | | | X86InterleavedAccess part1. splitting patch D34601 into two part. This part changes the location of two functions. The second part will be based on that patch. This was requested by @RKSimon. Reviewers: 1. dorit 2. Farhana 3. RKSimon 4. guyblank 5. DavidKreitzer llvm-svn: 309084
* [X86] Prevent selecting masked aligned load instructions if the load should ↵Craig Topper2017-07-261-3/+6
| | | | | | | | | | | | | | | | be non-temporal Summary: The aligned load predicates don't suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35712 llvm-svn: 309079
* [AArch64] Fix some Clang-tidy modernize-use-using and Include What You Use ↵Eugene Zelenko2017-07-2511-184/+279
| | | | | | warnings; other minor fixes (NFC). llvm-svn: 309062
* Update the comments on default subtargets based on feedback.Eric Christopher2017-07-254-8/+12
| | | | llvm-svn: 309041
* AMDGPU/SI: Fix Depth and Height computation for SI schedulerMarek Olsak2017-07-251-3/+3
| | | | | | | | Patch by: Axel Davy Differential Revision: https://reviews.llvm.org/D34967 llvm-svn: 309028
* AMDGPU/SI: Force exports at the end for SI schedulerMarek Olsak2017-07-252-0/+60
| | | | | | | | Patch by: Axel Davy Differential Revision: https://reviews.llvm.org/D34965 llvm-svn: 309027
* Revert "This patch enables the usage of constant Enum identifiers within ↵Eric Christopher2017-07-251-55/+21
| | | | | | | | Microsoft style inline assembly statements." This reverts commit r308966. llvm-svn: 309005
* [PowerPC] Pretty-print CR bits the way the binutils disassembler doesNemanja Ivanovic2017-07-251-11/+26
| | | | | | | | | This patch just adds printing of CR bit registers in a more human-readable form akin to that used by the GNU binutils. Differential Revision: https://reviews.llvm.org/D31494 llvm-svn: 309001
* [PowerPC] - Recommit r304907 now that the issue has been fixedNemanja Ivanovic2017-07-251-0/+26
| | | | | | | This is just a recommit since the issue that the commit exposed is now resolved. llvm-svn: 308995
* [X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)Simon Pilgrim2017-07-251-2/+2
| | | | | | | | | | | | D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914). Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os). This patch should be considered for the 5.0.0 release branch as well Differential Revision: https://reviews.llvm.org/D35830 llvm-svn: 308986
* [Sparc] invalid adjustments in TLS_LE/TLS_LDO relocations removedFedor Sergeev2017-07-251-8/+7
| | | | | | | | | | | | | | | | | | | Summary: Some SPARC TLS relocations were applying nontrivial adjustments to zero value, leading to unexpected non-zero values in ELF and then Solaris linker failures. Getting rid of these adjustments. Fixes PR33825. Reviewers: rafael, asb, jyknight Subscribers: joerg, jyknight, llvm-commits Differential Revision: https://reviews.llvm.org/D35567 llvm-svn: 308978
* X86 Asm uses assertions instead of proper diagnostic. This patch fixes that.Andrew V. Tischenko2017-07-251-23/+57
| | | | | | Differential Revision: https://reviews.llvm.org/D35115 llvm-svn: 308972
* This patch enables the usage of constant Enum identifiers within Microsoft ↵Matan Haroush2017-07-251-21/+55
| | | | | | | | | | style inline assembly statements. Differential Revision: https://reviews.llvm.org/D33277 https://reviews.llvm.org/D33278 llvm-svn: 308966
OpenPOWER on IntegriCloud