summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* Revert "ARM: switch armv7em triple to hard-float defaults and libcalls."Tim Northover2018-07-181-1/+0
| | | | | | This reverts commit r337385 until it can be targeted at MachO only. llvm-svn: 337424
* [X86][SSE] Canonicalize scalar fp arithmetic shuffle patternsSimon Pilgrim2018-07-181-2/+31
| | | | | | | | | | | | As discussed on PR38197, this canonicalizes MOVS*(N0, OP(N0, N1)) --> MOVS*(N0, SCALAR_TO_VECTOR(OP(N0[0], N1[0]))) This returns the scalar-fp codegen lost by rL336971. Additionally it handles the OP(N1, N0)) case for commutable (FADD/FMUL) ops. Differential Revision: https://reviews.llvm.org/D49474 llvm-svn: 337419
* [mips] Fix predicate for the MipsTruncIntFP patternSimon Atanasyan2018-07-181-1/+1
| | | | | | | | | This is a follow-up to the rL337171. This patch fixes regression introduced by the r337171 and enables MipsTruncIntFP pattern. Differential revision: https://reviews.llvm.org/D49469 llvm-svn: 337392
* [X86][SSE] Remove BLENDPD canonicalization from combineTargetShuffleSimon Pilgrim2018-07-181-25/+0
| | | | | | When rL336971 removed the scalar-fp isel patterns, we lost the need for this canonicalization - commutation/folding can handle everything else. llvm-svn: 337387
* ARM: stop explicitly marking armv7k libcalls as hard-float. NFC.Tim Northover2018-07-181-7/+0
| | | | | | | Since the triple's default is hard float, the libcalls will already use VFP registers. llvm-svn: 337386
* ARM: switch armv7em triple to hard-float defaults and libcalls.Tim Northover2018-07-181-0/+1
| | | | | | | We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. llvm-svn: 337385
* ARM: deduplicate hard-float detection code. NFC.Tim Northover2018-07-184-12/+12
| | | | | | | | ARMSubtarget had a copy/pasted block to determine whether the target was hard-float, but it just delegated to triple features anyway so it's better at the TargetMachine level. llvm-svn: 337384
* [AArch64][SVE] Asm: Support for unpredicated FP operations.Sander de Smalen2018-07-182-0/+35
| | | | | | | | | | | | | | | | | | This patch adds support for the following unpredicated floating-point instructions: FADD Floating point add FSUB Floating point subtract FMUL Floating point multiplication FTSMUL Floating point trigonometric starting value FRECPS Floating point reciprocal step FRSQRTS Floating point reciprocal square root step The instructions have the following assembly format: fadd z0.h, z1.h, z2.h and have variants for 16, 32 and 64-bit FP elements. llvm-svn: 337383
* Revert "[Sparc] Use the IntPair reg class for r constraints with value type f64"Daniel Cederman2018-07-181-1/+1
| | | | | | | | | This reverts commit 55222c9183c6e07f53a54c4061677734f54feac1. I missed that this patch has a dependency on https://reviews.llvm.org/D49219 that has not been approved yet. llvm-svn: 337373
* [AArch64][SVE] Asm: Support for UDOT/SDOT instructions.Sander de Smalen2018-07-182-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | The signed/unsigned DOT instructions perform a dot-product on quadtuplets from two source vectors and accumulate the result in the destination register. The instructions come in two forms: Vector form, e.g. sdot z0.s, z1.b, z2.b - signed dot product on four 8-bit quad-tuplets, accumulating results in 32-bit elements. udot z0.d, z1.h, z2.h - unsigned dot product on four 16-bit quad-tuplets, accumulating results in 64-bit elements. Indexed form, e.g. sdot z0.s, z1.b, z2.b[3] - signed dot product on four 8-bit quad-tuplets with specified quadtuplet from second source vector, accumulating results in 32-bit elements. udot z0.d, z1.h, z2.h[1] - dot product on four 16-bit quad-tuplets with specified quadtuplet from second source vector, accumulating results in 64-bit elements. llvm-svn: 337372
* [Sparc] Use the IntPair reg class for r constraints with value type f64Daniel Cederman2018-07-181-1/+1
| | | | | | | | | | | | | | | Summary: This is how it appears to be handled in GCC and it prevents a "Unknown mismatch" error in the SelectionDAGBuilder. Reviewers: venkatra, jyknight, jrtc27 Reviewed By: jyknight, jrtc27 Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D49218 llvm-svn: 337370
* [AArch64][SVE] Asm: Integer divide instructions.Sander de Smalen2018-07-182-0/+11
| | | | | | | | | | | | | | | | | | This patch adds the following predicated instructions: UDIV Unsigned divide active elements UDIVR Unsigned divide active elements, reverse form. SDIV Signed divide active elements SDIVR Signed divide active elements, reverse form. e.g. udiv z0.s, p0/m, z0.s, z1.s (unsigned divide active elements in z0 by z1, store result in z0) sdivr z0.s, p0/m, z0.s, z1.s (signed divide active elements in z1 by z0, store result in z0) llvm-svn: 337369
* [AArch64][SVE] Asm: Support for integer MUL instructions.Sander de Smalen2018-07-182-8/+26
| | | | | | | | | | | | | | | | This patch adds the following instructions: MUL - multiply vectors, e.g. mul z0.h, p0/m, z0.h, z1.h - multiply with immediate, e.g. mul z0.h, z0.h, #127 SMULH - signed multiply returning high half, e.g. smulh z0.h, p0/m, z0.h, z1.h UMULH - unsigned multiply returning high half, e.g. umulh z0.h, p0/m, z0.h, z1.h llvm-svn: 337358
* [X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by ↵Craig Topper2018-07-183-16/+38
| | | | | | | | using VMOVLPS with a modified address. This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable. llvm-svn: 337357
* [NFC] fix trivial typos in commentsHiroshi Inoue2018-07-182-4/+4
| | | | llvm-svn: 337351
* Fix build failures from r337347, found by clangJustin Hibbits2018-07-183-15/+6
| | | | | | | | | | | * Delete a no-longer-used override, and mark the other getRegisterTypeForCallingConv() as override. * SPE only supports i32, not i64, as the internal type, so simply remove the type check, so that DestReg and Opc are provably always set. GCC 6.4 did not warn about either of the above. llvm-svn: 337350
* [X86] Remove patterns that mix X86ISD::MOVLHPS/MOVHLPS with v2i64/v2f64 types.Craig Topper2018-07-182-33/+0
| | | | | | The X86ISD::MOVLHPS/MOVHLPS should now only be emitted in SSE1 only. This means that the v2i64/v2f64 types would be illegal thus we don't need these patterns. llvm-svn: 337349
* [X86] Generate v2f64 X86ISD::UNPCKL/UNPCKH instead of ↵Craig Topper2018-07-182-4/+17
| | | | | | | | | | | | X86ISD::MOVLHPS/MOVHLPS for unary v2f64 {0,0} and {1,1} shuffles with SSE2. I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts. I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code. We already support domain switching for UNPCKLPD and MOVLHPS. llvm-svn: 337348
* Introduce codegen for the Signal Processing EngineJustin Hibbits2018-07-1818-614/+1323
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1, e500v2, and several e200 cores. This adds support targeting the e500v2, as this is more common than the e500v1, and is in SoCs still on the market. This patch is very intrusive because the SPE is binary incompatible with the traditional FPU. After discussing with others, the cleanest solution was to make both SPE and FPU features on top of a base PowerPC subset, so all FPU instructions are now wrapped with HasFPU predicates. Supported by this are: * Code generation following the SPE ABI at the LLVM IR level (calling conventions) * Single- and Double-precision math at the level supported by the APU. Still to do: * Vector operations * SPE intrinsics As this changes the Callee-saved register list order, one test, which tests the precise generated code, was updated to account for the new register order. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D44830 llvm-svn: 337347
* Complete the SPE instruction set patternsJustin Hibbits2018-07-186-225/+562
| | | | | | | | | This is the lead-up to having SPE codegen. Add the rest of the instructions, along with MC tests. Differential Revision: https://reviews.llvm.org/D44829 llvm-svn: 337346
* Add PowerPC e500(v2) core scheduler and directives.Justin Hibbits2018-07-187-220/+497
| | | | | | Differential Revision: https://reviews.llvm.org/D44828 llvm-svn: 337345
* [X86] Remove the vector alignment requirement from the patterns added in ↵Craig Topper2018-07-171-2/+4
| | | | | | | | r337320. The resulting instruction will only load 64 bits so alignment isn't required. llvm-svn: 337334
* [X86] Add patterns for folding full vector load into MOVHPS and MOVLPS with ↵Craig Topper2018-07-172-16/+25
| | | | | | SSE1 only. llvm-svn: 337320
* [x86/SLH] Flesh out the data-invariant instruction table a bit based on ↵Chandler Carruth2018-07-171-7/+37
| | | | | | | | | | | | | | | | | | | | | | | | | feedback from Craig. Summary: The only thing he suggested that I've skipped here is the double-wide multiply instructions. Multiply is an area I'm nervous about there being some hidden data-dependent behavior, and it doesn't seem important for any benchmarks I have, so skipping it and sticking with the minimal multiply support that matches what I know is widely used in existing crypto libraries. We can always add double-wide multiply when we have clarity from vendors about its behavior and guarantees. I've tried to at least cover the fundamentals here with tests, although I've not tried to cover every width or permutation. I can add more tests where folks think it would be helpful. Reviewers: craig.topper Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D49413 llvm-svn: 337308
* [WebAssembly] Update WebAssemblyLowerEmscriptenEHSjLj to handle separate ↵Sam Clegg2018-07-171-38/+17
| | | | | | | | | | | | | | | | | | | | | compilation Previously we were assuming whole program compilation. Now that separate compilation is a thing we need to update this pass. Firstly, it can no longer assert on the existence of malloc and free. This functions might not be in the current translation unit. If we need them then we will generate not imports for them. Secondly the global helper function we create should be marked as weak since we will be generating a separate copy in each translation unit. Finally the names of the symbols used must be unique and fixed since they need to agree across translation units. Differential Revision: https://reviews.llvm.org/D49263 llvm-svn: 337301
* [X86] Remove some standalone patterns in favor of the patterns in the MOVLPD ↵Craig Topper2018-07-172-20/+2
| | | | | | | | instruction definitions. Previously we passed 'null_frag' into the instruction definition. The multiclass is shared with MOVHPD which doesn't use null_frag. It turns out by passing X86Movsd it produces patterns equivalent to some standalone patterns. llvm-svn: 337299
* [AArch64][SVE]: Integer multiply-add/subtract instructions.Sander de Smalen2018-07-172-0/+69
| | | | | | | | | | This patch adds support for the following instructions: MLA mul-add, writing addend (Zda = Zda + Zn * Zm) MLS mul-sub, writing addend (Zda = Zda + -Zn * Zm) MAD mul-add, writing multiplicant (Zdn = Za + Zdn * Zm) MSB mul-sub, writing multiplicant (Zdn = Za + -Zdn * Zm) llvm-svn: 337293
* [Mips][FastISel] Fix handling of icmp with i1 typePetar Jovanovic2018-07-171-0/+4
| | | | | | | | | | | The Mips FastISel back-end does not extend i1 values while lowering icmp. Ensure that we bail into DAG ISel when handling this case. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D49290 llvm-svn: 337288
* [AArch64][SVE] Asm: FP fused multiply-add/subtract instructions.Sander de Smalen2018-07-172-0/+119
| | | | | | | | | | | | | | | | | | This patch adds support for the following instructions: FMLA mul-add, writing addend (Zda = Zda + Zn * Zm) FNMLA negated mul-add, writing addend (Zda = -Zda + -Zn * Zm) FMLS mul-sub, writing addend (Zda = Zda + -Zn * Zm) FNMLS negated mul-sub, writing addend (Zda = -Zda + Zn * Zm) FMAD mul-add, writing multiplicant (Zdn = Za + Zdn * Zm) FNMAD negated mul-add, writing multiplicant (Zdn = -Za + -Zdn * Zm) FMSB mul-sub, writing multiplicant (Zdn = Za + -Zdn * Zm) FNMSB negated mul-sub, writing multiplicant (Zdn = -Za + Zdn * Zm) llvm-svn: 337282
* [AArch64][SVE] Asm: Support for predicated FP operations (FP immediate)Sander de Smalen2018-07-171-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | This patch completes support for the following floating point instructions that take FP immediates: FADD* (addition) FSUB (subtract) FSUBR (subtract reverse form) FMUL* (multiplication) FMAX* (maximum) FMAXNM (maximum number) FMIN (maximum) FMINNM (maximum number) All operations are predicated and take a FP immediate operand, e.g. fadd z0.h, p0/m, z0.h, #0.5 fmin z0.s, p0/m, z0.s, #1.0 ^___________^ (tied) * Instructions added in a previous patch. llvm-svn: 337272
* [LLVM-C] Add target triple normalization to the C API.whitequark2018-07-171-0/+4
| | | | | | | | | | | | | | | | | | rL333307 was introduced to remove automatic target triple normalization when calling sys::getDefaultTargetTriple(), arguing that users of the latter already called Triple::normalize() if necessary. However, users of the C API currently have no way of doing target triple normalization. This patch introduces an LLVMNormalizeTargetTriple function to the C API which wraps Triple::normalize() and can be used on the result of LLVMGetDefaultTargetTriple to achieve the same effect. Differential Revision: https://reviews.llvm.org/D49414 Reviewed By: whitequark llvm-svn: 337263
* [AArch64][SVE] Asm: Support for predicated FP operations.Sander de Smalen2018-07-172-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the following floating point instructions: FABD (absolute difference) FADD (addition) FSUB (subtract) FSUBR (subtract reverse form) FDIV (divide) FDIVR (divide reverse form) FMAX (maximum) FMAXNM (maximum number) FMIN (minimum) FMINNM (minimum number) FSCALE (adjust exponent) FMULX (multiply extended) All operations are predicated and binary form, e.g. fadd z0.h, p0/m, z0.h, z1.h ^___________^ (tied) Supporting 16, 32 and 64-bit FP elements. llvm-svn: 337259
* [DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELTSimon Pilgrim2018-07-171-10/+26
| | | | | | | | If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops. Differential Revision: https://reviews.llvm.org/D49262 llvm-svn: 337258
* [AArch64][SVE] Asm: Support for SPLICE instruction.Sander de Smalen2018-07-172-0/+26
| | | | | | | | | | | | | | | | The SPLICE instruction splices two vectors into one vector using a predicate. It copies the active elements from the first vector, and then fills the remaining elements with the low-numbered elements from the second vector. The instruction has the following form, e.g. splice z0.b, p0, z0.b, z1.b for 8-bit elements. It also supports 16, 32 and 64-bit elements. llvm-svn: 337253
* [AArch64][SVE] Asm: Support for EXT instruction.Sander de Smalen2018-07-172-0/+21
| | | | | | | | | | | | | This patch adds an instruction that allows extracting a vector from a pair of vectors, given an immediate index that describes the element position to extract from. The instruction has the following assembly: ext z0.b, z0.b, z1.b, #imm where #imm is an immediate between 0 and 255. llvm-svn: 337251
* [X86] Properly qualify some MOVSS/MOVSD patterns with OptSize.Craig Topper2018-07-171-12/+13
| | | | | | These are integer versions of patterns that I already fixed for floating point. llvm-svn: 337240
* [Sparc] Do not depend on icc for ta 1Daniel Cederman2018-07-171-2/+2
| | | | | | | | | | The ta instruction will always trap, regardless of the value of the integer condition codes. TRAPri is marked as using icc, so we cannot use a pattern for TRAPri to implement ta 1, as verify-machineinstrs can complain that icc is not defined. Instead we implement ta 1 the same way as ta 5. llvm-svn: 337236
* [X86] Add full set of patterns for turning ceil/floor/trunc/rint/nearbyint ↵Craig Topper2018-07-171-178/+197
| | | | | | | | into rndscale with loads, broadcast, and masking. This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from. llvm-svn: 337234
* [X86] Add a missing FMA3 scalar intrinsic pattern.Craig Topper2018-07-161-0/+7
| | | | | | This allows us to use 231 form to fold an insertelement on the add input to the fma. There is technically no software intrinsic that can use this until AVX512F, but it can be manually built up from other intrinsics. llvm-svn: 337223
* [WebAssembly] Remove ELF file support.Sam Clegg2018-07-1620-341/+55
| | | | | | | | | This support was partial and temporary. Now that we have wasm object file support its no longer needed. Differential Revision: https://reviews.llvm.org/D48744 llvm-svn: 337222
* [AMDGPU] [AMDGPU] Support a fdot2 pattern.Farhana Aleen2018-07-166-1/+88
| | | | | | | | | | | | | | | Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198
* [x86/SLH] Completely rework how we sink post-load hardening past dataChandler Carruth2018-07-161-24/+184
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | invariant instructions to be both more correct and much more powerful. While testing, I continued to find issues with sinking post-load hardening. Unfortunately, it was amazingly hard to create any useful tests of this because we were mostly sinking across copies and other loading instructions. The fact that we couldn't sink past normal arithmetic was really a big oversight. So first, I've ported roughly the same set of instructions from the data invariant loads to also have their non-loading varieties understood to be data invariant. I've also added a few instructions that came up so often it again made testing complicated: inc, dec, and lea. With this, I was able to shake out a few nasty bugs in the validity checking. We need to restrict to hardening single-def instructions with defined registers that match a particular form: GPRs that don't have a NOREX constraint directly attached to their register class. The (tiny!) test case included catches all of the issues I was seeing (once we can sink the hardening at all) except for the NOREX issue. The only test I have there is horrible. It is large, inexplicable, and doesn't even produce an error unless you try to emit encodings. I can keep looking for a way to test it, but I'm out of ideas really. Thanks to Ben for giving me at least a sanity-check review. I'll follow up with Craig to go over this more thoroughly post-commit, but without it SLH crashes everywhere so landing it for now. Differential Revision: https://reviews.llvm.org/D49378 llvm-svn: 337177
* [mips] Eliminate the usage of hasStdEnc in MipsPat.Simon Atanasyan2018-07-167-161/+206
| | | | | | | | | | | Instead, the pattern is tagged with the correct predicate when it is declared. Some patterns have been duplicated as necessary. Patch by Simon Dardis. Differential revision: https://reviews.llvm.org/D48365 llvm-svn: 337171
* [MIPS GlobalISel] Select instructions to load and store i32 on stackPetar Jovanovic2018-07-163-2/+88
| | | | | | | | | | | Add code for selection of G_LOAD, G_STORE, G_GEP, G_FRAMEINDEX and G_CONSTANT. Support loads and stores of i32 values. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D48957 llvm-svn: 337168
* [X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' patternRoman Lebedev2018-07-162-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 | PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But the IR-optimal patter does not lower efficiently, so we want to undo it.. This handles the simple pattern. There is a second pattern with predicate and constants inverted. NOTE: we do not check uses here. we always do the transform. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49266 llvm-svn: 337166
* [Sparc] Use the correct encoding for ta 3Daniel Cederman2018-07-161-1/+1
| | | | | | | | | | | | | | | Summary: The old encoding generated a "tn %g1 + 3" instruction instead of the expected "ta 3". Reviewers: venkatra, jyknight Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D49171 llvm-svn: 337165
* [Sparc] Use the names .rem and .urem instead of __modsi3 and __umodsi3Daniel Cederman2018-07-161-0/+3
| | | | | | | | | | | | | | Summary: These are the names used in libgcc. Reviewers: venkatra, jyknight, ekedaigle Reviewed By: jyknight Subscribers: joerg, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48915 llvm-svn: 337164
* [Sparc] Generate ta 1 for the @llvm.debugtrap intrinsicDaniel Cederman2018-07-162-0/+4
| | | | | | | | | | | | | | | Summary: Software trap number one is the trap used for breakpoints in the Sparc ABI. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48637 llvm-svn: 337163
* [x86/SLH] Fix a bug where we would try to post-load harden non-GPRs.Chandler Carruth2018-07-161-13/+25
| | | | | | | | | | | | | | | | Found cases that hit the assert I added. This patch factors the validity checking into a nice helper routine and calls it when deciding to harden post-load, and asserts it when doing so later. I've added tests for the various ways of loading a floating point type, as well as loading all vector permutations. Even though many of these go to identical instructions, it seems good to somewhat comprehensively test them. I'm confident there will be more fixes needed here, I'll try to add tests each time as I get this predicate adjusted. llvm-svn: 337160
* [x86/SLH] Extract another small helper function, add better comments andChandler Carruth2018-07-161-23/+34
| | | | | | use better terminology. NFC. llvm-svn: 337157
OpenPOWER on IntegriCloud