summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Remove result type constraints from the ↵Craig Topper2019-05-301-3/+3
| | | | | | | | | | extloadv2f32/extloadv4f32/extloadv8f32 PatFrags. NFC The result types aren't mentioned in the pattern name so really shouldn't be in the PatFrags. The users of these either have their own type constraint or rely on the type constranit system to realize the only legal extend would be to f64. llvm-svn: 362175
* [X86] Remove code that unnecessarily sets EXTLOAD with src type of ↵Craig Topper2019-05-301-9/+0
| | | | | | | | | | v2f32/v4f32/v8f32 as Legal for SSE2/AVX/AVX512 respectively. NFC The LoadExt table defaults to all combinations being Legal. For vector types, only src VTs with an i1 element type were ever changed. So we don't need to mark them legal manually. llvm-svn: 362170
* AMDGPU/GlobalISel: Add wave scratch offset argumentMatt Arsenault2019-05-301-0/+42
| | | | | | Avoids crashing in PEI in a future change. llvm-svn: 362136
* [AMDGPU] Added target-specific attribute amdgpu-max-memory-clauseTim Renouf2019-05-301-1/+3
| | | | | | | | | | | | | | | | With LLPC, previous investigation has suggested that si-scheduler interacts badly with SiFormMemoryClauses on an XNACK target in some games. That needs further investigation in the future. In the meantime, this commit adds a target-specific attribute to allow us to disable SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC to use. Differential Revision: https://reviews.llvm.org/D62572 Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92 llvm-svn: 362127
* [NFC][ARM][ParallelDSP] Refactor narrow sequenceSam Parker2019-05-301-48/+19
| | | | | | | Most of the code used for finding a 'narrow' sequence is not used, so I've removed it and simplified the calls from the smlad matcher. llvm-svn: 362104
* [ARM] Change the MC names for VMAXNM/VMINNMSjoerd Meijer2019-05-303-35/+36
| | | | | | | | | | | | | Now the NEON ones have a prefix "NEON_", and the VFP ones have a prefix "VFP_". This is so that the regex in ARMScheduleA57.td can be made to match both of _those_ classes of VMAXNM without also matching the MVE ones that are going to be introduced soon. NFCI. Patch by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60700 llvm-svn: 362097
* [ARM] LowerVECTOR_SHUFFLE - fix uninitialized variable warnings. NFCI.Simon Pilgrim2019-05-301-4/+4
| | | | llvm-svn: 362094
* [ARM] add target arch definitions for 8.1-M and MVESjoerd Meijer2019-05-304-2/+52
| | | | | | | | | | | | | | | | | This adds: - LLVM subtarget features to make all the new instructions conditional on, - CPU and FPU names for use on clang's command line, with default FPUs set so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right FPU features, - architecture extension names "mve" and "mve.fp", - ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE (a new actual tag). Patch mostly by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60698 llvm-svn: 362090
* [ARM] Introduce separate features for FP registersSjoerd Meijer2019-05-305-18/+69
| | | | | | | | | | | | | | | | | The MVE extension in Arm v8.1-M permits the use of some move, load and store isntructions which access the FP registers, even if there's no actual FP support in the processor (in particular, if you have the integer-only version of MVE). Therefore, we need separate subtarget features to condition those instructions on, which are implied by both FP and MVE but are not part of either. Patch mostly by Simon Tatham. Differential Revision: https://reviews.llvm.org/D60694 llvm-svn: 362088
* [X86][SSE] Improve bool vector extload (PR26091)Simon Pilgrim2019-05-301-0/+15
| | | | | | | | We already have good codegen for (vXiY *ext(vXi1 bitcast(iX))) cases, this patch uses it for loads of vXi1 types as well - changing the load into a iX integer load, and bitcasting so that combineToExtendBoolVectorInReg can then use it. Differential Revision: https://reviews.llvm.org/D62449 llvm-svn: 362081
* [AArch64][SVE2] Asm: support SVE2 vector splice (constructive)Cullen Rhodes2019-05-302-0/+27
| | | | | | | | | | | | Summary: The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62530 llvm-svn: 362073
* [AArch64][SVE2] Asm: support SVE2 load instructionsCullen Rhodes2019-05-302-0/+55
| | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: * LDNT1SB, LDNT1B, LDNT1SH, LDNT1H, LDNT1SW, LDNT1W, LDNT1D The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62528 llvm-svn: 362072
* [AArch64][SVE2] Asm: support FCVTX/FLOGB instructionsCullen Rhodes2019-05-302-0/+12
| | | | | | | | | | | | | | | | | Summary: Patch completes SVE2 support for: SVE Floating Point Unary Operations - Predicated Group The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62526 llvm-svn: 362071
* [AArch64][SVE2] Asm: add ext (immediate offset, constructive) instructionCullen Rhodes2019-05-302-0/+18
| | | | | | | | | | | | Summary: The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62518 llvm-svn: 362070
* [ARM] Add an MVE execution domainSjoerd Meijer2019-05-302-6/+8
| | | | | | | | | | | | | | | | | | | | | | MVE architecturally specifies a 'beat' system in which a vector instruction executed now will complete its actual operation over the next four cycles, so it can overlap with the execution of the previous and next MVE instruction. This makes it generally an advantage to avoid moving values back and forth between MVE registers and anywhere else, if there's any sensible way to do the same processing in whatever register type the values already occupied. That's just what the 'execution domain' system is supposed to achieve. So here we add a new execution domain which will contain all the MVE vector instructions when they are added. Patch by: Simon Tatham Differential Revision: https://reviews.llvm.org/D60703 llvm-svn: 362068
* [X86] Add ENQCMD instructionsPengfei Wang2019-05-306-0/+74
| | | | | | | | | | | | For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference. Patch by Tianqing Wang (tianqing) Differential Revision: https://reviews.llvm.org/D62281 llvm-svn: 362053
* [ARC] Cleanup ARCAsmPrinter.Pete Couperus2019-05-291-16/+0
| | | | | | | | | | | | | | | | Summary: Remove unused getTargetStreamer. Remove unused headers. Reviewers: dantrushin Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62549 llvm-svn: 362021
* AMDGPU: Return address loweringAakanksha Patil2019-05-292-1/+27
| | | | | | | | The patch computes the return address for the current function. Differential revision: https://reviews.llvm.org/D59666 llvm-svn: 362001
* [mips] Iterate over MSACtrlRegClass to reserve all MSA control registers. NFCSimon Atanasyan2019-05-291-8/+2
| | | | llvm-svn: 361965
* [mips] Use range-based for loops. NFCSimon Atanasyan2019-05-291-8/+4
| | | | llvm-svn: 361964
* [ARM] Split predicates out into their own .td fileSjoerd Meijer2019-05-293-184/+189
| | | | | | | | | | | | | | | | The new ARMPredicates.td is included from ARM.td, early enough that the predicate definitions are already in scope when ARMSchedule.td is included. This will make it possible to refer to them in UnsupportedFeatures fields of scheduling models. NFC: the chunk of Tablegen being moved here is copied and pasted verbatim. Patch by: Simon Tatham Differential Revision: https://reviews.llvm.org/D60693 llvm-svn: 361958
* [AArch64][SVE2] Asm: support SVE Bitwise Logical - Unpredicated GroupCullen Rhodes2019-05-292-0/+81
| | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: * EOR3, BSL, BCAX, BSL1N, BSL2N, NBSL, XAR Aliases for types .B/.H/.S for EOR3 and BCAX have been added, the preferred disassembly is .D. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62387 llvm-svn: 361936
* [AArch64][SVE2] Asm: support Floating Point Widening Multiply-AddCullen Rhodes2019-05-292-0/+68
| | | | | | | | | | | | | | | Summary: Patch adds support for the indexed and unpredicated vectors forms of the FMLALB, FMLALT, FMLSLB and FMLSLT instructions. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62386 llvm-svn: 361935
* [AArch64][SVE2] Asm: support SVE2 Floating Point Pairwise GroupCullen Rhodes2019-05-292-0/+40
| | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 floating-point pairwise operations: * FADDP, FMAXNMP, FMINNMP, FMAXP, FMINP The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62383 llvm-svn: 361933
* Revert "[X86] Use 'llvm_unreachable' instead of nullptr in unreachable code to"Pengfei Wang2019-05-291-3/+3
| | | | | | This reverts commit c1b3716614bc0a107e6f41a7d3d503baefad8a5b. llvm-svn: 361918
* [X86] Use 'llvm_unreachable' instead of nullptr in unreachable code toPengfei Wang2019-05-291-3/+3
| | | | | | | | | | | | | | | | | avoid static check fail RegClassOrBank is an object of RegClassOrRegBank, which is defined as using llvm::RegClassOrRegBank = typedef PointerUnion<const TargetRegisterClass *, const RegisterBank *> so control flow can not get here. Use ""llvm_unreachable" here to avoid "null pointer" confusion. Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62006 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 361912
* [X86] Fix x86-64 call *foo@tlsdesc(%rax) and support R_386_TLSGOTDESC ↵Fangrui Song2019-05-292-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | R_386_TLS_DESC_CALL D18885 emitted 5 bytes for call *foo@tlsdesc(%rax). It should use the 2-byte form instead and let R_X86_64_TLSDESC_CALL apply to the beginning of the call instruction. The 2-byte form was deliberately chosen to make ->LE and ->IE relaxation work: 0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <.text+0x7> 3: R_X86_64_GOTPC32_TLSDESC a-0x4 7: ff 10 callq *(%rax) 7: R_X86_64_TLSDESC_CALL a => 0: 48 c7 c0 fc ff ff ff mov $0xfffffffffffffffc,%rax 7: 66 90 xchg %ax,%ax Also change the symbol type to STT_TLS when VK_TLSCALL or VK_TLSDESC is seen. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D62512 llvm-svn: 361910
* [WebAssembly] Add signatures for RINT builtinsThomas Lively2019-05-291-0/+6
| | | | | | | | | | | | Reviewers: azakai, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62564 llvm-svn: 361904
* [AArch64][GlobalISel] Select FCMPSri/FCMPDri when comparing against 0.0Jessica Paquette2019-05-281-13/+27
| | | | | | | | | | | Add support for selecting FCMPSri and FCMPDri when comparing against 0.0, and factor out opcode selection for G_FCMP into its own function. Add a test to show that we don't do this with other immediates. Differential Revision: https://reviews.llvm.org/D62539 llvm-svn: 361888
* [WebAssembly] Support for atomic fencesHeejin Ahn2019-05-283-4/+107
| | | | | | | | | | | | | | | | Summary: This adds support for translation of LLVM IR fence instruction. We convert a singlethread fence to a pseudo compiler barrier which becomes 0 instructions in final binary, and a thread fence to an idempotent atomicrmw instruction to a memory address. Reviewers: dschuff, jfb, sunfish, tlively Subscribers: sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D50277 llvm-svn: 361884
* AMDGPU: Temporary drop s_mul_hi_i/u32 patternsKonstantin Zhuravlyov2019-05-281-6/+2
| | | | | | | | It introduces performance regressions in several applications. This has already been submitted downstream. llvm-svn: 361879
* [AArch64] Handle ISD::LRINT and ISD::LLRINTAdhemerval Zanella2019-05-282-0/+15
| | | | | | | | | | | This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus fcvtzs. It currently only handles the scalar version. Reviewed By: SjoerdMeijer, mstorsjo Differential Revision: https://reviews.llvm.org/D62018 llvm-svn: 361877
* [CodeGen] Add lrint/llrint builtinsAdhemerval Zanella2019-05-281-0/+2
| | | | | | | | | | | | | | | | | This patch add the ISD::LRINT and ISD::LLRINT along with new intrinsics. The changes are straightforward as for other floating-point rounding functions, with just some adjustments required to handle the return value being an interger. The idea is to optimize lrint/llrint generation for AArch64 in a subsequent patch. Current semantic is just route it to libm symbol. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D62017 llvm-svn: 361875
* [AMDGPU] Correct the handling of inlineasm output registers.Michael Liao2019-05-281-2/+1
| | | | | | | | | | | | | | | | Summary: - There's a regression due to the cross-block RC assignment. Use the proper way to derive the output register RC in inline asm. Reviewers: rampitec, alex-t Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, eraman, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62537 llvm-svn: 361868
* Revert "[x86] split 256-bit store of concatenated vectors"Sanjay Patel2019-05-281-11/+0
| | | | | | | | | This reverts commit d5a8637072f4c556b88156bd2f6237a2ead47d31. Most likely suspect for this bot failure: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/9684 llvm-svn: 361850
* AMDGPU: Don't enable all lanes with non-CSR VGPR spillsMatt Arsenault2019-05-281-39/+49
| | | | | | | | If the only VGPRs used for SGPR spilling were not CSRs, this was enabling all laness and immediately restoring exec. This is the usual situation in leaf functions. llvm-svn: 361848
* [AMDGPU] Fix the mis-handling of `vreg_1` copied from scalar register.Michael Liao2019-05-281-1/+5
| | | | | | | | | | | | | | | | | | | | | | Summary: - Don't treat the use of a scalar register as `vreg_1` an VGPR usage. Otherwise, that promotes that scalar register into vector one, which breaks the assumption that scalar register holds the lane mask. - The issue is triggered in a complicated case, where if the uses of that (lane mask) scalar register is legalized firstly before its definition, e.g., due to the mismatch block placement and its topological order or loop. In that cases, the legalization of PHI introduces the use of that scalar register as `vreg_1`. Reviewers: rampitec, nhaehnle, arsenm, alex-t Subscribers: kzhuravl, jvesely, wdng, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl Tags: #llvm Differential Revision: https://reviews.llvm.org/D62492 llvm-svn: 361847
* [ARM] Replace fp-only-sp and d16 with fp64 and d32.Simon Tatham2019-05-2816-175/+203
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Those two subtarget features were awkward because their semantics are reversed: each one indicates the _lack_ of support for something in the architecture, rather than the presence. As a consequence, you don't get the behavior you want if you combine two sets of feature bits. Each SubtargetFeature for an FP architecture version now comes in four versions, one for each combination of those options. So you can still say (for example) '+vfp2' in a feature string and it will mean what it's always meant, but there's a new string '+vfp2d16sp' meaning the version without those extra options. A lot of this change is just mechanically replacing positive checks for the old features with negative checks for the new ones. But one more interesting change is that I've rearranged getFPUFeatures() so that the main FPU feature is appended to the output list *before* rather than after the features derived from the Restriction field, so that -fp64 and -d32 can override defaults added by the main feature. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D60691 llvm-svn: 361845
* [AArch64] Delete unused VariantKind in AArch64MCExprFangrui Song2019-05-282-4/+1
| | | | llvm-svn: 361844
* [X86-64] Fix 256-bit SET0 lowering for non-VLX targetsDavid Greene2019-05-281-0/+6
| | | | | | | | | | If we don't have VLX then 256-bit SET0 should be lowered to VPXOR with ZMM registers. This restores functionality accidentally removed by r309926. Differential Revision: https://reviews.llvm.org/D62415 llvm-svn: 361843
* [x86] split 256-bit store of concatenated vectorsSanjay Patel2019-05-281-0/+11
| | | | | | | | | | | | | | | | | | | | This shows up as a side issue to the main problem for the AVX target example from PR37428: https://bugs.llvm.org/show_bug.cgi?id=37428 - https://godbolt.org/z/7tpRa3 But as we can see in the pile of existing test diffs, it's actually a widespread problem that affects any AVX or later target. Apart from a couple of oddballs, I think these are all improvements for the reasons stated in the code comment: we do not want to enable YMM unnecessarily (avoid vzeroupper and frequency throttling) and some cores split 256-bit stores anyway. We could say that MergeConsecutiveStores() is going overboard on some of these examples, but that won't solve the problem completely. But that is the reason I'm proposing this as a lowering rather than a combine: we will infinite loop fighting the merge code if we try this earlier. Differential Revision: https://reviews.llvm.org/D62498 llvm-svn: 361822
* [x86] fix 256-bit vector store splitting to honor 'volatile'Sanjay Patel2019-05-281-14/+30
| | | | | | | | | | | Forking this out of the discussion in D62498 (and assuming that will be committed later, so adding the helper function here). The LangRef says: "the backend should never split or merge target-legal volatile load/store instructions." Differential Revision: https://reviews.llvm.org/D62506 llvm-svn: 361815
* [X86] Custom lower CONCAT_VECTORS of v2i1Benjamin Kramer2019-05-281-7/+2
| | | | | | | The generic legalizer cannot handle this. Add an assert instead of silently miscompiling vectors with elements smaller than 8 bits. llvm-svn: 361814
* [NFC] Test commit, delete trailing whitespaceGraham Hunter2019-05-281-1/+1
| | | | llvm-svn: 361813
* [X86] X86CmovConverterPass::collectCmovCandidates - fix uninitialized ↵Simon Pilgrim2019-05-281-1/+2
| | | | | | variable warnings. NFCI. llvm-svn: 361804
* [AArch64][SVE2] Asm: support SVE2 Floating Point Convert GroupCullen Rhodes2019-05-282-0/+42
| | | | | | | | | | | | | | | | | Summary: Patch adds support for the following intructions: SVE2 floating-point convert precision: * FCVTXNT, FCVTNT, FCVTLT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62382 llvm-svn: 361801
* [AArch64][SVE2] Asm: support SVE2 Crypto Extensions GroupCullen Rhodes2019-05-282-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 crypto constructive binary operations: * SM4EKEY, RAX1 SVE2 crypto destructive binary operations: * AESE, AESD, SM4E SVE2 crypto unary operations: * AESMC, AESIMC AESE, AESD, AESMC and AESIMC are enabled with +sve2-aes. SM4E and SM4EKEY are enabled with +sve2-sm4. RAX1 is enabled with +sve2-sha3. The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62307 llvm-svn: 361797
* [AArch64][SVE2] Asm: support SVE2 Histogram Computation GroupsCullen Rhodes2019-05-282-0/+53
| | | | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 histogram generation (segment): * HISTSEG SVE2 histogram generation (vector): * HISTCNT The specification can be found here: https://developer.arm.com/docs/ddi0602/latest Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62306 llvm-svn: 361796
* [AArch64][SVE2] Asm: support SVE2 Misc GroupCullen Rhodes2019-05-282-0/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch adds support for the following instructions: SVE2 bitwise exclusive-or interleaved: * EORBT, EORTB SVE2 bitwise permute: * BEXT, BDEP, BGRP SVE2 bitwise shift left long: * SSHLLB, SSHLLT, USHLLB, USHLLT SVE2 integer add/subtract interleaved long: * SADDLBT, SSUBLBT, SSUBLTB BDEP, BEXT and BGRP are enabled with SVE2 feature +bitperm, all other instructions in this group are enabled with +sve2. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D62304 llvm-svn: 361795
* [AMDGPU] Fix for the address sanitizer failure. Fixing typoAlexander Timofeev2019-05-271-1/+1
| | | | llvm-svn: 361776
OpenPOWER on IntegriCloud