summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64][SVE] Fold constant multiply of element countCullen Rhodes2019-12-203-1/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: E.g. %0 = tail call i64 @llvm.aarch64.sve.cntw(i32 31) %mul = mul i64 %0, <const> Should emit: cntw x0, all, mul #<const> For <const> in the range 1-16. Patch by Kerry McLaughlin Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71014
* [AArch64][SVE] Add intrnisics for saturating scalar arithmeticAndrzej Warzynski2019-12-202-69/+137
| | | | | | | | | | | | | | | | | | | | | | Summary: The following intrnisics are added: * @llvm.aarch64.sve.sqdec{b|h|w|d|p} * @llvm.aarch64.sve.sqinc{b|h|w|d|p} * @llvm.aarch64.sve.uqdec{b|h|w|d|p} * @llvm.aarch64.sve.uqinc{b|h|w|d|p} For every intrnisic there a scalar variants (with n32 or n64 suffix) and vector variants (no suffix). Reviewers: sdesmalen, rengolin, efriedma Reviewed By: sdesmalen, efriedma Subscribers: eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71252
* Recommit "[AArch64][SVE] Add permutation and selection intrinsics"Cullen Rhodes2019-12-205-42/+264
| | | | | | | | Recommit 23c28c40436143006be740533375c036d11c92cd (reverted in dcb48f50bdfa0fa47b62d089b6ed999d857fc9f8) with a fix for an assert "Request for a fixed size on a scalable object" being triggered in `LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the TypeSize object.
* [AArch64][SVE] Add intrinsics for binary narrowing operationsAndrzej Warzynski2019-12-203-22/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The following intrinsics for binary narrowing shift righ operations are added: * @llvm.aarch64.sve.shrnb * @llvm.aarch64.sve.uqshrnb * @llvm.aarch64.sve.sqshrnb * @llvm.aarch64.sve.sqshrunb * @llvm.aarch64.sve.uqrshrnb * @llvm.aarch64.sve.sqrshrnb * @llvm.aarch64.sve.sqrshrunb * @llvm.aarch64.sve.shrnt * @llvm.aarch64.sve.uqshrnt * @llvm.aarch64.sve.sqshrnt * @llvm.aarch64.sve.sqshrunt * @llvm.aarch64.sve.uqrshrnt * @llvm.aarch64.sve.sqrshrnt * @llvm.aarch64.sve.sqrshrunt Reviewers: sdesmalen, rengolin, efriedma Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71552
* [ARM][MVE] Fixes for tail predication.Sam Parker2019-12-203-12/+61
| | | | | | | | | | | | | | | | | | 1) Fix an issue with the incorrect value being used for the number of elements being passed to [d|w]lstp. We were trying to check that the value was available at LoopStart, but this doesn't consider that the last instruction in the block could also define the register. Two helpers have been added to RDA for this. 2) Insert some code to now try to move the element count def or the insertion point so that we can perform more tail predication. 3) Related to (1), the same off-by-one could prevent us from generating a low-overhead loop when a mov lr could have been the last instruction in the block. 4) Fix up some instruction attributes so that not all the low-overhead loop instructions are labelled as branches and terminators - as this is not true for dls/dlstp. Differential Revision: https://reviews.llvm.org/D71609
* [ARM][MVE] Tail predicate in the presence of vcmpSam Parker2019-12-203-76/+270
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Record the discovered VPT blocks while checking for validity and, for now, only handle blocks that begin with VPST and not VPT. We're now allowing more than one instruction to define vpr, but each block must somehow be predicated using the vctp. This leaves us with several scenarios which need fixing up: 1) A VPT block with is only predicated by the vctp and has no internal vpr defs. 2) A VPT block which is only predicated by the vctp but has an internal vpr def. 3) A VPT block which is predicated upon the vctp as well as another vpr def. 4) A VPT block which is not predicated upon a vctp, but contains it and all instructions within the block are predicated upon in. The changes needed are, for: 1) The easy one, just remove the vpst and unpredicate the instructions in the block. 2) Remove the vpst and unpredicate the instructions up to the internal vpr def. Need insert a new vpst to predicate the remaining instructions. 3) No nothing. 4) The vctp will be inside a vpt and the instruction will be removed, so adjust the size of the mask on the vpst. Differential Revision: https://reviews.llvm.org/D71107
* [ARM][MVE] Tail predicate bottom/top muls.Sam Parker2019-12-201-0/+3
| | | | | | Add VMULL and VQDMULL variants to our tail predication white list. Differential Revision: https://reviews.llvm.org/D71465
* [X86] Make EmitCmp into a static function and explicitly return chain result ↵Craig Topper2019-12-192-18/+19
| | | | | | | | | | | | for STRICT_FCMP. NFCI The only thing its getting from the X86TargetLowering class is the subtarget which we can easily pass. This function only has one call site now since this might help the compiler inline it. Explicitly return both the flag result and the chain result for STRICT_FCMP nodes. This removes an assumption in the caller that getValue(1) is the right way to get the chain.
* [X86] Directly call EmitTest in two places instead of creating a null ↵Craig Topper2019-12-191-4/+2
| | | | | | | | | constant and calling EmitCmp. NFCI EmitCmp will just immediately call EmitTest and discard the null constant only to have EmitTest create it again if it doesn't fold. So just skip all that and go directly to EmitTest.
* [StackMaps] Be explicit about label formation [NFC] (try 2)Philip Reames2019-12-194-9/+43
| | | | | | Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all. For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
* Temporarily Revert "[StackMaps] Be explicit about label formation [NFC]"Eric Christopher2019-12-191-14/+3
| | | | | | as it broke the aarch64 build. This reverts commit bc7595d934b958ab481288d7b8e768fe5310be8f.
* [StackMaps] Be explicit about label formation [NFC]Philip Reames2019-12-191-3/+14
| | | | For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
* [FaultMaps] Make label formation a bit more explicit [NFC]Philip Reames2019-12-191-1/+5
| | | | This is in advance of assembler padding directives support where we'll need to bundle the label w/the corresponding faulting instruction to avoid padding being inserted between.
* [RISCV] Don't crash on unsupported relocationsLuís Marques2019-12-191-2/+11
| | | | | | | | | | Summary: Instead of crashing due to the `llvm_unreachable`, provide a proper error when invalid fixups/relocations are encountered. Reviewers: asb, lenary Reviewed By: asb Tags: #llvm Differential Revision: https://reviews.llvm.org/D71536
* [SystemZ] Recognize mrecord-mcount in backendJonas Paulsson2019-12-192-3/+18
| | | | | | | Emit the __mcount_loc section for all fentry calls. Review: Ulrich Weigand https://reviews.llvm.org/D71629
* [RISCV] Enable the machine outliner for RISC-Vlewis-revill2019-12-193-0/+190
| | | | | | | | | | This patch enables the machine outliner for RISC-V and adds the necessary logic for checking whether sequences can be safely outlined, and describing how they should be outlined. Outlined functions are called using the register t0 (x5) as the return address register, which must be available for an occurrence of a sequence to be safely outlined. Differential Revision: https://reviews.llvm.org/D66210
* [PowerPC] Only use PLT annotations if using PIC relocation modelJustin Hibbits2019-12-191-1/+7
| | | | | | | | | | | | | | Summary: The default static (non-PIC, non-PIE) model for 32-bit powerpc does not use @PLT annotations and relocations in GCC. LLVM shouldn't use @PLT annotations either, because it breaks secure-PLT linking with (some versions of?) GNU LD. Update the available-externally.ll test to reflect that default mode should be the same as the static relocation, by using the same check prefix. Reviewed by: sfertile Differential Revision: https://reviews.llvm.org/D70570
* Revert "[AArch64][SVE] Add permutation and selection intrinsics"Cullen Rhodes2019-12-195-264/+42
| | | | | | | | | | | This reverts commit 23c28c40436143006be740533375c036d11c92cd. It caused build failures in the following expensive checks builders: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1295 http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/700 Reverting for now whilst I figure what the issue is.
* [AArch64][SVE] Add permutation and selection intrinsicsCullen Rhodes2019-12-195-42/+264
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Adds the following intrinsics: * @llvm.aarch64.sve.clasta * @llvm.aarch64.sve.clasta_n * @llvm.aarch64.sve.clastb * @llvm.aarch64.sve.clastb_n * @llvm.aarch64.sve.compact * @llvm.aarch64.sve.ext * @llvm.aarch64.sve.lasta * @llvm.aarch64.sve.lastb * @llvm.aarch64.sve.rev * @llvm.aarch64.sve.splice * @llvm.aarch64.sve.tbl * @llvm.aarch64.sve.trn1 * @llvm.aarch64.sve.trn2 * @llvm.aarch64.sve.uzp1 * @llvm.aarch64.sve.uzp2 * @llvm.aarch64.sve.zip1 * @llvm.aarch64.sve.zip2 Reviewers: sdesmalen, efriedma, dancgr, mgudim, huntergr, rengolin Reviewed By: sdesmalen, efriedma Subscribers: kmclaughlin, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71401
* [llvm-exegesis] Fix pfm counter names for Haswell for older versions of libpfmMiloš Stojanović2019-12-191-8/+8
| | | | | | | | The inconsistency caused uops mode to fail on an older version of libpfm since the dispatched_port was added as an alias for executed_port only after v4.6.0 of libpfm. Differential revision: https://reviews.llvm.org/D71665
* Make more use of MachineInstr::mayLoadOrStore.Jay Foad2019-12-1910-14/+14
|
* [ARM] Improve codegen of volatile load/store of i64Victor Campos2019-12-196-6/+158
| | | | | | | | | | | | | | | | | | Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072
* [AArch64][SVE] Implement pfirst and pnext intrinsicsCullen Rhodes2019-12-192-5/+12
| | | | | | | | | | | | | Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally Reviewed By: cameron.mcinally Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71472
* [AArch64][SVE] Implement ptrue intrinsicCullen Rhodes2019-12-193-10/+19
| | | | | | | | | | | | | | Reviewers: sdesmalen, eli.friedman, dancgr, mgudim, cameron.mcinally, huntergr, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71457
* [AMDGPU] Implemented fma cost analysisStanislav Mekhanoshin2019-12-182-0/+53
| | | | Differential Revision: https://reviews.llvm.org/D71676
* Enable STRICT_FP_TO_SINT/UINT on X86 backendLiu, Chen32019-12-195-113/+196
| | | | | | This patch is mainly for custom lowering the vector operation. Differential Revision: https://reviews.llvm.org/D71592
* [PowerPC] make lwa as a valid ds candidate in ppcloopinstrformprep passczhengsz2019-12-181-5/+9
| | | | | | | | Fix a FIXME in ppcloopinstrformprep pass. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D71346
* [WebAssembly] Add avgr_u intrinsics and require nuw in patternsThomas Lively2019-12-181-16/+17
| | | | | | | | | | | | | | | | | | Summary: The vector pattern `(a + b + 1) / 2` was previously selected to an avgr_u instruction regardless of nuw flags, but this is incorrect in the case where either addition may have an unsigned wrap. This CL changes the existing pattern to require both adds to have nuw flags and adds builtin functions and intrinsics for the avgr_u instructions because the corrected pattern is not representable in C. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71648
* [X86] Add a simple hack to IsProfitableToFold to prevent vselect+strict fp ↵Craig Topper2019-12-181-0/+6
| | | | | | | | operations from being folded into masked instructions. We really need to update the isel patterns to prevent this, but that requires some tablegen de-tangling. So this hack will work for correctness in the short term.
* [FPEnv] Strict versions of llvm.minimum/llvm.maximumUlrich Weigand2019-12-182-2/+4
| | | | | | | | | | | | | Add new intrinsics llvm.experimental.constrained.minimum llvm.experimental.constrained.maximum as strict versions of llvm.minimum and llvm.maximum. Includes SystemZ back-end support. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71624
* Revert "[AArch64][SVE] Replace integer immediate intrinsics with splat ↵Danilo Carvalho Grael2019-12-182-39/+22
| | | | | | | | | | vector variant" This reverts commit 830e08b98bcb427136443093c282b25328137cf0 and eb1857ce0da481caf82271e6d0c9fc745dfab26f. This commit leads to an unexpected failure on test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll. The review will need more changes before its re-commited.
* [Clang FE, SystemZ] Don't add "true" value for the "mnop-mcount" attribute.Jonas Paulsson2019-12-182-3/+2
| | | | | | | | Let the "mnop-mcount" function attribute simply be present or non-present. Update SystemZ backend as well to use hasFnAttribute() instead. Review: Ulrich Weigand https://reviews.llvm.org/D71669
* [PowerPC][NFC] Refactor splat of constant to vector.Stefan Pintilie2019-12-181-25/+4
| | | | | | | | | Refactor the splatting of a constant to a vector so that common code is used both for Power9 and Power8. Patch by: Anil Mahmud Differential Revision: https://reviews.llvm.org/D71481
* [AArch64][SVE] Replace integer immediate intrinsics with splat vector variantDanilo Carvalho Grael2019-12-182-22/+39
| | | | | | | | | | | | Summary: Replace the integer immediate intrisics with splat vector variants so they can be applied as optimizations for the C/C++ intrinsics. Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan Tags: #llvm Differential Revision: https://reviews.llvm.org/D71614
* [gicombiner] Import tryCombineIndexedLoadStore()Daniel Sanders2019-12-181-6/+1
| | | | | | | | | | | | | | | | | Summary: Now that arbitrary data is supported, import tryCombineIndexedLoadStore() Depends on D69147 Reviewers: bogner, volkan Reviewed By: volkan Subscribers: hiraditya, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69151
* [AArch64] match fcvtl2 with bitcasted extractSanjay Patel2019-12-182-6/+35
| | | | | | | | | | | | | This should eliminate a regression seen in D63815. If we are FP extending the high half extract of a vector, we should be able to peek through a bitcast sitting between the extract and extend. This replaces tablegen patterns with a more general DAG to DAG override, so we can handle any casted type. Differential Revision: https://reviews.llvm.org/D71515
* [gicombiner] Add support for arbitrary match data being passed from match to ↵Daniel Sanders2019-12-181-14/+12
| | | | | | | | | | | | | | | | | | | | | apply Summary: This is used by the extending_loads combine to tell the apply step which use is the preferred one to fold and the other uses should be re-written to consume. Depends on D69117 Reviewers: volkan, bogner Reviewed By: volkan Subscribers: hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69147
* [AArch64] Improve codegen of volatile load/store of i128Victor Campos2019-12-183-14/+69
| | | | | | | | | | | | | | | | Summary: Instead of generating two i64 instructions for each load or store of a volatile i128 value (two LDRs or STRs), now emit a single LDP or STP. Reviewers: labrinea, t.p.northover, efriedma Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69559
* [AArch64] Enable clustering memory accesses to fixed stack objectsJay Foad2019-12-183-118/+90
| | | | | | | | | | | | | | | | | | Summary: r347747 added support for clustering mem ops with FI base operands including support for fixed stack objects in shouldClusterFI, but apparently this was never tested. This patch fixes shouldClusterFI to work with scaled as well as unscaled load/store instructions, and fixes the ordering of memory ops in MemOpInfo::operator< to ensure that memory addresses always increase, regardless of which direction the stack grows. Subscribers: MatzeB, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71334
* [NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter]Anna Welker2019-12-183-7/+13
| | | | | | | Add an extra parameter so alignment can be taken under consideration in gather/scatter legalization. Differential Revision: https://reviews.llvm.org/D71610
* [X86] Add strict fma supportWang, Pengfei2019-12-184-19/+26
| | | | | | | | | | | | Summary: Add strict fma support Reviewers: craig.topper, RKSimon, LiuChen3 Subscribers: hiraditya, llvm-commits, LuoYuanke Tags: #llvm Differential Revision: https://reviews.llvm.org/D71604
* [PowerPC] Add missing legalization for vector BSWAPNemanja Ivanovic2019-12-174-3/+31
| | | | | | | | We somehow missed doing this when we were working on Power9 exploitation. This just adds the missing legalization and cost for producing the vector intrinsics. Differential revision: https://reviews.llvm.org/D70436
* [X86] Manually format some setOperationAction calls to line up arguments to ↵Craig Topper2019-12-171-8/+8
| | | | improve readability. NFC
* [AArch64][GlobalISel]: Fix a crash in GlobalIsel in dealing with 16bit ↵Xiaoqing Wu2019-12-171-1/+2
| | | | | | | | | | | | | | | | uadd.with.overflow. Summary: AArch64 doesn't support uadd.with.overflow.i16 natively. This change adds a legalization rule to convert the 32bit add result to 16bit. This should fix PR43981. Reviewers: arsenm, qcolombet, paquette, aemerson Reviewed By: paquette Subscribers: wdng, rovka, kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71587
* [AMDGPU] Fixed cost model for packed 16 bit opsStanislav Mekhanoshin2019-12-171-1/+13
| | | | Differential Revision: https://reviews.llvm.org/D71622
* [WebAssembly] Implement SIMD {i8x16,i16x8}.avgr_u instructionsThomas Lively2019-12-171-1/+20
| | | | | | | | | | | | | | | | | Summary: These instructions were added to the spec proposal in https://github.com/WebAssembly/simd/pull/126. Their semantics are equivalent to `(a + b + 1) / 2`. The opcode for the experimental i32x4.dot_i16x8_s is also bumped due to a collision with the i8x16.avgr_u opcode. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71628
* [AIX] Avoid unset csect assert for functions defined after their use in TOCDavid Tenty2019-12-171-16/+17
| | | | | | | | | | | | | | | | | Summary: If a function is defined after it appears in a TOC expression, we may try to access an unset containing csect when returning a symbol for the expression. Reviewers: Xiangling_L, DiggerLin, jasonliu, hubert.reinterpretcast Reviewed By: hubert.reinterpretcast Subscribers: hubert.reinterpretcast, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71125
* AMDGPU/SILoadStoreOptimillzer: Refactor CombineInfo structTom Stellard2019-12-171-241/+216
| | | | | | | | | | | | | | | | | Summary: Modify CombineInfo to only store information about a single instruction. This is a little easier to work with and removes a lot of duplicate initialization code. Reviewers: arsenm, nhaehnle Reviewed By: arsenm, nhaehnle Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71045
* [AMDGPU] Fix typo in SIInstrInfo::memOpsHaveSameBasePtrJay Foad2019-12-171-1/+1
| | | | | | | | | | | | | | | Summary: The typo has been present since memOpsHaveSameBasePtr was introduced in r313208. It caused SIInstrInfo::shouldClusterMemOps to cluster more mem ops than it was supposed to. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71616
* [RISCV] Add subtargets initialized with target featureZakk Chen2019-12-172-6/+31
| | | | | | | | | | | | expected failed test (RV32IF-ILP32F) will be fixed in a subsequent patch. Reviewers: efriedma, lenary, asb Reviewed By: efriedma, lenary Tags: #llvm Differential Revision: https://reviews.llvm.org/D70116
OpenPOWER on IntegriCloud