summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* MipsDelaySlotFiller: Don't move BUNDLE instructions into the delay slotAlex Richardson2019-12-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: In our CHERI fork we use BUNDLE instructions to ensure that a three-instruction sequence to generate a program-counter-relative value is emitted without reordering or insertions (since that would break the 32-bit offset computation). This sequence is created in MipsExpandPseudo and we use finalizeBundle() to create the BUNDLE instruction. However, the delay slot filler currently breaks this pattern since the BUNDLE will be removed and so all instructions are moved into the delay slot. Since the delay slot only executes the first instruction, this results in incorrect computations (and run-time crashes) if the branch is taken. The original test cases uses CHERI instructions, so for the test case here I simple filled a BUNDLE with a no-op DADDiu $sp_64, -16 and DADDiu $sp_64, 16. Reviewers: atanasyan Reviewed By: atanasyan Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70944
* Add debug output to MipsDelaySlotFiller passAlex Richardson2019-12-041-5/+28
| | | | | | | | | | | | | | | | | | Summary: I was tracking down a code-generation bug in this pass and found that the added output was useful. It is also helpful to find out why a delay slot could not be filled even though there is clearly a valid instruction (which appears to mostly be caused by CFI instructions). Reviewers: atanasyan Reviewed By: atanasyan Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70940
* [AArch64TTI] Compute imm materialization cost for AArch64 intrinsicsFlorian Hahn2019-12-041-0/+6
| | | | | | | | | | | | | | | | Currently, getIntImmCost returns TCC_Free for almost all intrinsics. For most AArch64 specific intrinsics however, it looks like integer constants cannot be folded into most of them (at least the ones I checked). Unless we know that we can fold integer operands with the intrinsic, we handle more cases correctly by returning the cost to materialize the immediate than return TCC_Free. Reviewers: SjoerdMeijer, dmgreen, t.p.northover, ributzka Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D70669
* AMDGPU: Avoid folding 2 constant operands into an SALU operationDavid Stuttard2019-12-041-0/+23
| | | | | | | | | | | | | | | Summary: Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2 constant operands into the same SALU operation, with neither operand able to be encoded as an inline constant. Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70896
* [PowerPC] folding rlwinm + rlwinm to rlwinmczhengsz2019-12-031-0/+138
| | | | | | | | | | | | For example: x3 = rlwinm x3, 27, 5, 31 x3 = rlwinm x3, 19, 0, 12 can be combined to x3 = rlwinm x3, 14, 0, 12 Reviewed by: steven.zhang, lkail Differential Revision: https://reviews.llvm.org/D70374
* [X86] Model DAZ and FTZWang, Pengfei2019-12-042-21/+43
| | | | | | | | | | | | Summary: This is a follow-up of D70881. It models DAZ and FTZ for releated instructions. Reviewers: craig.topper, RKSimon, andrew.w.kaylor Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70938
* [X86] Model MXCSR for all AVX512 instructionsWang, Pengfei2019-12-042-57/+79
| | | | | | | | | | | | Summary: Model MXCSR for all AVX512 instructions Reviewers: craig.topper, RKSimon, andrew.w.kaylor Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70881
* [RISCV] Don't force Local Exec TLS for non-PICJames Clarke2019-12-031-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Forcing Local Exec TLS requires the use of copy relocations. Copy relocations need special handling in the runtime linker when being used against TLS symbols, which is present in glibc, but not in FreeBSD nor musl, and so cannot be relied upon. Moreover, copy relocations are a hack that embed the size of an object in the ABI when it otherwise wouldn't be, and break protected symbols (which are expected to be DSO local), whilst also wasting space, thus they should be avoided whenever possible. As discussed in D70398, RISC-V should move away from forcing Local Exec, and instead use Initial Exec like other targets, with possible linker relaxation to follow. The RISC-V GCC maintainers also intend to adopt this more-conventional behaviour (see https://github.com/riscv/riscv-elf-psabi-doc/issues/122). Reviewers: asb, MaskRay Reviewed By: MaskRay Subscribers: emaste, krytarowski, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, llvm-commits, bsdjhb Tags: #llvm Differential Revision: https://reviews.llvm.org/D70649
* [NFC][KnownBits] Add getMinValue() / getMaxValue() methodsRoman Lebedev2019-12-031-1/+1
| | | | | | | | | | As it can be seen from accompanying cleanup, it is not unheard of to write `~Known.Zero` meaning "what maximal value can this KnownBits produce". But i think `~Known.Zero` isn't *that* self-explanatory, as compared to a method with a name. Note that not all `~Known.Zero` places were cleaned up, only those where this arguably improves things.
* [AArch64] Fix over-eager fusing of NEON SIMD MUL/ADDSanne Wouda2019-12-032-8/+362
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The ISel pattern for SIMD MLA is a bit too eager: it replaces the ADD with an MLA even when the MUL cannot be eliminated, e.g. when it has another use. An MLA is usually has a higher latency than an ADD (and there are fewer pipes available that can execute it), so trading an MLA for an ADD is not great. ISel is not taking the number of uses of the MUL result into account, nor any other factors such as the length of the critical path or other resource pressure. The MachineCombiner is able to make these judgments so this patch ports the ISel pattern for MUL/ADD fusing to the MachineCombiner. Similarly for MUL/SUB -> MLS, as well as the indexed variants. The change has no impact on SPEC CPU© intrate nor fprate. Reviewers: dmgreen, SjoerdMeijer, fhahn, Gerolf Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70673
* [Aarch64][SVE] Add intrinsics for gather loads (vector + imm)Sander de Smalen2019-12-034-29/+47
| | | | | | | | | | | | | | | | | | This patch adds intrinsics for SVE gather loads from memory addresses generated by a vector base plus immediate index: * @llvm.aarch64.sve.ld1.gather.imm This intrinsics maps 1-1 to the corresponding SVE instruction (example for half-words): * ld1h { z0.d }, p0/z, [z0.d, #16] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, kmclaughlin, eli.friedman, rengolin, rovka, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70806
* [Aarch64][SVE] Add intrinsics for gather loads with 32-bits offsetsSander de Smalen2019-12-034-46/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds intrinsics for SVE gather loads for which the offsets are 32-bits wide and are: * unscaled * @llvm.aarch64.sve.ld1.gather.sxtw * @llvm.aarch64.sve.ld1.gather.uxtw * scaled (offsets become indices) * @llvm.arch64.sve.ld1.gather.sxtw.index * @llvm.arch64.sve.ld1.gather.uxtw.index The offsets are either zero (uxtw) or sign (sxtw) extended to 64 bits. These intrinsics map 1-1 to the corresponding SVE instructions (examples for half-words): * unscaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw] * scaled * ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1] * ld1h { z0.s }, p0/z, [x0, z0.s, uxtw #1] Committed on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, kmclaughlin, eli.friedman, rengolin, rovka, huntergr, dancgr, mgudim, efriedma Reviewed By: sdesmalen Tags: #llvm Differential Revision: https://reviews.llvm.org/D70782
* [AArch64][SVE2] Implement remaining SVE2 floating-point intrinsicsKerry McLaughlin2019-12-032-17/+42
| | | | | | | | | | | | | | | | | | | Summary: Adds the following intrinsics: - faddp - fmaxp, fminp, fmaxnmp & fminnmp - fmlalb, fmlalt, fmlslb & fmlslt - flogb Reviewers: huntergr, sdesmalen, dancgr, efriedma Reviewed By: sdesmalen Subscribers: efriedma, tschuett, kristof.beyls, hiraditya, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70253
* [AArch64][SVE] Add intrinsics for gather loads with 64-bit offsetsSander de Smalen2019-12-036-29/+158
| | | | | | | | | | | | | | | | | | | | This patch adds the following intrinsics for gather loads with 64-bit offsets: * @llvm.aarch64.sve.ld1.gather (unscaled offset) * @llvm.aarch64.sve.ld1.gather.index (scaled offset) These intrinsics map 1-1 to the following AArch64 instructions respectively (examples for half-words): * ld1h { z0.d }, p0/z, [x0, z0.d] * ld1h { z0.d }, p0/z, [x0, z0.d, lsl #1] Committing on behalf of Andrzej Warzynski (andwar) Reviewers: sdesmalen, huntergr, rovka, mgudim, dancgr, rengolin, efriedma Reviewed By: efriedma Tags: #llvm Differential Revision: https://reviews.llvm.org/D70542
* [AArch64][SVE] Implement shift intrinsicsKerry McLaughlin2019-12-035-21/+69
| | | | | | | | | | | | | | | | | | | | Summary: Adds the following intrinsics: - asr & asrd - insr - lsl & lsr This patch also adds a new AArch64ISD node (INSR) to represent the int_aarch64_sve_insr intrinsic. Reviewers: huntergr, sdesmalen, dancgr, mgudim, rengolin, efriedma Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cameron.mcinally, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70437
* [CodeGen] Move ARMCodegenPrepare to TypePromotionSam Parker2019-12-034-1074/+1
| | | | | | | | | | | | | | | | | | Convert ARMCodeGenPrepare into a generic type promotion pass by: - Removing the insertion of arm specific intrinsics to handle narrow types as we weren't using this. - Removing ARMSubtarget references. - Now query a generic TLI object to know which types should be promoted and what they should be promoted to. - Move all codegen tests into Transforms folder and testing using opt and not llc, which is how they should have been written in the first place... The pass searches up from icmp operands in an attempt to safely promote types so we can avoid generating unnecessary unsigned extends during DAG ISel. Differential Revision: https://reviews.llvm.org/D69556
* [NFC][PowerPC] Add the inheritable and additional features to make the ↵QingShan Zhang2019-12-031-39/+81
| | | | | | | | | | | | | | | processor definition more clear The old processor design assume that, all the old processor's feature must be inherited into future processor. That is not true as instruction fusion or some implementation defined features are not inheritable. What this patch did: * Rename the old "specific features" to "additional features" that keep the new added inheritable features. * Use the "specific features" to keep those features only for specific processor. * Add the "inheritable features" to keep all the features that inherited from early processor. Differential Revision: https://reviews.llvm.org/D70768
* [X86] Model MXCSR for AVX instructions other than AVX512Wang, Pengfei2019-12-032-12/+17
| | | | | | | | | | | | Summary: Model MXCSR for AVX instructions other than AVX512 Reviewers: craig.topper, RKSimon Subscribers: hiraditya, llvm-commits, LuoYuanke, LiuChen3 Tags: #llvm Differential Revision: https://reviews.llvm.org/D70875
* [MIBundles] Move analyzePhysReg out of MIBundleOperands iterator (NFC).Florian Hahn2019-12-021-2/+1
| | | | | | | | | | | analyzePhysReg does not really fit into the iterator and moving it makes it easier to change the base iterator. Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm, qcolombet Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D70559
* [ARM] Add ARMVCCThen to tablegen and make use of it. NFCDavid Green2019-12-022-71/+76
| | | | | | | Similar to the parent, this adds some constants to tablegen to replace the existing magic values. Differential Revision: https://reviews.llvm.org/D70825
* [ARM] Add ARMCC constants to tablegen. NFCDavid Green2019-12-023-151/+133
| | | | | | | | | | I got tired of looking at magic constants in tablegen files. This adds condition codes like ARMCCeq and makes use of them. I also removed the extra patterns for reverse condition codes from D70296, they should now be covered by the parent commit. Differential Revision: https://reviews.llvm.org/D70824
* [ARM] Add some VCMP folding and canonicalisationDavid Green2019-12-023-27/+62
| | | | | | | | | | The VCMP instructions in MVE can accept a register or ZR, but only as the right hand operator. Most of the time this will already be correct because the icmp will have been canonicalised that way already. There are some cases in the lowering of float conditions that this will not apply to though. This code should fix up those cases. Differential Revision: https://reviews.llvm.org/D70822
* [ARM,MVE] Add intrinsics to deal with predicates.Simon Tatham2019-12-021-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This commit adds the `vpselq` intrinsics which take an MVE predicate word and select lanes from two vectors; the `vctp` intrinsics which create a tail predicate word suitable for processing the first m elements of a vector (e.g. in the last iteration of a loop); and `vpnot`, which simply complements a predicate word and is just syntactic sugar for the `~` operator. The `vctp` ACLE intrinsics are lowered to the IR intrinsics we've already added (and which D70592 just reorganized). I've filled in the missing isel rule for VCTP64, and added another set of rules to generate the predicated forms. I needed one small tweak in MveEmitter to allow the `unpromoted` type modifier to apply to predicates as well as integers, so that `vpnot` doesn't pointlessly convert its input integer to an `<n x i1>` before complementing it. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70485
* [ARM,MVE] Rename and clean up VCTP IR intrinsics.Simon Tatham2019-12-022-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: D65884 added a set of Arm IR intrinsics for the MVE VCTP instruction, to use in tail predication. But the 64-bit one doesn't work properly: its predicate type is `<2 x i1>` / `v2i1`, which isn't a legal MVE type (due to not having a full set of instructions that manipulate it usefully). The test of `vctp64` in `basic-tail-pred.ll` goes through `opt` fine, as the test expects, but if you then feed it to `llc` it causes a type legality failure at isel time. The usual workaround we've been using in the rest of the MVE intrinsics family is to bodge `v2i1` into `v4i1`. So I've adjusted the `vctp64` IR intrinsic to do that, and completely removed the code (and test) that uses that intrinsic for 64-bit tail predication. That will allow me to add isel rules (upcoming in D70485) that actually generate the VCTP64 instruction. Also renamed all four of these IR intrinsics so that they have `mve` in the name, since its absence was confusing. Reviewers: ostannard, MarkMurrayARM, dmgreen Reviewed By: MarkMurrayARM Subscribers: samparker, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70592
* [PowerPC] Fix crash in peephole optimizationNemanja Ivanovic2019-12-021-2/+4
| | | | | | | | | | | | When converting reg+reg shifts to reg+imm rotates, we neglect to consider the CodeGenOnly versions of the 32-bit shift mnemonics. This means we produce a rotate with missing operands which causes a crash. Committing this fix without review since it is non-controversial that the list of mnemonics to consider should include the 64-bit aliases for the exact mnemonics. Fixes PR44183.
* [ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-AVictor Campos2019-12-022-0/+44
| | | | | | | | | | | | | | | | | | | Summary: Add support for vcadd_* family of intrinsics. This set of intrinsics is available in Armv8.3-A. The fp16 versions require the FP16 extension, which has been available (opt-in) since Armv8.2-A. Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70862
* AMDGPU: Fixed indeterminate map iteration in SIPeepholeSDWATim Renouf2019-12-021-2/+3
| | | | | | Differential Revision: https://reviews.llvm.org/D70783 Change-Id: Ic26f915a4acb4c00ecefa9d09d7c24cec370ed06
* [ARM][MVE][Intrinsics] Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics.Mark Murray2019-12-021-38/+55
| | | | | | | | | | Summary: Add VMINQ/VMAXQ/VMINNMQ/VMAXNMQ intrinsics and their predicated versions. Add unit tests. Subscribers: kristof.beyls, hiraditya, dmgreen, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70829
* [ARM] Remove VHADD patternsDavid Green2019-12-021-54/+0
| | | | | | | | | | | | | | | These instructions do not work quite like I expected them to. They perform the addition and then shift in a higher precision integer, so do not match up with the patterns that we added. For example with s8s, adding 100 and 100 should wrap leaving the shift to work on a negative number. VHADD will instead do the arithmetic in higher precision, giving 100 overall. The vhadd gives a "better" result, but not one that matches up with the input. I am just removing the patterns here. We might be able to re-add them in the future by checking for wrap flags or changing bitwidths. But for the moment just remove them to remove the problem cases.
* Fix broken comment phrasing and indentationMatt Arsenault2019-12-021-7/+6
|
* AMDGPU/GlobalISel: Add AGPR bank and RegBankSelect mfma intrinsicsAustin Kerbow2019-12-015-14/+134
| | | | Differential Revision: https://reviews.llvm.org/D70871
* [X86] Add floating point execution domain to ↵Craig Topper2019-11-302-49/+74
| | | | comi/ucomi/cvtss2si/cvtsd2si/cvttss2si/cvttsd2si/cvtsi2ss/cvtsi2sd instructions.
* Revert 651f07908a1 "[AArch64] Don't combine callee-save and local stack ↵Hans Wennborg2019-11-301-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | adjustment when optimizing for size" This caused asserts (and perhaps also miscompiles) while building for Windows on AArch64. See the discussion on D68530 for details and reproducer. Reverting until this can be investigated and fixed. > For arm64, D18619 introduced the ability to combine bumping the stack pointer > upfront in case it needs to be bumped for both the callee-save area as well as > the local stack area. > > That diff already remarks that "This change can cause an increase in > instructions", but argues that even when that happens, it should be still be a > performance benefit because the number of micro-ops is reduced. > > We have observed that this code-size increase can be significant in practice. > This diff disables combining stack bumping for methods that are marked as > optimize-for-size. > > Example of a prologue with the behavior before this diff (combining stack bumping when possible): > sub sp, sp, #0x40 > stp d9, d8, [sp, #0x10] > stp x20, x19, [sp, #0x20] > stp x29, x30, [sp, #0x30] > add x29, sp, #0x30 > [... compute x8 somehow ...] > stp x0, x8, [sp] > > And after this diff, if the method is marked as optimize-for-size: > stp d9, d8, [sp, #-0x30]! > stp x20, x19, [sp, #0x10] > stp x29, x30, [sp, #0x20] > add x29, sp, #0x20 > [... compute x8 somehow ...] > stp x0, x8, [sp, #-0x10]! > > Note that without combining the stack bump there are two auto-decrements, > nicely folded into the stp instructions, whereas otherwise there is a single > sub sp, ... instruction, but not folded. > > Patch by Nikolai Tillmann! > > Differential Revision: https://reviews.llvm.org/D68530
* [PowerPC][AIX] Add support for lowering int/float/double formal arguments.Sean Fertile2019-11-292-3/+119
| | | | | | | | | | | | | This patch adds LowerFormalArguments_AIX, support is added for lowering int, float, and double formal arguments into general purpose and floating point registers only. The aix calling convention testcase have been redone to test for caller and callee functionality in the same lit test. Patch by Zarko Todorovski! Differential Revision: https://reviews.llvm.org/D69578
* Revert "[ARM] Allocatable Global Register Variables for ARM"Carey Williams2019-11-298-68/+22
| | | | This reverts commit 2d739f98d8a53e38bf9faa88cdb6b0c2a363fb77.
* [ARM] Fix instruction selection for ARMISD::CMOV with f16 typeVictor Campos2019-11-292-1/+8
| | | | | | | | | | | | | | | | | | | | | | Summary: In the cases where the CMOV (f16) SDNode is used with condition codes LT, LE, VC or NE, it is successfully selected into a VSEL instruction. In the remaining cases, however, instruction selection fails since VSEL does not support other condition codes. This patch handles such cases by using the single-precision version of the VMOV instruction. Reviewers: ostannard, dmgreen Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70667
* AMDGPU: Reuse carry out register during FI eliminationAustin Kerbow2019-11-282-6/+14
| | | | | | | | | | | | | | | | | | | Summary: Pre gfx9 we need to scavenge a 64-bit SGPR to use as the carry out for an Add. If only one SGPR was available this crashed when trying to scavenge another 32bit SGPR to materialize the offset. Instead, reuse a 32-bit SGPR from the carry out as the offset register. Also prefer to use vcc for the unused carry out when it is available. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70614
* [AArch64][v8.3a] Don't emit LDRA '[xN]!' alias in disassembly.Simon Tatham2019-11-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Summary: In rG643ac6c0420b, the syntax `ldraa x1, [x0]!` was added as an alias for `ldraa x1, [x0, #0]!`. That syntax is less obvious in meaning, and also will not be accepted by assemblers that haven't been updated yet. So it would be better not to emit it as the preferred disassembly for that instruction. This change lowers the EmitPriority of the new alias so that the more explicit syntax `[x0, #0]!` is preferred by the disassembler. The new syntax is still accepted by the assembler. Reviewers: ab, ostannard Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70813
* [X86] Add support for STRICT_FP_TO_UINT/SINT from fp128.Craig Topper2019-11-271-4/+9
|
* [X86] Add SSEPackedSingle/Double execution domain to COMI/UCOMI SSE/AVX ↵Craig Topper2019-11-272-32/+34
| | | | instructions.
* [AIX] Emit TOC entries for ASM printingDavid Tenty2019-11-273-22/+117
| | | | | | | | | | | | | | | | | | | | Summary: Emit the correct .toc psuedo op when we change to the TOC and emit TC entries. Make sure TOC psuedos get the right symbols via overriding getMCSymbolForTOCPseudoMO on AIX. Add a test for TOC assembly writing and update tests to include TOC entries. Also make sure external globals have a csect set and handle external function descriptor (originally authored by Jason Liu) so we can emit TOC entries for them. Reviewers: DiggerLin, sfertile, Xiangling_L, jasonliu, hubert.reinterpretcast Reviewed By: jasonliu Subscribers: arphaman, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70461
* [PowerPC] Separate Features that are known to be Power9 specific from Future CPUStefan Pintilie2019-11-271-4/+13
| | | | | | | | The Power 9 CPU has some features that are unlikely to be passed on to future versions of the CPU. This patch separates this out so that future CPU does not inherit them. Differential Revision: https://reviews.llvm.org/D70466
* [PowerPC] Add new Future CPU for PowerPC in LLVMStefan Pintilie2019-11-275-4/+23
| | | | | | | | | | This is a continuation of D70262 The previous patch as listed above added the future CPU in clang. This patch adds the future CPU in the PowerPC backend. At this point the patch simply assumes that a future CPU will have the same characteristics as pwr9. Those characteristics may change with later patches. Differential Revision: https://reviews.llvm.org/D70333
* [x86] make SLM extract vector element more expensive than defaultSanjay Patel2019-11-271-0/+14
| | | | | | | | | | | | | | | | | | | I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607
* [ARM][MVE][Intrinsics] Add MVE VAND/VORR/VORN/VEOR/VBIC intrinsics. Add unit ↵Mark Murray2019-11-271-45/+53
| | | | | | | | | | | | | | tests. Summary: Add MVE VAND/VORR/VORN/VEOR/VBIC intrinsics. Add unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70547
* [ARM][MVE][Intrinsics] Add MVE VMUL intrinsics. Remove annoying "t1" from ↵Mark Murray2019-11-271-21/+49
| | | | | | | | | | | | | | VMUL* instructions. Add unit tests. Summary: Add MVE VMUL intrinsics. Remove annoying "t1" from VMUL* instructions. Add unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70546
* [ARM][MVE][Intrinsics] Add MVE VABD intrinsics. Add unit tests.Mark Murray2019-11-271-9/+53
| | | | | | | | | | | | Summary: Add MVE VABD intrinsics. Add unit tests. Reviewers: simon_tatham, ostannard, dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70545
* [ARM] Replace arm_neon_vqadds with sadd_satDavid Green2019-11-272-28/+31
| | | | | | | | | | This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat. This helps generate vqadds from standard IR nodes, which might be produced from the vectoriser. The old variants are removed in the process. Differential Revision: https://reviews.llvm.org/D69350
* AArch64: support the Apple NEON syntax for v8.2 crypto instructions.Tim Northover2019-11-271-11/+15
| | | | Very simple change, just adding the extra syntax variant.
* [X86] [Win64] Avoid truncating large (> 32 bit) stack allocationsMartin Storsjö2019-11-271-1/+1
| | | | | | | This fixes PR44129, which was broken in a7adc3185b (in 7.0.0 and newer). Differential Revision: https://reviews.llvm.org/D70741
OpenPOWER on IntegriCloud