summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC/PPCInstrFormats.td
Commit message (Collapse)AuthorAgeFilesLines
* [PowerPC][NFC] Rename record instructions to use _rec suffix instead of oJinsong Ji2020-01-061-19/+19
| | | | | | | | | | | | | | | | | | | We use o suffix to indicate record form instuctions, (as it is similar to dot '.' in mne?) This was fine before, as we did not support XO-form. However, with https://reviews.llvm.org/D66902, we now have XO-form support. It becomes confusing now to still use 'o' for record form, and it is weird to have something like 'Oo' . This patch rename all 'o' instructions to use '_rec' instead. Also rename `isDot` to `isRecordForm`. Reviewed By: #powerpc, hfinkel, nemanjai, steven.zhang, lkail Differential Revision: https://reviews.llvm.org/D70758
* [PowerPC] Add Support for indirect calls on AIX.Sean Fertile2019-12-131-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extends the desciptor-based indirect call support for 32-bit codegen, and enables indirect calls for AIX. In-depth Description: In a function descriptor based ABI, a function pointer points at a descriptor structure as opposed to the function's entry point. The descriptor takes the form of 3 pointers: 1 for the function's entry point, 1 for the TOC anchor of the module containing the function definition, and 1 for the environment pointer: struct FunctionDescriptor { void *EntryPoint; void *TOCAnchor; void *EnvironmentPointer; }; An indirect call has several steps of loading the the information from the descriptor into the proper registers for setting up the call. Namely it has to: 1) Save the caller's TOC pointer into the TOC save slot in the linkage area, and then load the callee's TOC pointer into the TOC register (GPR 2 on AIX). 2) Load the function descriptor's entry point into the count register. 3) Load the environment pointer into the environment pointer register (GPR 11 on AIX). 4) Perform the call by branching on count register. 5) Restore the caller's TOC pointer after returning from the indirect call. A couple important caveats to the above: - There is no way to directly load a value from memory into the count register. Instead we populate the count register by loading the entry point address into a gpr and then moving the gpr to the count register. - The TOC restore has to come immediately after the branch on count register instruction (i.e., the 1st instruction executed after we return from the call). This is an implementation limitation. We could, in theory, schedule the restore elsewhere as long as no uses of the TOC pointer fall in between the call and the restore; however, to keep it simple, we insert a pseudo instruction that represents both the indirect branch instruction and the load instruction that restores the caller's TOC from the linkage area. As they flow through the compiler as a single pseudo instruction, nothing can be inserted between them and the caller's TOC is then valid at any use. Differtential Revision: https://reviews.llvm.org/D70724
* [PowerPC][NFC] Consolidate duplicate XX3Form_SetZero and XX3Form_Zero.Jinsong Ji2019-08-141-8/+1
| | | | | | Rename one to XX3Form_SameOp, remove the other one. llvm-svn: 368856
* Add slbfee instruction.Sean Fertile2019-04-151-0/+1
| | | | llvm-svn: 358425
* [PowerPC] Fix reversed bit issue in DCMX mask for "xvtstdcdp" and ↵Stefan Pintilie2019-04-021-2/+2
| | | | | | | | | | | | | | | | "xvtstdcsp" P9 implementation Did experiments on power 9 machine, checked the outputs for NaN & Infinity+ cases with corresponding DCMX bit set. Confirmed the DCMX mask bit for NaN and infinity+ are reversed. This patch fixes the issue. Patch by Victor Huang. Differential Revision: https://reviews.llvm.org/D59384 llvm-svn: 357494
* [PowerPC] Remove UseVSXRegStefan Pintilie2019-03-261-9/+0
| | | | | | | | | | The UseVSXReg flag can be safely removed and the code cleaned up. Patch By: Yi-Hong Liu Differential Revision: https://reviews.llvm.org/D58685 llvm-svn: 357028
* Update the file headers across all of the LLVM projects in the monorepoChandler Carruth2019-01-191-4/+3
| | | | | | | | | | | | | | | | | to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
* [PowerPC][NFC] Sorting out Pseudo related classes to avoid confusionJinsong Ji2018-12-131-2/+19
| | | | | | | | | | | | | | | | | | | | | | | | | There are several Pseudo in PowerPC backend. eg: * ISel Pseudo-instructions , which has let usesCustomInserter=1 in td ExpandISelPseudos -> EmitInstrWithCustomInserter will deal with them. * Post-RA pseudo instruction, which has let isPseudo = 1 in td, or Standard pseudo (SUBREG_TO_REG,COPY etc.) ExpandPostRAPseudos -> expandPostRAPseudo will expand them * Multi-instruction pseudo operations will expand them PPCAsmPrinter::EmitInstruction * Pseudo instruction in CodeEmitter, which has encoding of 0. Currently, in td files, especially PPCInstrVSX.td, we did not distinguish Post-RA pseudo instruction and Pseudo instruction in CodeEmitter very clearly. This patch is to * Rename Pseudo<> class to PPCEmitTimePseudo, which means encoding of 0 in CodeEmitter * Introduce new class PPCPostRAExpPseudo <> for previous PostRA Pseudo * Introduce new class PPCCustomInserterPseudo <> for previous Isel Pseudo Differential Revision: https://reviews.llvm.org/D55143 llvm-svn: 349044
* [NFC] [Power] Fix instruction format for xsrqpiZaara Syeda2018-05-141-0/+21
| | | | | | | | | | | | xsrqpi is currently using Z23Form_1. The instruction format is xsrqpi R,VRT,VRB,RMC. Rathar than bits 11-15 being used for FRA, it should have bits 11-14 reserved and bit 15 for R. This patch adds a new class Z23Form_4 to fix the instruction format. Differential Revision: https://reviews.llvm.org/D46761 llvm-svn: 332253
* [PowerPC] Infrastructure work. Implement getting the opcode for a spill in ↵Stefan Pintilie2018-03-261-5/+46
| | | | | | | | | | | one place. A new function getOpcodeForSpill should now be the only place to get the opcode for a given spilled register. Differential Revision: https://reviews.llvm.org/D43086 llvm-svn: 328556
* [PowerPC] Mark P9 scheduling model completeStefan Pintilie2017-09-221-0/+1
| | | | | | | | | | | | This patch just adds the missing information to the P9 scheduling model to allow the model to be marked as complete. The model has been verified against P9 documentation. The model was verified with utils/schedcover.py. Differential Revision: https://reviews.llvm.org/D35695 llvm-svn: 314026
* [Power9] Add missing Power9 instructions.Tony Jiang2017-09-191-0/+44
| | | | | | | The following 8 instructions are implemented in this patch. addpcis(subpcis, lnia), darn, maddhd, maddhdu, maddld, setb llvm-svn: 313636
* [Power9] Add new instructions for floating point status and control registers.Stefan Pintilie2017-08-281-0/+62
| | | | | | | | | Added the following P9 instructions: mffsce, mffscdrn, mffscdrni, mffscrn, mffscrni, mffsl Differential Revision: https://reviews.llvm.org/D37167 llvm-svn: 311903
* [PowerPC] Implement missing ISA 2.06 instructions.Tony Jiang2017-01-051-0/+6
| | | | | | | Instructions: fctidu[.], fctiwu[.], ftdiv, ftsqrt are not implemented. Implement them and add corresponding test cases in this patch. llvm-svn: 291116
* [PPC] Generate positive FP zero using xor insn instead of loading from ↵Ehsan Amiri2016-10-241-0/+7
| | | | | | | | | | | constant area https://reviews.llvm.org/D23614 Currently we load +0.0 from constant area. That can change to be generated using XOR instruction. llvm-svn: 284995
* [Power9] Part-word VSX integer scalar loads/stores and sign extend instructionsNemanja Ivanovic2016-10-041-0/+10
| | | | | | | | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D23155 This patch removes the VSHRC register class (based on D20310) and adds exploitation of the Power9 sub-word integer loads into VSX registers as well as vector sign extensions. The new instructions are useful for a few purposes: Int to Fp conversions of 1 or 2-byte values loaded from memory Building vectors of 1 or 2-byte integers with values loaded from memory Storing individual 1 or 2-byte elements from integer vectors This patch implements all of those uses. llvm-svn: 283190
* [Power9] Exploit move and splat instructions for build_vector improvementNemanja Ivanovic2016-09-231-0/+7
| | | | | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D21135 This patch exploits the following instructions: mtvsrws lxvwsx mtvsrdd mfvsrld In order to improve some build_vector and extractelement patterns. llvm-svn: 282246
* [PowerPC] Support asm parsing for bc[l][a][+-] mnemonicsHal Finkel2016-09-031-0/+16
| | | | | | | | | | | | | | | | | | | | PowerPC assembly code in the wild, so it seems, has things like this: bc+ 12, 28, .L9 This is a bit odd because the '+' here becomes part of the BO field, and the BO field is otherwise the first operand. Nevertheless, the ISA specification does clearly say that the +- hint syntax applies to all conditional-branch mnemonics (that test either CTR or a condition register, although not the forms which check both), both basic and extended, so this is supposed to be valid. This introduces some asm-parser-only definitions which take only the upper three bits from the specified BO value, and the lower two bits are implied by the +- suffix (via some associated aliases). Fixes PR23646. llvm-svn: 280571
* [PowerPC] Add asm parser/disassembler support for hrfid,nap,slbmfevHal Finkel2016-09-021-0/+19
| | | | | | | | These few book-III instructions are used by the Linux kernel. Partially fixes PR24796. llvm-svn: 280560
* This reverts commit r265505.Kit Barton2016-04-281-73/+0
| | | | | | | Revert "[Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random number, set bool, and dfp test significance". This patch has caused a functional regression in SPEC2k6 namd, and a performance regression in mesa-pipe. llvm-svn: 267927
* [PowerPC] Basic support for P9 byte comparison and count trailing zero insnsNemanja Ivanovic2016-04-131-0/+37
| | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D17850 This patch implements the following instructions: cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd. llvm-svn: 266228
* [Power9] Implement add-pc, multiply-add, modulo, extend-sign-shift, random ↵Chuang-Yu Cheng2016-04-061-0/+74
| | | | | | | | | | | | | | | | | | | | | number, set bool, and dfp test significance This patch implement the following instructions: - addpcis subpcis - maddhd maddhdu maddld - modsw moduw modsd modud - darn - extswsli extswsli. - setb - dtstsfi dtstsfiq Total 15 instructions Reviewers: nemanjai hfinkel tjablin amehsan kbarton http://reviews.llvm.org/D17885 llvm-svn: 265505
* [Power9] Implement copy-paste, msgsync, slb, and stop instructionsChuang-Yu Cheng2016-04-061-0/+11
| | | | | | | | | | | | | This patch implements the following BookII and Book III instructions: - copy copy_first cp_abort paste paste. paste_last - msgsync - slbieg slbsync - stop Total 10 instructions Reviewers: nemanjai hfinkel tjablin amehsan kbarton llvm-svn: 265504
* [PowerPC] Basic support for P9 atomic loads and storesNemanja Ivanovic2016-03-311-0/+14
| | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D18032 This patch provides asm implementation for the following instructions: lwat, ldat, stwat, stdat, ldmx, mcrxrx llvm-svn: 265022
* [Power9] Implement new altivec instructions: bcd* seriesChuang-Yu Cheng2016-03-281-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the following altivec instructions: - Decimal Convert From/to National/Zoned/Signed-QWord: bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq. - Decimal Copy-Sign/Set-Sign: bcdcpsgn. bcdsetsgn. - Decimal Shift/Unsigned-Shift/Shift-and-Round: bcds. bcdus. bcdsr. - Decimal (Unsigned) Truncate: bcdtrunc. bcdutrunc. Total 13 instructions Thanks Amehsan's advice! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D17838 llvm-svn: 264568
* [Power9] Implement new vsx instructions: insert, extract, test data class, ↵Chuang-Yu Cheng2016-03-281-0/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | min/max, reverse, permute, splat This change implements the following vsx instructions: - Scalar Insert/Extract xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp - Vector Insert/Extract xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp xxextractuw xxinsertw - Scalar/Vector Test Data Class xststdcdp xststdcsp xststdcqp xvtstdcdp xvtstdcsp - Maximum/Minimum xsmaxcdp xsmaxjdp xsmincdp xsminjdp - Vector Byte-Reverse/Permute/Splat xxbrd xxbrh xxbrq xxbrw xxperm xxpermr xxspltib 30 instructions Thanks Nemanja for invaluable discussion! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16842 llvm-svn: 264567
* [Power9] Implement new altivec instructions: permute, count zero, extend ↵Chuang-Yu Cheng2016-03-261-0/+15
| | | | | | | | | | | | | | | | | | | | | sign, negate, parity, shift/rotate, mul10 This change implements the following vector operations: - vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw - vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d - vnegd vnegw - vprtybd vprtybq vprtybw - vbpermd vpermr - vrlwnm vrlwmi vrldnm vrldmi vslv vsrv - vmul10cuq vmul10uq vmul10ecuq vmul10euq 28 instructions Thanks Nemanja, Kit for invaluable hints and discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan Phabricator: http://reviews.llvm.org/D15887 llvm-svn: 264504
* [Power9] Implement new vsx instructions: load, store instructions for vector ↵Kit Barton2016-03-081-0/+15
| | | | | | | | | | | | | | | | | | | | and scalar We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to implement this new patch. This patch implements the following vsx instructions: Vector load/store: lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx stxv stxvb16x stxvh8x stxvl stxvll stxvx Scalar load/store: lxsd lxssp lxsibzx lxsihzx stxsd stxssp stxsibx stxsihx 21 instructions Phabricator: http://reviews.llvm.org/D16919 llvm-svn: 262906
* Power9] Implement new vsx instructions: compare and conversionKit Barton2016-02-261-0/+23
| | | | | | | | | | | | | | | | | | | This change implements the following vsx instructions: Quad/Double-Precision Compare: xscmpoqp xscmpuqp xscmpexpdp xscmpexpqp xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp xvcmpnedp(.) xvcmpnesp(.) Quad-Precision Floating-Point Conversion xscvqpdp(o) xscvdpqp xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp xscvdphp xscvhpdp xvcvhpsp xvcvsphp xsrqpi xsrqpix xsrqpxp 28 instructions Phabricator: http://reviews.llvm.org/D16709 llvm-svn: 262068
* [PPC64] Add support for clrbhrb, mfbhrbe, rfebb.Bill Schmidt2015-05-221-0/+26
| | | | | | | | | | | This patch adds support for the ISA 2.07 additions involving the branch history rolling buffer and event-based branching. These will not be used by typical applications, so built-in support is not required. They will only be available via inline assembly. Assembly/disassembly tests are included in the patch. llvm-svn: 238032
* [PowerPC] Add asm/disasm support for dcbt with hintHal Finkel2015-04-231-0/+15
| | | | | | | | | | | | | | | | | | Add assembler/disassembler support for dcbt/dcbtst (and aliases) with the hint field specified (non-zero). Unforunately, the syntax for this instruction is special in that it differs for server vs. embedded cores: dcbt ra, rb, th [server] dcbt th, ra, rb [embedded] where th can be omitted when it is 0. dcbtst is the same. Thus we need to play games in the parser and the printer to flip the operands around on the embedded cores. We'll use the server syntax as the default (binutils currently uses the embedded form by default, but IBM is changing that). We also stop marking dcbtst as having unmodeled side effects (this is not necessary, it is just a hint like dcbt -- noticed by inspection, so no separate test case). llvm-svn: 235657
* Add direct moves to/from VSR and exploit them for FP/INT conversionsNemanja Ivanovic2015-04-111-0/+6
| | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D8928 It adds direct move instructions to/from VSX registers to GPR's. These are exploited for FP <-> INT conversions. llvm-svn: 234682
* Add Hardware Transactional Memory (HTM) SupportKit Barton2015-03-251-0/+54
| | | | | | | | | | | | | | | | | | | | | | | | This patch adds Hardware Transaction Memory (HTM) support supported by ISA 2.07 (POWER8). The intrinsic support is based on GCC one [1], but currently only the 'PowerPC HTM Low Level Built-in Function' are implemented. The HTM instructions follows the RC ones and the transaction initiation result is set on RC0 (with exception of tcheck). Currently approach is to create a register copy from CR0 to GPR and comapring. Although this is suboptimal, since the branch could be taken directly by comparing the CR0 value, it generates code correctly on both test and branch and just return value. A possible future optimization could be elimitate the MFCR instruction to branch directly. The HTM usage requires a recently newer kernel with PPC HTM enabled. Tested on powerpc64 and powerpc64le. This is send along a clang patch to enabled the builtins and option switch. [1] https://gcc.gnu.org/onlinedocs/gcc/PowerPC-Hardware-Transactional-Memory-Built-in-Functions.html Phabricator Review: http://reviews.llvm.org/D8247 llvm-svn: 233204
* Add LLVM support for PPC cryptography builtinsNemanja Ivanovic2015-03-041-0/+33
| | | | | | Review: http://reviews.llvm.org/D7955 llvm-svn: 231285
* [PowerPC] Add support for the QPX vector instruction setHal Finkel2015-02-251-0/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the QPX vector instruction set, which is used by the enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes wide, holding 4 double-precision floating-point values. Boolean values, modeled here as <4 x i1> are actually also represented as floating-point values (essentially { -1, 1 } for { false, true }). QPX shares many features with Altivec and VSX, but is distinct from both of them. One major difference is that, instead of adding completely-separate vector registers, QPX vector registers are extensions of the scalar floating-point registers (lane 0 is the corresponding scalar floating-point value). The operations supported on QPX vectors mirrors that supported on the scalar floating-point values (with some additional ones for permutations and logical/comparison operations). I've been maintaining this support out-of-tree, as part of the bgclang project, for several years. This is not the entire bgclang patch set, but is most of the subset that can be cleanly integrated into LLVM proper at this time. Adding this to the LLVM backend is part of my efforts to rebase bgclang to the current LLVM trunk, but is independently useful (especially for codes that use LLVM as a JIT in library form). The assembler/disassembler test coverage is complete. The CodeGen test coverage is not, but I've included some tests, and more will be added as follow-up work. llvm-svn: 230413
* [PowerPC] Add assembler support for mcrfs and friendsHal Finkel2015-01-151-0/+38
| | | | | | | | | | Fill out our support for the floating-point status and control register instructions (mcrfs and friends). As it turns out, these are necessary for compiling src/test/harness_fp.h in TBB for PowerPC. Thanks to Raf Schietekat for reporting the issue! llvm-svn: 226070
* [PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64Hal Finkel2014-12-231-0/+39
| | | | | | | | | | | | | | | | On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction sequence is interpreted by the unwinding code in libgcc. To make sure these occur as a pair, as with other pairings interpreted by the linker, fuse the two instructions into one instruction (for code generation only). In the future, we might wish to do this by emitting CFI directives instead, but this solution is simpler, and mirrors what GCC does. Additional discussion on this point is contained in the PR. Fixes PR22015. llvm-svn: 224788
* [PowerPC] Add the 'attn' instructionHal Finkel2014-11-251-0/+6
| | | | | | | | The attn instruction is not part of the Power ISA, but is documented in the A2 user manual, and is accepted by the GNU assembler for the A2 and the POWER4+. Reported as part of PR21650. llvm-svn: 222712
* [PowerPC] Add support for dcbtst and icbt (prefetch)Hal Finkel2014-08-231-0/+15
| | | | | | | | | | | | Adds code generation support for dcbtst (data cache prefetch for write) and icbt (instruction cache prefetch for read - Book E cores only). We still end up with a 'cannot select' error for the non-supported prefetch intrinsic forms. This will be fixed in a later commit. Fixes PR20692. llvm-svn: 216339
* tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs.Joerg Sonnenberger2014-08-041-0/+16
| | | | llvm-svn: 214784
* Don't use additional arguments for dss and friends to satisfy DSS_Form,Joerg Sonnenberger2014-08-021-2/+1
| | | | | | | | | when let can do the same thing. Keep the 64bit variants as codegen-only. While they have a different register class, the encoding is the same for 32bit and 64bit mode. Having both present would otherwise confuse the disassembler. llvm-svn: 214636
* Refactor TLBIVAX and add tlbsx.Joerg Sonnenberger2014-07-301-0/+5
| | | | llvm-svn: 214354
* Recognize BookE's mbar instruction.Joerg Sonnenberger2014-07-291-0/+9
| | | | llvm-svn: 214244
* Support move to/from segment register.Joerg Sonnenberger2014-07-291-0/+22
| | | | llvm-svn: 214234
* [PowerPC] Simplify and improve loading into TOC registerUlrich Weigand2014-06-181-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During an indirect function call sequence on the 64-bit SVR4 ABI, generate code must load and then restore the TOC register. This does not use a regular LOAD instruction since the TOC register r2 is marked as reserved. Instead, the are two special instruction patterns: let RST = 2, DS = 2 in def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg), "ld 2, 8($reg)", IIC_LdStLD, [(PPCload_toc i64:$reg)]>, isPPC64; let RST = 2, DS = 10, RA = 1 in def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins), "ld 2, 40(1)", IIC_LdStLD, [(PPCtoc_restore)]>, isPPC64; Note that these not only restrict the destination of the load to r2, but they also restrict the *source* of the load to particular address combinations. The latter is a problem when we want to support the ELFv2 ABI, since there the TOC save slot is no longer at 40(1). This patch replaces those two instructions with a single instruction pattern that only hard-codes r2 as destination, but supports generic addresses as source. This will allow supporting the ELFv2 ABI, and also helps generate more efficient code for calls to absolute addresses (allowing simplification of the ppc64-calls.ll test case). llvm-svn: 211193
* [PowerPC] Initial support for the VSX instruction setHal Finkel2014-03-131-0/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure. The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below). Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However: - CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4 - The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be. - A lot of unnecessary copies make it past the optimizers, and this needs to be fixed. - Many more regression tests are needed. Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures. llvm-svn: 203768
* Add CR-bit tracking to the PowerPC backend for i1 valuesHal Finkel2014-02-281-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451
* Add a disassembler to the PowerPC backendHal Finkel2013-12-191-0/+4
| | | | | | | | | | | | | | | | | | | | | The tests for the disassembler were adapted from the encoder tests, and for the most part, the output from the disassembler matches that encoder-test inputs. There are some places where more-informative mnemonics could be produced (notably for the branch instructions), and those cases are noted in the tests with FIXMEs. Future work includes: - Generating more-informative mnemonics when possible (this may also be done in the printer). - Remove the dependence on positional "numbered" operand-to-variable mapping (for both encoding and decoding). - Internally using 64-bit instruction variants in 64-bit mode (if this turns out to matter). llvm-svn: 197693
* Improve instruction scheduling for the PPC POWER7Hal Finkel2013-12-121-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aside from a few minor latency corrections, the major change here is a new hazard recognizer which focuses on better dispatch-group formation on the POWER7. As with the PPC970's hazard recognizer, the most important thing it does is avoid load-after-store hazards within the same dispatch group. It uses the POWER7's special dispatch-group-terminating nop instruction (instead of inserting multiple regular nop instructions). This new hazard recognizer makes use of the scheduling dependency graph itself, built using AA information, to robustly detect the possibility of load-after-store hazards. significant test-suite performance changes (the error bars are 99.5% confidence intervals based on 5 test-suite runs both with and without the change -- speedups are negative): speedups: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.55171% +/- 0.333168% MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl -17.5576% +/- 14.598% MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl -29.5708% +/- 7.09058% MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt -34.9471% +/- 11.4391% SingleSource/Benchmarks/BenchmarkGame/puzzle -25.1347% +/- 11.0104% SingleSource/Benchmarks/Misc/flops-8 -17.7297% +/- 9.79061% SingleSource/Benchmarks/Shootout-C++/ary3 -35.5018% +/- 23.9458% SingleSource/Regression/C/uint64_to_float -56.3165% +/- 25.4234% SingleSource/UnitTests/Vectorizer/gcc-loops -18.5309% +/- 6.8496% regressions: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 18.351% +/- 12.156% SingleSource/Benchmarks/Shootout-C++/methcall 27.3086% +/- 14.4733% llvm-svn: 197099
* Add IIC_ prefix to PPC instruction-class namesHal Finkel2013-11-271-3/+3
| | | | | | | | | | | | | This adds the IIC_ prefix to the instruction itinerary class names, giving the PPC backend a naming convention for itinerary classes that is more consistent with that used by the X86 and ARM backends. Instruction scheduling in the PPC backend needs a bunch of cleanup and improvement (especially for the ooo cores). This is just a preliminary step. No functionality change intended. llvm-svn: 195890
OpenPOWER on IntegriCloud