summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC/PPC.td
Commit message (Collapse)AuthorAgeFilesLines
...
* [PowerPC]Activate FeatureVSX for the Power targetBill Seurer2014-12-081-5/+3
| | | | | | | | This change activates FeatureVSX for Power 7 and Power 8 in PPC.td. http://reviews.llvm.org/D6570 llvm-svn: 223709
* [PowerPC] Reduce names from Power8Vector to P8VectorBill Schmidt2014-10-101-4/+3
| | | | | | Per Hal Finkel's review, improving typability of some variable names. llvm-svn: 219514
* [PowerPC] Add feature for Power8 vector extensionsBill Schmidt2014-10-101-1/+4
| | | | | | | | | | | | | | | | | | The current VSX feature for PowerPC specifies availability of the VSX instructions added with the 2.06 architecture version. With 2.07, the architecture adds new instructions to both the Category:Vector and Category:VSX instruction sets. Additionally, unaligned vector storage operations have improved performance. This patch adds a feature to provide access to the new instructions and performance capabilities of Power8. For compatibility with GCC, the feature is controlled via a new -mpower8-vector switch, and the feature causes the __POWER8_VECTOR__ builtin define to be generated by the preprocessor. There is a companion patch for cfe being committed at the same time. llvm-svn: 219501
* [PowerPC] Modern Book-E cores support syncHal Finkel2014-10-021-2/+7
| | | | | | | | | | | | | Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923
* Spell e500 feature in lower case.Joerg Sonnenberger2014-08-071-1/+1
| | | | llvm-svn: 215103
* Add first bunch of SPE instructions. As they overlap with Altivec, markJoerg Sonnenberger2014-08-071-0/+2
| | | | | | | them as parser-only until the disassembler is extended to handle predicates properly. llvm-svn: 215102
* Add support for m[ft][di]bat[ul] instructions.Joerg Sonnenberger2014-08-041-0/+2
| | | | llvm-svn: 214731
* Add features for PPC 4xx and e500/e500mc instructions.Joerg Sonnenberger2014-08-041-0/+4
| | | | | | Move the test cases for them into separate files. llvm-svn: 214724
* [PowerPC] Support ELFv1/ELFv2 ABI selection via featuresUlrich Weigand2014-07-281-0/+10
| | | | | | | | | | | | | | | | | | | | While LLVM now supports both ELFv1 and ELFv2 ABIs, their use is currently hard-coded via the target triple: powerpc64-linux is always ELFv1, while powerpc64le-linux is always ELFv2. These are of course the most common scenarios, but in principle it is possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on little-endian systems (and GCC does support that), and there are some special use cases for that (e.g. certain Linux kernel versions could only be built using ELFv1 on LE). This patch implements the LLVM side of supporting this. As precedent on other platforms suggests, ABI options are passed to the back-end as features. Thus, this patch implements two features "elfv1" and "elfv2" that select the desired ABI if present. (If not, the LLVM uses the same default rules as now.) llvm-svn: 214072
* add ppc64/pwr8 as targetWill Schmidt2014-06-261-0/+10
| | | | | | | includes handling DIR_PWR8 where appropriate The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later. llvm-svn: 211779
* [PowerPC] Add a TableGen relation for A-type and M-type VSX FMA instructionsHal Finkel2014-03-251-0/+19
| | | | | | | TableGen will create a lookup table for the A-type FMA instructions providing their corresponding M-form opcodes. This will be used by upcoming commits. llvm-svn: 204746
* [PowerPC] Initial support for the VSX instruction setHal Finkel2014-03-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure. The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below). Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However: - CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4 - The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be. - A lot of unnecessary copies make it past the optimizers, and this needs to be fixed. - Many more regression tests are needed. Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures. llvm-svn: 203768
* [TableGen] Optionally forbid overlap between named and positional operandsHal Finkel2014-03-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are currently two schemes for mapping instruction operands to instruction-format variables for generating the instruction encoders and decoders for the assembler and disassembler respectively: a) to map by name and b) to map by position. In the long run, we'd like to remove the position-based scheme and use only name-based mapping. Unfortunately, the name-based scheme currently cannot deal with complex operands (those with suboperands), and so we currently must use the position-based scheme for those. On the other hand, the position-based scheme cannot deal with (register) variables that are split into multiple ranges. An upcoming commit to the PowerPC backend (adding VSX support) will require this capability. While we could teach the position-based scheme to handle that, since we'd like to move away from the position-based mapping generally, it seems silly to teach it new tricks now. What makes more sense is to allow for partial transitioning: use the name-based mapping when possible, and only use the position-based scheme when necessary. Now the problem is that mixing the two sensibly was not possible: the position-based mapping would map based on position, but would not skip those variables that were mapped by name. Instead, the two sets of assignments would overlap. However, I cannot currently change the current behavior, because there are some backends that rely on it [I think mistakenly, but I'll send a message to llvmdev about that]. So I've added a new TableGen bit variable: noNamedPositionallyEncodedOperands, that can be used to cause the position-based mapping to skip variables mapped by name. llvm-svn: 203767
* Add CR-bit tracking to the PowerPC backend for i1 valuesHal Finkel2014-02-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451
* Add a disassembler to the PowerPC backendHal Finkel2013-12-191-0/+3
| | | | | | | | | | | | | | | | | | | | | The tests for the disassembler were adapted from the encoder tests, and for the most part, the output from the disassembler matches that encoder-test inputs. There are some places where more-informative mnemonics could be produced (notably for the branch instructions), and those cases are noted in the tests with FIXMEs. Future work includes: - Generating more-informative mnemonics when possible (this may also be done in the printer). - Remove the dependence on positional "numbered" operand-to-variable mapping (for both encoding and decoding). - Internally using 64-bit instruction variants in 64-bit mode (if this turns out to matter). llvm-svn: 197693
* Change the default of AsmWriterClassName and isMCAsmWriter.Rafael Espindola2013-12-021-7/+1
| | | | llvm-svn: 196065
* Add a scheduling model (with itinerary) for the PPC POWER7Hal Finkel2013-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a scheduling model for the POWER7 (P7) core, and enables the machine-instruction scheduler when targeting the P7. Scheduling for the P7, like earlier ooo PPC cores, requires considering both dispatch group hazards, and functional unit resources and latencies. These are both modeled in a combined itinerary. Dispatch group formation is still handled by the post-RA scheduler (which still needs to be updated for the P7, but nevertheless does a pretty good job). One interesting aspect of this change is that I've also enabled to use of AA duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark results seem to support this decision (see below), and while this is normally useful for in-order cores, and not for ooo cores like the P7, I think that the dispatch slot hazards are enough like in-order resources to make the AA useful. Test suite significant performance differences (where negative is a speedup, and positive is a regression) vs. the current situation: MultiSource/Benchmarks/BitBench/drop3/drop3 with AA: N/A without AA: -28.7614% +/- 19.8356% (significantly against AA) MultiSource/Benchmarks/FreeBench/neural/neural with AA: -17.7406% +/- 11.2712% without AA: N/A (significantly in favor of AA) MultiSource/Benchmarks/SciMark2-C/scimark2 with AA: -11.2079% +/- 1.80543% without AA: -11.3263% +/- 2.79651% MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt with AA: -41.8649% +/- 17.0053% without AA: -34.5256% +/- 23.7072% MultiSource/Benchmarks/mafft/pairlocalalign with AA: 25.3016% +/- 17.8614% without AA: 38.6629% +/- 14.9391% (significantly in favor of AA) MultiSource/Benchmarks/sim/sim with AA: N/A without AA: 13.4844% +/- 7.18195% (significantly in favor of AA) SingleSource/Benchmarks/BenchmarkGame/Large/fasta with AA: 15.0664% +/- 6.70216% without AA: 12.7747% +/- 8.43043% SingleSource/Benchmarks/BenchmarkGame/puzzle with AA: 82.2713% +/- 26.3567% without AA: 75.7525% +/- 41.1842% SingleSource/Benchmarks/Misc/flops-2 with AA: -37.1621% +/- 20.7964% without AA: -35.2342% +/- 20.2999% (significantly in favor of AA) These are 99.5% confidence intervals from 5 runs per configuration. Regarding the choice to turn on AA during CodeGen, of these results, four seem significantly in favor of using AA, and one seems significantly against. I'm not making this decision based on these numbers alone, but these results seem consistent with results I have from other tests, and so I think that, on balance, using AA is a win. llvm-svn: 195981
* Create a PPC440 SchedMachineModelHal Finkel2013-11-291-6/+6
| | | | | | | Some of the older PPC processor definitions don't have associated SchedMachineModels; correct this for the PPC440. llvm-svn: 195949
* Add support for the VSX target attribute. No functional changeEric Christopher2013-10-161-0/+2
| | | | | | as we don't actually use it to emit any code yet. llvm-svn: 192837
* Mark PPC MFTB and DST (and friends) as deprecatedHal Finkel2013-09-121-12/+25
| | | | | | | | Use the new instruction deprecation feature to mark mftb (now replaced with mfspr) and dst (along with the other Altivec cache control instructions) as deprecated when targeting cores supporting at least ISA v2.03. llvm-svn: 190605
* Add the PPC fcpsgn instructionHal Finkel2013-08-191-5/+7
| | | | | | | | | Modern PPC cores support a floating-point copysign instruction, and we can use this to lower the FCOPYSIGN node (which is created from calls to the libm copysign function). A couple of extra patterns are necessary because the operand types of FCOPYSIGN need not agree. llvm-svn: 188653
* [PowerPC] Support powerpc64le as a syntax-checking target.Bill Schmidt2013-07-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | This patch provides basic support for powerpc64le as an LLVM target. However, use of this target will not actually generate little-endian code. Instead, use of the target will cause the correct little-endian built-in defines to be generated, so that code that tests for __LITTLE_ENDIAN__, for example, will be correctly parsed for syntax-only testing. Code generation will otherwise be the same as powerpc64 (big-endian), for now. The patch leaves open the possibility of creating a little-endian PowerPC64 back end, but there is no immediate intent to create such a thing. The LLVM portions of this patch simply add ppc64le coverage everywhere that ppc64 coverage currently exists. There is nothing of any import worth testing until such time as little-endian code generation is implemented. In the corresponding Clang patch, there is a new test case variant to ensure that correct built-in defines for little-endian code are generated. llvm-svn: 187179
* [PowerPC] Support basic compare mnemonicsUlrich Weigand2013-07-081-0/+10
| | | | | | | | | | | | | | | | | This adds support for the basic mnemoics (with the L operand) for the fixed-point compare instructions. These are defined as aliases for the already existing CMPW/CMPD patterns, depending on the value of L. This requires use of InstAlias patterns with immediate literal operands. To make this work, we need two further changes: - define a RegisterPrefix, because otherwise literals 0 and 1 would be parsed as literal register names - provide a PPCAsmParser::validateTargetOperandClass routine to recognize immediate literals (like ARM does) llvm-svn: 185826
* [PowerPC] Add assembler parserUlrich Weigand2013-05-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds assembler parser support to the PowerPC back end. The parser will run for any powerpc-*-* and powerpc64-*-* triples, but was tested only on 64-bit Linux. The supported syntax is intended to be compatible with the GNU assembler. The parser does not yet support all PowerPC instructions, but it does support anything that is generated by LLVM itself. There is no support for testing restricted instruction sets yet, i.e. the parser will always accept any instructions it knows, no matter what feature flags are given. Instruction operands will be checked for validity and errors generated. (Error handling in general could still be improved.) The patch adds a number of test cases to verify instruction and operand encodings. The tests currently cover all instructions from the following PowerPC ISA v2.06 Book I facilities: Branch, Fixed-point, Floating-Point, and Vector. Note that a number of these instructions are not yet supported by the back end; they are marked with FIXME. A number of follow-on check-ins will add extra features. When they are all included, LLVM passes all tests (including bootstrap) when using clang -cc1as as the system assembler. llvm-svn: 181050
* Add PPC instruction record forms and associated query functionsHal Finkel2013-04-121-1/+37
| | | | | | | | | | | | | | | | | | This is prep. work for the implementation of optimizeCompare. Many PPC instructions have 'record' forms (in almost all cases, this means that the RC bit is set) that cause the result of the instruction to be compared with zero, and the result of that comparison saved in a predefined condition register. In order to add the record forms of the instructions without too much copy-and-paste, the relevant functions have been refactored into multiclasses which define both the record and normal forms. Also, two TableGen-generated mapping functions have been added which allow querying the instruction code for the record form given the normal form (and vice versa). No functionality change intended. llvm-svn: 179356
* Add a SchedMachineModel for the PPC G5Hal Finkel2013-04-051-10/+10
| | | | llvm-svn: 178850
* Add a SchedMachineModel for the PPC A2Hal Finkel2013-04-051-2/+2
| | | | llvm-svn: 178848
* PPC: Enable FRES and FRSQRTE on the default PPC64 descriptionHal Finkel2013-04-031-1/+2
| | | | | | | I discussed this with Bill Schmidt on IRC, and it was decided that this is a safe and reasonable default. llvm-svn: 178659
* Remove some unsupported-feature comments from PPC.tdHal Finkel2013-04-031-3/+0
| | | | | | These refer to the reciprocal estimate support recently committed. llvm-svn: 178618
* Use PPC reciprocal estimates with Newton iteration in fast-math modeHal Finkel2013-04-031-27/+67
| | | | | | | | | | | | | | | | | | | When unsafe FP math operations are enabled, we can use the fre[s] and frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together with some Newton iteration, in order to quickly generate floating-point division and sqrt results. All of these instructions are separately optional, and so each has its own feature flag (except for the Altivec instructions, which are covered under the existing Altivec flag). Doing this is not only faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these computations to be pipelined with other computations in order to hide their overall latency. I've also added a couple of missing fnmsub patterns which turned out to be missing (but are necessary for good code generation of the Newton iterations). Altivec needs a similar fix, but that will probably be more complicated because fneg is expanded for Altivec's v4f32. llvm-svn: 178617
* Add more PPC floating-point conversion instructionsHal Finkel2013-04-011-10/+11
| | | | | | | | | The P7 and A2 have additional floating-point conversion instructions which allow a direct two-instruction sequence (plus load/store) to convert from all combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores, only some combinations were directly available). llvm-svn: 178480
* Add the PPC lfiwax instructionHal Finkel2013-03-311-12/+16
| | | | | | | | | This instruction is available on modern PPC64 CPUs, and is now used to improve the SINT_TO_FP lowering (by eliminating the need for the separate sign extension instruction and decreasing the amount of needed stack space). llvm-svn: 178446
* Add PPC FP rounding instructions fri[mnpz]Hal Finkel2013-03-291-12/+15
| | | | | | | | | These instructions are available on the P5x (and later) and on the A2. They implement the standard floating-point rounding operations (floor, trunc, etc.). One caveat: frin (round to nearest) does not implement "ties to even", and so is only enabled in fast-math mode. llvm-svn: 178337
* Add the PPC64 ldbrx/stdbrx instructionsHal Finkel2013-03-281-14/+14
| | | | | | | | These are 64-bit load/store with byte-swap, and available on the P7 and the A2. Like the similar instructions for 16- and 32-bit words, these are matched in the target DAG-combine phase against load/store-bswap pairs. llvm-svn: 178276
* Add the PPC64 popcntd instructionHal Finkel2013-03-281-4/+7
| | | | | | | PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and tell TTI about it so that popcount-loop recognition will know about it. llvm-svn: 178233
* Add notes about future PowerPC featuresBill Schmidt2013-02-011-0/+17
| | | | llvm-svn: 174232
* LLVM enablement for some older PowerPC CPUsBill Schmidt2013-02-011-0/+20
| | | | llvm-svn: 174230
* Add definitions for the PPC a2q core marked as having QPX availableHal Finkel2013-01-301-0/+7
| | | | | | | | This is the first commit of a large series which will add support for the QPX vector instruction set to the PowerPC backend. This instruction set is used on the IBM Blue Gene/Q supercomputers. llvm-svn: 173973
* Add PPC Freescale e500mc and e5500 subtargets.Hal Finkel2012-08-281-0/+10
| | | | | | | | | Add subtargets for Freescale e500mc (32-bit) and e5500 (64-bit) to the PowerPC backend. Patch by Tobias von Koch. llvm-svn: 162764
* Add support for the PPC isel instruction.Hal Finkel2012-06-221-7/+12
| | | | | | | The isel (integer select) instruction is supported on the 440 and A2 embedded cores and on the POWER7. llvm-svn: 159045
* Fixes for PPC host detection and features.Hal Finkel2012-06-121-3/+3
| | | | | | | | | POWER4 is a 64-bit CPU (better matched to the 970). The g3 is really the 750 (no altivec), the g4+ is the 74xx (not the 750). Patch by Andreas Tobler. llvm-svn: 158363
* Enable MFOCRF generation on the PPC A2 core.Hal Finkel2012-06-111-2/+2
| | | | llvm-svn: 158324
* Rename the PPC target feature gpul to mfocrf.Hal Finkel2012-06-111-7/+7
| | | | | | | | | | | The PPC target feature gpul (IsGigaProcessor) was only used for one thing: To enable the generation of the MFOCRF instruction. Furthermore, this instruction is available on other PPC cores outside of the G5 line. This feature now corresponds to the HasMFOCRF flag. No functionality change. llvm-svn: 158323
* Add POWER6 and POWER7 CPU types to the PPC backend.Hal Finkel2012-06-111-0/+10
| | | | | | No functional change; these will be used by upcoming scheduler enhancements. llvm-svn: 158313
* Fix some 80-col. violations I introduced with the A2 PPC64 core.Hal Finkel2012-04-011-1/+2
| | | | llvm-svn: 153852
* Add instruction itinerary for the PPC64 A2 core.Hal Finkel2012-04-011-0/+4
| | | | | | | This adds a full itinerary for IBM's PPC64 A2 embedded core. These cores form the basis for the CPUs in the new IBM BG/Q supercomputer. llvm-svn: 153842
* Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, ↵Jia Liu2012-02-181-3/+3
| | | | | | MSP430, PPC, PTX, Sparc, X86, XCore. llvm-svn: 150878
* Add PPC 440 scheduler and some associated testsHal Finkel2011-10-171-0/+5
| | | | llvm-svn: 142170
* initial test commit (remove whitespace)Hal Finkel2011-10-141-2/+2
| | | | llvm-svn: 141972
* dissolve some more hacks.Chris Lattner2010-11-151-0/+6
| | | | llvm-svn: 119115
OpenPOWER on IntegriCloud