summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC/PPCSubtarget.h
Commit message (Collapse)AuthorAgeFilesLines
...
* Add LLVM support for PPC cryptography builtinsNemanja Ivanovic2015-03-041-0/+2
| | | | | | Review: http://reviews.llvm.org/D7955 llvm-svn: 231285
* [PowerPC] Add support for the QPX vector instruction setHal Finkel2015-02-251-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the QPX vector instruction set, which is used by the enhanced A2 cores on the IBM BG/Q supercomputers. QPX vectors are 256 bytes wide, holding 4 double-precision floating-point values. Boolean values, modeled here as <4 x i1> are actually also represented as floating-point values (essentially { -1, 1 } for { false, true }). QPX shares many features with Altivec and VSX, but is distinct from both of them. One major difference is that, instead of adding completely-separate vector registers, QPX vector registers are extensions of the scalar floating-point registers (lane 0 is the corresponding scalar floating-point value). The operations supported on QPX vectors mirrors that supported on the scalar floating-point values (with some additional ones for permutations and logical/comparison operations). I've been maintaining this support out-of-tree, as part of the bgclang project, for several years. This is not the entire bgclang patch set, but is most of the subset that can be cleanly integrated into LLVM proper at this time. Adding this to the LLVM backend is part of my efforts to rebase bgclang to the current LLVM trunk, but is independently useful (especially for codes that use LLVM as a JIT in library form). The assembler/disassembler test coverage is complete. The CodeGen test coverage is not, but I've included some tests, and more will be added as follow-up work. llvm-svn: 230413
* Move ABI handling and 64-bitness to the PowerPC target machine.Eric Christopher2015-02-171-9/+4
| | | | | | | This required changing how the computation of the ABI is handled and how some of the checks for ABI/target are done. llvm-svn: 229471
* Move the target machine variable so that it's initialized earlyEric Christopher2015-02-131-2/+1
| | | | | | enough we can use it to initialize frame lowering. llvm-svn: 229168
* Stash the TargetMachine on the subtarget so we can access it later.Eric Christopher2015-02-131-2/+3
| | | | | | Clean up a subtarget function that has it passed in while we're at it. llvm-svn: 229164
* [PowerPC] Implement the vpopcnt instructions for POWER8Bill Schmidt2015-02-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Add the vector population count instructions for byte, halfword, word, and doubleword sizes. There are two major changes here: PPCISelLowering.cpp: Make CTPOP legal for vector types. PPCRegisterInfo.td: Added v2i64 to the VRRC register definition. This is needed for the doubleword variations of the integer ops that were added in P8. Test Plan Test the instruction vpcnt* encoding/decoding in ppc64-encoding-vmx.s Test the generation of the vpopcnt instructions for various vector data types. When adding the v2i64 type to the Vector Register set, I also needed to add the appropriate bit conversion patterns between v2i64 and the existing vector types. Testing for these conversions were also added in the test case by passing a different vector type as a parameter into the test functions. There is also a run step that will ensure the vpopcnt instructions are generated when the vsx feature is disabled. llvm-svn: 228046
* Move DataLayout back to the TargetMachine from TargetSubtargetInfoEric Christopher2015-01-261-4/+0
| | | | | | | | | | | | | | | | | | | derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets(*) have had subtarget dependent code moved out and onto the TargetMachine. *One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113
* [PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect callsHal Finkel2015-01-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the POWER7, A2 and earlier cores) are really pointers to a function descriptor, a structure with three pointers: the actual pointer to the code to which to jump, the pointer to the TOC needed by the callee, and an environment pointer. We used to chain these loads, and make them opaque to the rest of the optimizer, so that they'd always occur directly before the call. This is not necessary, and in fact, highly suboptimal on embedded cores. Once the function pointer is known, the loads can be performed ahead of time; in fact, they can be hoisted out of loops. Now these function descriptors are almost always generated by the linker, and thus the contents of the descriptors are invariant. As a result, by default, we'll mark the associated loads as invariant (allowing them to be hoisted out of loops). I've added a target feature to turn this off, however, just in case someone needs that option (constructing an on-stack descriptor, casting it to a function pointer, and then calling it cannot be well-defined C/C++ code, but I can imagine some JIT-compilation system doing so). Consider this simple test: $ cat call.c typedef void (*fp)(); void bar(fp x) { for (int i = 0; i < 1600000000; ++i) x(); } $ cat main.c typedef void (*fp)(); void bar(fp x); void foo() {} int main() { bar(foo); } On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads as invariant brings the execution time down to ~8 seconds from ~32 seconds with the loads in the loop. The difference on the POWER7 is smaller. Compiling with: gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2] clang -O3 -mcpu=native call.c main.c : ~5.3 seconds clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds (looks like we'd benefit from additional loop unrolling here, as a first guess, because this is faster with the extra loads) The -mno-invariant-function-descriptors will be added to Clang shortly. llvm-svn: 226207
* [PPC64] Add support for the ICBT instruction on POWER8.Bill Schmidt2015-01-141-0/+2
| | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Support for the ICBT instruction is currently present, but limited to embedded processors. This change adds a new FeatureICBT that can be used to identify whether the ICBT instruction is available on a specific processor. Two new tests are added: * Positive test to ensure the icbt instruction is present when using -mcpu=pwr8 * Negative test to ensure the icbt instruction is not generated when using -mcpu=pwr7 Both test cases use the Prefetch opcode in LLVM. They are based on the ppc64-prefetch.ll test case. llvm-svn: 226033
* [cleanup] Re-sort all the #include lines in LLVM usingChandler Carruth2015-01-141-1/+1
| | | | | | | | | | | utils/sort_includes.py. I clearly haven't done this in a while, so more changed than usual. This even uncovered a missing include from the InstrProf library that I've added. No functionality changed here, just mechanical cleanup of the include order. llvm-svn: 225974
* [PowerPC] Add a flag for experimenting with subreg liveness trackingHal Finkel2015-01-091-0/+2
| | | | | | | This cannot yet be enabled by default, it causes ~50 miscompiles in the test suite. llvm-svn: 225497
* [PowerPC] Add support for the CMPB instructionHal Finkel2015-01-031-0/+2
| | | | | | | | | | | | | | Newer POWER cores, and the A2, support the cmpb instruction. This instruction compares its operands, treating each of the 8 bytes in the GPRs separately, returning a 'mask' result of 0 (for false) or -1 (for true) in each byte. Code generation support is added, in the form of a PPCISelDAGToDAG DAG-preprocessing routine, that recognizes patterns close to what the instruction computes (either exactly, or related by a constant masking operation), and generates the cmpb instruction (along with any necessary constant masking operation). This can be expanded if use cases arise. llvm-svn: 225106
* [PowerPC] Reduce names from Power8Vector to P8VectorBill Schmidt2014-10-101-2/+2
| | | | | | Per Hal Finkel's review, improving typability of some variable names. llvm-svn: 219514
* [PowerPC] Add feature for Power8 vector extensionsBill Schmidt2014-10-101-0/+2
| | | | | | | | | | | | | | | | | | The current VSX feature for PowerPC specifies availability of the VSX instructions added with the 2.06 architecture version. With 2.07, the architecture adds new instructions to both the Category:Vector and Category:VSX instruction sets. Additionally, unaligned vector storage operations have improved performance. This patch adds a feature to provide access to the new instructions and performance capabilities of Power8. For compatibility with GCC, the feature is controlled via a new -mpower8-vector switch, and the feature causes the __POWER8_VECTOR__ builtin define to be generated by the preprocessor. There is a companion patch for cfe being committed at the same time. llvm-svn: 219501
* [PowerPC] Modern Book-E cores support syncHal Finkel2014-10-021-0/+2
| | | | | | | | | | | | | Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923
* constify the TargetMachine argument used in the subtarget andEric Christopher2014-10-011-1/+1
| | | | | | lowering constructors. llvm-svn: 218832
* Now that the optimization level is adjusting the feature stringEric Christopher2014-10-011-5/+1
| | | | | | before we hit the subtarget, remove the constructor parameter. llvm-svn: 218817
* Remove resetSubtargetFeatures as it is unused.Eric Christopher2014-09-031-3/+1
| | | | llvm-svn: 217071
* Reinstate "Nuke the old JIT."Eric Christopher2014-09-021-11/+0
| | | | | | | | Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982
* Canonicalize header guards into a common format.Benjamin Kramer2014-08-131-2/+2
| | | | | | | | | | Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558
* Initialize PPC DataLayout based on the Triple only.Eric Christopher2014-08-091-1/+3
| | | | llvm-svn: 215281
* Remove extraneous 64-bit argument to the PPC TargetMachine constructorEric Christopher2014-08-091-4/+4
| | | | | | and update initialization. llvm-svn: 215280
* Temporarily Revert "Nuke the old JIT." as it's not quite ready toEric Christopher2014-08-071-0/+11
| | | | | | | | | | | be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154
* Nuke the old JIT.Rafael Espindola2014-08-071-11/+0
| | | | | | | | | I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111
* Add first bunch of SPE instructions. As they overlap with Altivec, markJoerg Sonnenberger2014-08-071-0/+2
| | | | | | | them as parser-only until the disassembler is extended to handle predicates properly. llvm-svn: 215102
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-041-8/+19
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* Add support for m[ft][di]bat[ul] instructions.Joerg Sonnenberger2014-08-041-0/+2
| | | | llvm-svn: 214731
* Add features for PPC 4xx and e500/e500mc instructions.Joerg Sonnenberger2014-08-041-0/+4
| | | | | | Move the test cases for them into separate files. llvm-svn: 214724
* [PowerPC] Support ELFv1/ELFv2 ABI selection via featuresUlrich Weigand2014-07-281-3/+7
| | | | | | | | | | | | | | | | | | | | While LLVM now supports both ELFv1 and ELFv2 ABIs, their use is currently hard-coded via the target triple: powerpc64-linux is always ELFv1, while powerpc64le-linux is always ELFv2. These are of course the most common scenarios, but in principle it is possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on little-endian systems (and GCC does support that), and there are some special use cases for that (e.g. certain Linux kernel versions could only be built using ELFv1 on LE). This patch implements the LLVM side of supporting this. As precedent on other platforms suggests, ABI options are passed to the back-end as features. Thus, this patch implements two features "elfv1" and "elfv2" that select the desired ABI if present. (If not, the LLVM uses the same default rules as now.) llvm-svn: 214072
* [PowerPC] ELFv2 function call changesUlrich Weigand2014-07-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch builds upon the two preceding MC changes to implement the basic ELFv2 function call convention. In the ELFv1 ABI, a "function descriptor" was associated with every function, pointing to both the entry address and the related TOC base (and a static chain pointer for nested functions). Function pointers would actually refer to that descriptor, and the indirect call sequence needed to load up both entry address and TOC base. In the ELFv2 ABI, there are no more function descriptors, and function pointers simply refer to the (global) entry point of the function code. Indirect function calls simply branch to that address, after loading it up into r12 (as required by the ABI rules for a global entry point). Direct function calls continue to just do a "bl" to the target symbol; this will be resolved by the linker to the local entry point of the target function if it is local, and to a PLT stub if it is global. That PLT stub would then load the (global) entry point address of the final target into r12 and branch to it. Note that when performing a local function call, r2 must be set up to point to the current TOC base: if the target ends up local, the ABI requires that its local entry point is called with r2 set up; if the target ends up global, the PLT stub requires that r2 is set up. This patch implements all LLVM changes to implement that scheme: - No longer create a function descriptor when emitting a function definition (in EmitFunctionEntryLabel) - Emit two entry points *if* the function needs the TOC base (r2) anywhere (this is done EmitFunctionBodyStart; note that this cannot be done in EmitFunctionBodyStart because the global entry point prologue code must be *part* of the function as covered by debug info). - In order to make use tracking of r2 (as needed above) work correctly, mark direct function calls as implicitly using r2. - Implement the ELFv2 indirect function call sequence (no function descriptors; load target address into r12). - When creating an ELFv2 object file, emit the .abiversion 2 directive to tell the linker to create the appropriate version of PLT stubs. Reviewed by Hal Finkel. llvm-svn: 213489
* [PowerPC] 32-bit ELF PIC supportHal Finkel2014-07-181-0/+3
| | | | | | | | | | This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend. Patch by Justin Hibbits! llvm-svn: 213427
* Move Post RA Scheduling flag bit into SchedMachineModelSanjay Patel2014-07-151-5/+5
| | | | | | | | | | | | | | | | | | | | | Refactoring; no functional changes intended Removed PostRAScheduler bits from subtargets (X86, ARM). Added PostRAScheduler bit to MCSchedModel class. This bit is set by a CPU's scheduling model (if it exists). Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses. Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!). Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling. Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values. Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. c. PPC overrides the CPU's postRA settings by enabling postRA for everything. d. X86 is the only target that actually has postRA specified via sched model info. Differential Revision: http://reviews.llvm.org/D4217 llvm-svn: 213101
* add ppc64/pwr8 as targetWill Schmidt2014-06-261-0/+1
| | | | | | | includes handling DIR_PWR8 where appropriate The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later. llvm-svn: 211779
* Fix typo.Eric Christopher2014-06-131-1/+1
| | | | llvm-svn: 210947
* Move the PPCSelectionDAGInfo off the TargetMachine and onto theEric Christopher2014-06-121-0/+3
| | | | | | subtarget. llvm-svn: 210854
* Move PPCTargetLowering off of the TargetMachine and onto the subtarget.Eric Christopher2014-06-121-1/+4
| | | | llvm-svn: 210852
* Move PPCJITInfo off of the TargetMachine and onto the subtarget.Eric Christopher2014-06-121-1/+4
| | | | | | | Needed to migrate a few functions around to avoid circular header dependencies. llvm-svn: 210845
* Move PPCInstrInfo off of the target machine and onto the subtarget.Eric Christopher2014-06-121-0/+3
| | | | llvm-svn: 210839
* Move DataLayout from the PPCTargetMachine to the subtarget.Eric Christopher2014-06-121-0/+4
| | | | llvm-svn: 210824
* Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. UseEric Christopher2014-06-121-0/+8
| | | | | | | | the initializeSubtargetDependencies code to obtain an initialized subtarget and migrate a couple of subtarget using functions to the .cpp file to avoid circular includes. llvm-svn: 210822
* Make early if conversion dependent upon the subtarget and addEric Christopher2014-05-211-0/+2
| | | | | | | a subtarget hook to enable. Unconditionally add to the pass pipeline for targets that might want to use it. No functional change. llvm-svn: 209340
* Save the optimization level the subtarget was created with in aEric Christopher2014-05-131-0/+3
| | | | | | | | | | member variable and sink the initialization of crbits into the subtarget feature reset code. No functional change, but this refactor will be used in a future commit. llvm-svn: 208726
* [C++11] Add 'override' keywords and remove 'virtual'. Additionally add ↵Craig Topper2014-04-291-5/+5
| | | | | | 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. PowerPC edition llvm-svn: 207504
* [PowerPC] Initial support for the VSX instruction setHal Finkel2014-03-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VSX is an ISA extension supported on the POWER7 and later cores that enhances floating-point vector and scalar capabilities. Among other things, this adds <2 x double> support and generally helps to reduce register pressure. The interesting part of this ISA feature is the register configuration: there are 64 new 128-bit vector registers, the 32 of which are super-registers of the existing 32 scalar floating-point registers, and the second 32 of which overlap with the 32 Altivec vector registers. This makes things like vector insertion and extraction tricky: this can be free but only if we force a restriction to the right register subclass when needed. A new "minipass" PPCVSXCopy takes care of this (although it could do a more-optimal job of it; see the comment about unnecessary copies below). Please note that, currently, VSX is not enabled by default when targeting anything because it is not yet ready for that. The assembler and disassembler are fully implemented and tested. However: - CodeGen support causes miscompiles; test-suite runtime failures: MultiSource/Benchmarks/FreeBench/distray/distray MultiSource/Benchmarks/McCat/08-main/main MultiSource/Benchmarks/Olden/voronoi/voronoi MultiSource/Benchmarks/mafft/pairlocalalign MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 SingleSource/Benchmarks/CoyoteBench/almabench SingleSource/Benchmarks/Misc/matmul_f64_4x4 - The lowering currently falls back to using Altivec instructions far more than it should. Worse, there are some things that are scalarized through the stack that shouldn't be. - A lot of unnecessary copies make it past the optimizers, and this needs to be fixed. - Many more regression tests are needed. Normally, I'd fix these things prior to committing, but there are some students and other contributors who would like to work this, and so it makes sense to move this development process upstream where it can be subject to the regular code-review procedures. llvm-svn: 203768
* Don't avoid cfi instructions on the bg/p.Rafael Espindola2014-03-071-2/+0
| | | | | | | The integrated assembler now works for ppc. Since this was the last use of the bg/p predicate and Hal says that it is now dead, drop the predicate too. llvm-svn: 203269
* Add CR-bit tracking to the PowerPC backend for i1 valuesHal Finkel2014-02-281-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change enables tracking i1 values in the PowerPC backend using the condition register bits. These bits can be treated on PowerPC as separate registers; individual bit operations (and, or, xor, etc.) are supported. Tracking booleans in CR bits has several advantages: - Reduction in register pressure (because we no longer need GPRs to store boolean values). - Logical operations on booleans can be handled more efficiently; we used to have to move all results from comparisons into GPRs, perform promoted logical operations in GPRs, and then move the result back into condition register bits to be used by conditional branches. This can be very inefficient, because the throughput of these CR <-> GPR moves have high latency and low throughput (especially when other associated instructions are accounted for). - On the POWER7 and similar cores, we can increase total throughput by using the CR bits. CR bit operations have a dedicated functional unit. Most of this is more-or-less mechanical: Adjustments were needed in the calling-convention code, support was added for spilling/restoring individual condition-register bits, and conditional branch instruction definitions taking specific CR bits were added (plus patterns and code for generating bit-level operations). This is enabled by default when running at -O2 and higher. For -O0 and -O1, where the ability to debug is more important, this feature is disabled by default. Individual CR bits do not have assigned DWARF register numbers, and storing values in CR bits makes them invisible to the debugger. It is critical, however, that we don't move i1 values that have been promoted to larger values (such as those passed as function arguments) into bit registers only to quickly turn around and move the values back into GPRs (such as happens when values are returned by functions). A pair of target-specific DAG combines are added to remove the trunc/extends in: trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) In short, we only want to use CR bits where some of the i1 values come from comparisons or are used by conditional branches or selects. To put it another way, if we can do the entire i1 computation in GPRs, then we probably should (on the POWER7, the GPR-operation throughput is higher, and for all cores, the CR <-> GPR moves are expensive). POWER7 test-suite performance results (from 10 runs in each configuration): SingleSource/Benchmarks/Misc/mandel-2: 35% speedup MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown MultiSource/Applications/lemon/lemon: 8% slowdown llvm-svn: 202451
* Move PPC's getDataLayoutString out of line and document it better.Rafael Espindola2013-12-111-16/+0
| | | | llvm-svn: 196987
* Add support for the VSX target attribute. No functional changeEric Christopher2013-10-161-0/+1
| | | | | | as we don't actually use it to emit any code yet. llvm-svn: 192837
* Mark PPC MFTB and DST (and friends) as deprecatedHal Finkel2013-09-121-0/+4
| | | | | | | | Use the new instruction deprecation feature to mark mftb (now replaced with mfspr) and dst (along with the other Altivec cache control instructions) as deprecated when targeting cores supporting at least ISA v2.03. llvm-svn: 190605
* Enable MI scheduling (and CodeGen AA) by default for embedded PPC coresHal Finkel2013-09-111-0/+8
| | | | | | | For embedded PPC cores (especially the A2 core), using the MI scheduler with AA is far superior to the other scheduling options. llvm-svn: 190558
OpenPOWER on IntegriCloud