summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
...
* Add mftblo and mftbhi for PPC 4xx.Joerg Sonnenberger2014-08-051-0/+5
| | | | llvm-svn: 214863
* Add lswi / stswi for assembler use with a warning to not add patternsJoerg Sonnenberger2014-08-051-0/+10
| | | | | | for them. llvm-svn: 214862
* Have MachineFunction cache a pointer to the subtarget to make lookupsEric Christopher2014-08-055-64/+42
| | | | | | | | | | | shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838
* Add TCR register accessJoerg Sonnenberger2014-08-041-0/+3
| | | | llvm-svn: 214826
* Add PPC 603's tlbld and tlbli instructions.Joerg Sonnenberger2014-08-041-0/+5
| | | | llvm-svn: 214825
* [PPC64LE] Fix wrong IR for vec_sld and vec_vsldoiBill Schmidt2014-08-041-30/+12
| | | | | | | | | | | | | | | My original LE implementation of the vsldoi instruction, with its altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect shufflevector operations in the LLVM IR. Correct code is generated because the back end handles the incorrect shufflevector in a consistent manner. This patch and a companion patch for Clang correct this problem by removing the fixup from altivec.h and the corresponding fixup from the PowerPC back end. Several test cases are also modified to reflect the now-correct LLVM IR. llvm-svn: 214800
* Add simplified aliases for access to DCCR, ICCR, DEAR and ESRJoerg Sonnenberger2014-08-041-0/+12
| | | | llvm-svn: 214797
* tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs.Joerg Sonnenberger2014-08-042-0/+39
| | | | llvm-svn: 214784
* Remove the TargetMachine forwards for TargetSubtargetInfo basedEric Christopher2014-08-0415-122/+154
| | | | | | information and update all callers. No functional change. llvm-svn: 214781
* Recognize mftbl as alias for mftb, for symmetry with mttb.Joerg Sonnenberger2014-08-041-0/+1
| | | | llvm-svn: 214769
* Rename PPCLinuxMCAsmInfo to PPCELFMCAsmInfo to better reflect theJoerg Sonnenberger2014-08-043-5/+5
| | | | | | systems it represents. llvm-svn: 214755
* Allow .lcomm with alignment on ELF targets.Joerg Sonnenberger2014-08-041-0/+1
| | | | llvm-svn: 214754
* Refactor SPRG instructions.Joerg Sonnenberger2014-08-041-35/+16
| | | | llvm-svn: 214733
* Add support for m[ft][di]bat[ul] instructions.Joerg Sonnenberger2014-08-044-0/+33
| | | | llvm-svn: 214731
* Add features for PPC 4xx and e500/e500mc instructions.Joerg Sonnenberger2014-08-044-4/+18
| | | | | | Move the test cases for them into separate files. llvm-svn: 214724
* [PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endianUlrich Weigand2014-08-043-36/+68
| | | | | | | | | | | | | In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl* by swapping the input arguments. As it turns out, the exact same fix is also required for the vpkuhum/vpkuwum patterns. This fixes another regression in llvmpipe when vector support is enabled. Reviewed by Bill Schmidt. llvm-svn: 214718
* [PowerPC] MULHU/MULHS are not legal for vector typesUlrich Weigand2014-08-041-0/+2
| | | | | | | | | | I ran into some test failures where common code changed vector division by constant into a multiply-high operation (MULHU). But these are not implemented by the back-end, so we failed to recognize the insn. Fixed by marking MULHU/MULHS as Expand for vector types. llvm-svn: 214716
* [PowerPC] Fix and improve vector comparisonsUlrich Weigand2014-08-042-150/+111
| | | | | | | | | | | | | | | | | | | | This patch refactors code generation of vector comparisons. This fixes a wrong code-gen bug for ISD::SETGE for floating-point types, and improves generated code for vector comparisons in general. Specifically, the patch moves all logic deciding how to implement vector comparisons into getVCmpInst, which gets two extra boolean outputs indicating to its caller whether its needs to swap the input operands and/or negate the result of the comparison. Apart from implementing these two modifications as directed by getVCmpInst, there is no need to ever implement vector comparisons in any other manner; in particular, there is never a need to perform two separate comparisons (e.g. one for equal and one for greater-than, as code used to do before this patch). Reviewed by Bill Schmidt. llvm-svn: 214714
* tlbia supportJoerg Sonnenberger2014-08-022-0/+4
| | | | llvm-svn: 214640
* mfdcr / mtdcr supportJoerg Sonnenberger2014-08-021-0/+5
| | | | llvm-svn: 214639
* Don't use additional arguments for dss and friends to satisfy DSS_Form,Joerg Sonnenberger2014-08-022-63/+54
| | | | | | | | | when let can do the same thing. Keep the 64bit variants as codegen-only. While they have a different register class, the encoding is the same for 32bit and 64bit mode. Having both present would otherwise confuse the disassembler. llvm-svn: 214636
* Add a non-const subtarget returning function to the target machineEric Christopher2014-08-011-1/+2
| | | | | | | | | so that we can use it to get the old-style JIT out of the subtarget. This code should be removed when the old-style JIT is removed (imminently). llvm-svn: 214560
* [PowerPC] PR20280 - Slots for byval parameters are not immutableUlrich Weigand2014-08-011-1/+1
| | | | | | | | | Found by inspection while looking at PR20280: code would mark slots in the parameter save area where a byval parameter is passed as "immutable". This is not correct since code is allowed to modify byval parameters in place in the parameter save area. llvm-svn: 214517
* [PowerPC] Generate unaligned vector loads using intrinsics instead of ↵Hal Finkel2014-08-011-50/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | regular loads Altivec vector loads on PowerPC have an interesting property: They always load from an aligned address (by rounding down the address actually provided if necessary). In order to generate an actual unaligned load, you can generate two load instructions, one with the original address, one offset by one vector length, and use a special permutation to extract the bytes desired. When this was originally implemented, I generated these two loads using regular ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with this: The alignment of a load does not contribute to its identity, and SDNodes are uniqued. So, imagine that we have some unaligned load, L1, that is not aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned). Further imagine that there had already existed a load (L1+16)(unaligned) with the same chain operand as the load L1. When (L1+16)(aligned) is created as part of the lowering of L1, this load *is* also the (L1+16)(unaligned) node, just now marked as aligned (because the new alignment overwrites the old). But the original users of (L1+16)(unaligned) now get the data intended for the permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists to get its own permutation-based expansion. This was PR19991. A second potential problem has to do with the MMOs on these loads, which can be used by AA during instruction scheduling to break chain-based dependencies. If the new "aligned" loads get the MMO from the original unaligned load, this does not represent the fact that it will load data from below the original address. Normally, this would not matter, but this load might be combined with another load pair for a previous vector, and then the dependency on the otherwise- ignored lower bytes can matter. To fix both problems, instead of generating the necessary loads using regular ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are provided with MMOs with a conservative address range. Unfortunately, I no longer have a failing test case (since PR19991 was reported, other changes in CodeGen have forced this bug back into hiding it again). Nevertheless, this should fix the underlying problem. llvm-svn: 214481
* [PowerPC] Recognize consecutive memory accesses from intrinsicsHal Finkel2014-08-011-9/+63
| | | | | | | | | | | | | | | When generating unaligned vector loads, we need to search for other loads or stores nearby offset by one vector width. If we find one, then we know that we can safely generate another aligned load at that address. Otherwise, we must generate the next load using an offset of the vector width minus one byte (so we don't read off the end of the allocation if the base unaligned address happened to be aligned at runtime). We had previously done this using only other vector loads and stores, but did not consider the PowerPC-specific vector load/store intrinsics. Now we'll also consider vector intrinsics. By itself, this change is a feature enhancement, but is a necessary step toward fixing the underlying problem behind PR19991. llvm-svn: 214469
* Make sure no loads resulting from load->switch DAGCombine are marked invariantLouis Gerbarg2014-07-311-4/+4
| | | | | | | | | | | | | | Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449
* Add mtpid/mfpid for BookE.Joerg Sonnenberger2014-07-301-0/+3
| | | | llvm-svn: 214363
* Refactor TLBIVAX and add tlbsx.Joerg Sonnenberger2014-07-302-8/+10
| | | | llvm-svn: 214354
* Add rfdi and rfmci from the e500/e500mc ISA.Joerg Sonnenberger2014-07-301-0/+3
| | | | llvm-svn: 214339
* Add BookE's tlbre, tlbwe and tlbivax instructions.Joerg Sonnenberger2014-07-301-0/+15
| | | | llvm-svn: 214332
* Add BookE's wrtee and wrteei instructions.Joerg Sonnenberger2014-07-301-0/+13
| | | | llvm-svn: 214297
* SPRG 0 to 3 are valid outside BookE, so move them to the normal testJoerg Sonnenberger2014-07-302-0/+20
| | | | | | file. Add support for accessing SPRG 4 to 7 on BookE. llvm-svn: 214295
* Add rfci instruction.Joerg Sonnenberger2014-07-291-1/+4
| | | | llvm-svn: 214256
* mbar without argument is equivalent to mbar 0.Joerg Sonnenberger2014-07-291-0/+2
| | | | llvm-svn: 214250
* Recognize BookE's mbar instruction.Joerg Sonnenberger2014-07-292-0/+12
| | | | llvm-svn: 214244
* Fix typo in alias: DSIR -> DSISRJoerg Sonnenberger2014-07-291-2/+2
| | | | llvm-svn: 214238
* Support move to/from segment register.Joerg Sonnenberger2014-07-295-0/+52
| | | | llvm-svn: 214234
* Add a number of aliases for SPR access.Joerg Sonnenberger2014-07-291-0/+27
| | | | llvm-svn: 214196
* Add rfi instruction. Based on feedback by Ulrich Weigand.Joerg Sonnenberger2014-07-291-0/+2
| | | | llvm-svn: 214181
* Fix typos / grammar.Matt Arsenault2014-07-291-3/+3
| | | | llvm-svn: 214147
* [PowerPC] Support ELFv1/ELFv2 ABI selection via featuresUlrich Weigand2014-07-283-4/+28
| | | | | | | | | | | | | | | | | | | | While LLVM now supports both ELFv1 and ELFv2 ABIs, their use is currently hard-coded via the target triple: powerpc64-linux is always ELFv1, while powerpc64le-linux is always ELFv2. These are of course the most common scenarios, but in principle it is possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on little-endian systems (and GCC does support that), and there are some special use cases for that (e.g. certain Linux kernel versions could only be built using ELFv1 on LE). This patch implements the LLVM side of supporting this. As precedent on other platforms suggests, ABI options are passed to the back-end as features. Thus, this patch implements two features "elfv1" and "elfv2" that select the desired ABI if present. (If not, the LLVM uses the same default rules as now.) llvm-svn: 214072
* Add alignment value to allowsUnalignedMemoryAccessMatt Arsenault2014-07-272-6/+8
| | | | | | | | | | Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055
* [PowerPC] Support TLS on PPC32/ELFHal Finkel2014-07-257-62/+211
| | | | | | Patch by Justin Hibbits! llvm-svn: 213960
* [PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl*.Bill Schmidt2014-07-253-48/+101
| | | | | | | | | | | | | | | | | | | | | | Because the PowerPC vmrgh* and vmrgl* instructions have a built-in big-endian bias, it is necessary to swap their inputs in little-endian mode when using them to implement a vector shuffle. This was previously missed in the vector LE implementation. There was already logic to distinguish between unary and "normal" vmrg* vector shuffles, so this patch extends that logic to use a third option: "swapped" vmrg* vector shuffles that are used for little endian in place of the "normal" ones. I've updated the vec-shuffle-le.ll test to check for the expected register ordering on the generated instructions. This bug was discovered when testing the LE and ELFv2 patches for safety if they were backported to 3.4. A different vectorization decision was made in 3.4 than on mainline trunk, and that exposed the problem. I've verified this fix takes care of that issue. llvm-svn: 213915
* Don't use 128bit functions on PPC32.Joerg Sonnenberger2014-07-241-0/+7
| | | | llvm-svn: 213899
* Prune dependency to MC from each target disassembler.NAKAMURA Takumi2014-07-241-1/+1
| | | | llvm-svn: 213856
* Update library dependencies.NAKAMURA Takumi2014-07-241-1/+1
| | | | llvm-svn: 213832
* [PowerPC] ELFv2 aggregate passing supportUlrich Weigand2014-07-213-49/+171
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds infrastructure support for passing array types directly. These can be used by the front-end to pass aggregate types (coerced to an appropriate array type). The details of the array type being used inform the back-end about ABI-relevant properties. Specifically, the array element type encodes: - whether the parameter should be passed in FPRs, VRs, or just GPRs/stack slots (for float / vector / integer element types, respectively) - what the alignment requirements of the parameter are when passed in GPRs/stack slots (8 for float / 16 for vector / the element type size for integer element types) -- this corresponds to the "byval align" field Using the infrastructure provided by this patch, a companion patch to clang will enable two features: - In the ELFv2 ABI, pass (and return) "homogeneous" floating-point or vector aggregates in FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI) - As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates that fit fully in registers without using the "byval" mechanism The patch uses the functionArgumentNeedsConsecutiveRegisters callback to encode that special treatment is required for all directly-passed array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set as a results are then used to implement the required size and alignment rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc. As a related change, the ABI routines have to be modified to support passing floating-point types in GPRs. This is necessary because with homogeneous aggregates of 4-byte float type we can now run out of FPRs *before* we run out of the 64-byte argument save area that is shadowed by GPRs. Any extra floating-point arguments that no longer fit in FPRs must now be passed in GPRs until we run out of those too. Note that there was already code to pass floating-point arguments in GPRs used with vararg parameters, which was done by writing the argument out to the argument save area first and then reloading into GPRs. The patch re-implements this, however, in favor of code packing float arguments directly via extension/truncation, BITCAST, and BUILD_PAIR operations. This is required to support the ELFv2 ABI, since we cannot unconditionally write to the argument save area (which the caller might not have allocated). The change does, however, affect ELFv1 varags routines too; but even here the overall effect should be advantageous: Instead of loading the argument into the FPR, then storing the argument to the stack slot, and finally reloading the argument from the stack slot into a GPR, the new code now just loads the argument into the FPR, and subsequently loads the argument into the GPR (via BITCAST). That BITCAST might imply a save/reload from a stack temporary (in which case we're no worse than before); but it might be implemented more efficiently in some cases. The final part of the patch enables up to 8 FPRs and VRs for argument return in PPCCallingConv.td; this is required to support returning ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs since LLVM wil only look for which register to use if the parameter is marked as "direct" return anyway.) Reviewed by Hal Finkel. llvm-svn: 213493
* [PowerPC] ELFv2 explicit CFI for CR fieldsUlrich Weigand2014-07-211-1/+9
| | | | | | | | | | | | | | | | | | | | This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI would represent a saved CR word (holding CR fields CR2, CR3, and CR4) using just a single CFI record refering to CR2. In ELFv2 instead, each of the CR fields is represented by its own CFI record. The advantage is that the compiler can now chose to save just a single (or two) CR fields instead of all of them, if those are the only ones that actually need saving. That can lead to more efficient code using mf(o)crf instead of the (slow) mfcr instruction. Note that this patch does not (yet) implement this more efficient code generation, but it does implement the part that is required to be ABI compliant: creating multiple CFI records if multiple CR fields are saved. Reviewed by Hal Finkel. llvm-svn: 213492
* [PowerPC] ELFv2 stack space reductionUlrich Weigand2014-07-204-21/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ELFv2 ABI reduces the amount of stack required to implement an ABI-compliant function call in two ways: * the "linkage area" is reduced from 48 bytes to 32 bytes by eliminating two unused doublewords * the 64-byte "parameter save area" is now optional and need not be present in certain cases (it remains mandatory in functions with variable arguments, and functions that have any parameter that is passed on the stack) The following patch implements this required changes: - reducing the linkage area, and associated relocation of the TOC save slot, in getLinkageSize / getTOCSaveOffset (this requires updating all callers of these routines to pass in the isELFv2ABI flag). - (partially) handling the case where the parameter save are is optional This latter part requires some extra explanation: Currently, we still always allocate the parameter save area when *calling* a function. That is certainly always compliant with the ABI, but may cause code to allocate stack unnecessarily. This can be addressed by a follow-on optimization patch. On the *callee* side, in LowerFormalArguments, we *must* track correctly whether the ABI guarantees that the caller has allocated the parameter save area for our use, and the patch does so. However, there is one complication: the code that handles incoming "byval" arguments will currently *always* write to the parameter save area, because it has to force incoming register arguments to the stack since it must return an *address* to implement the byval semantics. To fix this, the patch changes the LowerFormalArguments code to write arguments to a freshly allocated stack slot on the function's own stack frame instead of the argument save area in those cases where that area is not present. Reviewed by Hal Finkel. llvm-svn: 213490
OpenPOWER on IntegriCloud