summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC/PPCISelLowering.h
Commit message (Collapse)AuthorAgeFilesLines
...
* [DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to ↵Craig Topper2018-07-301-1/+1
| | | | | | BuildSDIVPow2. llvm-svn: 338303
* Remove trailing spaceFangrui Song2018-07-301-1/+1
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* DAG: Add calling convention argument to calling convention funcsMatt Arsenault2018-07-281-0/+2
| | | | | | | | This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194
* Fix build failures from r337347, found by clangJustin Hibbits2018-07-181-2/+1
| | | | | | | | | | | * Delete a no-longer-used override, and mark the other getRegisterTypeForCallingConv() as override. * SPE only supports i32, not i64, as the internal type, so simply remove the type check, so that DestReg and Opc are provably always set. GCC 6.4 did not warn about either of the above. llvm-svn: 337350
* Introduce codegen for the Signal Processing EngineJustin Hibbits2018-07-181-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1, e500v2, and several e200 cores. This adds support targeting the e500v2, as this is more common than the e500v1, and is in SoCs still on the market. This patch is very intrusive because the SPE is binary incompatible with the traditional FPU. After discussing with others, the cleanest solution was to make both SPE and FPU features on top of a base PowerPC subset, so all FPU instructions are now wrapped with HasFPU predicates. Supported by this are: * Code generation following the SPE ABI at the LLVM IR level (calling conventions) * Single- and Double-precision math at the level supported by the APU. Still to do: * Vector operations * SPE intrinsics As this changes the Callee-saved register list order, one test, which tests the precise generated code, was updated to account for the new register order. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D44830 llvm-svn: 337347
* [Power9] Ensure float128 in non-homogenous aggregates are passed via VSX regLei Huang2018-07-051-0/+4
| | | | | | | | | | | | | Non-homogenous aggregates are passed in consecutive GPRs, in GPRs and in memory, or in memory. This patch ensures that float128 members of non-homogenous aggregates are passed via VSX registers. This is done via custom lowering a bitcast of a build_pari(i64,i64) to float128 to a new PPCISD node, BUILD_FP128. Differential Revision: https://reviews.llvm.org/D48308 llvm-svn: 336310
* [PowerPC] Unify handling for conversion of FP_TO_INT feeding a storeLei Huang2018-05-081-0/+7
| | | | | | | | | | | | Existing DAG combine only handles conversions for FP_TO_SINT: "{f32, f64} x { i32, i16 }" This patch simplifies the code to handle: "{ FP_TO_SINT, FP_TO_UINT } x { f64, f32 } x { i64, i32, i16, i8 }" Differential Revision: https://reviews.llvm.org/D46102 llvm-svn: 331778
* [PowerPC] Implement isMaskAndCmp0FoldingBeneficialNemanja Ivanovic2018-05-021-0/+1
| | | | | | | | | | | Sinking the and closer to a compare against zero is beneficial on PPC as it allows us to emit record-form instructions. In the future, we may expand this to a larger set of operations that feed compares against zero since PPC has lots of record-form instructions. Differential revision: https://reviews.llvm.org/D46060 llvm-svn: 331416
* Remove \brief commands from doxygen comments.Adrian Prantl2018-05-011-2/+2
| | | | | | | | | | | | | | | | We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
* [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to ↵Craig Topper2018-03-291-1/+1
| | | | | | | | | | | | CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806
* Fix layering by moving ValueTypes.h from CodeGen to IRDavid Blaikie2018-03-231-1/+1
| | | | | | ValueTypes.h is implemented in IR already. llvm-svn: 328397
* Fix layering of MachineValueType.h by moving it from CodeGen to SupportDavid Blaikie2018-03-231-1/+1
| | | | | | | | | This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395
* [PowerPC] Make AddrSpaceCast noopNemanja Ivanovic2018-03-191-0/+5
| | | | | | | | | | | PowerPC targets do not use address spaces. As a result, we can get selection failures with address space casts. This patch makes those casts noops. Patch by Valentin Churavy. Differential revision: https://reviews.llvm.org/D43781 llvm-svn: 327877
* [PowerPC] Zero-extend the compare operand for ATOMIC_CMP_SWAPNemanja Ivanovic2018-01-121-0/+6
| | | | | | | | | | | | Part of the fix for https://bugs.llvm.org/show_bug.cgi?id=35812. This patch ensures that the compare operand for the atomic compare and swap is properly zero-extended to 32 bits if applicable. A follow-up commit will fix the extension for the SETCC node generated when expanding an ATOMIC_CMP_SWAP_WITH_SUCCESS. That will complete the bug fix. Differential Revision: https://reviews.llvm.org/D41856 llvm-svn: 322372
* MachineFunction: Return reference from getFunction(); NFCMatthias Braun2017-12-151-2/+2
| | | | | | The Function can never be nullptr so we can return a reference. llvm-svn: 320884
* TLI: Allow using PSV for intrinsic mem operandsMatt Arsenault2017-12-141-0/+1
| | | | llvm-svn: 320756
* [CodeGen] Print register names in lowercase in both MIR and debug outputFrancis Visoiu Mistrih2017-11-281-8/+8
| | | | | | | | | | | As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187
* Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie2017-11-171-1/+1
| | | | | | | | All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
* [PowerPC] Implement mayBeEmittedAsTailCall for PPCSean Fertile2017-11-151-0/+4
| | | | | | | | | Implements TargetLowering callback 'mayBeEmittedAsTailCall' that enables CodeGenPrepare to duplicate returns when they might enable a tail-call. Differential Revision: https://reviews.llvm.org/D39777 llvm-svn: 318321
* Adds code to PPC ISEL lowering to recognize byte inserts from ↵Graham Yiu2017-11-061-0/+5
| | | | | | | | vector_shuffles, and use P9 shift and vector insert byte instructions instead of vperm. Extends tests from vector insert half-word. Differential Revision: https://reviews.llvm.org/D34497 llvm-svn: 317503
* [PPC] Use xxbrd to speed up bswap64Guozhi Wei2017-11-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Power doesn't have bswap instructions, so llvm generates following code sequence for bswap64. rotldi 5, 3, 16 rotldi 4, 3, 8 rotldi 9, 3, 24 rotldi 10, 3, 32 rotldi 11, 3, 48 rotldi 12, 3, 56 rldimi 4, 5, 8, 48 rldimi 4, 9, 16, 40 rldimi 4, 10, 24, 32 rldimi 4, 11, 40, 16 rldimi 4, 12, 48, 8 rldimi 4, 3, 56, 0 But Power9 has vector bswap instructions, they can also be used to speed up scalar bswap intrinsic. With this patch, bswap64 can be translated to: mtvsrdd 34, 3, 3 xxbrd 34, 34 mfvsrld 3, 34 Differential Revision: https://reviews.llvm.org/D39510 llvm-svn: 317499
* Adds code to PPC ISEL lowering to recognize half-word inserts from ↵Graham Yiu2017-11-011-1/+8
| | | | | | | | vector_shuffles, and use P9 shift and vector insert instructions instead of vperm. Differential Revision: https://reviews.llvm.org/D34160 llvm-svn: 317111
* DAG: Add opcode and source type to isFPExtFreeMatt Arsenault2017-10-131-1/+1
| | | | | | | | This is only currently used for mad/fma transforms. This is the only case where it should be used for AMDGPU, so add an opcode to be sure. llvm-svn: 315740
* [PPC][NFC] Renaming things with 'xxinsert' moniker to 'vecinsert' to make it ↵Tony Jiang2017-09-051-3/+3
| | | | | | | | more general. Commit on behalf of Graham Yiu (gyiu@ca.ibm.com) llvm-svn: 312547
* [DAG] convert vector select-of-constants to logic/mathSanjay Patel2017-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | This goes back to a discussion about IR canonicalization. We'd like to preserve and convert more IR to 'select' than we currently do because that's likely the best choice in IR: http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html ...but that's often not true for codegen, so we need to account for this pattern coming in to the backend and transform it to better DAG ops. Steps in this patch: 1. Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely enable this transform. Other targets will probably want that anyway to distinguish scalars from vectors. We're using that here to exclude AVX512 targets, but it may not be necessary. 2. Convert a vselect to ext+add. This eliminates a constant load/materialization, and the vector ext is often free. Implementing a more general fold using xor+and can be a follow-up for targets that don't have a legal vselect. It's also possible that we can remove the TLI hook for the special case fold implemented here because we're eliminating a constant, but it needs to be tested on other targets. Differential Revision: https://reviews.llvm.org/D36840 llvm-svn: 311731
* Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. ↵Peter Collingbourne2017-07-261-5/+5
| | | | | | | | NFCI. This was a use-after-free waiting to happen. llvm-svn: 309159
* [NFC] test commit.Stefan Pintilie2017-07-261-0/+8
| | | | | | Added a comment to explain how to add a PPCISD node. llvm-svn: 309114
* [SystemZ, LoopStrengthReduce]Jonas Paulsson2017-07-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729
* [PowerPC] Ensure displacements for DQ-Form instructions are multiples of 16Nemanja Ivanovic2017-07-131-1/+1
| | | | | | | | | | | | | As outlined in the PR, we didn't ensure that displacements for DQ-Form instructions are multiples of 16. Since the instruction encoding encodes a quad-word displacement, a sub-16 byte displacement is meaningless and ends up being encoded incorrectly. Fixes https://bugs.llvm.org/show_bug.cgi?id=33671. Differential Revision: https://reviews.llvm.org/D35007 llvm-svn: 307934
* fix formatting; NFCHiroshi Inoue2017-07-101-2/+2
| | | | llvm-svn: 307523
* [PowerPC] NFC : Common up definitions of isIntS16Immediate and update ↵Lei Huang2017-07-071-0/+3
| | | | | | parameter to int16_t llvm-svn: 307442
* [Power9] Exploit vector integer extend instructions when indices aren't correct.Tony Jiang2017-07-051-0/+4
| | | | | | | | | | | | | | | This patch adds on to the exploitation added by https://reviews.llvm.org/D33510. This now catches build vector nodes where the inputs are coming from sign extended vector extract elements where the indices used by the vector extract are not correct. We can still use the new hardware instructions by adding a shuffle to move the elements to the correct indices. I introduced a new PPCISD node here because adding a vector_shuffle and changing the elements of the vector_extracts was getting undone by another DAG combine. Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com) Differential Revision: https://reviews.llvm.org/D34009 llvm-svn: 307169
* [PowerPC] Match vec_revb builtins to P9 instructions.Tony Jiang2017-06-121-0/+21
| | | | | | | | | | | | Power9 has instructions that will reverse the bytes within an element for all sizes (half-word, word, double-word and quad-word). These can be used for the vec_revb builtins in altivec.h. However, we implement these to match vector shuffle nodes as that will cover both the builtins and vector shuffles that occur in the SDAG through other means. Differential Revision: https://reviews.llvm.org/D33690 llvm-svn: 305214
* [Power9] Added support for the modsw, moduw, modsd, modud hardware instructions.Tony Jiang2017-06-121-0/+1
| | | | | | | | | | | Note that if we need the result of both the divide and the modulo then we compute the modulo based on the result of the divide and not using the new hardware instruction. Commit on behalf of STEFAN PINTILIE. Differential Revision: https://reviews.llvm.org/D33940 llvm-svn: 305210
* [PowerPC] Fix a performance bug for PPC::XXPERMDI.Tony Jiang2017-05-311-0/+8
| | | | | | | | | | There are some VectorShuffle Nodes in SDAG which can be selected to XXPERMDI Instruction, this patch recognizes them and does the selection to improve the PPC performance. Differential Revision: https://reviews.llvm.org/D33404 llvm-svn: 304298
* [PowerPC] Fix a performance bug for PPC::XXSLDWI.Tony Jiang2017-05-241-1/+5
| | | | | | | | There are some VectorShuffle Nodes in SDAG which can be selected to XXSLDWI instruction, this patch recognizes them and does the selection to improve the PPC performance. llvm-svn: 303822
* [PPC] Lower load acquire/seq_cst trailing fence to cmp + bne + isync.Tim Shen2017-05-161-0/+1
| | | | | | | | | | | | | | | | | Summary: This fixes pr32392. The lowering pipeline is: llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in expandPostRAPseudo. The reason why expandPostRAPseudo is chosen is because previous passes are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne- 7, .+4 (some branch pass(s)). Differential Revision: https://reviews.llvm.org/D32763 llvm-svn: 303205
* [PPC] Move the combine "a << (b % (sizeof(a) * 8)) -> (PPCshl a, b)" to the ↵Tim Shen2017-05-121-11/+10
| | | | | | | | | | | | | | | | | | | | | | backend. NFC. Summary: Eli pointed out that it's unsafe to combine the shifts to ISD::SHL etc., because those are not defined for b > sizeof(a) * 8, even after some of the combiners run. However, PPCISD::SHL defines that behavior (as the instructions themselves). Move the combination to the backend. The tests in shift_mask.ll still pass. Reviewers: echristo, hfinkel, efriedma, iteratee Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D33076 llvm-svn: 302937
* [Atomic] Remove IsStore/IsLoad in the interface, and pass the instruction ↵Tim Shen2017-05-091-4/+4
| | | | | | | | | | | | | | | | | instead. NFC. Now both emitLeadingFence and emitTrailingFence take the instruction itself, instead of taking IsLoad/IsStore pairs. Instruction::mayReadFromMemory and Instrucion::mayWriteToMemory are used for determining those two booleans. The instruction argument is also useful for later D32763, in emitTrailingFence. For emitLeadingFence, it seems to have cleaner interface with the proposed change. Differential Revision: https://reviews.llvm.org/D32762 llvm-svn: 302539
* [PowerPC, DAGCombiner] Fold a << (b % (sizeof(a) * 8)) back to a single ↵Tim Shen2017-05-031-0/+8
| | | | | | | | | | | | | | | | instruction Summary: This is the corresponding llvm change to D28037 to ensure no performance regression. Reviewers: bogner, kbarton, hfinkel, iteratee, echristo Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D28329 llvm-svn: 301990
* [SelectionDAG] Use KnownBits struct in DAG's computeKnownBits and ↵Craig Topper2017-04-281-2/+1
| | | | | | | | | | | | simplifyDemandedBits This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently. This is largely a mechanical transformation from KnownZero to Known.Zero. Differential Revision: https://reviews.llvm.org/D32569 llvm-svn: 301620
* [DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to ↵Sanjay Patel2017-04-051-0/+4
| | | | | | | | | | | | | | | | | bitwise logic+setcc (PR32401) This is a generic combine enabled via target hook to reduce icmp logic as discussed in: https://bugs.llvm.org/show_bug.cgi?id=32401 It's likely that other targets will want to enable this hook for scalar transforms, and there are probably other patterns that can use bitwise logic to reduce comparisons. Note that we are missing an IR canonicalization for these patterns, and we will probably prefer the pair-of-compares form in IR (shorter, more likely to fold). Differential Revision: https://reviews.llvm.org/D31483 llvm-svn: 299542
* [DAGCombiner] Add vector demanded elements support to ↵Simon Pilgrim2017-03-311-0/+1
| | | | | | | | | | computeKnownBitsForTargetNode Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes. Differential Revision: https://reviews.llvm.org/D31249 llvm-svn: 299201
* [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)Sanjay Patel2017-03-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977
* [PowerPC] Fix some Clang-tidy modernize and Include What You Use warnings; ↵Eugene Zelenko2017-01-131-23/+39
| | | | | | other minor fixes (NFC). llvm-svn: 291872
* [PowerPC] Implement missing ISA 2.06 instructions.Tony Jiang2017-01-051-1/+1
| | | | | | | Instructions: fctidu[.], fctiwu[.], ftdiv, ftsqrt are not implemented. Implement them and add corresponding test cases in this patch. llvm-svn: 291116
* [PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSRGuozhi Wei2016-12-121-0/+2
| | | | | | | | | Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX. This patch fixes pr31144. Differential Revision: https://reviews.llvm.org/D27287 llvm-svn: 289473
* [PowerPC] Improvements for BUILD_VECTOR Vol. 2Nemanja Ivanovic2016-11-291-0/+4
| | | | | | | | | | | | | | | This patch corresponds to review: https://reviews.llvm.org/D25980 This is the 2nd patch in a series of 4 that improve the lowering and combining for BUILD_VECTOR nodes on PowerPC. This particular patch combines a build vector of fp-to-int conversions into an fp-to-int conversion of a build vector of fp values. For example: Converts (build_vector (fp_to_[su]i $A), (fp_to_[su]i $B), ...) Into (fp_to_[su]i (build_vector $A, $B, ...))). Which is a natural match for much cleaner code. llvm-svn: 288218
* [PPC][DAGCombine] Convert SETCC to subtract when the result is zero extendedEhsan Amiri2016-11-181-0/+5
| | | | | | | | | | | | | | | | | When we see a SETCC whose only users are zero extend operations, we can replace it with a subtraction. This results in doing all calculations in GPRs and avoids CR use. Currently we do this only for ULT, ULE, UGT and UGE condition codes. There are ways that this can be extended. For example for signed condition codes. In that case we will be introducing additional sign extend instructions, so more careful profitability analysis may be required. Another direction to extend this is for equal, not equal conditions. Also when users of SETCC are any_ext or sign_ext, we might be able to do something similar. llvm-svn: 287329
* Always use relative jump table encodings on PowerPC64.Joerg Sonnenberger2016-11-161-0/+9
| | | | | | | | | | | | | | | | | For the default, small and medium code model, use the existing difference from the jump table towards the label. For all other code models, setup the picbase and use the difference between the picbase and the block address. Overall, this results in smaller data tables at the expensive of one or two more arithmetic operation at the jump site. Given that we only create jump tables with a lot more than two entries, it is a net win in size. For larger code models the assumption remains that individual functions are no larger than 2GB. Differential Revision: https://reviews.llvm.org/D26336 llvm-svn: 287059
OpenPOWER on IntegriCloud