summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
* Add MachineFunctionProperty checks for AllVRegsAllocated for target passesDerek Schuff2016-04-042-2/+10
| | | | | | | | | | | | | | Summary: This adds the same checks that were added in r264593 to all target-specific passes that run after register allocation. Reviewers: qcolombet Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18525 llvm-svn: 265313
* [PPC64] Bug fix: when enabling sibling-call-opt and shrink-wrapping, the ↵Chuang-Yu Cheng2016-04-012-26/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tail call branch instruction might disappear Bug Pattern: # BB#0: # %entry cmpldi 3, 0 beq- 0, .LBB0_2 # BB#1: # %exit lwz 4, 0(3) #TC_RETURNd8 LVComputationKind 0 .LBB0_2: # %cond.false mflr 0 std 0, 16(1) stdu 1, -96(1) .Ltmp0: .cfi_def_cfa_offset 96 .Ltmp1: .cfi_offset lr, 16 bl __assert_fail nop The branch instruction for tail call return is not generated, because the shrink-wrapping pass choosing a new Restore Point: %cond.false, so %exit block is not sent to emitEpilogue, that's why the branch is not generated. Thanks Kit's opinions! Reviewers: nemanjai hfinkel tjablin kbarton http://reviews.llvm.org/D17606 llvm-svn: 265112
* [PowerPC] Add a late MI-level pass for QPX load/splat simplificationHal Finkel2016-03-315-4/+170
| | | | | | | | | | | | | | | Chapter 3 of the QPX manual states that, "Scalar floating-point load instructions, defined in the Power ISA, cause a replication of the source data across all elements of the target register." Thus, if we have a load followed by a QPX splat (from the first lane), the splat is redundant. This adds a late MI-level pass to remove the redundant splats in some of these cases (specifically when both occur in the same basic block). This optimization is scheduled just prior to post-RA scheduling. It can't happen before anything that might replace the load with some already-computed quantity (i.e. store-to-load forwarding). llvm-svn: 265047
* Change eliminateCallFramePseudoInstr() to return an iteratorHans Wennborg2016-03-312-5/+5
| | | | | | | | | | | | | | | | | | | | | This will become necessary in a subsequent change to make this method merge adjacent stack adjustments, i.e. it might erase the previous and/or next instruction. It also greatly simplifies the calls to this function from Prolog- EpilogInserter. Previously, that had a bunch of logic to resume iteration after the call; now it just continues with the returned iterator. Note that this changes the behaviour of PEI a little. Previously, it attempted to re-visit the new instruction created by eliminateCallFramePseudoInstr(). That code was added in r36625, but I can't see any reason for it: the new instructions will obviously not be pseudo instructions, they will not have FrameIndex operands, and we have already accounted for the stack adjustment. Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265036
* [PPC] basic support for Power 9 direct move instructionsEhsan Amiri2016-03-311-2/+17
| | | | | | | | http://reviews.llvm.org/D18097 Initial support does not include any patterns to generate this instructions llvm-svn: 265031
* [PowerPC] Correctly compute 64-bit offsets in fast iselUlrich Weigand2016-03-311-6/+5
| | | | | | | | | | | | | | | | | | | | | PPCSimplifyAddress contains this code: IntegerType *OffsetTy = ((VT == MVT::i32) ? Type::getInt32Ty(*Context) : Type::getInt64Ty(*Context)); to determine the type to be used for an index register, if one needs to be created. However, the "VT" here is the type of the data being loaded or stored, *not* the type of an address. This means that if a data element of type i32 is accessed using an index that does not not fit into 32 bits, a wrong address is computed here. Note that PPCFastISel is only ever used on 64-bit currently, so the type of an address is actually *always* MVT::i64. Other parts of the code, even in this same PPCSimplifyAddress routine, already rely on that fact. Thus, this patch changes the code to simply unconditionally use Type::getInt64Ty(*Context) as OffsetTy. llvm-svn: 265023
* [PowerPC] Basic support for P9 atomic loads and storesNemanja Ivanovic2016-03-317-0/+66
| | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D18032 This patch provides asm implementation for the following instructions: lwat, ldat, stwat, stdat, ldmx, mcrxrx llvm-svn: 265022
* [PowerPC] Remove incorrect use of COPY_TO_REGCLASS in fast iselUlrich Weigand2016-03-313-20/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fast isel pass currently emits a COPY_TO_REGCLASS node to convert from a F4RC to a F8RC register class during conversion of a floating-point number to integer. There is actually no support in the common code instruction printers to emit COPY_TO_REGCLASS nodes, so the PowerPC back-end has special code there to simply ignore COPY_TO_REGCLASS. This is correct *if and only if* the source and destination registers of COPY_TO_REGCLASS are the same (except for the different register class). But nothing guarantees this to be the case, and if the register allocator does end up allocating source and destination to different registers after all, the back-end simply generates incorrect code. I've included a test case that shows such incorrect code generation. However, it seems that COPY_TO_REGCLASS is actually not intended to be used at the MI layer at all. It is used during SelectionDAG, but always lowered to a plain COPY before emitting MI. Other back-end's fast isel passes never emit COPY_TO_REGCLASS at all. I suspect it is simply wrong for the PowerPC back-end to emit it here. This patch changes the PowerPC back-end to directly emit COPY instead of COPY_TO_REGCLASS and removes the special handling in the instruction printers. Differential Revision: http://reviews.llvm.org/D18605 llvm-svn: 265020
* [PowerPC] Load two floats directly instead of using one 64-bit integer loadHal Finkel2016-03-311-0/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When dealing with complex<float>, and similar structures with two single-precision floating-point numbers, especially when such things are being passed around by value, we'll sometimes end up loading both float values by extracting them from one 64-bit integer load. It looks like this: t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64 t16: i64 = srl t13, Constant:i32<32> t17: i32 = truncate t16 t18: f32 = bitcast t17 t19: i32 = truncate t13 t20: f32 = bitcast t19 The problem, especially before the P8 where those bitcasts aren't legal (and get expanded via the stack), is that it would have been better to use two floating-point loads directly. Here we add a target-specific DAGCombine to do just that. In short, we turn: ld 3, 0(5) stw 3, -8(1) rldicl 3, 3, 32, 32 stw 3, -4(1) lfs 3, -4(1) lfs 0, -8(1) into: lfs 3, 4(5) lfs 0, 0(5) llvm-svn: 264988
* Remove HasFnAttribute guards to getFnAttribute callsNirav Dave2016-03-301-2/+1
| | | | | | | | | | | | These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872
* [PPC] Remove -ppc-loop-prefetch-distance in favor of -prefetch-distanceAdam Nemet2016-03-291-7/+5
| | | | | | | After the previous change, this can now be overridden centrally in the pass. llvm-svn: 264807
* [PowerPC] Refactor popcnt[dw] target featuresHal Finkel2016-03-295-18/+28
| | | | | | | | | Instead of using two feature bits, one to indicate the availability of the popcnt[dw] instructions, and another to indicate whether or not they're fast, use a single enum. This allows more consistent control via target attribute strings, and via Clang's command line. llvm-svn: 264690
* [PowerPC] Clarify a comment in PPCTTI about vector loadsHal Finkel2016-03-281-1/+1
| | | | | | | This should say that we could do unaligned vector loads on the P7 using VSX instructions, not that we should. llvm-svn: 264683
* [PowerPC] On the A2, popcnt[dw] are very slowHal Finkel2016-03-285-6/+16
| | | | | | | | | | | | | | | | The A2 cores support the popcntw/popcntd instructions, but they're microcoded, and slower than our default software emulation. Specifically, popcnt[dw] take approximately 74 cycles, whereas our software emulation takes only 24-28 cycles. I've added a new target feature to indicate a slow popcnt[dw], instead of just removing the existing target feature from the a2/a2q processor models, because: 1. This allows us to return more accurate information via the TTI interface (I recognize that this currently makes no practical difference) 2. Is hopefully easier to understand (it allows the core's features to match its manual while still having the desired effect). llvm-svn: 264600
* [Power9] Implement new altivec instructions: bcd* seriesChuang-Yu Cheng2016-03-283-0/+126
| | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the following altivec instructions: - Decimal Convert From/to National/Zoned/Signed-QWord: bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq. - Decimal Copy-Sign/Set-Sign: bcdcpsgn. bcdsetsgn. - Decimal Shift/Unsigned-Shift/Shift-and-Round: bcds. bcdus. bcdsr. - Decimal (Unsigned) Truncate: bcdtrunc. bcdutrunc. Total 13 instructions Thanks Amehsan's advice! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D17838 llvm-svn: 264568
* [Power9] Implement new vsx instructions: insert, extract, test data class, ↵Chuang-Yu Cheng2016-03-287-0/+345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | min/max, reverse, permute, splat This change implements the following vsx instructions: - Scalar Insert/Extract xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp - Vector Insert/Extract xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp xxextractuw xxinsertw - Scalar/Vector Test Data Class xststdcdp xststdcsp xststdcqp xvtstdcdp xvtstdcsp - Maximum/Minimum xsmaxcdp xsmaxjdp xsmincdp xsminjdp - Vector Byte-Reverse/Permute/Splat xxbrd xxbrh xxbrq xxbrw xxperm xxpermr xxspltib 30 instructions Thanks Nemanja for invaluable discussion! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16842 llvm-svn: 264567
* [Power9] Implement new vsx instructions: quad-precision move, fp-arithmeticChuang-Yu Cheng2016-03-282-0/+171
| | | | | | | | | | | | | | | | | | | | This change implements the following vsx instructions: - quad-precision move xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp - quad-precision fp-arithmetic xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o) xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o) 22 instructions Thanks Nemanja and Kit for careful review and invaluable discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16110 llvm-svn: 264565
* [PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based ↵Hal Finkel2016-03-271-3/+13
| | | | | | | | | | | | | | | | | loop legality Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc function calls (fmax[f], etc.) generally map to function calls after lowering. For some vector types with QPX at least, however, we can legally lower these, and we don't need to prohibit CTR-based loops on their account. It turned out, however, that the logic that checked the opcodes associated with intrinsics was broken (it would set the Opcode variable, but that variable was later checked only if set for some otherwise-external function call. This fixes the latter problem and adds the FMAX/MINNUM mappings. llvm-svn: 264532
* [PowerPC] Disable the CTR optimization in the presence of {min,max}numDavid Majnemer2016-03-261-0/+2
| | | | | | | | | The minnum and maxnum intrinsics get lowered to libcalls which invalidates the CTR optimization. This fixes PR27083. llvm-svn: 264508
* [Power9] Implement new altivec instructions: permute, count zero, extend ↵Chuang-Yu Cheng2016-03-263-0/+181
| | | | | | | | | | | | | | | | | | | | | sign, negate, parity, shift/rotate, mul10 This change implements the following vector operations: - vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw - vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d - vnegd vnegw - vprtybd vprtybq vprtybw - vbpermd vpermr - vrlwnm vrlwmi vrldnm vrldmi vslv vsrv - vmul10cuq vmul10uq vmul10ecuq vmul10euq 28 instructions Thanks Nemanja, Kit for invaluable hints and discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan Phabricator: http://reviews.llvm.org/D15887 llvm-svn: 264504
* Finish the incomplete 'd' inline asm constraint support for PPC byEric Christopher2016-03-241-0/+5
| | | | | | making sure we give it a register and mark it as a register constraint. llvm-svn: 264340
* [PowerPC] Disable direct moves for extractelement and bitcast in 32-bit modeNemanja Ivanovic2016-03-241-2/+2
| | | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D17711 It disables direct moves on these operations in 32-bit mode since the patterns assume 64-bit registers. The final patch is slightly different from the Phabricator review as the bitcast operations needed to be disabled in 32-bit mode as well. This fixes PR26617. llvm-svn: 264282
* Codegen: [PPC] Word Rotates are Zero Extending.Kyle Butt2016-03-231-1/+8
| | | | | | | Add Word rotates to the list of instructions that are zero extending. This allows them to be used in dot form to compare with zero. llvm-svn: 264183
* adding another optimization opportunity to readme fileEhsan Amiri2016-03-181-0/+11
| | | | llvm-svn: 263775
* [PPC, FastISel] Fix ordered/unordered fcmpTim Shen2016-03-171-7/+23
| | | | | | | | | | | | | | | | | | | | | For fcmp, major concern about the following 6 cases is NaN result. The comparison result consists of 4 bits, indicating lt, eq, gt and un (unordered), only one of which will be set. The result is generated by fcmpu instruction. However, bc instruction only inspects one of the first 3 bits, so when un is set, bc instruction may jump to to an undesired place. More specifically, if we expect an unordered comparison and un is set, we expect to always go to true branch; in such case UEQ, UGT and ULT still give false, which are undesired; but UNE, UGE, ULE happen to give true, since they are tested by inspecting !eq, !lt, !gt, respectively. Similarly, for ordered comparison, when un is set, we always expect the result to be false. In such case OGT, OLT and OEQ is good, since they are actually testing GT, LT, and EQ respectively, which are false. OGE, OLE and ONE are tested through !lt, !gt and !eq, and these are true. llvm-svn: 263753
* [PowerPC] Disable CTR loops optimization for soft float operationsPetar Jovanovic2016-03-171-0/+19
| | | | | | | | | | | | This patch prevents CTR loops optimization when using soft float operations inside loop body. Soft float operations use function calls, but function calls are not allowed inside CTR optimized loops. Patch by Aleksandar Beserminji. Differential Revision: http://reviews.llvm.org/D17600 llvm-svn: 263727
* Tweak some atomics functions in preparation for larger changes; NFC.James Y Knight2016-03-162-1/+4
| | | | | | | | | | | | | | | | - Rename getATOMIC to getSYNC, as llvm will soon be able to emit both '__sync' libcalls and '__atomic' libcalls, and this function is for the '__sync' ones. - getInsertFencesForAtomic() has been replaced with shouldInsertFencesForAtomic(Instruction), so that the decision can be made per-instruction. This functionality will be used soon. - emitLeadingFence/emitTrailingFence are no longer called if shouldInsertFencesForAtomic returns false, and thus don't need to check the condition themselves. llvm-svn: 263665
* [DAG] use !isUndef() ; NFCISanjay Patel2016-03-141-1/+1
| | | | llvm-svn: 263453
* [DAG] use isUndef() ; NFCISanjay Patel2016-03-141-8/+8
| | | | llvm-svn: 263448
* Fix for PR 26378Nemanja Ivanovic2016-03-121-0/+6
| | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D17712 We were not clearing the TOC vector in PPCAsmPrinter when initializing it. This caused duplicate definition asserts when the pass is reused on the module (i.e. with -compile-twice or in JIT contexts). llvm-svn: 263338
* [PPC] backend changes to generate xvabs[s,d]p and xvnabs[s,d]p instructionsKit Barton2016-03-091-0/+2
| | | | | | | This has to be committed before the FE changes Phabricator: http://reviews.llvm.org/D17837 llvm-svn: 263035
* [Power9] Implement new vsx instructions: load, store instructions for vector ↵Kit Barton2016-03-087-0/+214
| | | | | | | | | | | | | | | | | | | | and scalar We follow the comments mentioned in http://reviews.llvm.org/D16842#344378 to implement this new patch. This patch implements the following vsx instructions: Vector load/store: lxv lxvx lxvb16x lxvl lxvll lxvh8x lxvwsx stxv stxvb16x stxvh8x stxvl stxvll stxvx Scalar load/store: lxsd lxssp lxsibzx lxsihzx stxsd stxssp stxsibx stxsihx 21 instructions Phabricator: http://reviews.llvm.org/D16919 llvm-svn: 262906
* A couple more UB fixes for C++14 sized deallocation.Richard Smith2016-03-081-0/+4
| | | | llvm-svn: 262891
* [PPCVSXFMAMutate] Temporarily disable this passTim Shen2016-03-031-2/+8
| | | | llvm-svn: 262573
* [Power9] Implement new vector compare, extract, insert instructionsKit Barton2016-03-012-0/+96
| | | | | | | | | | | | | | | | | | This change implements the following vector operations: - Vector Compare Not Equal - vcmpneb(.) vcmpneh(.) vcmpnew(.) - vcmpnezb(.) vcmpnezh(.) vcmpnezw(.) - Vector Extract Unsigned - vextractub vextractuh vextractuw vextractd - vextublx vextubrx vextuhlx vextuhrx vextuwlx vextuwrx - Vector Insert - vinsertb vinserth vinsertw vinsertd 26 instructions. Phabricator: http://reviews.llvm.org/D15916 llvm-svn: 262392
* New file to track implementation status of new POWER9 instructionsKit Barton2016-03-011-0/+442
| | | | llvm-svn: 262386
* TableGen: Check scheduling models for completenessMatthias Braun2016-03-017-0/+14
| | | | | | | | | | | | | | | | | | | | | | TableGen checks at compiletime that for scheduling models with "CompleteModel = 1" one of the following holds: - Is marked with the hasNoSchedulingInfo flag - The instruction is a subclass of Sched - There are InstRW definitions in the scheduling model Typical steps necessary to complete a model: - Ensure all pseudo instructions that are expanded before machine scheduling (usually everything handled with EmitYYY() functions in XXXTargetLowering). - If a CPU does not support some instructions mark the corresponding resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }". - Add missing scheduling information. Differential Revision: http://reviews.llvm.org/D17747 llvm-svn: 262384
* Fix for PR26180Nemanja Ivanovic2016-02-293-6/+6
| | | | | | | | | | Corresponds to Phabricator review: http://reviews.llvm.org/D16592 This fix includes both an update to how we handle the "generic" CPU on LE systems as well as Anton's fix for the Fast Isel issue. llvm-svn: 262233
* CodeGen: Change MachineInstr to use MachineInstr&, NFCDuncan P. N. Exon Smith2016-02-271-3/+3
| | | | | | | | Change MachineInstr API to prefer MachineInstr& over MachineInstr* whenever the parameter is expected to be non-null. Slowly inching toward being able to fix PR26753. llvm-svn: 262149
* CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFCDuncan P. N. Exon Smith2016-02-271-2/+2
| | | | | | | | | | | | | | Take MachineInstr by reference instead of by pointer in SlotIndexes and the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are never null, so this cleans up the API a bit. It also incidentally removes a few implicit conversions from MachineInstrBundleIterator to MachineInstr* (see PR26753). At a couple of call sites it was convenient to convert to a range-based for loop over MachineBasicBlock::instr_begin/instr_end, so I added MachineBasicBlock::instrs. llvm-svn: 262115
* [PPC] Legalize FNEG on PPC when possibleKit Barton2016-02-261-0/+3
| | | | | | | | Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has native support for this opcode Phabricator: http://reviews.llvm.org/D17647 llvm-svn: 262079
* Power9] Implement new vsx instructions: compare and conversionKit Barton2016-02-266-0/+257
| | | | | | | | | | | | | | | | | | | This change implements the following vsx instructions: Quad/Double-Precision Compare: xscmpoqp xscmpuqp xscmpexpdp xscmpexpqp xscmpeqdp xscmpgedp xscmpgtdp xscmpnedp xvcmpnedp(.) xvcmpnesp(.) Quad-Precision Floating-Point Conversion xscvqpdp(o) xscvdpqp xscvqpsdz xscvqpswz xscvqpudz xscvqpuwz xscvsdqp xscvudqp xscvdphp xscvhpdp xvcvhpsp xvcvsphp xsrqpi xsrqpix xsrqpxp 28 instructions Phabricator: http://reviews.llvm.org/D16709 llvm-svn: 262068
* Silencing a signed vs unsigned mismatch.Aaron Ballman2016-02-231-1/+1
| | | | llvm-svn: 261640
* CodeGen: TII: Take MachineInstr& in predicate API, NFCDuncan P. N. Exon Smith2016-02-232-69/+66
| | | | | | | | | | | | | Change TargetInstrInfo API to take `MachineInstr&` instead of `MachineInstr*` in the functions related to predicated instructions (I'll try to come back later and get some of the rest). All of these functions require non-null parameters already, so references are more clear. As a bonus, this happens to factor away a host of implicit iterator => pointer conversions. No functionality change intended. llvm-svn: 261605
* Fix for PR26690 take 2Nemanja Ivanovic2016-02-221-1/+1
| | | | | | | | This is what was meant to be in the initial commit to fix this bug. The parens were missing. This commit also adds a test case for the bug and has undergone full testing on PPC and X86. llvm-svn: 261546
* Revert bad fix for PR26690.Nemanja Ivanovic2016-02-221-1/+1
| | | | llvm-svn: 261527
* Fix for PR26690Nemanja Ivanovic2016-02-221-1/+1
| | | | | | | I mistook BitVector::empty() to mean BitVector::count() == 0 and it does not. Corrected the issue with the fix for PR26500. llvm-svn: 261525
* Fix some abuse of auto flagged by clang's -Wrange-loop-analysis.Benjamin Kramer2016-02-221-5/+5
| | | | llvm-svn: 261524
* Fix the build bot break caused by rL261441.Nemanja Ivanovic2016-02-201-5/+11
| | | | | | | | The patch has a necessary call to a function inside an assert. Which is fine when you have asserts turned on. Not so much when they're off. Sorry about the regression. llvm-svn: 261447
* Fix for PR 26500Nemanja Ivanovic2016-02-202-52/+182
| | | | | | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D17294 It ensures that whatever block we are emitting the prologue/epilogue into, we have the necessary scratch registers. It takes away the hard-coded register numbers for use as scratch registers as registers that are guaranteed to be available in the function prologue/epilogue are not guaranteed to be available within the function body. Since we shrink-wrap, the prologue/epilogue may end up in the function body. llvm-svn: 261441
OpenPOWER on IntegriCloud