summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
...
* [PowerPC] Combine ADD to ADDZEQingShan Zhang2018-09-072-0/+98
| | | | | | | | | | | | | | | | | | | | | | | | | | On the ppc64le platform, if ir has the following form, define i64 @addze1(i64 %x, i64 %z) local_unnamed_addr #0 { entry: %cmp = icmp ne i64 %z, CONSTANT (-32767 <= CONSTANT <= 32768) %conv1 = zext i1 %cmp to i64 %add = add nsw i64 %conv1, %x ret i64 %add } we can optimize it to the form below. when C == 0 --> addze X, (addic Z, -1)) / add X, (zext(setne Z, C))-- \ when -32768 <= -C <= 32767 && C != 0 --> addze X, (addic (addi Z, -C), -1) Patch By: HLJ2009 (Li Jia He) Differential Revision: https://reviews.llvm.org/D51403 Reviewed By: Nemanjai llvm-svn: 341634
* [PowerPC] Add Itineraries of IIC_IntRotateDI for P7/P8QingShan Zhang2018-09-032-0/+8
| | | | | | | | | | | | | | | | | | When doing some instruction scheduling work, we noticed some missing itineraries. Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling, because we can still get same latency due to default values. With machine scheduler, however, itineraries will have impact to scheduling. eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class. And most of the instruction class with itineraries will have NumMicroOps default to 1. This will has impact on the count of RetiredMOps, affects the Pending/Available Queue, then causing different scheduling or suboptimal scheduling further. Patch by jsji (Jinsong Ji) Differential Revision: https://reviews.llvm.org/D51506 llvm-svn: 341293
* [PPC] Remove Darwin support from POWER backend.Kit Barton2018-08-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch issues an error message if Darwin ABI is attempted with the PPC backend. It also cleans up existing test cases, either converting the test to use an alternative triple or removing the test if the coverage is no longer needed. Updated Tests ------------- The majority of test cases were updated to use a different triple that does not include the Darwin ABI. Many tests were also updated to use FileCheck, in place of grep. Deleted Tests ------------- llvm/test/tools/dsymutil/PowerPC/sibling.test was originally added to test specific functionality of dsymutil using an object file created with an old version of llvm-gcc for a Powerbook G4. After a discussion with @JDevlieghere he suggested removing the test. llvm/test/CodeGen/PowerPC/combine_loads_from_build_pair.ll was converted from a PPC test to a SystemZ test, as the behavior is also reproducible there. All other tests that were deleted were specific to the darwin/ppc ABI and no longer necessary. Phabricator Review: https://reviews.llvm.org/D50988 llvm-svn: 340795
* [PowerPC][MC] Support expressions in getMemRIX16Encoding.Sean Fertile2018-08-271-3/+9
| | | | | | | | | | Loosens an assert in getMemRIX16Encoding that restricts DQ-form instructions to using an immediate, so that we can assemble instructions like lxv/stxv where the offset is an expression. Differential Revision: https://reviews.llvm.org/D51122 llvm-svn: 340761
* fix comment typoNico Weber2018-08-271-1/+1
| | | | llvm-svn: 340744
* [PowerPC] Revert commit r339779Nemanja Ivanovic2018-08-274-9/+16
| | | | | | | This commit has caused failures in some internal benchmarks. Temporarily reverting this patch until the issue can be diagnosed and fixed. llvm-svn: 340740
* [PowerPC] Recommit r340016 after fixing the reported issueNemanja Ivanovic2018-08-274-3/+38
| | | | | | | | The internal benchmark failure reported by Google was due to a missing check for the result type for the sign-extend and shift DAG. This commit adds the check and re-commits the patch. llvm-svn: 340734
* [PowerPC] Emit xscpsgndp instead of xxlor when copying floating point scalar ↵Stefan Pintilie2018-08-241-1/+1
| | | | | | | | | | | | | | | registers for P9 This patch will address using the xscpsgndp instruction to copy floating point scalar registers instead of the xxlor (specifically XXLORf) instruction that is currently used. Additionally, this patch of utilizing xscpsgndp will apply to P9, while pre-P9 will still use xxlor. Patch by amyk Differential Revision: https://reviews.llvm.org/D50004 llvm-svn: 340643
* Temporarily Revert "[PowerPC] Generate Power9 extswsli extend sign and shift ↵Eric Christopher2018-08-214-38/+3
| | | | | | | | immediate instruction" due to it causing a compiler crash on valid. This reverts commit r340016, testcase forthcoming. llvm-svn: 340315
* [PowerPC] Add a peephole post RA to transform the inst that fed by addQingShan Zhang2018-08-202-50/+353
| | | | | | | | | | | | | If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e. y = add imm, reg LFDX. 0, y --> LFD imm(reg) Reviewers: Nemanjai Differential Revision: https://reviews.llvm.org/D49007 llvm-svn: 340149
* [PowerPC] Generate lxsd instead of the ld->mtvsrd sequence for vector loadsStefan Pintilie2018-08-171-0/+29
| | | | | | | | | | | | | | | | | | This patch addresses: - Implementation within PPCISelLowering.cpp to check if we should use direct load into vector instructions (such as lxsd/lfd ) when the scalar_to_vector function is used; which will allow us to catch as many cases of the scalar_to_vector uses as possible to translate the ld->mtvsrd sequence into lxsd. - Test cases to exhibit the behaviour of emitting lxsd/lfd. Patch by amyk Differential revision: https://reviews.llvm.org/D49698 llvm-svn: 340037
* [PowerPC] Generate Power9 extswsli extend sign and shift immediate instructionNemanja Ivanovic2018-08-174-3/+38
| | | | | | | | | | | Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli extend sign and shift immediate instruction. Patch by RolandF. Differential revision: https://reviews.llvm.org/D49879 llvm-svn: 340016
* [MI] Change the array of `MachineMemOperand` pointers to beChandler Carruth2018-08-161-24/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a generically extensible collection of extra info attached to a `MachineInstr`. The primary change here is cleaning up the APIs used for setting and manipulating the `MachineMemOperand` pointer arrays so chat we can change how they are allocated. Then we introduce an extra info object that using the trailing object pattern to attach some number of MMOs but also other extra info. The design of this is specifically so that this extra info has a fixed necessary cost (the header tracking what extra info is included) and everything else can be tail allocated. This pattern works especially well with a `BumpPtrAllocator` which we use here. I've also added the basic scaffolding for putting interesting pointers into this, namely pre- and post-instruction symbols. These aren't used anywhere yet, they're just there to ensure I've actually gotten the data structure types correct. I'll flesh out support for these in a subsequent patch (MIR dumping, parsing, the works). Finally, I've included an optimization where we store any single pointer inline in the `MachineInstr` to avoid the allocation overhead. This is expected to be the overwhelmingly most common case and so should avoid any memory usage growth due to slightly less clever / dense allocation when dealing with >1 MMO. This did require several ergonomic improvements to the `PointerSumType` to reasonably support the various usage models. This also has a side effect of freeing up 8 bits within the `MachineInstr` which could be repurposed for something else. The suggested direction here came largely from Hal Finkel. I hope it was worth it. ;] It does hopefully clear a path for subsequent extensions w/o nearly as much leg work. Lots of thanks to Reid and Justin for careful reviews and ideas about how to do all of this. Differential Revision: https://reviews.llvm.org/D50701 llvm-svn: 339940
* [PowerPC] Enhance the selection(ISD::VSELECT) of vector typeNemanja Ivanovic2018-08-154-16/+9
| | | | | | | | | | | | | | | To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding. Use xxsel to match vselect if vsx is open, or use vsel. In order to do not write many patterns in td file, promote (for vector it's bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into vsel or xxsel. Patch by wuzish Differential revision: https://reviews.llvm.org/D49531 llvm-svn: 339779
* [PowerPC] Don't run BV DAG Combine before legalization if it assumes legal typesNemanja Ivanovic2018-08-151-3/+10
| | | | | | | | | | | | | | When trying to combine a DAG that builds a vector out of sign-extensions of vector extracts, the code assumes legal input types. Due to that, we have to disable this combine prior to legalization. In some cases, the DAG will look slightly different after legalization so account for that in the matching code. This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087 Differential Revision: https://reviews.llvm.org/D49080 llvm-svn: 339769
* [SDAG] Remove the reliance on MI's allocation strategy forChandler Carruth2018-08-141-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array. Prior to this change, the memory management was deeply confusing here -- The way the MI was built relied on the `SelectionDAG` allocating memory for these arrays of pointers using the `MachineFunction`'s allocator so that the raw pointer to the array could be blindly copied into an eventual `MachineInstr`. This creates a hard coupling between how `MachineInstr`s allocate their array of `MachineMemOperand` pointers and how the `MachineSDNode` does. This change is motivated in large part by a change I am making to how `MachineFunction` allocates these pointers, but it seems like a layering improvement as well. This would run the risk of increasing allocations overall, but I've implemented an optimization that should avoid that by storing a single `MachineMemOperand` pointer directly instead of allocating anything. This is expected to be a net win because the vast majority of uses of these only need a single pointer. As a side-effect, this makes the API for updating a `MachineSDNode` and a `MachineInstr` reasonably different which seems nice to avoid unexpected coupling of these two layers. We can map between them, but we shouldn't be *surprised* at where that occurs. =] Differential Revision: https://reviews.llvm.org/D50680 llvm-svn: 339740
* [PowerPC] Improve codegen for vector loads using scalar_to_vectorZaara Syeda2018-08-083-20/+83
| | | | | | | | | | | | | | | | This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize: LXSD and LXSDX for i64 and f64 LXSIWAX for i32 (sign extension to i64) LXSIWZX for i32 and f64 Committing on behalf of Amy Kwan. Differential Revision: https://reviews.llvm.org/D48950 llvm-svn: 339260
* [PowerPC] Do not round values prior to converting to integerNemanja Ivanovic2018-08-022-3/+105
| | | | | | | | | | | | | | | Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector loses precision. This patch removes the code that adds these nodes to true f64 operands. It also adds patterns required to ensure the code is still vectorized rather than converting individual elements and inserting into a vector. Fixes https://bugs.llvm.org/show_bug.cgi?id=38342 Differential Revision: https://reviews.llvm.org/D50121 llvm-svn: 338658
* [DAGCombiner][TargetLowering] Pass a SmallVector instead of a std::vector to ↵Craig Topper2018-07-302-3/+3
| | | | | | | | BuildSDIV/BuildUDIV/etc. The vector contains the SDNodes that these functions create. The number of nodes is always a small number so we should use SmallVector to avoid a heap allocation. llvm-svn: 338329
* [DAGCombiner][PowerPC][AArch64] Pass Created vector by reference to ↵Craig Topper2018-07-302-6/+4
| | | | | | BuildSDIVPow2. llvm-svn: 338303
* Remove trailing spaceFangrui Song2018-07-3019-46/+46
| | | | | | sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h} llvm-svn: 338293
* DAG: Add calling convention argument to calling convention funcsMatt Arsenault2018-07-283-1/+5
| | | | | | | | This seems like a pretty glaring omission, and AMDGPU wants to treat kernels differently from other calling conventions. llvm-svn: 338194
* [Power9] Code Cleanup - Remove needsAggressiveScheduling()Stefan Pintilie2018-07-191-27/+8
| | | | | | | | | As we already return true from needsAggressiveScheduling() for the most recent hardware it would be cleaner to just return true for all PowerPC hardware. Differential Revision: https://reviews.llvm.org/D48663 llvm-svn: 337488
* [NFC] fix trivial typos in commentsHiroshi Inoue2018-07-182-4/+4
| | | | llvm-svn: 337351
* Fix build failures from r337347, found by clangJustin Hibbits2018-07-183-15/+6
| | | | | | | | | | | * Delete a no-longer-used override, and mark the other getRegisterTypeForCallingConv() as override. * SPE only supports i32, not i64, as the internal type, so simply remove the type check, so that DestReg and Opc are provably always set. GCC 6.4 did not warn about either of the above. llvm-svn: 337350
* Introduce codegen for the Signal Processing EngineJustin Hibbits2018-07-1818-614/+1323
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1, e500v2, and several e200 cores. This adds support targeting the e500v2, as this is more common than the e500v1, and is in SoCs still on the market. This patch is very intrusive because the SPE is binary incompatible with the traditional FPU. After discussing with others, the cleanest solution was to make both SPE and FPU features on top of a base PowerPC subset, so all FPU instructions are now wrapped with HasFPU predicates. Supported by this are: * Code generation following the SPE ABI at the LLVM IR level (calling conventions) * Single- and Double-precision math at the level supported by the APU. Still to do: * Vector operations * SPE intrinsics As this changes the Callee-saved register list order, one test, which tests the precise generated code, was updated to account for the new register order. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D44830 llvm-svn: 337347
* Complete the SPE instruction set patternsJustin Hibbits2018-07-186-225/+562
| | | | | | | | | This is the lead-up to having SPE codegen. Add the rest of the instructions, along with MC tests. Differential Revision: https://reviews.llvm.org/D44829 llvm-svn: 337346
* Add PowerPC e500(v2) core scheduler and directives.Justin Hibbits2018-07-187-220/+497
| | | | | | Differential Revision: https://reviews.llvm.org/D44828 llvm-svn: 337345
* [PowerPC] Materialize more constants with CR-field set in late peepholeNemanja Ivanovic2018-07-131-5/+28
| | | | | | | | | | | | | | | | | Revision r322373 fixed a bug in how we materialize constants when the CR-field needs to be set. However the fix is overly conservative. It will only do the transform if AND-ing the input with the new constant produces the same new constant. This is of course correct, but not necessarily required. If there are no futher uses of the constant, the constant can be changed. If there are no uses of the GPR result, the final result of the materialization isn't important other than it needs to compare to zero correctly (lt, gt, eq). Differential revision: https://reviews.llvm.org/D42109 llvm-svn: 337008
* [Power9] Add remaining __flaot128 builtin support for FMA round to oddStefan Pintilie2018-07-111-3/+12
| | | | | | | | | | | | | | Implement this as it is done on GCC: __float128 a, b, c, d; a = __builtin_fmaf128_round_to_odd (b, c, d); // generates xsmaddqpo a = __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsmsubqpo a = - __builtin_fmaf128_round_to_odd (b, c, d); // generates xsnmaddqpo a = - __builtin_fmaf128_round_to_odd (b, c, -d); // generates xsnmsubpqp Differential Revision: https://reviews.llvm.org/D48218 llvm-svn: 336754
* [Power9] Add __float128 builtins for Rounding OperationsStefan Pintilie2018-07-092-0/+22
| | | | | | | | | | | | | | | Added __float128 support for a number of rounding operations: trunc rint nearbyint round floor ceil Differential Revision: https://reviews.llvm.org/D48415 llvm-svn: 336601
* [Power9] [LLVM] Add __float128 support for trunc to double round to oddStefan Pintilie2018-07-091-1/+4
| | | | | | | | | Add support for this builtin: double builtin_truncf128_round_to_odd(float128) Differential Revision: https://reviews.llvm.org/D48483 llvm-svn: 336595
* [Power9] Add __float128 builtins for Round To OddStefan Pintilie2018-07-091-6/+25
| | | | | | | | | | | | GCC has builtins for these round to odd instructions: __float128 __builtin_sqrtf128_round_to_odd (__float128) __float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128) __float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128) Differential Revision: https://reviews.llvm.org/D47550 llvm-svn: 336578
* [Power9] Add __float128 support for compare operationsStefan Pintilie2018-07-093-2/+75
| | | | | | | | Added handling for the select f128. Differential Revision: https://reviews.llvm.org/D48294 llvm-svn: 336548
* [Power9] Add __float128 library call for fremStefan Pintilie2018-07-061-0/+2
| | | | | | | | Power 9 does not have a hardware instruction for frem but we can call fmodf128. Differential Revision: https://reviews.llvm.org/D48552 llvm-svn: 336406
* [Power9] Add lib calls for float128 operations with no equivalent PPC ↵Lei Huang2018-07-051-0/+19
| | | | | | | | | | | instructions Map the following instructions to the proper float128 lib calls: pow[i], exp[2], log[2|10], sin, cos, fmin, fmax Differential Revision: https://reviews.llvm.org/D48544 llvm-svn: 336361
* [Power9] Optimize codgen for conversions of int to float128Lei Huang2018-07-051-0/+17
| | | | | | | | | | | | Optimize code sequences for integer conversion to fp128 when the integer is a result of: * float->int * float->long * double->int * double->long Differential Revision: https://reviews.llvm.org/D48429 llvm-svn: 336316
* [Power9] Ensure float128 in non-homogenous aggregates are passed via VSX regLei Huang2018-07-054-0/+43
| | | | | | | | | | | | | Non-homogenous aggregates are passed in consecutive GPRs, in GPRs and in memory, or in memory. This patch ensures that float128 members of non-homogenous aggregates are passed via VSX registers. This is done via custom lowering a bitcast of a build_pari(i64,i64) to float128 to a new PPCISD node, BUILD_FP128. Differential Revision: https://reviews.llvm.org/D48308 llvm-svn: 336310
* [Power9]Legalize and emit code for quad-precision convert from single-precisionLei Huang2018-07-052-2/+10
| | | | | | | | | Legalize and emit code for quad-precision floating point operation conversion of single-precision value to quad-precision. Differential Revision: https://reviews.llvm.org/D47569 llvm-svn: 336307
* [Power9] Implement float128 parameter passing and return valuesLei Huang2018-07-052-5/+25
| | | | | | | | | | This patch enable parameter passing and return by value for float128 types. Passing aggregate/union which contain float128 members will be submitted in subsequent patches. Differential Revision: https://reviews.llvm.org/D47552 llvm-svn: 336306
* [Power9]Legalize and emit code for round & convert quad-precision valuesLei Huang2018-07-043-3/+26
| | | | | | | | | Legalize and emit code for round & convert float128 to double precision and single precision. Differential Revision: https://reviews.llvm.org/D46997 llvm-svn: 336299
* [PowerPC] Replace the Post RA List Scheduler with the Machine SchedulerStefan Pintilie2018-07-041-1/+7
| | | | | | | | | | | We want to run the Machine Scheduler instead of the List Scheduler after RA. Checked with a performance run on a Power 9 machine with SPEC 2006 and while some benchmarks improved and others degraded the geomean was slightly improved with the Machine Scheduler. Differential Revision: https://reviews.llvm.org/D45265 llvm-svn: 336295
* [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's ↵QingShan Zhang2018-07-021-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | multiple for i64 pre-inc load/store For the below case, pre-inc prep think it's a good candidate to use pre-inc for the bucket, but 64bit integer load/store update (pre-inc) instruction on Power requires the displacement field should be DS-form (4's multiple). Since it can't satisfy the constraint, we have to do some fix ups later. As below, the original load/stores could be well-form, it makes things worse. unsigned long long result = 0; unsigned long long foo(char *p, unsigned long long n) { for (unsigned long long i = 0; i < n; i++) { unsigned long long x1 = *(unsigned long long *)(p - 50000 + i); unsigned long long x2 = *(unsigned long long *)(p - 61024 + i); unsigned long long x3 = *(unsigned long long *)(p - 62048 + i); unsigned long long x4 = *(unsigned long long *)(p - 64096 + i); result *= x1 * x2 * x3 * x4; } return result; } Patch by jedilyn(Kewen Lin). Differential Revision: https://reviews.llvm.org/D48813 --This line, and those below, will be ignored-- M lib/Target/PowerPC/PPCLoopPreIncPrep.cpp A test/CodeGen/PowerPC/preincprep-i64-check.ll llvm-svn: 336074
* [PowerPC] Fix incorrectly encoded wait instructionLei Huang2018-06-251-1/+1
| | | | | | | | Encoding for the wait instruction was wrong. Fix according to ISA 3.0. Differential Revision: https://reviews.llvm.org/D48550 llvm-svn: 335514
* [PowerPC] Fix label address calculation for ppc32Strahinja Petrovic2018-06-191-3/+4
| | | | | | | | This patch fixes calculating address of label on ppc32 (for -fPIC). Differential Revision: https://reviews.llvm.org/D46582 llvm-svn: 335043
* If the arch is P9, we will select the DFLOADf32/DFLOADf64 pseudo instruction ↵QingShan Zhang2018-06-192-11/+23
| | | | | | | | | | | when we are loading a floating, and expand it post RA basing on the register pressure. However, we miss to do the add-imm peephole for these pseudo instruction. Differential Revision: https://reviews.llvm.org/D47568 Reviewed By: Nemanjai llvm-svn: 335024
* [PowerPC] Add support for high and higha symbol modifiers on tls modifers.Sean Fertile2018-06-151-0/+12
| | | | | | | | | | Enables using the high and high-adjusted symbol modifiers on thread local storage modifers in powerpc assembly. Needed to be able to support 64 bit thread-pointer and dynamic-thread-pointer access sequences. Differential Revision: https://reviews.llvm.org/D47754 llvm-svn: 334856
* [PPC64] Support "symbol@high" and "symbol@higha" symbol modifers.Sean Fertile2018-06-154-0/+34
| | | | | | | | | | Add support for the "@high" and "@higha" symbol modifiers in powerpc64 assembly. The modifiers represent accessing the segment consiting of bits 16-31 of a 64-bit address/offset. Differential Revision: https://reviews.llvm.org/D47729 llvm-svn: 334855
* [PowerPC] fix trivial typos in comment, NFCHiroshi Inoue2018-06-1310-26/+26
| | | | llvm-svn: 334583
* [PowerPC] avoid verification failure due to PowerPC VSX Swap Removal passHiroshi Inoue2018-06-131-0/+6
| | | | | | | This patch fixes a failure in lnt tests with -verify-machineinstrs option. When VSX Swap Removal pass swaps two register operands, it did not maintain kill flags associated with operands. This patch swaps flags as well as register number to avoid inconsistent kill flags information. llvm-svn: 334579
OpenPOWER on IntegriCloud