summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
* This patch adds the VSX logical instructions introduced in the Power ISA ↵Kit Barton2015-02-182-2/+25
| | | | | | | | | | 2.07. It also removes the added complexity that favors VMX versions of the three instructions. Phabricator review: http://reviews.llvm.org/D7616 Commiting on Nemanja's behalf. llvm-svn: 229694
* Make the PowerPC AsmPrinter independent of global subtargetEric Christopher2015-02-171-15/+24
| | | | | | | | | initialization. Initialize the subtarget once per function and migrate EmitStartOfAsmFile to either use attributes on the TargetMachine or get information from all of the various subtargets. llvm-svn: 229475
* Add a FIXME to move IsLittleEndian to the target machine.Eric Christopher2015-02-171-0/+1
| | | | llvm-svn: 229472
* Move ABI handling and 64-bitness to the PowerPC target machine.Eric Christopher2015-02-175-32/+42
| | | | | | | This required changing how the computation of the ABI is handled and how some of the checks for ABI/target are done. llvm-svn: 229471
* [PowerPC] Support non-direct-sub/superclass VSX copiesHal Finkel2015-02-161-4/+4
| | | | | | | | | | | | | | | | Our register allocation has become better recently, it seems, and is now starting to generate cross-block copies into inflated register classes. These copies are not transformed into subregister insertions/extractions by the PPCVSXCopy class, and so need to be handled directly by PPCInstrInfo::copyPhysReg. The code to do this was *almost* there, but not quite (it was unnecessarily restricting itself to only the direct sub/super-register-class case (not copying between, for example, something in VRRC and the lower-half of VSRC which are super-registers of F8RC). Triggering this behavior manually is difficult; I'm including two bugpoint-reduced test cases from the test suite. llvm-svn: 229457
* AArch64: Safely handle the incoming sret call argument.Andrew Trick2015-02-161-6/+12
| | | | | | | | | | | | | | | | | | This adds a safe interface to the machine independent InputArg struct for accessing the index of the original (IR-level) argument. When a non-native return type is lowered, we generate the hidden machine-level sret argument on-the-fly. Before this fix, we were representing this argument as OrigArgIndex == 0, which is an outright lie. In particular this crashed in the AArch64 backend where we actually try to access the type of the original argument. Now we use a sentinel value for machine arguments that have no original argument index. AArch64, ARM, Mips, and PPC now check for this case before accessing the original argument. Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering llvm-svn: 229413
* Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for ↵Aaron Ballman2015-02-151-2/+2
| | | | | | requiring the macro. NFC; LLVM edition. llvm-svn: 229340
* Remove a variable only used in an assert and sink its initializer intoChandler Carruth2015-02-141-2/+1
| | | | | | the assert. Fixes -Wunused-variable on non-asserts builds. llvm-svn: 229250
* PowerPC: Canonicalize access to function attributes, NFCDuncan P. N. Exon Smith2015-02-144-19/+9
| | | | | | | | | | | | Canonicalize access to function attributes to use the simpler API. getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind) => getFnAttribute(Kind) getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind) => hasFnAttribute(Kind) llvm-svn: 229224
* The base pointer save offset can be computed at initialization time,Eric Christopher2015-02-132-23/+20
| | | | | | do so and fix up the calls. llvm-svn: 229169
* Move the target machine variable so that it's initialized earlyEric Christopher2015-02-132-4/+3
| | | | | | enough we can use it to initialize frame lowering. llvm-svn: 229168
* Stash the TargetMachine on the subtarget so we can access it later.Eric Christopher2015-02-133-6/+6
| | | | | | Clean up a subtarget function that has it passed in while we're at it. llvm-svn: 229164
* PPC LinkageSize can be computed at initialization time, do so.Eric Christopher2015-02-134-27/+20
| | | | llvm-svn: 229163
* [PM] Remove the old 'PassManager.h' header file at the top level ofChandler Carruth2015-02-131-1/+1
| | | | | | | | | | | | | | | | | | | | LLVM's include tree and the use of using declarations to hide the 'legacy' namespace for the old pass manager. This undoes the primary modules-hostile change I made to keep out-of-tree targets building. I sent an email inquiring about whether this would be reasonable to do at this phase and people seemed fine with it, so making it a reality. This should allow us to start bootstrapping with modules to a certain extent along with making it easier to mix and match headers in general. The updates to any code for users of LLVM are very mechanical. Switch from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h". Qualify the types which now produce compile errors with "legacy::". The most common ones are "PassManager", "PassManagerBase", and "FunctionPassManager". llvm-svn: 229094
* Re-sort #include lines using my handy dandy ./utils/sort_includes.pyChandler Carruth2015-02-131-2/+2
| | | | | | script. This is in preparation for changes to lots of include lines. llvm-svn: 229088
* PPCFrameLowering's FramePointerOffset can be computed at initializationEric Christopher2015-02-133-28/+25
| | | | | | time. Do so. llvm-svn: 228998
* The TOC save offset can be computed at compile time, do so andEric Christopher2015-02-133-7/+10
| | | | | | propagate changes. llvm-svn: 228997
* The return save offset can be computed at initialization time - doEric Christopher2015-02-133-17/+19
| | | | | | so and save the value. llvm-svn: 228996
* Change max interleave factor to 12 for POWER7 and POWER8.Olivier Sallenave2015-02-121-0/+6
| | | | llvm-svn: 228973
* MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line ↵Benjamin Kramer2015-02-121-1/+1
| | | | | | | | with countTrailingZeros Update all callers. llvm-svn: 228930
* [PowerPC] Mark jumps as expensive (using using CR bits)Hal Finkel2015-02-121-1/+3
| | | | | | | | | | | | | | | On PowerPC, which has a full set of logical operations on (its multiple sets of) condition-register bits, it is not profitable to break of complex conditions feeding a jump into multiple jumps. We can turn off this feature of CGP/SDAGBuilder by marking jumps as "expensive". P7 test-suite speedups (no regressions): MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.626647% +/- 0.323583% MultiSource/Benchmarks/Olden/power/power -18.2821% +/- 8.06481% llvm-svn: 228895
* Fix up r228725, missed change in PPCSubtarget definitionBill Schmidt2015-02-101-6/+6
| | | | llvm-svn: 228728
* [PowerPC] Fix reverted patch r227976 to avoid register assignment issuesBill Schmidt2015-02-1011-114/+379
| | | | | | | | | | | | | | | | | | | See full discussion in http://reviews.llvm.org/D7491. We now hide the add-immediate and call instructions together in a separate pseudo-op, which is tagged to define GPR3 and clobber the call-killed registers. The PPCTLSDynamicCall pass prior to RA now expands this op into the two separate addi and call ops, with explicit definitions of GPR3 on both instructions, and explicit clobbers on the call instruction. The pass is now marked as requiring and preserving the LiveIntervals and SlotIndexes analyses, and fixes these up after the replacement sequences are introduced. Self-hosting has been verified on LE P8 and BE P7 with various optimization levels, etc. It has also been verified with the --no-tls-optimize flag workaround removed. llvm-svn: 228725
* [PowerPC] Support the (old) cntlz instruction aliasHal Finkel2015-02-101-0/+3
| | | | | | | Some old assembly code uses the cntlz alias for cntlzw, binutils supports this, and we should too. Fixes PR22519. llvm-svn: 228719
* Migrate PPCAsmPrinter's subtarget from reference to pointer inEric Christopher2015-02-101-48/+49
| | | | | | preparation for making it MachineFunction dependent. llvm-svn: 228638
* This change implements the following three logical vector operations:Kit Barton2015-02-091-0/+25
| | | | | | | | | | | | veqv (vector equivalence) vnand vorc I increased the AddedComplexity for these instructions to 500 to ensure they are generated instead of issuing other VSX instructions. Phabricator review: http://reviews.llvm.org/D7469 llvm-svn: 228580
* [PowerPC] Handle loop predecessor invokesHal Finkel2015-02-071-4/+12
| | | | | | | | | | If a loop predecessor has an invoke as its terminator, and the return value from that invoke is used to determine the loop iteration space, then we can't insert a computation based on that value in the loop predecessor prior to the terminator (oops). If there's such an invoke, or just no predecessor for that matter, insert a new loop preheader. llvm-svn: 228488
* Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and ↵Hal Finkel2015-02-0611-234/+108
| | | | | | | | | | | | | | related fixups Unfortunately, even with the workaround of disabling the linker TLS optimizations in Clang restored (which has already been done), this still breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native). Bill is currently working on an alternate implementation to address the TLS issue in a way that also fully elides the linker bug (which, unfortunately, this approach did not fully), so I'm reverting this now. llvm-svn: 228460
* Make helper functions/classes/globals static. NFC.Benjamin Kramer2015-02-062-6/+10
| | | | llvm-svn: 228410
* Fix an incorrect identifierSylvestre Ledru2015-02-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: EIEIO is not a correct declaration and breaks the build under Debian HURD. Instead, E_IEIO is used. // http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html Some additional classes of identifier names are reserved for future extensions to the C language or the POSIX.1 environment. While using these names for your own purposes right now might not cause a problem, they do raise the possibility of conflict with future versions of the C or POSIX standards, so you should avoid these names. ... Names beginning with a capital ‘E’ followed a digit or uppercase letter may be used for additional error code names. See Error Reporting.// Reported here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776965 And patch wrote by Svante Signell With this patch, LLVM, Clang & LLDB build under Debian HURD: https://buildd.debian.org/status/fetch.php?pkg=llvm-toolchain-3.6&arch=hurd-i386&ver=1%3A3.6~%2Brc2-2&stamp=1423040039 Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7437 llvm-svn: 228331
* [PowerPC] Prepare loops for pre-increment loads/storesHal Finkel2015-02-054-0/+383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PowerPC supports pre-increment load/store instructions (except for Altivec/VSX vector load/stores). Using these on embedded cores can be very important, but most loops are not naturally set up to use them. We can often change that, however, by placing loops into a non-canonical form. Generically, this means transforming loops like this: for (int i = 0; i < n; ++i) array[i] = c; to look like this: T *p = array[-1]; for (int i = 0; i < n; ++i) *++p = c; the key point is that addresses accessed are pulled into dedicated PHIs and "pre-decremented" in the loop preheader. This allows the use of pre-increment load/store instructions without loop peeling. A target-specific late IR-level pass (running post-LSR), PPCLoopPreIncPrep, is introduced to perform this transformation. I've used this code out-of-tree for generating code for the PPC A2 for over a year. Somewhat to my surprise, running the test suite + externals on a P7 with this transformation enabled showed no performance regressions, and one speedup: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk -2.32514% +/- 1.03736% So I'm going to enable it on everything for now. I was surprised by this because, on the POWER cores, these pre-increment load/store instructions are cracked (and, thus, harder to schedule effectively). But seeing no regressions, and feeling that it is generally easier to split instructions apart late than it is to combine them late, this might be the better approach regardless. In the future, we might want to integrate this functionality into LSR (but currently LSR does not create new PHI nodes, so (for that and other reasons) significant work would need to be done). llvm-svn: 228328
* [PowerPC] Generate pre-increment floating-point ld/st instructionsHal Finkel2015-02-051-0/+4
| | | | | | | | PowerPC supports pre-increment floating-point load/store instructions, both r+r and r+i, and we had patterns for them, but they were not marked as legal. Mark them as legal (and add a test case). llvm-svn: 228327
* [PowerPC] Implement the vclz instructions for PWR8Bill Schmidt2015-02-052-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Add the vector count leading zeros instruction for byte, halfword, word, and doubleword sizes. This is a fairly straightforward addition after the changes made for vpopcnt: 1. Add the correct definitions for the various instructions in PPCInstrAltivec.td 2. Make the CTLZ operation legal on vector types when using P8Altivec in PPCISelLowering.cpp Test Plan Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the instructions are being generated when the CTLZ operation is used in LLVM. Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively. llvm-svn: 228301
* Replace tabs with spaces from r228116. Oops.Bill Schmidt2015-02-041-2/+2
| | | | llvm-svn: 228117
* [PowerPC] Handle 32-bit targets properly in PPCTLSDynamicCall.cppBill Schmidt2015-02-041-1/+3
| | | | llvm-svn: 228116
* [PowerPC] Implement the vpopcnt instructions for POWER8Bill Schmidt2015-02-036-6/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Add the vector population count instructions for byte, halfword, word, and doubleword sizes. There are two major changes here: PPCISelLowering.cpp: Make CTPOP legal for vector types. PPCRegisterInfo.td: Added v2i64 to the VRRC register definition. This is needed for the doubleword variations of the integer ops that were added in P8. Test Plan Test the instruction vpcnt* encoding/decoding in ppc64-encoding-vmx.s Test the generation of the vpopcnt instructions for various vector data types. When adding the v2i64 type to the Vector Register set, I also needed to add the appropriate bit conversion patterns between v2i64 and the existing vector types. Testing for these conversions were also added in the test case by passing a different vector type as a parameter into the test functions. There is also a run step that will ensure the vpopcnt instructions are generated when the vsx feature is disabled. llvm-svn: 228046
* [PowerPC] Yet another approach to __tls_get_addrBill Schmidt2015-02-0311-108/+232
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a third attempt to properly handle the local-dynamic and global-dynamic TLS models. In my original implementation, calls to __tls_get_addr were hidden from view until the asm-printer phase, at which point the underlying branch-and-link instruction was created with proper relocations. This mostly worked well, but I used some repellent techniques to ensure that the TLS_GET_ADDR nodes at the SD and MI levels correctly received input from GPR3 and produced output into GPR3. This proved to work badly in the presence of multiple TLS variable accesses, with the copies to and from GPR3 being scheduled incorrectly and generally creating havoc. In r221703, I addressed that problem by representing the calls to __tls_get_addr as true calls during instruction lowering. This had the advantage of removing all of the bad hacks and relying on the existing call machinery to properly glue the copies in place. It looked like this was going to be the right way to go. However, as a side effect of the recent discovery of problems with linker optimizations for TLS, we discovered cases of suboptimal code generation with this strategy. The problem comes when tls_get_addr is called for the same address, and there is a resulting CSE opportunity. It turns out that in such cases MachineCSE will common the addis/addi instructions that set up the input value to tls_get_addr, but will not common the calls themselves. MachineCSE does not have any machinery to common idempotent calls. This is perfectly sensible, since presumably this would be done at the IR level, and introducing calls in the back end isn't commonplace. In any case, we end up with two calls to __tls_get_addr when one would suffice, and that isn't good. I presumed that the original design would have allowed commoning of the machine-specific nodes that hid the __tls_get_addr calls, so as suggested by Ulrich Weigand, I went back to that design and cleaned it up so that the copies were properly held together by glue nodes. However, it turned out that this didn't work either...the presence of copies to physical registers kept the machine-specific nodes from being commoned also. All of which leads to the design presented here. This is a return to the original design, except that no attempt is made to introduce copies to and from GPR3 during instruction lowering. Virtual registers are used until prior to register allocation. At that point, a special pass is run that identifies the machine-specific nodes that hide the tls_get_addr calls and introduces the copies to and from GPR3 around them. The register allocator then coalesces these copies away. With this design, MachineCSE succeeds in commoning tls_get_addr calls where possible, and we get nice optimal code generation (better than GCC at the moment, which does not common these calls). One additional problem must be dealt with: After introducing the mentions of the physical register GPR3, the aggressive anti-dependence breaker sees opportunities to improve scheduling by selecting a different register instead. Flags must be used on the instruction descriptions to tell the anti-dependence breaker to keep its hands in its pockets. One thing missing from the original design was recording a definition of the link register on the GET_TLS_ADDR nodes. Doing this was found to be insufficient to force a stack frame to be created, which led to looping behavior because two different LR values were stored at the same address. This appears to have been an oversight in PPCFrameLowering::determineFrameLayout(), which is repaired here. Because MustSaveLR() returns true for calls to builtin_return_address, this changed the expected behavior of test/CodeGen/PowerPC/retaddr2.ll, which now stacks a frame but formerly did not. I've fixed the test case to reflect this. There are existing TLS tests to catch regressions; the checks in test/CodeGen/PowerPC/tls-store2.ll proved to be too restrictive in the face of instruction scheduling with these changes, so I fixed that up. I've added a new test case based on the PrettyStackTrace module that demonstrated the original problem. This checks that we get correct code generation and that CSE of the calls to __get_tls_addr has taken place. llvm-svn: 227976
* [PowerPC] Put PPCEarlyReturn into its own source fileHal Finkel2015-02-013-165/+202
| | | | | | | | | | | PPCInstrInfo.cpp has ended up containing several small MI-level passes, and this is making the file harder to read than necessary. Split out PPCEarlyReturn into its own source file. NFC. Now that PPCInstrInfo.cpp does not also contain pass implementations, I hope that it will be slightly less unwieldy. llvm-svn: 227775
* [PowerPC] Remove unnecessary includeHal Finkel2015-02-011-1/+0
| | | | llvm-svn: 227772
* [PowerPC] Put PPCVSXCopy into its own source fileHal Finkel2015-02-013-139/+177
| | | | | | | | PPCInstrInfo.cpp has ended up containing several small MI-level passes, and this is making the file harder to read than necessary. Split out PPCVSXCopy into its own source file. NFC. llvm-svn: 227771
* [PowerPC] Put PPCVSXFMAMutate into its own source fileHal Finkel2015-02-013-291/+337
| | | | | | | | PPCInstrInfo.cpp has ended up containing several small MI-level passes, and this is making the file harder to read than necessary. Split out PPCVSXFMAMutate into its own source file. NFC. llvm-svn: 227770
* [PowerPC] Remove the PPCVSXCopyCleanup passHal Finkel2015-02-013-79/+0
| | | | | | | | | | | This MI-level pass was necessary when VSX support was first being developed, specifically, before the ABI code had been updated to use VSX registers for arguments (the register assignments did not change, in a physical sense, but the VSX super-registers are now used). Unfortunately, I never went back and removed this pass after that was done. I believe this code is now effectively dead. llvm-svn: 227767
* [PowerPC] Add implicit ops to conditional returns in PPCEarlyReturnHal Finkel2015-02-011-8/+13
| | | | | | | | | | | | | When PPCEarlyReturn, it should really copy implicit ops from the old return instruction to the new one. This currently does not matter much, because we run PPCEarlyReturn very late in the pipeline (there is nothing to do DCE on definitions of those registers). However, for completeness, we should do it anyway. Noticed by inspection (and there should be no functional change); thus, no test case. llvm-svn: 227763
* [PowerPC] VSX stores don't also readHal Finkel2015-02-011-4/+6
| | | | | | | | | | The VSX store instructions were also picking up an implicit "may read" from the default pattern, which was an intrinsic (and we don't currently have a way of specifying write-only intrinsics). This was causing MI verification to fail for VSX spill restores. llvm-svn: 227759
* [PowerPC] Better scheduling for isel on P7/P8Hal Finkel2015-02-019-2/+34
| | | | | | | | | | isel is actually a cracked instruction on the P7/P8, and must start a dispatch group. The scheduling model should reflect this so that we don't bunch too many of them together when possible. Thanks to Bill Schmidt and Pat Haugen for helping to sort this out. llvm-svn: 227758
* [PowerPC] Make r2 allocatable on PPC64/ELF for some leaf functionsHal Finkel2015-02-016-10/+96
| | | | | | | | | | | | | | | | | The TOC base pointer is passed in r2, and we normally reserve this register so that we can depend on it being there. However, for leaf functions, and specifically those leaf functions that don't do any TOC access of their own (which is generally due to accessing the constant pool, using TLS, etc.), we can treat r2 as an ordinary callee-saved register (it must be callee-saved because, for local direct calls, the linker will not insert any save/restore code). The allocation order has been changed slightly for PPC64/ELF systems to put r2 at the end of the list (while leaving it near the beginning for Darwin systems to prevent unnecessary output changes). While r2 is allocatable, using it still requires spill/restore traffic, and thus comes at the end of the list. llvm-svn: 227745
* [multiversion] Remove the function parameter from the unrollingChandler Carruth2015-02-012-4/+3
| | | | | | preferences interface on TTI now that all of TTI is per-function. llvm-svn: 227741
* [multiversion] Switch the TTI queries from TargetMachine to SubtargetChandler Carruth2015-02-012-11/+6
| | | | | | | | | | | | | | | | | | | now that we have a correct and cached subtarget specific to the function. Also, finish providing a cached per-function subtarget in the core LLVMTargetMachine -- that layer hadn't switched over yet. The only use of the TargetMachine was to re-lookup a subtarget for a particular function to work around the fact that TTI was immutable. Now that it is per-function and we haved a cached subtarget, use it. This still leaves a few interfaces with real warts on them where we were passing Function objects through the TTI interface. I'll remove these and clean their usage up in subsequent commits now that this isn't necessary. llvm-svn: 227738
* [multiversion] Remove the cached TargetMachine pointer from theChandler Carruth2015-02-011-4/+13
| | | | | | | | | | | | | | | | intermediate TTI implementation template and instead query up to the derived class for both the TargetMachine and the TargetLowering. Most of the derived types had a TLI cached already and there is no need to store a less precisely typed target machine pointer. This will in turn make it much cleaner to look up the TLI via a per-function subtarget instead of the generic subtarget, and it will pave the way toward pulling the subtarget used for unroll preferences into the same form once we are *always* using the function to look up the correct subtarget. llvm-svn: 227737
* [multiversion] Switch all of the targets over to use theChandler Carruth2015-02-013-5/+6
| | | | | | | | | | | | | | | | TargetIRAnalysis access path directly rather than implementing getTTI. This even removes getTTI from the interface. It's more efficient for each target to just register a precise callback that creates their specific TTI. As part of this, all of the targets which are building their subtargets individually per-function now build their TTI instance with the function and thus look up the correct subtarget and cache it. NVPTX, R600, and XCore currently don't leverage this functionality, but its trivial for them to add it now. llvm-svn: 227735
OpenPOWER on IntegriCloud