summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
* Move lib/Analysis/DebugInfo.cpp to lib/VMCore/DebugInfo.cpp andBill Wendling2012-06-281-1/+1
| | | | | | | | | include/llvm/Analysis/DebugInfo.h to include/llvm/DebugInfo.h. The reasoning is because the DebugInfo module is simply an interface to the debug info MDNodes and has nothing to do with analysis. llvm-svn: 159312
* There are a number of generic inline asm operand modifiers thatJack Carter2012-06-261-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | up to r158925 were handled as processor specific. Making them generic and putting tests for these modifiers in the CodeGen/Generic directory caused a number of targets to fail. This commit addresses that problem by having the targets call the generic routine for generic modifiers that they don't currently have explicit code for. For now only generic print operands 'c' and 'n' are supported.vi Affected files: test/CodeGen/Generic/asm-large-immediate.ll lib/Target/PowerPC/PPCAsmPrinter.cpp lib/Target/NVPTX/NVPTXAsmPrinter.cpp lib/Target/ARM/ARMAsmPrinter.cpp lib/Target/XCore/XCoreAsmPrinter.cpp lib/Target/X86/X86AsmPrinter.cpp lib/Target/Hexagon/HexagonAsmPrinter.cpp lib/Target/CellSPU/SPUAsmPrinter.cpp lib/Target/Sparc/SparcAsmPrinter.cpp lib/Target/MBlaze/MBlazeAsmPrinter.cpp lib/Target/Mips/MipsAsmPrinter.cpp MSP430 isn't represented because it did not even run with the long existing 'c' modifier and it was not apparent what needs to be done to get it inline asm ready. Contributer: Jack Carter llvm-svn: 159203
* llvm/lib: [CMake] Add explicit dependency to intrinsics_gen.NAKAMURA Takumi2012-06-241-0/+2
| | | | llvm-svn: 159112
* Silence an unused variable warning on release builds.Craig Topper2012-06-231-2/+2
| | | | llvm-svn: 159074
* Add support for the PPC isel instruction.Hal Finkel2012-06-228-14/+84
| | | | | | | The isel (integer select) instruction is supported on the 440 and A2 embedded cores and on the POWER7. llvm-svn: 159045
* Convert the PPC backend to use the new FMA infrastructure.Hal Finkel2012-06-224-42/+48
| | | | | | | The existing contraction patterns are replaced with fma/fneg. Overall functionality should be the same. llvm-svn: 158955
* Treat TargetGlobalAddress as a constant for the purpose of matching pre-inc ↵Hal Finkel2012-06-211-1/+6
| | | | | | | | stores on PPC. Thanks to Tobias von Koch for pointing out this problem. llvm-svn: 158932
* Add support for generating reg+reg (indexed) pre-inc loads on PPC.Hal Finkel2012-06-204-10/+106
| | | | llvm-svn: 158823
* Add DAG-combines for aggressive FMA formation.Lang Hames2012-06-191-1/+1
| | | | | | | | | | | | | | | | | | | | This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or FSUB + FMUL. The combines are performed when: (a) Either AllowExcessFPPrecision option (-enable-excess-fp-precision for llc) OR UnsafeFPMath option (-enable-unsafe-fp-math) are set, and (b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of the FADD/FSUB, and (c) The FMUL only has one user (the FADD/FSUB). If your target has fast FMA instructions you can make use of these combines by overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for types supported by your FMA instruction, and adding patterns to match ISD::FMA to your FMA instructions. llvm-svn: 158757
* Implement PPCInstrInfo::isCoalescableExtInstr().Jakob Stoklund Olesen2012-06-192-0/+19
| | | | | | | | | | | | | The PPC::EXTSW instruction preserves the low 32 bits of its input, just like some of the x86 instructions. Use it to reduce register pressure when the low 32 bits have multiple uses. This requires a small change to PeepholeOptimizer since EXTSW takes a 64-bit input register. This is related to PR5997. llvm-svn: 158743
* Mark most PPC register classes to avoid write-after-write.Hal Finkel2012-06-192-0/+16
| | | | | | | | | | | | | | | | | | | | | | | For processors with the G5-like instruction-grouping scheme, this helps avoid early group termination due to a write-after-write dependency within the group. It should also help on pipelined embedded cores. On POWER7, over the test suite, this gives an average 0.5% speedup. The largest speedups are: SingleSource/Benchmarks/Stanford/Quicksort - 33% MultiSource/Applications/d/make_dparser - 21% MultiSource/Benchmarks/FreeBench/analyzer/analyzer - 12% MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft - 12% Largest slowdowns: SingleSource/Benchmarks/Stanford/Bubblesort - 23% MultiSource/Benchmarks/Prolangs-C++/city/city - 21% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 16% MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode - 13% llvm-svn: 158719
* Add support for generating reg+reg preinc stores on PPC.Hal Finkel2012-06-196-25/+114
| | | | | | PPC will now generate STWUX and friends. llvm-svn: 158698
* Cleanup trip-count finding for PPC CTR loops (and some bug fixes).Hal Finkel2012-06-161-86/+127
| | | | | | | | | | | | | | This cleans up the method used to find trip counts in order to form CTR loops on PPC. This refactoring allows the pass to find loops which have a constant trip count but also happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different classes of loops that are currently ignored. In addition, we now search through all potential induction operations instead of just the first. Also, we check the predicate code on the conditional branch and abort the transformation if the code is not EQ or NE, and we then make sure that the branch to be transformed matches the condition register defined by the comparison (multiple possible comparisons will be considered). llvm-svn: 158607
* Add another missing 64-bit itinerary definition for the PPC A2 core.Hal Finkel2012-06-131-0/+11
| | | | llvm-svn: 158393
* Add some missing 64-bit itinerary definitions for the PPC A2 core.Hal Finkel2012-06-121-0/+22
| | | | llvm-svn: 158373
* Split out the PPC instruction class IntSimple from IntGeneral.Hal Finkel2012-06-129-65/+90
| | | | | | | On the POWER7, adds and logical operations can also be handled in the load/store pipelines. We'll call these IntSimple. llvm-svn: 158366
* Fixes for PPC host detection and features.Hal Finkel2012-06-121-3/+3
| | | | | | | | | POWER4 is a 64-bit CPU (better matched to the 970). The g3 is really the 750 (no altivec), the g4+ is the 74xx (not the 750). Patch by Andreas Tobler. llvm-svn: 158363
* Reapply r158337, this time properly protect Darwin/PPC host CPU use with ↵Hal Finkel2012-06-121-135/+4
| | | | | | | | | | | | | __ppc__. Original commit message: Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName(). Both the new Linux functionality and the old Darwin functions have been moved. This change also allows this information to be queried directly by clang and other frontends (clang, for example, will now have real -mcpu=native support). llvm-svn: 158349
* Revert r158337 "Move PPC host-CPU detection logic from PPCSubtarget into ↵Jakob Stoklund Olesen2012-06-121-2/+133
| | | | | | | | | sys::getHostCPUName()." This commit broke most of the PowerPC unit tests when running on Intel/Apple. llvm-svn: 158345
* Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName().Hal Finkel2012-06-111-133/+2
| | | | | | | | Both the new Linux functionality and the old Darwin functions have been moved. This change also allows this information to be queried directly by clang and other frontends (clang, for example, will now have real -mcpu=native support). llvm-svn: 158337
* Enable MFOCRF generation on the PPC A2 core.Hal Finkel2012-06-111-2/+2
| | | | llvm-svn: 158324
* Rename the PPC target feature gpul to mfocrf.Hal Finkel2012-06-115-13/+13
| | | | | | | | | | | The PPC target feature gpul (IsGigaProcessor) was only used for one thing: To enable the generation of the MFOCRF instruction. Furthermore, this instruction is available on other PPC cores outside of the G5 line. This feature now corresponds to the HasMFOCRF flag. No functionality change. llvm-svn: 158323
* Add A2 to the list of PPC CPUs recognized by Linux host CPU-type detection.Hal Finkel2012-06-111-0/+1
| | | | llvm-svn: 158322
* Emit the two-operand form of the PPC mfcr instruction as mfocrf.Hal Finkel2012-06-111-1/+1
| | | | | | This is necessary on Linux and supported on Darwin, see PR2604. llvm-svn: 158315
* Add local CPU detection for Linux PPC.Hal Finkel2012-06-111-1/+95
| | | | | | This functionality mirrors that available on PPC/Darwin. llvm-svn: 158314
* Add POWER6 and POWER7 CPU types to the PPC backend.Hal Finkel2012-06-113-0/+14
| | | | | | No functional change; these will be used by upcoming scheduler enhancements. llvm-svn: 158313
* Enable ILP scheduling for all nodes by default on PPC.Hal Finkel2012-06-101-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | Over the entire test-suite, this has an insignificantly negative average performance impact, but reduces some of the worst slowdowns from the anti-dep. change (r158294). Largest speedups: SingleSource/Benchmarks/Stanford/Quicksort - 28% SingleSource/Benchmarks/Stanford/Towers - 24% SingleSource/Benchmarks/Shootout-C++/matrix - 23% MultiSource/Benchmarks/SciMark2-C/scimark2 - 19% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15% (matrix and automotive-bitcount were both in the top-5 slowdown list from the anti-dep. change) Largest slowdowns: MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26% MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21% SingleSource/Benchmarks/CoyoteBench/lpbench - 20% MultiSource/Applications/d/make_dparser - 16% llvm-svn: 158296
* Use critical anti-dep. breaking on all PPC targets, but also add other ↵Hal Finkel2012-06-101-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | register classes. Using 'all' instead of 'critical' would be better because it would make it easier to satisfy the bundling constraints, but, as noted in the FIXME, that is currently not possible with the crs. This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups: SingleSource/Benchmarks/Shootout-C++/moments - 40% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26% SingleSource/Benchmarks/McGill/misr - 23% MultiSource/Applications/JM/ldecod/ldecod - 22% Largest slowdowns: SingleSource/Benchmarks/Shootout-C++/matrix - -29% SingleSource/Benchmarks/Shootout-C++/ary3 - -22% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18% SingleSource/Benchmarks/Shootout-C++/ary - -17% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15% llvm-svn: 158294
* Improve ext/trunc patterns on PPC64.Hal Finkel2012-06-091-11/+4
| | | | | | | | | | The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that would leave self-moves in the final assembly. Replacing those patterns with ones based on the SUBREG builtins yields better-looking code. Thanks to Jakob and Owen for their suggestions in this matter. llvm-svn: 158283
* Enable tail merging on PPC.Hal Finkel2012-06-091-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | Tail merging had been disabled on PPC because it would disturb bundling decisions made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions are made during post-RA scheduling, and tail merging is generally beneficial (the average test-suite speedup is insignificantly positive). Largest test-suite speedups: MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23% SingleSource/Benchmarks/Shootout-C++/ary - 21% SingleSource/Benchmarks/Stanford/Queens - 17% Largest slowdowns: MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22% MultiSource/Applications/JM/ldecod/ldecod - 14% MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9% This is improved by using full (instead of just critical) anti-dependency breaking, but doing so still causes miscompiles and so cannot yet be enabled by default. llvm-svn: 158259
* Remove the TODO statement in the PPC README re: CTR loopsHal Finkel2012-06-081-1/+0
| | | | | | | As Chris points out, this can now be removed! TODO: check if the associated section on viterbi's inner loop can also be removed. llvm-svn: 158224
* Enable PPC CTR loop formation by default.Hal Finkel2012-06-082-11/+9
| | | | | | | | | | | | | | | | | | | | | | Thanks to Jakob's help, this now causes no new test suite failures! Over the entire test suite, this gives an average 1% speedup. The largest speedups are: SingleSource/Benchmarks/Misc/pi - 108% SingleSource/Benchmarks/CoyoteBench/lpbench - 54% MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50% SingleSource/Benchmarks/Shootout/ary3 - 32% SingleSource/Benchmarks/Shootout-C++/matrix - 30% The largest slowdowns are: MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30% MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22% MultiSource/Applications/d/make_dparser - -14% SingleSource/Benchmarks/Shootout-C++/ary - -13% In light of these slowdowns, additional profiling work is obviously needed! llvm-svn: 158223
* Mark the PPC CTRRC and CTRRC8 register classes as non-allocatable.Hal Finkel2012-06-081-2/+10
| | | | | | | | | | | Marking these classes as non-alocatable allows CTR loop generation to work correctly with the block placement passes, etc. These register classes are currently used only by some unused TCRETURN patterns. In future cleanup, these will be removed. Thanks again to Jakob for suggesting this fix to the CTR loop problem! llvm-svn: 158221
* Disable the PPC CTR-Loops pass by default.Hal Finkel2012-06-082-4/+17
| | | | | | | | | | The pass itself works well, but the something in the Machine* infrastructure does not understand terminators which define registers. Without the ability to use the block-placement pass, etc. this causes performance regressions (and so is turned off by default). Turning off the analysis turns off the problems with the Machine* infrastructure. llvm-svn: 158206
* Fix a bug in the new PPC CTR-Loops pass.Hal Finkel2012-06-081-0/+1
| | | | | | | | | The code which tests for an induction operation cannot assume that any ADDI instruction will have a register operand because the operand could also be a frame index; for example: %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16 llvm-svn: 158205
* Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form ↵Hal Finkel2012-06-089-18/+812
| | | | | | | | | | CTR-based loop branching code. This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are no longer otherwise used. Also, invalid preheader DebugLoc is not used. llvm-svn: 158204
* PPC32 uses R2 as the TLS register. Fix the copy and paste.Roman Divacky2012-06-051-3/+3
| | | | llvm-svn: 158004
* Implement local-exec TLS on PowerPC.Roman Divacky2012-06-046-11/+69
| | | | llvm-svn: 157935
* Fix a copy-and-paste duplication error in the PPC 440 and A2 schedules (no ↵Hal Finkel2012-06-042-42/+0
| | | | | | functionality change). llvm-svn: 157912
* Enable generating PPC pre-increment (r+imm) instructions by default.Hal Finkel2012-06-041-5/+3
| | | | | | | It seems that this no longer causes test suite failures on PPC64 (after r157159), and often gives a performance benefit, so it can be enabled by default. llvm-svn: 157911
* Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCallJustin Holewinski2012-05-252-17/+20
| | | | | | | | | | to pass around a struct instead of a large set of individual values. This cleans up the interface and allows more information to be added to the struct for future targets without requiring changes to each and every target. NV_CONTRIB llvm-svn: 157479
* Add a missing PPC 64-bit stwu pattern.Hal Finkel2012-05-201-0/+8
| | | | | | | This seems to fix the remaining compile-time failures on PPC64 when compiling with -enable-ppc-preinc. llvm-svn: 157159
* Add a FIXME about access to negative stack-pointer offsets on PPC32.Hal Finkel2012-05-191-0/+2
| | | | | | | | | | | | | | The current code will generate a prologue which starts with something like: mflr 0 stw 31, -4(1) stw 0, 4(1) stwu 1, -16(1) But under the PPC32 SVR4 ABI, access to negative offsets from R1 is not allowed. This was pointed out by Peter Bergner. llvm-svn: 157133
* Allow MCCodeEmitter access to the target MCRegisterInfo.Jim Grosbach2012-05-152-0/+3
| | | | | | | | Add the MCRegisterInfo to the factories and constructors. Patch by Tom Stellard <Tom.Stellard@amd.com>. llvm-svn: 156828
* Mark .opd @progbits, thus avoiding a warning from asm.Roman Divacky2012-05-091-1/+1
| | | | llvm-svn: 156494
* Add an MF argument to TRI::getPointerRegClass() and TII::getRegClass().Jakob Stoklund Olesen2012-05-072-2/+4
| | | | | | | | | | | | | The getPointerRegClass() hook can return register classes that depend on the calling convention of the current function (ptr_rc_tailcall). So far, we have been able to infer the calling convention from the subtarget alone, but as we add support for multiple calling conventions per target, that no longer works. Patch by Yiannis Tsiouris! llvm-svn: 156328
* Remove the SubRegClasses field from RegisterClass descriptions.Jakob Stoklund Olesen2012-05-041-3/+1
| | | | | | This information in now computed by TableGen. llvm-svn: 156152
* Change the PassManager from a reference to a pointer.Bill Wendling2012-05-011-2/+2
| | | | | | | | | The TargetPassManager's default constructor wants to initialize the PassManager to 'null'. But it's illegal to bind a null reference to a null l-value. Make the ivar a pointer instead. PR12468 llvm-svn: 155902
* This patch fixes a problem which arose when using the Post-RA schedulerPreston Gurd2012-04-232-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on X86 Atom. Some of our tests failed because the tail merging part of the BranchFolding pass was creating new basic blocks which did not contain live-in information. When the anti-dependency code in the Post-RA scheduler ran, it would sometimes rename the register containing the function return value because the fact that the return value was live-in to the subsequent block had been lost. To fix this, it is necessary to run the RegisterScavenging code in the BranchFolding pass. This patch makes sure that the register scavenging code is invoked in the X86 subtarget only when post-RA scheduling is being done. Post RA scheduling in the X86 subtarget is only done for Atom. This patch adds a new function to the TargetRegisterClass to control whether or not live-ins should be preserved during branch folding. This is necessary in order for the anti-dependency optimizations done during the PostRASchedulerList pass to work properly when doing Post-RA scheduling for the X86 in general and for the Intel Atom in particular. The patch adds and invokes the new function trackLivenessAfterRegAlloc() instead of using the existing requiresRegisterScavenging(). It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of requiresRegisterScavenging(). It changes the all the targets that implemented requiresRegisterScavenging() to also implement trackLivenessAfterRegAlloc(). It adds an assertion in the Post RA scheduler to make sure that post RA liveness information is available when it is needed. It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order to avoid running into the added assertion. Finally, this patch restores the use of anti-dependency checking (which was turned off temporarily for the 3.1 release) for Intel Atom in the Post RA scheduler. Patch by Andy Zhang! Thanks to Jakob and Anton for their reviews. llvm-svn: 155395
* effectively back out my last change (r155190)Gabor Greif2012-04-201-2/+2
| | | | llvm-svn: 155195
OpenPOWER on IntegriCloud