summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
...
* Refactor for clarity and efficiency.Rafael Espindola2013-12-021-23/+22
| | | | | | | The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so this should be a nop. llvm-svn: 196059
* Add a scheduling model (with itinerary) for the PPC POWER7Hal Finkel2013-11-304-2/+390
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a scheduling model for the POWER7 (P7) core, and enables the machine-instruction scheduler when targeting the P7. Scheduling for the P7, like earlier ooo PPC cores, requires considering both dispatch group hazards, and functional unit resources and latencies. These are both modeled in a combined itinerary. Dispatch group formation is still handled by the post-RA scheduler (which still needs to be updated for the P7, but nevertheless does a pretty good job). One interesting aspect of this change is that I've also enabled to use of AA duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark results seem to support this decision (see below), and while this is normally useful for in-order cores, and not for ooo cores like the P7, I think that the dispatch slot hazards are enough like in-order resources to make the AA useful. Test suite significant performance differences (where negative is a speedup, and positive is a regression) vs. the current situation: MultiSource/Benchmarks/BitBench/drop3/drop3 with AA: N/A without AA: -28.7614% +/- 19.8356% (significantly against AA) MultiSource/Benchmarks/FreeBench/neural/neural with AA: -17.7406% +/- 11.2712% without AA: N/A (significantly in favor of AA) MultiSource/Benchmarks/SciMark2-C/scimark2 with AA: -11.2079% +/- 1.80543% without AA: -11.3263% +/- 2.79651% MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt with AA: -41.8649% +/- 17.0053% without AA: -34.5256% +/- 23.7072% MultiSource/Benchmarks/mafft/pairlocalalign with AA: 25.3016% +/- 17.8614% without AA: 38.6629% +/- 14.9391% (significantly in favor of AA) MultiSource/Benchmarks/sim/sim with AA: N/A without AA: 13.4844% +/- 7.18195% (significantly in favor of AA) SingleSource/Benchmarks/BenchmarkGame/Large/fasta with AA: 15.0664% +/- 6.70216% without AA: 12.7747% +/- 8.43043% SingleSource/Benchmarks/BenchmarkGame/puzzle with AA: 82.2713% +/- 26.3567% without AA: 75.7525% +/- 41.1842% SingleSource/Benchmarks/Misc/flops-2 with AA: -37.1621% +/- 20.7964% without AA: -35.2342% +/- 20.2999% (significantly in favor of AA) These are 99.5% confidence intervals from 5 runs per configuration. Regarding the choice to turn on AA during CodeGen, of these results, four seem significantly in favor of using AA, and one seems significantly against. I'm not making this decision based on these numbers alone, but these results seem consistent with results I have from other tests, and so I think that, on balance, using AA is a win. llvm-svn: 195981
* Split some PPC itinerary classesHal Finkel2013-11-3011-31/+154
| | | | | | | | | | | | | In preparation for adding scheduling definitions for the POWER7, split some PPC itinerary classes so that the P7's latencies and hazards can be better described. For the most part, this means differentiating indexed from non-index pre-increment loads and stores. Also, differentiate single from double-precision sqrt. No functionality change intended (except for a more-specific latency for single-precision sqrt on the A2). llvm-svn: 195980
* Adjust PPC A2 input operand latenciesHal Finkel2013-11-291-52/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the PPC A2, instructions are only issued after their input operands are ready. Model this by specifying that input operands are read at dispatch (0 cycles after issue). This changes all input operand latencies from 1 to 0. Significant test-suite performance changes (these are 99.5% confidence intervals on 6 runs for both before and after): speedups: MultiSource/Benchmarks/sim/sim -1.21915% +/- 0.175063% MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -1.23946% +/- 1.05133% SingleSource/Benchmarks/Misc/flops-2 -1.24237% +/- 0.681362% MultiSource/Applications/JM/lencod/lencod -1.33992% +/- 0.757498% MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt -1.51802% +/- 1.21468% MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt -2.18818% +/- 1.28605% MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt -2.21977% +/- 1.19499% SingleSource/Benchmarks/BenchmarkGame/spectral-norm -2.29822% +/- 0.671871% MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl -2.40975% +/- 0.355931% SingleSource/Benchmarks/Misc/fp-convert -2.41899% +/- 1.04751% MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl -2.50349% +/- 0.126765% SingleSource/Benchmarks/Misc/flops-3 -3.00214% +/- 0.700795% MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt -3.56995% +/- 3.2929% MultiSource/Applications/sgefa/sgefa -4.24908% +/- 2.00413% MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk -18.1294% +/- 3.96489% regressions: MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl 1.03249% +/- 0.178547% MultiSource/Applications/hexxagon/hexxagon 1.16597% +/- 0.285235% MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt 1.39576% +/- 1.07855% SingleSource/Benchmarks/Misc-C++/stepanov_v1p2 1.71539% +/- 0.173182% MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1 1.90013% +/- 0.866472% MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl 2.39854% +/- 1.05914% MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl 2.4402% +/- 0.817904% MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl 5.87997% +/- 3.3172% MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc 9.02643% +/- 5.79591% MultiSource/Benchmarks/VersaBench/bmm/bmm 10.3517% +/- 1.227% Obviously, there are data points on both sides of this; but I think, overall, this supports making the change. llvm-svn: 195951
* Create a PPC440 SchedMachineModelHal Finkel2013-11-292-6/+20
| | | | | | | Some of the older PPC processor definitions don't have associated SchedMachineModels; correct this for the PPC440. llvm-svn: 195949
* Fixup PPC440 load/store operand latenciesHal Finkel2013-11-291-19/+19
| | | | | | | | The operand latencies for loads and stores in the PPC440 itinerary were wrong (the store operands are all inputs, and the "with update" (pre-increment) instructions need a latency for the additional output). llvm-svn: 195948
* Adjust PPC440 operand latenciesHal Finkel2013-11-291-54/+54
| | | | | | | | | | | | The operand latencies for the PPC440 should be specified relative to dispatch, not relative to the initial fetch-and-decode stages. Because most instructions (ignoring bypass) wait in dispatch until their operands are ready, this is modeled as reading input operands "at dispatch" (0 cycles after issue), and so every input and output operand has 4 cycles subtracted from it. This could alter scheduling slightly, but I don't expect a large effect. llvm-svn: 195947
* Don't model the fetch and decode units for the PPC440Hal Finkel2013-11-291-180/+61
| | | | | | | | | | Modeling the fetch and decode units in the PPC440 itinerary does not add anything to the hazard detection capability (and so modeling them just wastes compile time). No functionality change intended. llvm-svn: 195946
* [CMake] Let add_public_tablegen_target() provide intrinsics_gen, too.NAKAMURA Takumi2013-11-281-2/+0
| | | | | | | | | | I think, in principle, intrinsics_gen may be added explicitly. That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen. Almost all source files depend on both CommonTaleGen and intrinsics_gen. Explicit add_dependencies() have been pruned under lib/Target. llvm-svn: 195929
* [CMake] Let add_public_tablegen_target responsible to provide dependency to ↵NAKAMURA Takumi2013-11-285-9/+1
| | | | | | | | | CommonTableGen. add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS. LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope. llvm-svn: 195927
* [CMake] Prune include_directories() in llvm/lib/Target. add_llvm_target() ↵NAKAMURA Takumi2013-11-283-7/+0
| | | | | | sets them. llvm-svn: 195921
* Use the mangler consistently instead of using getGlobalPrefix directly.Rafael Espindola2013-11-282-5/+4
| | | | llvm-svn: 195911
* Don't share functional units among the PPC itinerariesHal Finkel2013-11-289-1261/+1364
| | | | | | | | | | | Instead of sharing functional unit names between the various PPC itineraries, give each core its own unit names prefixed with the core name. This follows the convention used by other backends (such as ARM), and removes a non-obvious ordering dependency between the various PPCSchedule*.td files. No functionality change intended. llvm-svn: 195908
* Add IIC_ prefix to PPC instruction-class namesHal Finkel2013-11-2713-2355/+2366
| | | | | | | | | | | | | This adds the IIC_ prefix to the instruction itinerary class names, giving the PPC backend a naming convention for itinerary classes that is more consistent with that used by the X86 and ARM backends. Instruction scheduling in the PPC backend needs a bunch of cleanup and improvement (especially for the ooo cores). This is just a preliminary step. No functionality change intended. llvm-svn: 195890
* Don't set GlobalPrefix to the default value.Rafael Espindola2013-11-271-1/+0
| | | | llvm-svn: 195884
* Fix comment in PPCA2ModelHal Finkel2013-11-271-1/+1
| | | | llvm-svn: 195807
* PPC popcnt[dw] do not have record formsHal Finkel2013-11-201-6/+6
| | | | | | | The instruction definitions incorrectly specified that popcntd and popcntw have record forms; they do not. This mistake was causing invalid code generation. llvm-svn: 195272
* PPC: Optimize rldicl generation for masked shiftsHal Finkel2013-11-201-1/+15
| | | | | | | | | | | | | | Masking operations (where only some number of the low bits are being kept) are selected to rldicl(x, 0, mb). If x is a logical right shift (which would become rldicl(y, 64-n, n)), we might be able to fold the two instructions together: rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb) for n <= mb The right shift is really a left rotate followed by a mask, and if the explicit mask is a more-restrictive sub-mask of the mask implied by the shift, only one rldicl is needed. llvm-svn: 195185
* [weak vtables] Remove a bunch of weak vtablesJuergen Ributzka2013-11-194-1/+9
| | | | | | | | | | | | This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. The memory leaks in this version have been fixed. Thanks Alexey for pointing them out. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 195064
* Revert r194865 and r194874.Alexey Samsonov2013-11-184-9/+1
| | | | | | | | | | | | This change is incorrect. If you delete virtual destructor of both a base class and a subclass, then the following code: Base *foo = new Child(); delete foo; will not cause the destructor for members of Child class. As a result, I observe plently of memory leaks. Notable examples I investigated are: ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl. llvm-svn: 194997
* [weak vtables] Remove a bunch of weak vtablesJuergen Ributzka2013-11-154-1/+9
| | | | | | | | | | | This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 194865
* Avoid illegal integer promotion in fastiselBob Wilson2013-11-151-7/+2
| | | | | | | | | | | | | | | | | Stop folding constant adds into GEP when the type size doesn't match. Otherwise, the adds' operands are effectively being promoted, changing the conditions of an overflow. Results are different when: sext(a) + sext(b) != sext(a + b) Problem originally found on x86-64, but also fixed issues with ARM and PPC, which used similar code. <rdar://problem/15292280> Patch by Duncan Exon Smith! llvm-svn: 194840
* Add PPC option for full register names in asmHal Finkel2013-11-111-0/+10
| | | | | | | | | | | | | | | | | | | | | On non-Darwin PPC systems, we currently strip off the register name prefix prior to instruction printing. So instead of something like this: mr r3, r4 we print this: mr 3, 4 The first form is the default on Darwin, and is understood by binutils, but not yet understood by our integrated assembler. Once our integrated-as understands full register names as well, this temporary option will be replaced by tying this functionality to the verbose-asm option. The numeric-only form is compatible with legacy assemblers and tools, and is also gcc's default on most PPC systems. On the other hand, it is harder to read, and there are some analysis tools that expect full register names. llvm-svn: 194384
* Use StringRef::startswith_lower. No functionality change.Rui Ueyama2013-10-311-4/+4
| | | | llvm-svn: 193796
* Add a helper getSymbol to AsmPrinter.Rafael Espindola2013-10-292-21/+21
| | | | llvm-svn: 193627
* Add support for the VSX target attribute. No functional changeEric Christopher2013-10-162-0/+3
| | | | | | as we don't actually use it to emit any code yet. llvm-svn: 192837
* Add a MCAsmInfoELF class and factor some code into it.Rafael Espindola2013-10-162-3/+3
| | | | | | We had a MCAsmInfoCOFF, but no common class for all the ELF MCAsmInfos before. llvm-svn: 192760
* Add a MCTargetStreamer interface.Rafael Espindola2013-10-083-2/+72
| | | | | | | | | | | | | This patch fixes an old FIXME by creating a MCTargetStreamer interface and moving the target specific functions for ARM, Mips and PPC to it. The ARM streamer is still declared in a common place because it is used from lib/CodeGen/ARMException.cpp, but the Mips and PPC are completely hidden in the corresponding Target directories. I will send an email to llvmdev with instructions on how to use this. llvm-svn: 192181
* Remove getEHExceptionRegister and getEHHandlerRegister.Rafael Espindola2013-10-072-12/+0
| | | | | | They haven't been used for a long time. Patch by MathOnNapkins. llvm-svn: 192099
* [PowerPC] Fix PR17354: Generate nop after local calls for PIC code.Bill Schmidt2013-09-261-1/+3
| | | | | | | | When generating code for shared libraries, even local calls may be intercepted, so we need a nop after the call for the linker to fix up the TOC. Test case adapted from the one provided in PR17354. llvm-svn: 191440
* PPC: Allow partial fills in writeNopData()David Majnemer2013-09-261-4/+7
| | | | | | | | | | | | | | | | | | When asked to pad an irregular number of bytes, we should fill with zeros. This is consistent with the behavior specified in the AIX Assembler Language Reference as well as other LLVM and binutils assemblers. N.B. There is a small deviation from binutils' PPC assembler: when handling pads which are greater than 4 bytes but not mod 4, binutils will not emit any NOP sequences at all and only use zeros. This may or may not be a bug but there is no excellent rationale as to why that behavior is important to emulate. If that behavior is needed, we can change writeNopData() to behave in the same way. This fixes PR17352. llvm-svn: 191426
* PPC: Do not introduce ISD nodes for fctid and fctiwDavid Majnemer2013-09-263-8/+6
| | | | llvm-svn: 191421
* PPC: Add support for fctid and fctiwDavid Majnemer2013-09-263-4/+12
| | | | | | | | | Encodings were checked against the Power ISA documents and double checked against binutils. This fixes PR17350. llvm-svn: 191419
* MC: Add support for treating $ as a reference to the PCDavid Majnemer2013-09-251-0/+2
| | | | | | | | | | | | | | | | | The binutils assembler supports a mode called DOLLAR_DOT which treats the dollar sign token as a reference to the current program counter if the dollar sign doesn't precede a constant or identifier. This commit adds a new MCAsmInfo flag stating whether or not a given target supports this interpretation of the dollar sign token; by default, this flag is not enabled. Further, enable this flag for PPC. The system assembler for AIX and binutils both support using the dollar sign in this manner. This fixes PR17353. llvm-svn: 191368
* MC: Remove vestigial PCSymbol field from AsmInfoDavid Majnemer2013-09-251-3/+0
| | | | llvm-svn: 191362
* ISelDAG: spot chain cycles involving MachineNodesTim Northover2013-09-221-1/+3
| | | | | | | | | | | | | | | | | Previously, the DAGISel function WalkChainUsers was spotting that it had entered already-selected territory by whether a node was a MachineNode (amongst other things). Since it's fairly common practice to insert MachineNodes during ISelLowering, this was not the correct check. Looking around, it seems that other nodes get their NodeId set to -1 upon selection, so this makes sure the same thing happens to all MachineNodes and uses that characteristic to determine whether we should stop looking for a loop during selection. This should fix PR15840. llvm-svn: 191165
* Correct the pre-increment load latencies in the PPC A2 itineraryHal Finkel2013-09-221-3/+3
| | | | | | | | Pre-increment loads are microcoded on the A2, and the address increment occurs only after the load completes. As a result, the latency of the GPR address update is an additional 2 cycles on top of the load latency. llvm-svn: 191156
* [PowerPC] Add a FIXME.Bill Schmidt2013-09-171-0/+4
| | | | | | | | Documenting a design choice to generate only medium model sequences for TLS addresses at this time. Small and large code models could be supported if necessary. llvm-svn: 190883
* [PowerPC] Fix problems with large code model (PR17169).Bill Schmidt2013-09-172-8/+22
| | | | | | | | | | | | | | Large code model on PPC64 requires creating and referencing TOC entries when using the addis/ld form of addressing. This was not being done in all cases. The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this. Two test cases are also modified to reflect this requirement. Fast-isel was not creating correct code for loading floating-point constants using large code model. This also requires the addis/ld form of addressing. Previously we were using the addis/lfd shortcut which is only applicable to medium code model. One test case is modified to reflect this requirement. llvm-svn: 190882
* [PowerPC] Fix PR17155 - Ignore COPY_TO_REGCLASS during emit.Bill Schmidt2013-09-161-1/+8
| | | | | | | | | Fast-isel generates a COPY_TO_REGCLASS for widening f32 to f64, which is a nop on PPC64. This is needed to keep the register class system happy, but on the fast-isel path it is not removed before emit as it is for DAG select. Ignore this op when emitting instructions. llvm-svn: 190795
* PPC: Don't restrict lvsl generation to after type legalizationHal Finkel2013-09-151-1/+2
| | | | | | | | | | | | | | | | | | | This is a re-commit of r190764, with an extra check to make sure that we're not performing the transformation on illegal types (a small test case has been added for this as well). Original commit message: The PPC backend uses a target-specific DAG combine to turn unaligned Altivec loads into a permutation-based sequence when possible. Unfortunately, the target-specific DAG combine is not always called on all loads of interest (sometimes the routines in DAGCombine call CombineTo such that the new node and users are not added to the worklist); allowing the combine to trigger early (before type legalization) mitigates this problem. Because the autovectorizers only create legal vector types, I don't expect a lot of cases where this optimization is enabled by type legalization in practice. llvm-svn: 190771
* Revert r190764: PPC: Don't restrict lvsl generation to after type legalizationHal Finkel2013-09-151-0/+1
| | | | | | | | | | | | | | | | | This is causing test-suite failures. Original commit message: The PPC backend uses a target-specific DAG combine to turn unaligned Altivec loads into a permutation-based sequence when possible. Unfortunately, the target-specific DAG combine is not always called on all loads of interest (sometimes the routines in DAGCombine call CombineTo such that the new node and users are not added to the worklist); allowing the combine to trigger early (before type legalization) mitigates this problem. Because the autovectorizers only create legal vector types, I don't expect a lot of cases where this optimization is enabled by type legalization in practice. llvm-svn: 190765
* PPC: Don't restrict lvsl generation to after type legalizationHal Finkel2013-09-151-1/+0
| | | | | | | | | | | | | The PPC backend uses a target-specific DAG combine to turn unaligned Altivec loads into a permutation-based sequence when possible. Unfortunately, the target-specific DAG combine is not always called on all loads of interest (sometimes the routines in DAGCombine call CombineTo such that the new node and users are not added to the worklist); allowing the combine to trigger early (before type legalization) mitigates this problem. Because the autovectorizers only create legal vector types, I don't expect a lot of cases where this optimization is enabled by type legalization in practice. llvm-svn: 190764
* Add missing break statement in PPCISelLoweringHal Finkel2013-09-131-0/+2
| | | | | | As it turns out, not a problem in practice, but it should be there. llvm-svn: 190720
* Remove an unused variable, fixing -Werror build with latest Clang.Chandler Carruth2013-09-121-1/+0
| | | | llvm-svn: 190640
* Fix PPC ABI for ByVal structs with vector membersHal Finkel2013-09-121-9/+49
| | | | | | | | | | When a structure is passed by value, and that structure contains a vector member, according to the PPC ABI, the structure will receive enhanced alignment (so that the vector within the structure will always be aligned). This should resolve PR16641. llvm-svn: 190636
* Make the PPC fast-math sqrt expansion safe at 0Hal Finkel2013-09-121-1/+21
| | | | | | | | | | | In fast-math mode sqrt(x) is calculated using the fast expansion of the reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal sqrt expansions use the associated estimate instructions along with some Newton iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN, which is not correct. Now we explicitly return a result of zero if the input is zero. llvm-svn: 190624
* Implement asm support for a few PowerPC bookIII that are needed for assemblingRoman Divacky2013-09-124-0/+112
| | | | | | FreeBSD kernel. llvm-svn: 190618
* Mark PPC MFTB and DST (and friends) as deprecatedHal Finkel2013-09-126-25/+56
| | | | | | | | Use the new instruction deprecation feature to mark mftb (now replaced with mfspr) and dst (along with the other Altivec cache control instructions) as deprecated when targeting cores supporting at least ISA v2.03. llvm-svn: 190605
* Add an instruction deprecation feature to TableGen.Joey Gouly2013-09-121-2/+3
| | | | | | | | | | | | | | | | | | | | | | The 'Deprecated' class allows you to specify a SubtargetFeature that the instruction is deprecated on. The 'ComplexDeprecationPredicate' class allows you to define a custom predicate that is called to check for deprecation. For example: ComplexDeprecationPredicate<"MCR"> would mean you would have to define the following function: bool getMCRDeprecationInfo(MCInst &MI, MCSubtargetInfo &STI, std::string &Info) Which returns 'false' for not deprecated, and 'true' for deprecated and store the warning message in 'Info'. The MCTargetAsmParser constructor was chaned to take an extra argument of the MCInstrInfo class, so out-of-tree targets will need to be changed. llvm-svn: 190598
OpenPOWER on IntegriCloud