summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/PowerPC
Commit message (Collapse)AuthorAgeFilesLines
...
* [PPC] Use the correct immediate operands on 64-bit instructionsHal Finkel2014-01-022-12/+12
| | | | | | | | | | | Several of the 64-bit fixed-point instructions with immediate operands were using the 32-bit (i32) operand nodes instead of the corresponding 64-bit (i64) operand definitions (u16imm instead of u16imm64, for example). This error has had no effect so far, but would have caused type-checking violations with an upcoming change. llvm-svn: 198356
* Use r2 when encoding tls on ppc32. Fixes PR18305.Roman Divacky2013-12-221-1/+2
| | | | llvm-svn: 197878
* Add some comments.Roman Divacky2013-12-221-0/+2
| | | | llvm-svn: 197875
* Implement initial-exec TLS for PPC32.Roman Divacky2013-12-205-13/+58
| | | | llvm-svn: 197824
* Long doubles are required to be aligned to 128 bits and svr4 32 bits.Rafael Espindola2013-12-191-4/+0
| | | | | | Clang was already getting this right. llvm-svn: 197694
* Add a disassembler to the PowerPC backendHal Finkel2013-12-1911-3/+361
| | | | | | | | | | | | | | | | | | | | | The tests for the disassembler were adapted from the encoder tests, and for the most part, the output from the disassembler matches that encoder-test inputs. There are some places where more-informative mnemonics could be produced (notably for the branch instructions), and those cases are noted in the tests with FIXMEs. Future work includes: - Generating more-informative mnemonics when possible (this may also be done in the printer). - Remove the dependence on positional "numbered" operand-to-variable mapping (for both encoding and decoding). - Internally using 64-bit instruction variants in 64-bit mode (if this turns out to matter). llvm-svn: 197693
* Fix f64 and f128 for ppc-darwin.Rafael Espindola2013-12-181-1/+3
| | | | | | | | | | This patch adds -f64:32:64 to 32 bit ppc darwin since a f64 inside a structure are only 32 bit aligned. The patch also drop -f128:64:128 from all ppc darwin, since f128 is 128 bit aligned. llvm-svn: 197574
* One ppc32-darwin, a i64 inside a structure can have 32 bit alignment.Rafael Espindola2013-12-181-1/+2
| | | | | | | | Thanks for Iain Sandoe for testing this with the original gcc. Clang was already getting this right. llvm-svn: 197572
* Eliminate PPC instruction decoding ambiguitiesHal Finkel2013-12-172-36/+47
| | | | | | | | | | | | The instruction definitions in the PPC backend have a number of variants defined for the same instruction to represent differences between 64-bit and 32-bit semantics. In order to generate a disassembler for the PPC backend, we need to mark all but one of these as CodeGen only. No functionality change intended; this is prep work for PPC disassembly support. llvm-svn: 197535
* Fix the pointer size for the PS3 datalayout.Rafael Espindola2013-12-171-2/+5
| | | | | | This will be tested from clang. llvm-svn: 197501
* Allow MachineCSE to coalesce trivial subregister copies the same way that it ↵Andrew Trick2013-12-171-2/+8
| | | | | | | | | | | | | | | | | | | | coalesces normal copies. Without this, MachineCSE is powerless to handle redundant operations with truncated source operands. This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled: %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1 %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2 %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def> Test case: cse-add-with-overflow.ll. This exposed an existing bug in PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case: PowerPC/crash.ll. llvm-svn: 197465
* whitespaceAndrew Trick2013-12-171-3/+3
| | | | llvm-svn: 197464
* The preferred alignment defaults to the abi alignment. Omit if it is the same.Rafael Espindola2013-12-161-1/+1
| | | | llvm-svn: 197400
* On DataLayout, omit the default of p:64:64:64.Rafael Espindola2013-12-161-4/+2
| | | | llvm-svn: 197397
* Set has_asmparser in PowerPC/LLVMBuild.txtHal Finkel2013-12-161-0/+1
| | | | | | | PowerPC now has an asm parser (and has for many months now); indicate this in PowerPC/LLVMBuild.txt. llvm-svn: 197393
* [Powerpc darwin] AsmParser Base implementation.Iain Sandoe2013-12-141-13/+134
| | | | | | | | | | | | This is a base implementation of the powerpc-apple-darwin asm parser dialect. * Enables infrastructure (essentially isDarwin()) and fixes up the parsing of asm directives to separate out ELF and MachO/Darwin additions. * Enables parsing of {r,f,v}XX as register identifiers. * Enables parsing of lo16() hi16() and ha16() as modifiers. The changes to the test case are from David Fang (fangism). llvm-svn: 197324
* Assume defaults to produce smaller datalayout strings.Rafael Espindola2013-12-131-12/+2
| | | | llvm-svn: 197249
* test commit.Iain Sandoe2013-12-131-1/+1
| | | | | | Amend a comment. llvm-svn: 197237
* typo in commentGabor Greif2013-12-121-2/+2
| | | | llvm-svn: 197136
* Remove unused multiclass from PPCInstrInfo.tdHal Finkel2013-12-121-14/+0
| | | | llvm-svn: 197100
* Improve instruction scheduling for the PPC POWER7Hal Finkel2013-12-127-3/+337
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aside from a few minor latency corrections, the major change here is a new hazard recognizer which focuses on better dispatch-group formation on the POWER7. As with the PPC970's hazard recognizer, the most important thing it does is avoid load-after-store hazards within the same dispatch group. It uses the POWER7's special dispatch-group-terminating nop instruction (instead of inserting multiple regular nop instructions). This new hazard recognizer makes use of the scheduling dependency graph itself, built using AA information, to robustly detect the possibility of load-after-store hazards. significant test-suite performance changes (the error bars are 99.5% confidence intervals based on 5 test-suite runs both with and without the change -- speedups are negative): speedups: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.55171% +/- 0.333168% MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl -17.5576% +/- 14.598% MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl -29.5708% +/- 7.09058% MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt -34.9471% +/- 11.4391% SingleSource/Benchmarks/BenchmarkGame/puzzle -25.1347% +/- 11.0104% SingleSource/Benchmarks/Misc/flops-8 -17.7297% +/- 9.79061% SingleSource/Benchmarks/Shootout-C++/ary3 -35.5018% +/- 23.9458% SingleSource/Regression/C/uint64_to_float -56.3165% +/- 25.4234% SingleSource/UnitTests/Vectorizer/gcc-loops -18.5309% +/- 6.8496% regressions: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 18.351% +/- 12.156% SingleSource/Benchmarks/Shootout-C++/methcall 27.3086% +/- 14.4733% llvm-svn: 197099
* Fix the PPC subsumes-predicate checkHal Finkel2013-12-111-0/+4
| | | | | | | | | For one predicate to subsume another, they must both check the same condition register. Failure to check this prerequisite was causing miscompiles. Fixes PR18003. llvm-svn: 197089
* Prune redundant dependencies in LLVMBuild.txt.NAKAMURA Takumi2013-12-111-1/+1
| | | | llvm-svn: 196988
* Move PPC's getDataLayoutString out of line and document it better.Rafael Espindola2013-12-112-17/+39
| | | | llvm-svn: 196987
* on darwin<10, fallback to .weak_definition (PPC,X86)David Fang2013-12-103-3/+12
| | | | | | .weak_def_can_be_hidden was not yet supported by the system assembler llvm-svn: 196970
* Add proper dependencies to LLVMBuild.txt in llvm/lib.NAKAMURA Takumi2013-12-101-1/+1
| | | | | | I'll prune redundant deps in LLVMBuild.txt, later. llvm-svn: 196881
* Remove the isImplicitlyPrivate argument of getNameWithPrefix.Rafael Espindola2013-12-051-1/+1
| | | | | | | | | | | | getSymbolWithGlobalValueBase use is to create a name of a new symbol based on the name of an existing GV. Assert that and then remove the last call to pass true to isImplicitlyPrivate. This gives the mangler API a 1:1 mapping from GV to names, which is what we need to drop the mangler dependency on the target (and use an extended datalayout instead). llvm-svn: 196472
* Correct word hyphenationsAlp Toker2013-12-051-1/+1
| | | | | | | This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471
* Remove PPCScoreboardHazardRecognizerHal Finkel2013-12-023-41/+2
| | | | | | | | | | PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer which did only one thing: filtered out nodes in EmitInstruction for which DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo instructions. As far as I can tell, this is no longer true, and so we can use ScoreboardHazardRecognizer directly. llvm-svn: 196171
* Refactor the setting of PrivateGlobalPrefix.Rafael Espindola2013-12-021-1/+0
| | | | | | No functionality change. llvm-svn: 196170
* Move getSymbolWithGlobalValueBase to TargetLoweringObjectFile.Rafael Espindola2013-12-021-3/+3
| | | | | | This allows it to be used in TargetLoweringObjectFileImpl.cpp. llvm-svn: 196117
* Remove dead code.Rafael Espindola2013-12-021-24/+0
| | | | | | | | | MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm. Keeping parts of the old asm printer just to print inline asm to a string that we then parse back looks like a hack. llvm-svn: 196111
* Change the default of AsmWriterClassName and isMCAsmWriter.Rafael Espindola2013-12-021-7/+1
| | | | llvm-svn: 196065
* Refactor for clarity and efficiency.Rafael Espindola2013-12-021-23/+22
| | | | | | | The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so this should be a nop. llvm-svn: 196059
* Add a scheduling model (with itinerary) for the PPC POWER7Hal Finkel2013-11-304-2/+390
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a scheduling model for the POWER7 (P7) core, and enables the machine-instruction scheduler when targeting the P7. Scheduling for the P7, like earlier ooo PPC cores, requires considering both dispatch group hazards, and functional unit resources and latencies. These are both modeled in a combined itinerary. Dispatch group formation is still handled by the post-RA scheduler (which still needs to be updated for the P7, but nevertheless does a pretty good job). One interesting aspect of this change is that I've also enabled to use of AA duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark results seem to support this decision (see below), and while this is normally useful for in-order cores, and not for ooo cores like the P7, I think that the dispatch slot hazards are enough like in-order resources to make the AA useful. Test suite significant performance differences (where negative is a speedup, and positive is a regression) vs. the current situation: MultiSource/Benchmarks/BitBench/drop3/drop3 with AA: N/A without AA: -28.7614% +/- 19.8356% (significantly against AA) MultiSource/Benchmarks/FreeBench/neural/neural with AA: -17.7406% +/- 11.2712% without AA: N/A (significantly in favor of AA) MultiSource/Benchmarks/SciMark2-C/scimark2 with AA: -11.2079% +/- 1.80543% without AA: -11.3263% +/- 2.79651% MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt with AA: -41.8649% +/- 17.0053% without AA: -34.5256% +/- 23.7072% MultiSource/Benchmarks/mafft/pairlocalalign with AA: 25.3016% +/- 17.8614% without AA: 38.6629% +/- 14.9391% (significantly in favor of AA) MultiSource/Benchmarks/sim/sim with AA: N/A without AA: 13.4844% +/- 7.18195% (significantly in favor of AA) SingleSource/Benchmarks/BenchmarkGame/Large/fasta with AA: 15.0664% +/- 6.70216% without AA: 12.7747% +/- 8.43043% SingleSource/Benchmarks/BenchmarkGame/puzzle with AA: 82.2713% +/- 26.3567% without AA: 75.7525% +/- 41.1842% SingleSource/Benchmarks/Misc/flops-2 with AA: -37.1621% +/- 20.7964% without AA: -35.2342% +/- 20.2999% (significantly in favor of AA) These are 99.5% confidence intervals from 5 runs per configuration. Regarding the choice to turn on AA during CodeGen, of these results, four seem significantly in favor of using AA, and one seems significantly against. I'm not making this decision based on these numbers alone, but these results seem consistent with results I have from other tests, and so I think that, on balance, using AA is a win. llvm-svn: 195981
* Split some PPC itinerary classesHal Finkel2013-11-3011-31/+154
| | | | | | | | | | | | | In preparation for adding scheduling definitions for the POWER7, split some PPC itinerary classes so that the P7's latencies and hazards can be better described. For the most part, this means differentiating indexed from non-index pre-increment loads and stores. Also, differentiate single from double-precision sqrt. No functionality change intended (except for a more-specific latency for single-precision sqrt on the A2). llvm-svn: 195980
* Adjust PPC A2 input operand latenciesHal Finkel2013-11-291-52/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the PPC A2, instructions are only issued after their input operands are ready. Model this by specifying that input operands are read at dispatch (0 cycles after issue). This changes all input operand latencies from 1 to 0. Significant test-suite performance changes (these are 99.5% confidence intervals on 6 runs for both before and after): speedups: MultiSource/Benchmarks/sim/sim -1.21915% +/- 0.175063% MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -1.23946% +/- 1.05133% SingleSource/Benchmarks/Misc/flops-2 -1.24237% +/- 0.681362% MultiSource/Applications/JM/lencod/lencod -1.33992% +/- 0.757498% MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt -1.51802% +/- 1.21468% MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt -2.18818% +/- 1.28605% MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt -2.21977% +/- 1.19499% SingleSource/Benchmarks/BenchmarkGame/spectral-norm -2.29822% +/- 0.671871% MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl -2.40975% +/- 0.355931% SingleSource/Benchmarks/Misc/fp-convert -2.41899% +/- 1.04751% MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl -2.50349% +/- 0.126765% SingleSource/Benchmarks/Misc/flops-3 -3.00214% +/- 0.700795% MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt -3.56995% +/- 3.2929% MultiSource/Applications/sgefa/sgefa -4.24908% +/- 2.00413% MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk -18.1294% +/- 3.96489% regressions: MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl 1.03249% +/- 0.178547% MultiSource/Applications/hexxagon/hexxagon 1.16597% +/- 0.285235% MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt 1.39576% +/- 1.07855% SingleSource/Benchmarks/Misc-C++/stepanov_v1p2 1.71539% +/- 0.173182% MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1 1.90013% +/- 0.866472% MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl 2.39854% +/- 1.05914% MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl 2.4402% +/- 0.817904% MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl 5.87997% +/- 3.3172% MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc 9.02643% +/- 5.79591% MultiSource/Benchmarks/VersaBench/bmm/bmm 10.3517% +/- 1.227% Obviously, there are data points on both sides of this; but I think, overall, this supports making the change. llvm-svn: 195951
* Create a PPC440 SchedMachineModelHal Finkel2013-11-292-6/+20
| | | | | | | Some of the older PPC processor definitions don't have associated SchedMachineModels; correct this for the PPC440. llvm-svn: 195949
* Fixup PPC440 load/store operand latenciesHal Finkel2013-11-291-19/+19
| | | | | | | | The operand latencies for loads and stores in the PPC440 itinerary were wrong (the store operands are all inputs, and the "with update" (pre-increment) instructions need a latency for the additional output). llvm-svn: 195948
* Adjust PPC440 operand latenciesHal Finkel2013-11-291-54/+54
| | | | | | | | | | | | The operand latencies for the PPC440 should be specified relative to dispatch, not relative to the initial fetch-and-decode stages. Because most instructions (ignoring bypass) wait in dispatch until their operands are ready, this is modeled as reading input operands "at dispatch" (0 cycles after issue), and so every input and output operand has 4 cycles subtracted from it. This could alter scheduling slightly, but I don't expect a large effect. llvm-svn: 195947
* Don't model the fetch and decode units for the PPC440Hal Finkel2013-11-291-180/+61
| | | | | | | | | | Modeling the fetch and decode units in the PPC440 itinerary does not add anything to the hazard detection capability (and so modeling them just wastes compile time). No functionality change intended. llvm-svn: 195946
* [CMake] Let add_public_tablegen_target() provide intrinsics_gen, too.NAKAMURA Takumi2013-11-281-2/+0
| | | | | | | | | | I think, in principle, intrinsics_gen may be added explicitly. That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen. Almost all source files depend on both CommonTaleGen and intrinsics_gen. Explicit add_dependencies() have been pruned under lib/Target. llvm-svn: 195929
* [CMake] Let add_public_tablegen_target responsible to provide dependency to ↵NAKAMURA Takumi2013-11-285-9/+1
| | | | | | | | | CommonTableGen. add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS. LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope. llvm-svn: 195927
* [CMake] Prune include_directories() in llvm/lib/Target. add_llvm_target() ↵NAKAMURA Takumi2013-11-283-7/+0
| | | | | | sets them. llvm-svn: 195921
* Use the mangler consistently instead of using getGlobalPrefix directly.Rafael Espindola2013-11-282-5/+4
| | | | llvm-svn: 195911
* Don't share functional units among the PPC itinerariesHal Finkel2013-11-289-1261/+1364
| | | | | | | | | | | Instead of sharing functional unit names between the various PPC itineraries, give each core its own unit names prefixed with the core name. This follows the convention used by other backends (such as ARM), and removes a non-obvious ordering dependency between the various PPCSchedule*.td files. No functionality change intended. llvm-svn: 195908
* Add IIC_ prefix to PPC instruction-class namesHal Finkel2013-11-2713-2355/+2366
| | | | | | | | | | | | | This adds the IIC_ prefix to the instruction itinerary class names, giving the PPC backend a naming convention for itinerary classes that is more consistent with that used by the X86 and ARM backends. Instruction scheduling in the PPC backend needs a bunch of cleanup and improvement (especially for the ooo cores). This is just a preliminary step. No functionality change intended. llvm-svn: 195890
* Don't set GlobalPrefix to the default value.Rafael Espindola2013-11-271-1/+0
| | | | llvm-svn: 195884
* Fix comment in PPCA2ModelHal Finkel2013-11-271-1/+1
| | | | llvm-svn: 195807
* PPC popcnt[dw] do not have record formsHal Finkel2013-11-201-6/+6
| | | | | | | The instruction definitions incorrectly specified that popcntd and popcntw have record forms; they do not. This mistake was causing invalid code generation. llvm-svn: 195272
OpenPOWER on IntegriCloud