summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* In MipsRegisterInfo::eliminateFrameIndex, call Mips::loadImmediateAkira Hatanaka2012-06-141-25/+7
| | | | | | to load an immediate that does not fit into 16-bit. llvm-svn: 158431
* In MipsFrameLowering::emitPrologue and emitEpilogue, call Mips::loadImmediateAkira Hatanaka2012-06-141-39/+16
| | | | | | | | to load an immediate that does not fit into 16-bit. Also, take into consideration the global base register slot on the stack when computing the stack size. llvm-svn: 158430
* Define function MipsInstrInfo::GetInstSizeInBytes, which will be called toAkira Hatanaka2012-06-142-6/+76
| | | | | | | compute the size of basic blocks in a function. Also, define a function which emits a series of instructions to load an immediate. llvm-svn: 158429
* In MipsISelDAGToDAG.cpp, store the global base register to a stack frame object.Akira Hatanaka2012-06-141-3/+10
| | | | | | | Long-branches need access to the global base register to get the destination address. llvm-svn: 158428
* Add methods to MipsFunctionInfo for initializing and accessing the stack frameAkira Hatanaka2012-06-141-1/+23
| | | | | | | | | object for the global base register. This is the first of a series of patches which implements long branch expansion for MIPS. llvm-svn: 158427
* Bundle jump/branch instructions with the instructions in the delay slot inAkira Hatanaka2012-06-132-19/+30
| | | | | | | | | delay slot filler pass of MIPS, per suggestion of Jakob Stoklund Olesen. This change, along with the fix in r158154, enables machine verification to be run after delay slot filling. llvm-svn: 158426
* Implement a DAGCombine in MipsISelLowering.cpp which transforms the followingAkira Hatanaka2012-06-132-4/+34
| | | | | | | | | | pattern: (add v0, (add v1, abs_lo(tjt))) => (add (add v0, v1), abs_lo(tjt)) "tjt" is a TargetJumpTable node. llvm-svn: 158419
* Set a higher value for maxStoresPerMemcpy in MipsISelLowering.cpp.Akira Hatanaka2012-06-132-0/+17
| | | | llvm-svn: 158414
* Simplify CreateLoadLR and CreateStoreLR in MipsISelLowering.cpp.Akira Hatanaka2012-06-131-11/+6
| | | | llvm-svn: 158413
* Implement fastcc calling convention for MIPS.Akira Hatanaka2012-06-132-3/+59
| | | | llvm-svn: 158410
* Fix pattern for MKMSK instruction.Richard Osborne2012-06-131-1/+1
| | | | llvm-svn: 158409
* *typo: Cyles changed to CyclesKay Tiong Khoo2012-06-132-2/+2
| | | | llvm-svn: 158404
* Fix intrinsics for XOP frczss/sd instructions. These instructions only take ↵Craig Topper2012-06-131-12/+6
| | | | | | one source register and zero the upper bits of the destination rather than preserving them. llvm-svn: 158396
* Add another missing 64-bit itinerary definition for the PPC A2 core.Hal Finkel2012-06-131-0/+11
| | | | llvm-svn: 158393
* Clean up trailing blanks in Mips16InstrFormats.tdAkira Hatanaka2012-06-131-46/+46
| | | | | | Patch by Reed Kotler. llvm-svn: 158382
* disable use of directive .set nomicromipsAkira Hatanaka2012-06-131-1/+2
| | | | | | | | until this directive is pushed in gas to open source fsf Patch by Reed Kotler. llvm-svn: 158381
* 1. fix places where immed is used in place of imm to be consistent withAkira Hatanaka2012-06-131-38/+38
| | | | | | | | | non mips16 2. fix some comments to change OPcode->EXTEND for extended instructions Patch by Reed Kotler. llvm-svn: 158378
* Add some missing 64-bit itinerary definitions for the PPC A2 core.Hal Finkel2012-06-121-0/+22
| | | | llvm-svn: 158373
* [arm-fast-isel] Add support for -arm-long-calls.Chad Rosier2012-06-121-41/+57
| | | | | | Patch by Jush Lu <jush.msn@gmail.com>. llvm-svn: 158368
* Split out the PPC instruction class IntSimple from IntGeneral.Hal Finkel2012-06-129-65/+90
| | | | | | | On the POWER7, adds and logical operations can also be handled in the load/store pipelines. We'll call these IntSimple. llvm-svn: 158366
* Fixes for PPC host detection and features.Hal Finkel2012-06-121-3/+3
| | | | | | | | | POWER4 is a 64-bit CPU (better matched to the 970). The g3 is really the 750 (no altivec), the g4+ is the 74xx (not the 750). Patch by Andreas Tobler. llvm-svn: 158363
* Reapply r158337, this time properly protect Darwin/PPC host CPU use with ↵Hal Finkel2012-06-121-135/+4
| | | | | | | | | | | | | __ppc__. Original commit message: Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName(). Both the new Linux functionality and the old Darwin functions have been moved. This change also allows this information to be queried directly by clang and other frontends (clang, for example, will now have real -mcpu=native support). llvm-svn: 158349
* Revert r158337 "Move PPC host-CPU detection logic from PPCSubtarget into ↵Jakob Stoklund Olesen2012-06-121-2/+133
| | | | | | | | | sys::getHostCPUName()." This commit broke most of the PowerPC unit tests when running on Intel/Apple. llvm-svn: 158345
* Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName().Hal Finkel2012-06-111-133/+2
| | | | | | | | Both the new Linux functionality and the old Darwin functions have been moved. This change also allows this information to be queried directly by clang and other frontends (clang, for example, will now have real -mcpu=native support). llvm-svn: 158337
* Enable MFOCRF generation on the PPC A2 core.Hal Finkel2012-06-111-2/+2
| | | | llvm-svn: 158324
* Rename the PPC target feature gpul to mfocrf.Hal Finkel2012-06-115-13/+13
| | | | | | | | | | | The PPC target feature gpul (IsGigaProcessor) was only used for one thing: To enable the generation of the MFOCRF instruction. Furthermore, this instruction is available on other PPC cores outside of the G5 line. This feature now corresponds to the HasMFOCRF flag. No functionality change. llvm-svn: 158323
* Add A2 to the list of PPC CPUs recognized by Linux host CPU-type detection.Hal Finkel2012-06-111-0/+1
| | | | llvm-svn: 158322
* Emit the two-operand form of the PPC mfcr instruction as mfocrf.Hal Finkel2012-06-111-1/+1
| | | | | | This is necessary on Linux and supported on Darwin, see PR2604. llvm-svn: 158315
* Add local CPU detection for Linux PPC.Hal Finkel2012-06-111-1/+95
| | | | | | This functionality mirrors that available on PPC/Darwin. llvm-svn: 158314
* Add POWER6 and POWER7 CPU types to the PPC backend.Hal Finkel2012-06-113-0/+14
| | | | | | No functional change; these will be used by upcoming scheduler enhancements. llvm-svn: 158313
* Re-enable the CMN instruction.Bill Wendling2012-06-115-69/+144
| | | | | | | | | We turned off the CMN instruction because it had semantics which we weren't getting correct. If we are comparing with an immediate, then it's okay to use the CMN instruction. <rdar://problem/7569620> llvm-svn: 158302
* Enable ILP scheduling for all nodes by default on PPC.Hal Finkel2012-06-101-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | Over the entire test-suite, this has an insignificantly negative average performance impact, but reduces some of the worst slowdowns from the anti-dep. change (r158294). Largest speedups: SingleSource/Benchmarks/Stanford/Quicksort - 28% SingleSource/Benchmarks/Stanford/Towers - 24% SingleSource/Benchmarks/Shootout-C++/matrix - 23% MultiSource/Benchmarks/SciMark2-C/scimark2 - 19% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15% (matrix and automotive-bitcount were both in the top-5 slowdown list from the anti-dep. change) Largest slowdowns: MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26% MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21% SingleSource/Benchmarks/CoyoteBench/lpbench - 20% MultiSource/Applications/d/make_dparser - 16% llvm-svn: 158296
* Use critical anti-dep. breaking on all PPC targets, but also add other ↵Hal Finkel2012-06-101-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | register classes. Using 'all' instead of 'critical' would be better because it would make it easier to satisfy the bundling constraints, but, as noted in the FIXME, that is currently not possible with the crs. This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups: SingleSource/Benchmarks/Shootout-C++/moments - 40% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26% SingleSource/Benchmarks/McGill/misr - 23% MultiSource/Applications/JM/ldecod/ldecod - 22% Largest slowdowns: SingleSource/Benchmarks/Shootout-C++/matrix - -29% SingleSource/Benchmarks/Shootout-C++/ary3 - -22% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18% SingleSource/Benchmarks/Shootout-C++/ary - -17% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15% llvm-svn: 158294
* Add intrinsics for immediate form of XOP vprot instructions. Use i128mem ↵Craig Topper2012-06-101-27/+37
| | | | | | instead of f128mem for integer XOP instructions. llvm-svn: 158291
* Improve ext/trunc patterns on PPC64.Hal Finkel2012-06-091-11/+4
| | | | | | | | | | The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that would leave self-moves in the final assembly. Replacing those patterns with ones based on the SUBREG builtins yields better-looking code. Thanks to Jakob and Owen for their suggestions in this matter. llvm-svn: 158283
* Use XOP vpcom intrinsics in patterns instead of a target specific SDNode ↵Craig Topper2012-06-094-60/+13
| | | | | | type. Remove the custom lowering code that selected the SDNode type. llvm-svn: 158279
* Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate ↵Craig Topper2012-06-091-174/+20
| | | | | | as an argument. llvm-svn: 158278
* Silence a gcc-4.6 warning: GCC fails to understand that secondReg and cmpOp2 areDuncan Sands2012-06-091-1/+1
| | | | | | correlated, and thinks that cmpOp2 may be used uninitialized. llvm-svn: 158263
* Enable tail merging on PPC.Hal Finkel2012-06-091-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | Tail merging had been disabled on PPC because it would disturb bundling decisions made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions are made during post-RA scheduling, and tail merging is generally beneficial (the average test-suite speedup is insignificantly positive). Largest test-suite speedups: MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23% SingleSource/Benchmarks/Shootout-C++/ary - 21% SingleSource/Benchmarks/Stanford/Queens - 17% Largest slowdowns: MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22% MultiSource/Applications/JM/ldecod/ldecod - 14% MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9% This is improved by using full (instead of just critical) anti-dependency breaking, but doing so still causes miscompiles and so cannot yet be enabled by default. llvm-svn: 158259
* Test commitJack Carter2012-06-091-0/+1
| | | | llvm-svn: 158250
* Remove the TODO statement in the PPC README re: CTR loopsHal Finkel2012-06-081-1/+0
| | | | | | | As Chris points out, this can now be removed! TODO: check if the associated section on viterbi's inner loop can also be removed. llvm-svn: 158224
* Enable PPC CTR loop formation by default.Hal Finkel2012-06-082-11/+9
| | | | | | | | | | | | | | | | | | | | | | Thanks to Jakob's help, this now causes no new test suite failures! Over the entire test suite, this gives an average 1% speedup. The largest speedups are: SingleSource/Benchmarks/Misc/pi - 108% SingleSource/Benchmarks/CoyoteBench/lpbench - 54% MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50% SingleSource/Benchmarks/Shootout/ary3 - 32% SingleSource/Benchmarks/Shootout-C++/matrix - 30% The largest slowdowns are: MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30% MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22% MultiSource/Applications/d/make_dparser - -14% SingleSource/Benchmarks/Shootout-C++/ary - -13% In light of these slowdowns, additional profiling work is obviously needed! llvm-svn: 158223
* Mark the PPC CTRRC and CTRRC8 register classes as non-allocatable.Hal Finkel2012-06-081-2/+10
| | | | | | | | | | | Marking these classes as non-alocatable allows CTR loop generation to work correctly with the block placement passes, etc. These register classes are currently used only by some unused TCRETURN patterns. In future cleanup, these will be removed. Thanks again to Jakob for suggesting this fix to the CTR loop problem! llvm-svn: 158221
* Enable optimization for integer ABS on X86 if Subtarget has CMOV.Manman Ren2012-06-081-3/+5
| | | | llvm-svn: 158220
* Fix Target->Codegen dependence.Andrew Trick2012-06-081-195/+5
| | | | | | | | | | | | | | | | | Bulk move of TargetInstrInfo implementation into TargetInstrInfoImpl. This is dirty because the code isn't part of TargetInstrInfoImpl class, nor should it be, because the methods are not target hooks. However, it's the current mechanism for keeping libTarget useful outside the backend. You'll get a not-so-nice link error if you invoke a TargetInstrInfo method that depends on CodeGen. The TargetInstrInfoImpl class should probably be removed since it doesn't really solve this problem. To really fix this, we probably need separate interfaces for the CodeGen/nonCodeGen sides of TargetInstrInfo. llvm-svn: 158212
* Disable the PPC CTR-Loops pass by default.Hal Finkel2012-06-082-4/+17
| | | | | | | | | | The pass itself works well, but the something in the Machine* infrastructure does not understand terminators which define registers. Without the ability to use the block-placement pass, etc. this causes performance regressions (and so is turned off by default). Turning off the analysis turns off the problems with the Machine* infrastructure. llvm-svn: 158206
* Fix a bug in the new PPC CTR-Loops pass.Hal Finkel2012-06-081-0/+1
| | | | | | | | | The code which tests for an induction operation cannot assume that any ADDI instruction will have a register operand because the operand could also be a frame index; for example: %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16 llvm-svn: 158205
* Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form ↵Hal Finkel2012-06-089-18/+812
| | | | | | | | | | CTR-based loop branching code. This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are no longer otherwise used. Also, invalid preheader DebugLoc is not used. llvm-svn: 158204
* X86: optimize generated code for integer ABSManman Ren2012-06-071-2/+44
| | | | | | | | | | | | | | | | | | | | This patch will generate the following for integer ABS: movl %edi, %eax negl %eax cmovll %edi, %eax INSTEAD OF movl %edi, %ecx sarl $31, %ecx leal (%rdi,%rcx), %eax xorl %ecx, %eax There exists a target-independent DAG combine for integer ABS, which converts integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. This is implemented in PerformXorCombine. rdar://10695237 llvm-svn: 158175
* Do not optimize the used bits of the x86 vselect condition operand, when the ↵Nadav Rotem2012-06-071-4/+6
| | | | | | | | condition operand is a vector of 1-bit predicates. This may happen on MIC devices. llvm-svn: 158168
OpenPOWER on IntegriCloud