| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
to load an immediate that does not fit into 16-bit.
llvm-svn: 158431
|
| |
|
|
|
|
|
|
| |
to load an immediate that does not fit into 16-bit. Also, take into
consideration the global base register slot on the stack when computing the
stack size.
llvm-svn: 158430
|
| |
|
|
|
|
|
| |
compute the size of basic blocks in a function. Also, define a function which
emits a series of instructions to load an immediate.
llvm-svn: 158429
|
| |
|
|
|
|
|
| |
Long-branches need access to the global base register to get the destination
address.
llvm-svn: 158428
|
| |
|
|
|
|
|
|
|
| |
object for the global base register.
This is the first of a series of patches which implements long branch expansion
for MIPS.
llvm-svn: 158427
|
| |
|
|
|
|
|
|
|
| |
delay slot filler pass of MIPS, per suggestion of Jakob Stoklund Olesen.
This change, along with the fix in r158154, enables machine verification
to be run after delay slot filling.
llvm-svn: 158426
|
| |
|
|
|
|
|
|
|
|
| |
pattern:
(add v0, (add v1, abs_lo(tjt))) => (add (add v0, v1), abs_lo(tjt))
"tjt" is a TargetJumpTable node.
llvm-svn: 158419
|
| |
|
|
| |
llvm-svn: 158414
|
| |
|
|
| |
llvm-svn: 158413
|
| |
|
|
| |
llvm-svn: 158410
|
| |
|
|
| |
llvm-svn: 158409
|
| |
|
|
| |
llvm-svn: 158404
|
| |
|
|
|
|
| |
one source register and zero the upper bits of the destination rather than preserving them.
llvm-svn: 158396
|
| |
|
|
| |
llvm-svn: 158393
|
| |
|
|
|
|
| |
Patch by Reed Kotler.
llvm-svn: 158382
|
| |
|
|
|
|
|
|
| |
until this directive is pushed in gas to open source fsf
Patch by Reed Kotler.
llvm-svn: 158381
|
| |
|
|
|
|
|
|
|
| |
non mips16
2. fix some comments to change OPcode->EXTEND for extended instructions
Patch by Reed Kotler.
llvm-svn: 158378
|
| |
|
|
| |
llvm-svn: 158373
|
| |
|
|
|
|
| |
Patch by Jush Lu <jush.msn@gmail.com>.
llvm-svn: 158368
|
| |
|
|
|
|
|
| |
On the POWER7, adds and logical operations can also be handled
in the load/store pipelines. We'll call these IntSimple.
llvm-svn: 158366
|
| |
|
|
|
|
|
|
|
| |
POWER4 is a 64-bit CPU (better matched to the 970).
The g3 is really the 750 (no altivec), the g4+ is the 74xx (not the 750).
Patch by Andreas Tobler.
llvm-svn: 158363
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
__ppc__.
Original commit message:
Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName().
Both the new Linux functionality and the old Darwin functions have been moved.
This change also allows this information to be queried directly by clang and
other frontends (clang, for example, will now have real -mcpu=native support).
llvm-svn: 158349
|
| |
|
|
|
|
|
|
|
| |
sys::getHostCPUName()."
This commit broke most of the PowerPC unit tests when running on
Intel/Apple.
llvm-svn: 158345
|
| |
|
|
|
|
|
|
| |
Both the new Linux functionality and the old Darwin functions have been moved.
This change also allows this information to be queried directly by clang and
other frontends (clang, for example, will now have real -mcpu=native support).
llvm-svn: 158337
|
| |
|
|
| |
llvm-svn: 158324
|
| |
|
|
|
|
|
|
|
|
|
| |
The PPC target feature gpul (IsGigaProcessor) was only used for one thing:
To enable the generation of the MFOCRF instruction. Furthermore, this
instruction is available on other PPC cores outside of the G5 line. This
feature now corresponds to the HasMFOCRF flag.
No functionality change.
llvm-svn: 158323
|
| |
|
|
| |
llvm-svn: 158322
|
| |
|
|
|
|
| |
This is necessary on Linux and supported on Darwin, see PR2604.
llvm-svn: 158315
|
| |
|
|
|
|
| |
This functionality mirrors that available on PPC/Darwin.
llvm-svn: 158314
|
| |
|
|
|
|
| |
No functional change; these will be used by upcoming scheduler enhancements.
llvm-svn: 158313
|
| |
|
|
|
|
|
|
|
| |
We turned off the CMN instruction because it had semantics which we weren't
getting correct. If we are comparing with an immediate, then it's okay to use
the CMN instruction.
<rdar://problem/7569620>
llvm-svn: 158302
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Over the entire test-suite, this has an insignificantly negative average
performance impact, but reduces some of the worst slowdowns from the
anti-dep. change (r158294).
Largest speedups:
SingleSource/Benchmarks/Stanford/Quicksort - 28%
SingleSource/Benchmarks/Stanford/Towers - 24%
SingleSource/Benchmarks/Shootout-C++/matrix - 23%
MultiSource/Benchmarks/SciMark2-C/scimark2 - 19%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15%
(matrix and automotive-bitcount were both in the top-5 slowdown list from the
anti-dep. change)
Largest slowdowns:
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21%
SingleSource/Benchmarks/CoyoteBench/lpbench - 20%
MultiSource/Applications/d/make_dparser - 16%
llvm-svn: 158296
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
register classes.
Using 'all' instead of 'critical' would be better because it would make it easier to
satisfy the bundling constraints, but, as noted in the FIXME, that is currently not
possible with the crs.
This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups:
SingleSource/Benchmarks/Shootout-C++/moments - 40%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28%
SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26%
SingleSource/Benchmarks/McGill/misr - 23%
MultiSource/Applications/JM/ldecod/ldecod - 22%
Largest slowdowns:
SingleSource/Benchmarks/Shootout-C++/matrix - -29%
SingleSource/Benchmarks/Shootout-C++/ary3 - -22%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18%
SingleSource/Benchmarks/Shootout-C++/ary - -17%
MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15%
llvm-svn: 158294
|
| |
|
|
|
|
| |
instead of f128mem for integer XOP instructions.
llvm-svn: 158291
|
| |
|
|
|
|
|
|
|
|
| |
The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that
would leave self-moves in the final assembly. Replacing those patterns with ones
based on the SUBREG builtins yields better-looking code.
Thanks to Jakob and Owen for their suggestions in this matter.
llvm-svn: 158283
|
| |
|
|
|
|
| |
type. Remove the custom lowering code that selected the SDNode type.
llvm-svn: 158279
|
| |
|
|
|
|
| |
as an argument.
llvm-svn: 158278
|
| |
|
|
|
|
| |
correlated, and thinks that cmpOp2 may be used uninitialized.
llvm-svn: 158263
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tail merging had been disabled on PPC because it would disturb bundling decisions
made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions
are made during post-RA scheduling, and tail merging is generally beneficial (the
average test-suite speedup is insignificantly positive).
Largest test-suite speedups:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23%
SingleSource/Benchmarks/Shootout-C++/ary - 21%
SingleSource/Benchmarks/Stanford/Queens - 17%
Largest slowdowns:
MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24%
MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22%
MultiSource/Applications/JM/ldecod/ldecod - 14%
MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9%
This is improved by using full (instead of just critical) anti-dependency breaking,
but doing so still causes miscompiles and so cannot yet be enabled by default.
llvm-svn: 158259
|
| |
|
|
| |
llvm-svn: 158250
|
| |
|
|
|
|
|
| |
As Chris points out, this can now be removed!
TODO: check if the associated section on viterbi's inner loop can also be removed.
llvm-svn: 158224
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thanks to Jakob's help, this now causes no new test suite failures!
Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%
The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%
In light of these slowdowns, additional profiling work is obviously needed!
llvm-svn: 158223
|
| |
|
|
|
|
|
|
|
|
|
| |
Marking these classes as non-alocatable allows CTR loop generation to
work correctly with the block placement passes, etc. These register
classes are currently used only by some unused TCRETURN patterns.
In future cleanup, these will be removed.
Thanks again to Jakob for suggesting this fix to the CTR loop problem!
llvm-svn: 158221
|
| |
|
|
| |
llvm-svn: 158220
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bulk move of TargetInstrInfo implementation into
TargetInstrInfoImpl. This is dirty because the code isn't part of
TargetInstrInfoImpl class, nor should it be, because the methods are
not target hooks. However, it's the current mechanism for keeping
libTarget useful outside the backend. You'll get a not-so-nice link
error if you invoke a TargetInstrInfo method that depends on CodeGen.
The TargetInstrInfoImpl class should probably be removed since it
doesn't really solve this problem.
To really fix this, we probably need separate interfaces for the
CodeGen/nonCodeGen sides of TargetInstrInfo.
llvm-svn: 158212
|
| |
|
|
|
|
|
|
|
|
| |
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.
llvm-svn: 158206
|
| |
|
|
|
|
|
|
|
| |
The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
%vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16
llvm-svn: 158205
|
| |
|
|
|
|
|
|
|
|
| |
CTR-based loop branching code.
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.
llvm-svn: 158204
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch will generate the following for integer ABS:
movl %edi, %eax
negl %eax
cmovll %edi, %eax
INSTEAD OF
movl %edi, %ecx
sarl $31, %ecx
leal (%rdi,%rcx), %eax
xorl %ecx, %eax
There exists a target-independent DAG combine for integer ABS, which converts
integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov.
This is implemented in PerformXorCombine.
rdar://10695237
llvm-svn: 158175
|
| |
|
|
|
|
|
|
| |
condition operand is a vector of 1-bit predicates.
This may happen on MIC devices.
llvm-svn: 158168
|