| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
The fast register allocator is not supposed to work in the optimizing
pipeline. It doesn't make sense to compute live intervals, run full copy
coalescing, and then run RAFast.
Fast register allocation in the optimizing pipeline is better done by
RABasic.
llvm-svn: 158242
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
-%a + 42
into
42 - %a
previously we were emitting:
-(%a + 42)
This fixes the infinite loop in PR12338. The generated code is still not perfect, though.
Will work on that next
llvm-svn: 158237
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Thanks to Jakob's help, this now causes no new test suite failures!
Over the entire test suite, this gives an average 1% speedup. The largest speedups are:
SingleSource/Benchmarks/Misc/pi - 108%
SingleSource/Benchmarks/CoyoteBench/lpbench - 54%
MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50%
SingleSource/Benchmarks/Shootout/ary3 - 32%
SingleSource/Benchmarks/Shootout-C++/matrix - 30%
The largest slowdowns are:
MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30%
MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25%
MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22%
MultiSource/Applications/d/make_dparser - -14%
SingleSource/Benchmarks/Shootout-C++/ary - -13%
In light of these slowdowns, additional profiling work is obviously needed!
llvm-svn: 158223
|
| |
|
|
| |
llvm-svn: 158218
|
| |
|
|
|
|
| |
Patch by James Benton <jbenton@vmware.com>.
llvm-svn: 158213
|
| |
|
|
| |
llvm-svn: 158209
|
| |
|
|
|
|
|
|
|
|
| |
The pass itself works well, but the something in the Machine* infrastructure
does not understand terminators which define registers. Without the ability
to use the block-placement pass, etc. this causes performance regressions (and
so is turned off by default). Turning off the analysis turns off the problems
with the Machine* infrastructure.
llvm-svn: 158206
|
| |
|
|
|
|
|
|
|
| |
The code which tests for an induction operation cannot assume that any
ADDI instruction will have a register operand because the operand could
also be a frame index; for example:
%vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16
llvm-svn: 158205
|
| |
|
|
|
|
|
|
|
|
| |
CTR-based loop branching code.
This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon
pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are
no longer otherwise used. Also, invalid preheader DebugLoc is not used.
llvm-svn: 158204
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
can move instructions within the instruction list. If the instruction just
happens to be the one the basic block iterator is pointing to, and it is
moved to a different basic block, then we get into an infinite loop due to
the iterator running off the end of the basic block (for some reason this
doesn't fire any assertions). Original commit message:
Grab-bag of reassociate tweaks. Unify handling of dead instructions and
instructions to reoptimize. Exploit this to more systematically eliminate
dead instructions (this isn't very useful in practice but is convenient for
analysing some testcase I am working on). No need for WeakVH any more: use
an AssertingVH instead.
llvm-svn: 158199
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch will generate the following for integer ABS:
movl %edi, %eax
negl %eax
cmovll %edi, %eax
INSTEAD OF
movl %edi, %ecx
sarl $31, %ecx
leal (%rdi,%rcx), %eax
xorl %ecx, %eax
There exists a target-independent DAG combine for integer ABS, which converts
integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov.
This is implemented in PerformXorCombine.
rdar://10695237
llvm-svn: 158175
|
| |
|
|
|
|
| |
elements, which may disagree with the select condition type.
llvm-svn: 158166
|
| |
|
|
|
|
| |
Fixes pr13048.
llvm-svn: 158158
|
| |
|
|
| |
llvm-svn: 158128
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch will optimize the following
movq %rdi, %rax
subq %rsi, %rax
cmovsq %rsi, %rdi
movq %rdi, %rax
to
cmpq %rsi, %rdi
cmovsq %rsi, %rdi
movq %rdi, %rax
Perform this optimization if the actual result of SUB is not used.
rdar: 11540023
llvm-svn: 158126
|
| |
|
|
| |
llvm-svn: 158123
|
| |
|
|
|
|
|
|
| |
The commit is intended to fix rdar://11540023.
It is implemented as part of peephole optimization. We can actually implement
this in the SelectionDAG lowering phase.
llvm-svn: 158122
|
| |
|
|
|
|
| |
<rdar://problem/10889741>
llvm-svn: 158121
|
| |
|
|
|
|
|
| |
X86.
rdar://11496434
llvm-svn: 158087
|
| |
|
|
|
|
|
| |
matter.
rdar://11579835
llvm-svn: 158084
|
| |
|
|
|
|
|
|
|
| |
instructions to reoptimize. Exploit this to more systematically eliminate
dead instructions (this isn't very useful in practice but is convenient for
analysing some testcase I am working on). No need for WeakVH any more: use
an AssertingVH instead.
llvm-svn: 158073
|
| |
|
|
| |
llvm-svn: 158055
|
| |
|
|
| |
llvm-svn: 158045
|
| |
|
|
| |
llvm-svn: 158044
|
| |
|
|
| |
llvm-svn: 157972
|
| |
|
|
|
|
|
|
|
| |
when a compile time constant is known. This occurs when implicitly zero
extending function arguments from 16 bits to 32 bits.
<rdar://problem/11481151>
llvm-svn: 157966
|
| |
|
|
|
|
|
|
|
|
|
|
| |
replacement to make it at least as generic as the instruction being replaced.
This includes:
* dropping nsw/nuw flags
* getting the least restrictive tbaa and fpmath metadata
* merging ranges
Fixes PR12979.
llvm-svn: 157958
|
| |
|
|
| |
llvm-svn: 157939
|
| |
|
|
| |
llvm-svn: 157938
|
| |
|
|
| |
llvm-svn: 157935
|
| |
|
|
| |
llvm-svn: 157925
|
| |
|
|
|
|
|
| |
It seems that this no longer causes test suite failures on PPC64 (after r157159),
and often gives a performance benefit, so it can be enabled by default.
llvm-svn: 157911
|
| |
|
|
| |
llvm-svn: 157903
|
| |
|
|
|
|
| |
FMA3. Autoupgrade support coming in a separate commit.
llvm-svn: 157898
|
| |
|
|
| |
llvm-svn: 157896
|
| |
|
|
|
|
| |
loads to match instruction behavior.
llvm-svn: 157895
|
| |
|
|
|
|
| |
rdar://9877866
llvm-svn: 157876
|
| |
|
|
| |
llvm-svn: 157874
|
| |
|
|
| |
llvm-svn: 157868
|
| |
|
|
|
|
|
| |
This removes a bit of context from the verifier erros, but reduces code
duplication in a fairly critical part of LLVM and makes dominates easier to test.
llvm-svn: 157845
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch will optimize the following:
sub r1, r3
cmp r3, r1 or cmp r1, r3
bge L1
TO
sub r1, r3
bge L1 or ble L1
If the branch instruction can use flag from "sub", then we can eliminate
the "cmp" instruction.
llvm-svn: 157831
|
| |
|
|
|
|
| |
doesn't dominate a use.
llvm-svn: 157829
|
| |
|
|
| |
llvm-svn: 157824
|
| |
|
|
|
|
|
|
|
|
| |
could leave dangling references in the cache
add regression tests for this problem.
Can already compile & run: PHP, PCRE, and ICU (i.e., all the software I tried)
llvm-svn: 157822
|
| |
|
|
|
|
|
|
|
| |
This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.
llvm-svn: 157818
|
| |
|
|
|
|
| |
attention to FP_CONTRACT and matching @llvm.fma which is not available yet. This will allow us to enablle intrinsic use at least though.
llvm-svn: 157804
|
| |
|
|
|
|
|
|
|
| |
casts
in multiple-return value scenarios, like what happens on X86-64 when returning
small structs.
llvm-svn: 157800
|
| |
|
|
|
|
|
|
| |
types, as well as int<->ptr casts. This allows us to tailcall functions
with some trivial casts between the call and return (i.e. because the
return types disagree).
llvm-svn: 157798
|
| |
|
|
| |
llvm-svn: 157797
|
| |
|
|
| |
llvm-svn: 157795
|