| Commit message (Collapse) | Author | Age | Files | Lines | 
| ... |  | 
| | 
| 
| 
|  | 
llvm-svn: 40819
 | 
| | 
| 
| 
| 
| 
| 
|  | 
faster than with the 'local to a block' fastpath.  This speeds
up PR1432 from 2.1232 to 2.0686s (2.6%)
llvm-svn: 40818
 | 
| | 
| 
| 
| 
| 
| 
|  | 
to increment NumLocalPromoted, and didn't actually delete the
dead alloca, leading to an extra iteration of mem2reg.
llvm-svn: 40817
 | 
| | 
| 
| 
|  | 
llvm-svn: 40816
 | 
| | 
| 
| 
| 
| 
|  | 
Predsimplify fails llvm-gcc bootstrap.
llvm-svn: 40815
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
stored value was a non-instruction value.  Doh.
This increase the # single store allocas from 8982 to 9026, and
speeds up mem2reg on the testcase in PR1432 from 2.17 to 2.13s.
llvm-svn: 40813
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
and the alloca so they don't get reprocessed.
This speeds up PR1432 from 2.20s to 2.17s.
llvm-svn: 40812
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
1. Check for revisiting a block before checking domination, which is faster.
  2. If the stored value isn't an instruction, we don't have to check for domination.
  3. If we have a value used in the same block more than once, make sure to remove the
     block from the UsingBlocks vector.  Not doing so forces us to go through the slow
     path for the alloca.
The combination of these improvements increases the number of allocas on the fastpath
from 8935 to 8982 on PR1432.  This speeds it up from 2.90s to 2.20s (31%)
llvm-svn: 40811
 | 
| | 
| 
| 
| 
| 
|  | 
testcase in PR1432 from 6.33s to 2.90s (2.22x)
llvm-svn: 40810
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
a using block from the list if we handle it.  Not doing this caused us
to not be able to promote (with the fast path) allocas which have uses (whoops).
This increases the # allocas hitting this fastpath from 4042 to 8935 on the
testcase in PR1432, speeding up mem2reg by 2.6x
llvm-svn: 40809
 | 
| | 
| 
| 
|  | 
llvm-svn: 40808
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
LLVM. It cleans up the intrinsic definitions and generally smooths the process for more complicated intrinsic writing. It will be used by the upcoming atomic intrinsics as well as vector and float intrinsics in the future.
This also changes the syntax for llvm.bswap, llvm.part.set, llvm.part.select, and llvm.ct* intrinsics. They are automatically upgraded by both the LLVM ASM reader and the bitcode reader. The test cases have been updated, with special tests added to ensure the automatic upgrading is supported.
llvm-svn: 40807
 | 
| | 
| 
| 
| 
| 
|  | 
method.
llvm-svn: 40806
 | 
| | 
| 
| 
|  | 
llvm-svn: 40805
 | 
| | 
| 
| 
|  | 
llvm-svn: 40804
 | 
| | 
| 
| 
| 
| 
|  | 
in PR1432 by 6%
llvm-svn: 40803
 | 
| | 
| 
| 
|  | 
llvm-svn: 40802
 | 
| | 
| 
| 
| 
| 
|  | 
Darwin (which makes size within a struct==96)
llvm-svn: 40796
 | 
| | 
| 
| 
|  | 
llvm-svn: 40793
 | 
| | 
| 
| 
| 
| 
| 
|  | 
(I've tried to get the info right for all targets,
but I'm not expert on all of them - check yours.)
llvm-svn: 40792
 | 
| | 
| 
| 
|  | 
llvm-svn: 40791
 | 
| | 
| 
| 
|  | 
llvm-svn: 40776
 | 
| | 
| 
| 
|  | 
llvm-svn: 40774
 | 
| | 
| 
| 
|  | 
llvm-svn: 40772
 | 
| | 
| 
| 
|  | 
llvm-svn: 40758
 | 
| | 
| 
| 
|  | 
llvm-svn: 40757
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Generalize isPSHUFDMask and add a unary SHUFPD pattern so that SHUFPD's
memory operand alignment can be tested as well, with a fix to avoid
breaking MMX's use of isPSHUFDMask.
llvm-svn: 40756
 | 
| | 
| 
| 
|  | 
llvm-svn: 40754
 | 
| | 
| 
| 
|  | 
llvm-svn: 40751
 | 
| | 
| 
| 
|  | 
llvm-svn: 40750
 | 
| | 
| 
| 
|  | 
llvm-svn: 40749
 | 
| | 
| 
| 
|  | 
llvm-svn: 40748
 | 
| | 
| 
| 
|  | 
llvm-svn: 40746
 | 
| | 
| 
| 
|  | 
llvm-svn: 40745
 | 
| | 
| 
| 
| 
| 
|  | 
casts in the input.
llvm-svn: 40741
 | 
| | 
| 
| 
|  | 
llvm-svn: 40739
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
gvn, gvnpre, dse, and predsimplify.  To see these, use:
  make check-line-length
llvm-svn: 40738
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
exit edge to preserve LCSSA.
Fix dominance frontier update during loop unswitch. This fixes PR 1589, again
llvm-svn: 40737
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
X86InstrInfo::isReallyTriviallyReMaterializable knows how to handle
with the isReMaterializable flag so that it is given a chance to handle
them. Without hoisting constant-pool loads from loops this isn't very
visible, though it does keep CodeGen/X86/constant-pool-remat-0.ll from
making a copy of the constant pool on the stack.
llvm-svn: 40736
 | 
| | 
| 
| 
| 
| 
|  | 
operations of casts.  This implements InstCombine/zext-fold.ll
llvm-svn: 40726
 | 
| | 
| 
| 
|  | 
llvm-svn: 40723
 | 
| | 
| 
| 
|  | 
llvm-svn: 40722
 | 
| | 
| 
| 
|  | 
llvm-svn: 40720
 | 
| | 
| 
| 
|  | 
llvm-svn: 40712
 | 
| | 
| 
| 
|  | 
llvm-svn: 40711
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
simply specify them as results and let scheduledag handle them. That
is, instead of
SDOperand Flag = DAG.getTargetNode(Opc, MVT::i32, MVT::Flag, ...)
SDOperand Result = DAG.getCopyFromReg(Chain, X86::EAX, MVT::i32, Flag)
Just write:
SDOperand Result = DAG.getTargetNode(Opc, MVT::i32, MVT::i32, ...)
And let scheduledag emit the move from X86::EAX to a virtual register.
llvm-svn: 40710
 | 
| | 
| 
| 
|  | 
llvm-svn: 40703
 | 
| | 
| 
| 
|  | 
llvm-svn: 40702
 | 
| | 
| 
| 
|  | 
llvm-svn: 40701
 | 
| | 
| 
| 
|  | 
llvm-svn: 40698
 |