| Commit message (Collapse) | Author | Age | Files | Lines | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
CodeGenPrepare sinks compare instructions down to their uses to prevent
live flags and predicate registers across basic blocks.
PRE of a compare instruction prevents that, forcing the i1 compare
result into a general purpose register.  That is usually more expensive
than the redundant compare PRE was trying to eliminate in the first
place.
llvm-svn: 153657
 | 
| | 
| 
| 
| 
| 
|  | 
the end of a non-void function in Release builds.
llvm-svn: 153643
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
http://llvm.org/docs/SourceLevelDebugging.html#objcproperty
including type and DECL. Expand the metadata needed accordingly.
rdar://11144023
llvm-svn: 153639
 | 
| | 
| 
| 
| 
| 
|  | 
with 'v' version of instructions.
llvm-svn: 153636
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This is a code change to add support for changing instruction sequences of the form:
  load
  inc/dec of 8/16/32/64 bits
  store
into the appropriate X86 inc/dec through memory instruction:
  inc[qlwb] / dec[qlwb]
The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode.  The comments have also been expanded.
llvm-svn: 153635
 | 
| | 
| 
| 
|  | 
llvm-svn: 153623
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This is a code change to add support for changing instruction sequences of the form:
  load
  inc/dec of 8/16/32/64 bits
  store
into the appropriate X86 inc/dec through memory instruction:
  inc[qlwb] / dec[qlwb]
The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better
named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode.  The comments have also been expanded.
llvm-svn: 153617
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Some targets still mess up the liveness information, but that isn't
verified after MRI->invalidateLiveness().
The verifier can still check other useful things like register classes
and CFG, so it should be enabled after all passes.
llvm-svn: 153615
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The late scheduler depends on accurate liveness information if it is
breaking anti-dependencies, so we should be able to verify it.
Relax the terminator checking in the machine code verifier so it can
handle the basic blocks created by if conversion.
llvm-svn: 153614
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When an strd instruction doesn't get the registers it wants, it can be
expanded into two str instructions. Make sure the first str doesn't kill
the base register in the case where the base and data registers are
identical:
  t2STRi12 %R0<kill>, %R0, 4, pred:14, pred:%noreg
  t2STRi12 %R2<kill>, %R0, 8, pred:14, pred:%noreg
<rdar://problem/11101911>
llvm-svn: 153611
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When a number of sub-register VLRDS instructions are combined into a
VLDM, preserve any super-register implicit defs. This is required to
keep the register scavenger and machine code verifier happy.
Enable machine code verification after ARMLoadStoreOptimizer.
ARM/2012-01-26-CopyPropKills.ll was failing because of this.
llvm-svn: 153610
 | 
| | 
| 
| 
|  | 
llvm-svn: 153607
 | 
| | 
| 
| 
|  | 
llvm-svn: 153604
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
The arm_neon intrinsics can create virtual registers from the DPair
register class which allows both even-odd and odd-even D-register pairs.
This fixes PR12389.
llvm-svn: 153603
 | 
| | 
| 
| 
|  | 
llvm-svn: 153599
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Branch folding invalidates liveness and disables liveness verification
on some targets.
llvm-svn: 153597
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Extract the liveness verification into its own method.
This makes it possible to run the machine code verifier after liveness
information is no longer required to be valid.
llvm-svn: 153596
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Revert r153519: "ARMLoadStoreOptimizer invalidates register liveness."
These patches caused miscompilations in povray by turning off branch
folding's updating of live-in lists.
It turns out the the late scheduler depends on the live-in lists, even
if it doesn't need correct kill flags.
<rdar://problem/11139228>
llvm-svn: 153593
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This avoids the silly double search:
  if (isLiveIn(Reg))
    removeLiveIn(Reg);
llvm-svn: 153592
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Original commit message for r153521 (aka r153423):
Use the new range metadata in computeMaskedBits and add a new optimization to
instruction simplify that lets us remove an and when loding a boolean value.
llvm-svn: 153587
 | 
| | 
| 
| 
| 
| 
|  | 
indices weren't commuted
llvm-svn: 153579
 | 
| | 
| 
| 
| 
| 
|  | 
that we just determined to be constant, replace all loads from it with a zero value.
llvm-svn: 153576
 | 
| | 
| 
| 
|  | 
llvm-svn: 153574
 | 
| | 
| 
| 
|  | 
llvm-svn: 153573
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
blocks in the function cloner. This removes the last case of trivially
dead code that I've been seeing in the wild getting inlined, analyzed,
re-inlined, optimized, only to be deleted. Nukes a FIXME from the
cleanup tests.
llvm-svn: 153572
 | 
| | 
| 
| 
|  | 
llvm-svn: 153571
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
and not the rest of the member tag.
Fixes PR11695
llvm-svn: 153570
 | 
| | 
| 
| 
|  | 
llvm-svn: 153557
 | 
| | 
| 
| 
|  | 
llvm-svn: 153556
 | 
| | 
| 
| 
|  | 
llvm-svn: 153554
 | 
| | 
| 
| 
| 
| 
| 
|  | 
imposes a constraint that GOT16 referring to a local symbol or HI16 has to be
followed immediately by a matching LO16 relocation.
llvm-svn: 153553
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
them as machine instructions. Directives ".set noat" and ".set at" are now
emitted only at the beginning and end of a function except in the case where
they are emitted to enclose .cpload with an immediate operand that doesn't fit
in 16-bit field or unaligned load/stores.
Also, make the following changes:
- Remove function isUnalignedLoadStore and use a switch-case statement to
  determine whether an instruction is an unaligned load or store.
- Define helper function CreateMCInst which generates an instance of an MCInst
  from an opcode and a list of operands.
llvm-svn: 153552
 | 
| | 
| 
| 
| 
| 
|  | 
any side effects.
llvm-svn: 153551
 | 
| | 
| 
| 
|  | 
llvm-svn: 153543
 | 
| | 
| 
| 
|  | 
llvm-svn: 153542
 | 
| | 
| 
| 
|  | 
llvm-svn: 153536
 | 
| | 
| 
| 
| 
| 
| 
|  | 
will always be tiny sets, so DenseSet is overkill (SmallSet won't work as we
need iteration support). 
llvm-svn: 153529
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
MipsFunctionInfo.
If EmitNOAT is true, directives ".set noat" and ".set at" are emitted at the
beginning and end of a function. 
llvm-svn: 153528
 | 
| | 
| 
| 
| 
| 
|  | 
Fixes PR10105
llvm-svn: 153524
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
undefined behavior, which Rafael was kind enough to fix.
Original commit message for r153423:
Use the new range metadata in computeMaskedBits and add a new optimization to
instruction simplify that lets us remove an and when loding a boolean value.
llvm-svn: 153521
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This pass tries to update kill flags, but there are still many bugs.
Passes after the load/store optimizer don't need accurate liveness, so
don't even try.
<rdar://problem/11101911>
llvm-svn: 153519
 | 
| | 
| 
| 
|  | 
llvm-svn: 153518
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Branch folding can use a register scavenger to update liveness
information when required. Don't do that if liveness information is
already invalid.
llvm-svn: 153517
 | 
| | 
| 
| 
|  | 
llvm-svn: 153516
 | 
| | 
| 
| 
|  | 
llvm-svn: 153513
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Late optimization passes like branch folding and tail duplication can
transform the machine code in a way that makes it expensive to keep the
register liveness information up to date. There is a fuzzy line between
register allocation and late scheduling where the liveness information
degrades.
The MRI::tracksLiveness() flag makes the line clear: While true,
liveness information is accurate, and can be used for register
scavenging. Once the flag is false, liveness information is not
accurate, and can only be used as a hint.
Late passes generally don't need the liveness information, but they will
sometimes use the register scavenger to help update it. The scavenger
enforces strict correctness, and we have to spend a lot of code to
update register liveness that may never be used.
llvm-svn: 153511
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
size bloat. Unfortunately, I expect this to disable the majority of the
benefit from r152737. I'm hopeful at least that it will fix PR12345. To
explain this requires... quite a bit of backstory I'm afraid.
TL;DR: The change in r152737 actually did The Wrong Thing for
linkonce-odr functions. This change makes it do the right thing. The
benefits we saw were simple luck, not any actual strategy. Benchmark
numbers after a mini-blog-post so that I've written down my thoughts on
why all of this works and doesn't work...
To understand what's going on here, you have to understand how the
"bottom-up" inliner actually works. There are two fundamental modes to
the inliner:
1) Standard fixed-cost bottom-up inlining. This is the mode we usually
   think about. It walks from the bottom of the CFG up to the top,
   looking at callsites, taking information about the callsite and the
   called function and computing th expected cost of inlining into that
   callsite. If the cost is under a fixed threshold, it inlines. It's
   a touch more complicated than that due to all the bonuses, weights,
   etc. Inlining the last callsite to an internal function gets higher
   weighth, etc. But essentially, this is the mode of operation.
2) Deferred bottom-up inlining (a term I just made up). This is the
   interesting mode for this patch an r152737. Initially, this works
   just like mode #1, but once we have the cost of inlining into the
   callsite, we don't just compare it with a fixed threshold. First, we
   check something else. Let's give some names to the entities at this
   point, or we'll end up hopelessly confused. We're considering
   inlining a function 'A' into its callsite within a function 'B'. We
   want to check whether 'B' has any callers, and whether it might be
   inlined into those callers. If so, we also check whether inlining 'A'
   into 'B' would block any of the opportunities for inlining 'B' into
   its callers. We take the sum of the costs of inlining 'B' into its
   callers where that inlining would be blocked by inlining 'A' into
   'B', and if that cost is less than the cost of inlining 'A' into 'B',
   then we skip inlining 'A' into 'B'.
Now, in order for #2 to make sense, we have to have some confidence that
we will actually have the opportunity to inline 'B' into its callers
when cheaper, *and* that we'll be able to revisit the decision and
inline 'A' into 'B' if that ever becomes the correct tradeoff. This
often isn't true for external functions -- we can see very few of their
callers, and we won't be able to re-consider inlining 'A' into 'B' if
'B' is external when we finally see more callers of 'B'. There are two
cases where we believe this to be true for C/C++ code: functions local
to a translation unit, and functions with an inline definition in every
translation unit which uses them. These are represented as internal
linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for
linkonce-odr in r152737.
Unfortunately, when I did that, I also introduced a subtle bug. There
was an implicit assumption that the last caller of the function within
the TU was the last caller of the function in the program. We want to
bonus the last caller of the function in the program by a huge amount
for inlining because inlining that callsite has very little cost.
Unfortunately, the last caller in the TU of a linkonce-odr function is
*not* the last caller in the program, and so we don't want to apply this
bonus. If we do, we can apply it to one callsite *per-TU*. Because of
the way deferred inlining works, when it sees this bonus applied to one
callsite in the TU for 'B', it decides that inlining 'B' is of the
*utmost* importance just so we can get that final bonus. It then
proceeds to essentially force deferred inlining regardless of the actual
cost tradeoff.
The result? PR12345: code bloat, code bloat, code bloat. Another result
is getting *damn* lucky on a few benchmarks, and the over-inlining
exposing critically important optimizations. I would very much like
a list of benchmarks that regress after this change goes in, with
bitcode before and after. This will help me greatly understand what
opportunities the current cost analysis is missing.
Initial benchmark numbers look very good. WebKit files that exhibited
the worst of PR12345 went from growing to shrinking compared to Clang
with r152737 reverted.
- Bootstrapped Clang is 3% smaller with this change.
- Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4%
  faster with this change.
Please let me know about any other performance impact you see. Thanks to
Nico for reporting and urging me to actually fix, Richard Smith, Duncan
Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues
today.
llvm-svn: 153506
 | 
| | 
| 
| 
|  | 
llvm-svn: 153502
 | 
| | 
| 
| 
|  | 
llvm-svn: 153500
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
MachinePointerInfo when getStore is called to create a node that stores an
argument passed in register to the stack. Without this change, the post RA 
scheduler will fail to discover the dependencies between the stores
instructions and the instructions that load from a structure passed by value. 
The link to the related discussion is here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-March/048055.html
llvm-svn: 153499
 |