| Commit message (Collapse) | Author | Age | Files | Lines | 
| ... |  | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
We already have this fold when the lshr has one use, but it doesn't need that
restriction. We may be able to remove some code from foldShiftedShift().
Also, move the similar:
(X << C) >>u C --> X & (-1 >>u C)
...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst().
That whole function seems questionable since it is called by commonShiftTransforms(),
but there's really not much in common if we're checking the shift opcodes for every
fold.
llvm-svn: 293215
 | 
| | 
| 
| 
|  | 
llvm-svn: 293213
 | 
| | 
| 
| 
|  | 
llvm-svn: 293212
 | 
| | 
| 
| 
| 
| 
|  | 
splat vectors
llvm-svn: 293208
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary: Refine floating point SQRT and DIV with accurate latency information.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D29191
llvm-svn: 293204
 | 
| | 
| 
| 
| 
| 
|  | 
This simplifies skipping debug instructions and shrinking ranges.
llvm-svn: 293202
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
1) Explicitly sets mayLoad/mayStore property in the tablegen files on load/store
   instructions.
2) Updated the flags on a number of intrinsics indicating that they write
    memory.
3) Added SDNPMemOperand flags for some target dependent SDNodes so that they
   propagate their memory operand
Review: https://reviews.llvm.org/D28818
llvm-svn: 293200
 | 
| | 
| 
| 
|  | 
llvm-svn: 293196
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This change introduces adjustPassManager target callback giving a
target an opportunity to tweak PassManagerBuilder before pass
managers are populated.
This generalizes and replaces addEarlyAsPossiblePasses target
callback. In particular that can be used to add custom passes to
extension points other than EP_EarlyAsPossible.
Differential Revision: https://reviews.llvm.org/D28336
llvm-svn: 293189
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds
llvm-svn: 293188
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
tail calls - in LLVM
Summary:
This patch provides more staging for tail calls in XRay Arm32 . When the logging part of XRay is ready for tail calls, its support in the core part of XRay Arm32 may be as easy as changing the number passed to the handler from 1 to 2.
Coupled patch:
- https://reviews.llvm.org/D28674
Reviewers: dberris, rengolin
Reviewed By: dberris
Subscribers: llvm-commits, iid_iunknown, aemerson, rengolin, dberris
Differential Revision: https://reviews.llvm.org/D28673
llvm-svn: 293185
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
enabled.
    * Simplify Consecutive Merge Store Candidate Search
    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.
    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.
    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).
    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
    Additional Minor Changes:
      1. Finishes removing unused AliasLoad code
      2. Unifies the chain aggregation in the merged stores across code
         paths
      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.
      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.
      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence
      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.
      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)
      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.
    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.
    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:
      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.
    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293184
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
And teach shouldAssumeDSOLocal that ppc has no copy relocations.
The resulting code handle a few more case than before. For example, it
knows that a weak symbol can be resolved to another .o file, but it
will still be in the main executable.
llvm-svn: 293180
 | 
| | 
| 
| 
|  | 
llvm-svn: 293178
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Inlining in getAddExpr() can cause abnormal computational time in some cases.
New parameter -scev-addops-inline-threshold is intruduced with default value 500.
Reviewers: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D28812
llvm-svn: 293176
 | 
| | 
| 
| 
| 
| 
|  | 
Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit.
llvm-svn: 293175
 | 
| | 
| 
| 
| 
| 
|  | 
Differential revision: https://reviews.llvm.org/D28980
llvm-svn: 293171
 | 
| | 
| 
| 
| 
| 
|  | 
This reverts commit r293164. There are multiple tests failing.
llvm-svn: 293170
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
change the set of uniform instructions in the loop causing an assert
failure.
The problem is that the legalization checking also builds data
structures mapping various facts about the loop body. The immediate
cause was the set of uniform instructions. If these then change when
LCSSA is formed, the data structures would already have been built and
become stale. The included test case triggered an assert in loop
vectorize that was reduced out of the new PM's pipeline.
The solution is to form LCSSA early enough that no information is cached
across the changes made. The only really obvious position is outside of
the main logic to vectorize the loop. This also has the advantage of
removing one case where forming LCSSA could mutate the loop but we
wouldn't track that as a "Changed" state.
If it is significantly advantageous to do some legalization checking
prior to this, we can do a more careful positioning but it seemed best
to just back off to a safe position first.
llvm-svn: 293168
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch makes one change to GOT handling and two changes to N64's
relocation model handling. Furthermore, the jumptable encodings have
been corrected for static N64.
Big GOT handling is now done via a new SDNode MipsGotHi - this node is
unconditionally lowered to an lui instruction.
The first change to N64's relocation handling is the lifting of the
restriction that N64 always uses PIC. Now it is possible to target static
environments.
The second change adds support for 64 bit symbols and enables them by
default. Previously N64 had patterns for sym32 mode only. In this mode all
symbols are assumed to have 32 bit addresses. sym32 mode support
is selectable with attribute 'sym32'. A follow on patch for clang will
add the necessary frontend parameter.
This partially resolves PR/23485.
Thanks to Brooks Davis for reporting the issue!
Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris
Differential Revision: https://reviews.llvm.org/D23652
llvm-svn: 293164
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Add support for loading i1, i8 and i16 arguments from the stack, with or without
the ABI extension flags.
When the ABI extension flags are present, we load a 4-byte value, otherwise we
preserve the size of the load and let the instruction selector replace it with a
LDRB/LDRH. This generates the same thing as DAGISel.
Differential Revision: https://reviews.llvm.org/D27803
llvm-svn: 293163
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
with it.
This code was dereferencing the PoisoningVH which isn't allowed once it
is poisoned. But the code itself really doesn't need to access the
pointer, it is just doing the safe stuff of clearing out data structures
keyed on the pointer value.
Change the code to use iterators to erase directly from a DenseMap. This
is also substantially more efficient as it avoids lots of hashing and
lookups to do the erasure. DenseMap supports iterating behind the
iteration which is fairly easy to implement.
Sadly, I don't have a test case here. I'm not even close and I don't
know that I ever will be. The issue is that several of the tricky
aspects of fixing this only show up when you cause the stack's
SmallVector to be in *EXACTLY* the right location. I only ever got
a reproduction for those with Clang, and only with *exactly* the right
command line flags. Any adjustment, even to seemingly unrelated flags,
would make partial and half-way solutions magically start to "work". In
good news, all of this was caught with the LLVM test suite. Also, there
is no *specific* code here that is untested, just that the old pattern
of code won't immediately fail on any test case I've managed to
contrive.
llvm-svn: 293160
 | 
| | 
| 
| 
| 
| 
|  | 
DAG combine phase where I had originally meant to put it.
llvm-svn: 293157
 | 
| | 
| 
| 
| 
| 
|  | 
operations, use the correct type for the immediate operand.
llvm-svn: 293156
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Refactoring to remove duplications of this method.
New method getOperandsScalarizationOverhead() that looks at the present unique
operands and add extract costs for them. Old behaviour was to just add extract
costs for one operand of the type always, which still happens in
getArithmeticInstrCost() if no operands are provided by the caller.
This is a good start of improving on this, but there are more places
that can be improved by using getOperandsScalarizationOverhead().
Review: Hal Finkel
https://reviews.llvm.org/D29017
llvm-svn: 293155
 | 
| | 
| 
| 
| 
| 
|  | 
undef subvectors.
llvm-svn: 293152
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This intrinsic uses bit 0 and bit 4 of an immediate argument to determine which bits of its inputs to read. This patch uses this information to simplify the demanded elements of the input vectors.
Differential Revision: https://reviews.llvm.org/D28979
llvm-svn: 293151
 | 
| | 
| 
| 
|  | 
llvm-svn: 293150
 | 
| | 
| 
| 
|  | 
llvm-svn: 293148
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
factory functions for the two modes the loop unroller is actually used
in in-tree: simplified full-unrolling and the entire thing including
partial unrolling.
I've also wired these up to nice names so you can express both of these
being in a pipeline easily. This is a precursor to actually enabling
these parts of the O2 pipeline.
Differential Revision: https://reviews.llvm.org/D28897
llvm-svn: 293136
 | 
| | 
| 
| 
|  | 
llvm-svn: 293129
 | 
| | 
| 
| 
|  | 
llvm-svn: 293128
 | 
| | 
| 
| 
|  | 
llvm-svn: 293127
 | 
| | 
| 
| 
| 
| 
|  | 
sure it is not asan/msan-instrumented
llvm-svn: 293125
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Even when we don't create a remainder loop (that is, when we unroll by 2), we
may duplicate nested loops into the remainder. This is complicated by the fact
the remainder may itself be either inserted into an outer loop, or at the top
level. In the latter case, we may need to create new top-level loops.
Differential Revision: https://reviews.llvm.org/D29156
llvm-svn: 293124
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Otherwise we ask for a domtree node that's not there, and we crash.
Differential Revision:  https://reviews.llvm.org/D29145
llvm-svn: 293122
 | 
| | 
| 
| 
| 
| 
| 
|  | 
This is the opt/llc counterpart of -fsave-optimization-record to output
optimization remarks in a YAML file.
llvm-svn: 293121
 | 
| | 
| 
| 
| 
| 
|  | 
Thanks to Davide Italiano for finding the problem and providing a test case.
llvm-svn: 293119
 | 
| | 
| 
| 
| 
| 
|  | 
dumping the PCs
llvm-svn: 293117
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary: Fix comments in response to jlebar's comments in D27872.
Reviewers: jlebar
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29109
llvm-svn: 293116
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
Previously we assumed that the result of sqrt(x) always had 0 as its
sign bit.  But sqrt(-0) == -0.
Reviewers: hfinkel, efriedma, sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28928
llvm-svn: 293115
 | 
| | 
| 
| 
|  | 
llvm-svn: 293112
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This allows MIR passes to emit optimization remarks with the same level
of functionality that is available to IR passes.
It also hooks up the greedy register allocator to report spills.  This
allows for interesting use cases like increasing interleaving on a loop
until spilling of registers is observed.
I still need to experiment whether reporting every spill scales but this
demonstrates for now that the functionality works from llc
using -pass-remarks*=<pass>.
Differential Revision: https://reviews.llvm.org/D29004
llvm-svn: 293110
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Code region is the only part of this class that is IR-specific.  Code
region is moved down in the inheritance tree to a new derived class,
called DiagnosticInfoIROptimization.
All the existing remarks are derived from this new class now.
This allows the new MIR pass-remark classes to be derived from
DiagnosticInfoOptimizationBase.
Also because we keep the name DiagnosticInfoOptimizationBase, the clang
parts don't need any adjustment.
Differential Revision: https://reviews.llvm.org/D29003
llvm-svn: 293109
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This eliminates one overload on the term Raw.
Differential Revision: https://reviews.llvm.org/D29098
llvm-svn: 293104
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
supports it"
This reverts commit r292680. It is causing significantly worse
performance and test timeouts in our internal builds. I have already
routed reproduction instructions your way.
llvm-svn: 293092
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This is not a list of pairs, it is a hash table data structure. We now
correctly parse this out and dump it from llvm-pdbdump.
We still need to understand the conditions that lead to a type
getting an entry in the hash adjuster table.  That will be done
in a followup investigation / patch.
Differential Revision: https://reviews.llvm.org/D29090
llvm-svn: 293090
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Later code expects the vector loads produced to be directly
concatenable, which means we shouldn't pad anything except the last load
produced with UNDEF.
llvm-svn: 293088
 | 
| | 
| 
| 
| 
| 
|  | 
Thanks to Quentin for suggesting the refactoring.
llvm-svn: 293087
 | 
| | 
| 
| 
| 
| 
| 
|  | 
I think it's a hold-over from some previous iteration, but it's never
set to true in LLVM as it exists now.
llvm-svn: 293086
 |