|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | r295336 causes a bootstrapped clang to fail for many compilations on
powerpc BE.  See 
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315
for example.
Reverting as per the developer's request.
llvm-svn: 295849 | 
| | 
| 
| 
| 
| 
| 
| | Avoids test regressions in future AMDGPU commits when
more vector types are custom lowered.
llvm-svn: 295782 | 
| | 
| 
| 
| | llvm-svn: 295653 | 
| | 
| 
| 
| 
| 
| | Thanks to Mikael Holmén for the initial test case
llvm-svn: 295652 | 
| | 
| 
| 
| | llvm-svn: 295607 | 
| | 
| 
| 
| 
| 
| | I can't find any tests of the non-i1 code path, so it may be unnecessary at this point.
llvm-svn: 295463 | 
| | 
| 
| 
| | llvm-svn: 295453 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | masks
During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing).
This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types.
The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712).
Differential Revision: https://reviews.llvm.org/D29454
llvm-svn: 295451 | 
| | 
| 
| 
| | llvm-svn: 295447 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Resubmit -r295314 with PowerPC and AMDGPU tests updated.
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295336 | 
| | 
| 
| 
| 
| 
| 
| 
| | load combine"
This change causes some of AMDGPU and PowerPC tests to fail.
llvm-svn: 295316 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295314 | 
| | 
| 
| 
| 
| 
| | Tests will be included with future commit.
llvm-svn: 295242 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.
This fixes PR311956.
Differential Revision: https://reviews.llvm.org/D29961
llvm-svn: 295213 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | inputs are larger than the mask
Summary:
The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract.
This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract.
Reviewers: zvi, RKSimon
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29926
llvm-svn: 295152 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV
llvm-svn: 295081 | 
| | 
| 
| 
| | llvm-svn: 295055 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.
rdar://30495920
llvm-svn: 294982 | 
| | 
| 
| 
| 
| 
| 
| 
| | This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.
llvm-svn: 294970 | 
| | 
| 
| 
| 
| 
| | into the same location of a an undef vector can just use the original input to the extract.
llvm-svn: 294932 | 
| | 
| 
| 
| 
| 
| 
| 
| | EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR.
This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors.
llvm-svn: 294930 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | The bug was introduced with:
https://reviews.llvm.org/rL294863
...and manifests as a selection failure in x86, but that's actually
another bug. This fix prevents wrong codegen with -0.0, but in the
more common case when we have NSZ and NNAN (-ffast-math), we should 
still be able to fold this setcc/compare.
llvm-svn: 294924 | 
| | 
| 
| 
| 
| 
| | generic to support larger concats.
llvm-svn: 294875 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | I don't know if anything other than x86 vectors is affected by this change, but this may allow 
us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises
from the fact that blendv* instructions only use the sign-bit when deciding which vector element
to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already
exists in x86 lowering; this demanded bits change just enables the transform to fire more often.
The original motivation starts with a bug for DSE of masked stores that seems completely unrelated, 
but I've explained the likely steps in this series here:
https://llvm.org/bugs/show_bug.cgi?id=11210
Differential Revision: https://reviews.llvm.org/D29687
llvm-svn: 294863 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | legalization
The patch comes in 2 parts:
1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types.
2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled.
Fix for PR30760
Differential Revision: https://reviews.llvm.org/D29568
llvm-svn: 294749 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | type legalization
Summary:
With -debug, we aren't dumping the DAG after legalizing vector ops. In particular, on X86 with AVX1 only, we don't dump the DAG after we split 256-bit integer ops into pairs of 128-bit ADDs since this occurs during vector legalization.
I'm only dumping if the legalize vector ops changes something since we don't print anything during legalize vector ops. So this dump shows up right after the first type-legalization dump happens. So if nothing changed this second dump is unnecessary.
Having said that though, I think we should probably fix legalize vector ops to log what its doing.
Reviewers: RKSimon, eli.friedman, spatel, arsenm, chandlerc
Reviewed By: RKSimon
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D29554
llvm-svn: 294711 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by
Mikael Holmen.  Handle non-canonicalized xor not operation
correctly (was assuming operand 0 was always the non-constant operand)
and check that the negated condition is also in the same block as the
original and/or instruction (as is done for and/or operands already)
before proceeding with optimization.
Reviewers: bogner, MatzeB, qcolombet
Subscribers: mcrosier, uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D29680
llvm-svn: 294605 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Enable folding patterns which load the value from non-zero offset:
  i8 *a = ...
  i32 val = a[4] | (a[5] << 8) | (a[6] << 16) | (a[7] << 24)
=>
  i32 val = *((i32*)(a+4))
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D29394
llvm-svn: 294582 | 
| | 
| 
| 
| | llvm-svn: 294494 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: As per title.
Reviewers: mkuper, spatel, bkramer, RKSimon, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29528
llvm-svn: 294394 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Hoist entry block code for arguments and swift error values out of the
basic block instruction selection loop. Lowering arguments once up front
seems much more readable than doing it conditionally inside the loop. It
also makes it clear that argument lowering can update StaticAllocaMap
because no instructions have been selected yet.
Also use range-based for loops where possible.
llvm-svn: 294329 | 
| | 
| 
| 
| | llvm-svn: 294325 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This reverts commit r294186.
On an internal test, this triggers an out-of-memory error on PPC,
presumably because there is another dagcombine that does the exact
opposite triggering and endless loop consuming more and more memory.
Chandler has started at creating a reduced test case and we'll attach it
as soon as possible.
llvm-svn: 294288 | 
| | 
| 
| 
| 
| 
| 
| 
| | Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D29397
llvm-svn: 294201 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: As per title.
Reviewers: bkramer, sunfish, lattner, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29521
llvm-svn: 294188 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: Leverage it to transform addc into add.
Reviewers: mkuper, spatel, RKSimon, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29524
llvm-svn: 294187 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: This is extracted from D29443 .
Reviewers: mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29564
llvm-svn: 294186 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | being combined.
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.
We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.
This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.
Differential Revision: https://reviews.llvm.org/D29399
llvm-svn: 294183 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Without this change, the getVR() call would hit an assert since it was
being passed a physical register.
Update the AArch64/ldst-opt.ll test with a case that triggers this
behavior by adding a run with strict-align, which causes an unaligned
STR XZR instruction to be split into byte stores, creating an
EXTRACT_SUBREG of XZR that triggers the original problem.
Reviewers: bogner, qcolombet, MatzeB, atrick
Subscribers: aemerson, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D29495
llvm-svn: 294129 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: This avoid the need to duplicate all pattern and actually end up exposing some opportunity to optimize existing pattern that did not exists in both directions on an existing test case.
Reviewers: mkuper, spatel, bkramer, RKSimon, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29541
llvm-svn: 294125 | 
| | 
| 
| 
| 
| 
| | Based on similar code for INSERT_VECTOR_ELT.
llvm-svn: 294110 | 
| | 
| 
| 
| | llvm-svn: 294109 | 
| | 
| 
| 
| 
| 
| | legal below code that just canonicalizes INSERT_VECTOR_ELT without creating BUILD_VECTORS.
llvm-svn: 294108 | 
| | 
| 
| 
| | llvm-svn: 294091 | 
| | 
| 
| 
| 
| 
| 
| 
| | other minor fixes (NFC).
This is preparation to reduce TargetInstrInfo.h dependencies.
llvm-svn: 294084 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This re-applies commit r292189, reverted in r292191.
SelectionDAGBuilder recognizes libfuncs using some homegrown
parameter type-checking.
Use TLI instead, removing another heap of redundant code.
This isn't strictly NFC, as the SDAG code was too lax.
Concretely, this means changes are required to a few tests:
- calling a non-variadic function via a variadic prototype isn't OK;
  it just happens to work on x86_64 (but not on, e.g., aarch64).
- mempcpy has a size_t parameter;  the SDAG code accepts any integer
  type, which meant using i32 on x86_64 worked.
- a handful of SystemZ tests check the SDAG support for lax prototype
  checking: Ulrich agrees on removing them.
I don't think it's worth supporting any of these (IMO) invalid
testcases.  Instead, fix them to be more meaningful.
llvm-svn: 294028 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | ISD::DELETED_NODE && "NodeToMatch was removed partway through
selection"' failed.
NodeToMatch can be modified during matching, but code does not handle
this situation.
Differential Revision: https://reviews.llvm.org/D29292
llvm-svn: 294003 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.
llvm-svn: 293915 | 
| | 
| 
| 
| | llvm-svn: 293903 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | enabled.
    Recommiting after fixing X86 inc/dec chain bug.
    * Simplify Consecutive Merge Store Candidate Search
    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.
    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.
    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).
    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
    Additional Minor Changes:
      1. Finishes removing unused AliasLoad code
      2. Unifies the chain aggregation in the merged stores across code
         paths
      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.
      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.
      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence
      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.
      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)
      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.
    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.
    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:
      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.
    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293893 |