| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
| |
intermediate nodes have a single use across all results not just the result that was used to reach the chain node.
This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.
llvm-svn: 296486
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296476
|
|
|
|
|
|
| |
other minor fixes (NFC).
llvm-svn: 296404
|
|
|
|
|
|
|
|
|
|
| |
createSwiftErrorEntriesInEntryBlock
Otherwise, it will insert instructions before it.
rdar://30536186
llvm-svn: 296395
|
|
|
|
|
|
|
| |
This reverts r295782. This could potentially result in some
legalization loops and I avoided the need for this.
llvm-svn: 296393
|
|
|
|
|
|
|
|
|
|
| |
DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.
This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.
Differential Revision: https://reviews.llvm.org/D30176
llvm-svn: 296381
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
targets
This pattern is essentially a i16 load from p+1 address:
%p1.i16 = bitcast i8* %p to i16*
%p2.i8 = getelementptr i8, i8* %p, i64 2
%v1 = load i16, i16* %p1.i16
%v2.i8 = load i8, i8* %p2.i8
%v2 = zext i8 %v2.i8 to i16
%v1.shl = shl i16 %v1, 8
%res = or i16 %v1.shl, %v2
Current implementation would identify %v1 load as the first byte load and would mistakenly emit a i16 load from %p1.i16 address. This patch adds a check that the first byte is loaded from a non-zero offset of the first load address. This way this address can be used as the base address for the combined value. Otherwise just give up combining.
llvm-svn: 296336
|
|
|
|
|
|
| |
This refactoring will simplify the upcoming change to fix the bug in folding patterns with non-zero offsets on BE targets.
llvm-svn: 296332
|
|
|
|
|
|
|
|
| |
load node
This refactoring will simplify the upcoming change to fix a bug in folding patterns with non-zero offsets on BE targets.
llvm-svn: 296331
|
|
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.
llvm-svn: 296279
|
|
|
|
| |
llvm-svn: 296259
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 296252
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The motivation for filling out these select-of-constants cases goes back to D24480,
where we discussed removing an IR fold from add(zext) --> select. And that goes back to:
https://reviews.llvm.org/rL75531
https://reviews.llvm.org/rL159230
The idea is that we should always canonicalize patterns like this to a select-of-constants
in IR because that's the smallest IR and the best for value tracking. Note that we currently
do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in
this patch already exist in InstCombine today:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151
As this patch shows, most targets generate better machine code for simple ext/add/not ops
rather than a select of constants. So the follow-up steps to make this less of a patchwork
of special-case folds and missing IR canonicalization:
1. Have DAGCombiner convert any select of constants into ext/add/not ops.
2 Have InstCombine canonicalize in the other direction (create more selects).
Differential Revision: https://reviews.llvm.org/D30180
llvm-svn: 296137
|
|
|
|
| |
llvm-svn: 296004
|
|
|
|
|
|
|
|
|
|
|
| |
r295336 causes a bootstrapped clang to fail for many compilations on
powerpc BE. See
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315
for example.
Reverting as per the developer's request.
llvm-svn: 295849
|
|
|
|
|
|
|
| |
Avoids test regressions in future AMDGPU commits when
more vector types are custom lowered.
llvm-svn: 295782
|
|
|
|
| |
llvm-svn: 295653
|
|
|
|
|
|
| |
Thanks to Mikael Holmén for the initial test case
llvm-svn: 295652
|
|
|
|
| |
llvm-svn: 295607
|
|
|
|
|
|
| |
I can't find any tests of the non-i1 code path, so it may be unnecessary at this point.
llvm-svn: 295463
|
|
|
|
| |
llvm-svn: 295453
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
masks
During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing).
This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types.
The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712).
Differential Revision: https://reviews.llvm.org/D29454
llvm-svn: 295451
|
|
|
|
| |
llvm-svn: 295447
|
|
|
|
|
|
|
|
|
|
|
|
| |
Resubmit -r295314 with PowerPC and AMDGPU tests updated.
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295336
|
|
|
|
|
|
|
|
| |
load combine"
This change causes some of AMDGPU and PowerPC tests to fail.
llvm-svn: 295316
|
|
|
|
|
|
|
|
|
|
| |
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295314
|
|
|
|
|
|
| |
Tests will be included with future commit.
llvm-svn: 295242
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.
This fixes PR311956.
Differential Revision: https://reviews.llvm.org/D29961
llvm-svn: 295213
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
inputs are larger than the mask
Summary:
The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract.
This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract.
Reviewers: zvi, RKSimon
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29926
llvm-svn: 295152
|
|
|
|
|
|
|
|
|
|
| |
To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV
llvm-svn: 295081
|
|
|
|
| |
llvm-svn: 295055
|
|
|
|
|
|
|
|
|
| |
Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.
rdar://30495920
llvm-svn: 294982
|
|
|
|
|
|
|
|
| |
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.
llvm-svn: 294970
|
|
|
|
|
|
| |
into the same location of a an undef vector can just use the original input to the extract.
llvm-svn: 294932
|
|
|
|
|
|
|
|
| |
EXTRACT_SUBVECTOR from an INSERT_SUBVECTOR.
This gives more parallelism opportunities for AVX-512 when dealing with 128-bit extracts from 512-bit vectors.
llvm-svn: 294930
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bug was introduced with:
https://reviews.llvm.org/rL294863
...and manifests as a selection failure in x86, but that's actually
another bug. This fix prevents wrong codegen with -0.0, but in the
more common case when we have NSZ and NNAN (-ffast-math), we should
still be able to fold this setcc/compare.
llvm-svn: 294924
|
|
|
|
|
|
| |
generic to support larger concats.
llvm-svn: 294875
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't know if anything other than x86 vectors is affected by this change, but this may allow
us to remove target-specific intrinsics for blendv* (vector selects). The simplification arises
from the fact that blendv* instructions only use the sign-bit when deciding which vector element
to choose for the destination vector. The mechanism to fold VSELECT into SHRUNKBLEND nodes already
exists in x86 lowering; this demanded bits change just enables the transform to fire more often.
The original motivation starts with a bug for DSE of masked stores that seems completely unrelated,
but I've explained the likely steps in this series here:
https://llvm.org/bugs/show_bug.cgi?id=11210
Differential Revision: https://reviews.llvm.org/D29687
llvm-svn: 294863
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
legalization
The patch comes in 2 parts:
1 - it makes use of the SelectionDAG::NewNodesMustHaveLegalTypes flag to tell when it can safely constant fold illegal types.
2 - it correctly resets SelectionDAG::NewNodesMustHaveLegalTypes at the start of each call to SelectionDAGISel::CodeGenAndEmitDAG so all the pre-legalization stages can make use of it - not just the first basic block that gets handled.
Fix for PR30760
Differential Revision: https://reviews.llvm.org/D29568
llvm-svn: 294749
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
type legalization
Summary:
With -debug, we aren't dumping the DAG after legalizing vector ops. In particular, on X86 with AVX1 only, we don't dump the DAG after we split 256-bit integer ops into pairs of 128-bit ADDs since this occurs during vector legalization.
I'm only dumping if the legalize vector ops changes something since we don't print anything during legalize vector ops. So this dump shows up right after the first type-legalization dump happens. So if nothing changed this second dump is unnecessary.
Having said that though, I think we should probably fix legalize vector ops to log what its doing.
Reviewers: RKSimon, eli.friedman, spatel, arsenm, chandlerc
Reviewed By: RKSimon
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D29554
llvm-svn: 294711
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Fix two bugs in SelectionDAGBuilder::FindMergedConditions reported by
Mikael Holmen. Handle non-canonicalized xor not operation
correctly (was assuming operand 0 was always the non-constant operand)
and check that the negated condition is also in the same block as the
original and/or instruction (as is done for and/or operands already)
before proceeding with optimization.
Reviewers: bogner, MatzeB, qcolombet
Subscribers: mcrosier, uabelho, llvm-commits
Differential Revision: https://reviews.llvm.org/D29680
llvm-svn: 294605
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable folding patterns which load the value from non-zero offset:
i8 *a = ...
i32 val = a[4] | (a[5] << 8) | (a[6] << 16) | (a[7] << 24)
=>
i32 val = *((i32*)(a+4))
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D29394
llvm-svn: 294582
|
|
|
|
| |
llvm-svn: 294494
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: As per title.
Reviewers: mkuper, spatel, bkramer, RKSimon, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29528
llvm-svn: 294394
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hoist entry block code for arguments and swift error values out of the
basic block instruction selection loop. Lowering arguments once up front
seems much more readable than doing it conditionally inside the loop. It
also makes it clear that argument lowering can update StaticAllocaMap
because no instructions have been selected yet.
Also use range-based for loops where possible.
llvm-svn: 294329
|
|
|
|
| |
llvm-svn: 294325
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r294186.
On an internal test, this triggers an out-of-memory error on PPC,
presumably because there is another dagcombine that does the exact
opposite triggering and endless loop consuming more and more memory.
Chandler has started at creating a reduced test case and we'll attach it
as soon as possible.
llvm-svn: 294288
|
|
|
|
|
|
|
|
| |
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D29397
llvm-svn: 294201
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: As per title.
Reviewers: bkramer, sunfish, lattner, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29521
llvm-svn: 294188
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Leverage it to transform addc into add.
Reviewers: mkuper, spatel, RKSimon, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29524
llvm-svn: 294187
|