| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
| |
Split into smaller functions and prepare for handling
non-entry functions.
llvm-svn: 299998
|
| |
|
|
|
|
|
| |
This does not do what it is attempting to use it for
and requires working around in LowerFormalArguments.
llvm-svn: 299667
|
| |
|
|
| |
llvm-svn: 299391
|
| |
|
|
| |
llvm-svn: 299372
|
| |
|
|
|
|
|
|
|
|
| |
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.
Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.
llvm-svn: 299246
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.
This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.
I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.
Followup to D25691.
Differential Revision: https://reviews.llvm.org/D31311
llvm-svn: 299219
|
| |
|
|
|
|
|
|
|
|
| |
computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.
Differential Revision: https://reviews.llvm.org/D31249
llvm-svn: 299201
|
| |
|
|
|
|
|
|
| |
computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI.
Based on comment in D31249.
llvm-svn: 298991
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.
The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.
Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.
Differential Revision: https://reviews.llvm.org/D31284
llvm-svn: 298846
|
| |
|
|
| |
llvm-svn: 298730
|
| |
|
|
|
|
|
|
| |
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.
llvm-svn: 298452
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move backend internal intrinsics along with the rest of the
normal intrinsics, and use the Intrinsic::getDeclaration
API instead of manually constructing the type list.
It's surprising this was working before. fdiv.fast had
the wrong number of parameters. The control flow intrinsic
declaration attributes were not being applied, and
their types were inconsistent. The actual IR use types
did not match the declaration, and were closer to the
types used for the patterns. The brcond lowering
was changing the types, so introduce new nodes for those.
llvm-svn: 298119
|
| |
|
|
|
|
|
|
|
| |
computeKnownBits didn't handle fp_to_fp16 to report
the high bits as 0. ARM maps the generic node to an instruction
that does not modify the high bits of the register, so introduce
a target node where the high bits are known 0.
llvm-svn: 297873
|
| |
|
|
|
|
|
| |
When doing arcp optimization with a constant denominator,
this was leaving behind rcps with constant inputs.
llvm-svn: 297248
|
| |
|
|
| |
llvm-svn: 296396
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D29958
llvm-svn: 296186
|
| |
|
|
|
|
| |
Convert llvm.SI.packf16 test uses
llvm-svn: 295797
|
| |
|
|
| |
llvm-svn: 295789
|
| |
|
|
|
|
|
|
|
|
|
| |
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.
Also allow using clamp with f16, and use knowledge
of dx10_clamp.
llvm-svn: 295788
|
| |
|
|
| |
llvm-svn: 294635
|
| |
|
|
| |
llvm-svn: 293972
|
| |
|
|
| |
llvm-svn: 293968
|
| |
|
|
|
|
| |
In multi-use cases this can save a few instructions.
llvm-svn: 293962
|
| |
|
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.
llvm-svn: 293915
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
Recommiting after fixing X86 inc/dec chain bug.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293893
|
| |
|
|
|
|
|
|
|
|
|
| |
The operand types were defined to fit the fp16_to_fp node, which
has the half as an integer type. v_cvt_f32_f16 does support
source modifiers, so change this to have an FP type and modifiers.
For targets without legal f16, this requires recognizing the
bit operations and trying to produce them.
llvm-svn: 293857
|
| |
|
|
|
|
| |
Use a more specific subtarget check and combine hasOneUse checks
llvm-svn: 293726
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix build when global-isel is disabled and fix a warning.
Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.
Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm
Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris
Differential Revision: https://reviews.llvm.org/D26730
llvm-svn: 293551
|
| |
|
|
|
|
|
|
| |
This reverts commit r293503.
Revert while I investigate some of the buildbot failures.
llvm-svn: 293509
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.
Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm
Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris
Differential Revision: https://reviews.llvm.org/D26730
llvm-svn: 293503
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html
For reference:
- Public headers should just declare the dump() method but not use
LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void MyClass::dump() {
// print stuff to dbgs()...
}
#endif
llvm-svn: 293359
|
| |
|
|
|
|
|
|
| |
UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds
llvm-svn: 293188
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
enabled.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the chain aggregation in the merged stores across code
paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seems sufficient to not cause regressions in
tests.
5. Remove Chain dependencies of Memory operations on CopyfromReg
nodes as these are captured by data dependence
6. Forward loads-store values through tokenfactors containing
{CopyToReg,CopyFromReg} Values.
7. Peephole to convert buildvector of extract_vector_elt to
extract_subvector if possible (see
CodeGen/AArch64/store-merge.ll)
8. Store merging for the ARM target is restricted to 32-bit as
some in some contexts invalid 64-bit operations are being
generated. This can be removed once appropriate checks are
added.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable, improving load-store forwarding. One test in
particular is worth noting:
CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
forwarding converts a load-store pair into a parallel store and
a memory-realized bitcast of the same value. However, because we
lose the sharing of the explicit and implicit store values we
must create another local store. A similar transformation
happens before SelectionDAG as well.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
llvm-svn: 293184
|
| |
|
|
| |
llvm-svn: 293127
|
| |
|
|
|
|
| |
Can't for fma/mad since it seems they can't have flags currently.
llvm-svn: 292818
|
| |
|
|
|
|
|
|
|
|
|
| |
Add DUMMY_CHAIN SDNode to denote stores of interest
Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915
Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411
Differential Revision: https://reviews.llvm.org/D27964
llvm-svn: 292651
|
| |
|
|
|
|
|
|
|
|
|
|
| |
For -(x + y) -> (-x) + (-y), if x == -y, this would
change the result from -0.0 to 0.0. Since the fma/fmad
combine is an extension of this problem it also
applies there.
fmul should be fine, and I don't think any of the unary
operators or conversions should be a problem either.
llvm-svn: 292473
|
| |
|
|
| |
llvm-svn: 291792
|
| |
|
|
| |
llvm-svn: 291790
|
| |
|
|
| |
llvm-svn: 291784
|
| |
|
|
| |
llvm-svn: 291779
|
| |
|
|
| |
llvm-svn: 291778
|
| |
|
|
| |
llvm-svn: 291777
|
| |
|
|
|
|
| |
Patch mostly by Fiona Glaser
llvm-svn: 291733
|
| |
|
|
|
|
| |
Patch mostly by Fiona Glaser
llvm-svn: 291732
|
| |
|
|
|
|
| |
Patch mostly by Fiona Glaser
llvm-svn: 291731
|
| |
|
|
|
|
| |
Allows better source modifier usage.
llvm-svn: 291729
|
| |
|
|
|
|
| |
This was enabled without many specific tests or the comment.
llvm-svn: 291586
|
| |
|
|
|
|
|
|
| |
This will make transition to SCRATCH_MEMORY easier
Differential Revision: https://reviews.llvm.org/D24746
llvm-svn: 291279
|
| |
|
|
|
|
|
|
| |
v2: expose using amdgcn prefix
Differential Revision: https://reviews.llvm.org/D23511
llvm-svn: 290977
|