| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 215395
|
|
|
|
|
|
|
| |
This will lower them using register copies rather than loads and stores
to the stack.
llvm-svn: 215270
|
|
|
|
|
|
|
|
|
|
|
|
| |
These tests were using SI-NOT: MOVREL to make sure concat vectors
weren't being lowered to stack loads and stores, but we are using
scratch buffers for the stack now instead of registers, so we need
to add an additional SI-NOT check for scratch buffers.
With this change I was able to uncover one broken test which will
be fixed in a future commit.
llvm-svn: 215269
|
|
|
|
| |
llvm-svn: 214991
|
|
|
|
|
|
|
|
|
| |
This partially fixes weird looking load scheduling
in memcpy test. The load clustering doesn't seem
particularly smart, but this method seems to be partially
deprecated so it might not be worth trying to fix.
llvm-svn: 214943
|
|
|
|
|
|
|
|
| |
This currently has a noticable effect on the kernel argument loads.
LDS and global loads are more problematic, I think because of how copies
are currently inserted to ensure that the address is a VGPR.
llvm-svn: 214942
|
|
|
|
| |
llvm-svn: 214866
|
|
|
|
|
|
|
| |
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.
llvm-svn: 214865
|
|
|
|
|
|
|
|
|
| |
This slipped in in r214467, so something like
V_MOV_B32_e32 v0, ... is now printed with 2 spaces
between the instruction name and first operand.
llvm-svn: 214660
|
|
|
|
| |
llvm-svn: 214612
|
|
|
|
|
|
|
|
| |
This reverts commit r214566.
I did not mean to commit this yet.
llvm-svn: 214572
|
|
|
|
|
|
|
| |
SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code
path for 8-bit and 16-bit private loads.
llvm-svn: 214566
|
|
|
|
|
|
|
| |
Remove -CHECKs, use multiple prefixes, name values,
also test the @llvm.fabs version
llvm-svn: 214525
|
|
|
|
|
|
|
|
|
|
| |
Abs/neg folding has moved out of foldOperands and into the instruction
selection phase using complex patterns. As a consequence of this
change, we now prefer to select the 64-bit encoding for most
instructions and the modifier operands have been dropped from
integer VOP3 instructions.
llvm-svn: 214467
|
|
|
|
|
|
|
| |
This will prevent us from using extra MOV instructions once we prefer
selecting 64-bit instructions.
llvm-svn: 214464
|
|
|
|
|
|
|
|
| |
We were commuting the instruction by still shrinking it using the
original opcode.
NOTE: This is a candidate for the 3.5 branch.
llvm-svn: 214463
|
|
|
|
|
|
| |
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 214451
|
|
|
|
| |
llvm-svn: 214108
|
|
|
|
|
|
|
| |
The default guess uses i32. This needs an address space argument
to really do the right thing in all cases.
llvm-svn: 214104
|
|
|
|
|
|
|
|
|
|
| |
Rename to allowsMisalignedMemoryAccess.
On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.
llvm-svn: 214055
|
|
|
|
|
|
|
|
| |
There was no check prefix for the instruction lines.
Match what is emitted though, although I'm pretty sure it is
incorrect.
llvm-svn: 214035
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
over each node in the worklist prior to combining.
This allows the combiner to produce new nodes which need to go back
through legalization. This is particularly useful when generating
operands to target specific nodes in a post-legalize DAG combine where
the operands are significantly easier to express as pre-legalized
operations. My immediate use case will be PSHUFB formation where we need
to build a constant shuffle mask with a build_vector node.
This also refactors the relevant functionality in the legalizer to
support this, and updates relevant tests. I've spoken to the R600 folks
and these changes look like improvements to them. The avx512 change
needs to be investigated, I suspect there is a disagreement between the
legalizer and the DAG combiner there, but it seems a minor issue so
leaving it to be re-evaluated after this patch.
Differential Revision: http://reviews.llvm.org/D4564
llvm-svn: 214020
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
which have successfully round-tripped through the combine phase, and use
this to ensure all operands to DAG nodes are visited by the combiner,
even if they are only added during the combine phase.
This is critical to have the combiner reach nodes that are *introduced*
during combining. Previously these would sometimes be visited and
sometimes not be visited based on whether they happened to end up on the
worklist or not. Now we always run them through the combiner.
This fixes quite a few bad codegen test cases lurking in the suite while
also being more principled. Among these, the TLS codegeneration is
particularly exciting for programs that have this in the critical path
like TSan-instrumented binaries (although I think they engineer to use
a different TLS that is faster anyways).
I've tried to check for compile-time regressions here by running llc
over a merged (but not LTO-ed) clang bitcode file and observed at most
a 3% slowdown in llc. Given that this is essentially a worst case (none
of opt or clang are running at this phase) I think this is tolerable.
The actual LTO case should be even less costly, and the cost in normal
compilation should be negligible.
With this combining logic, it is possible to re-legalize as we combine
which is necessary to implement PSHUFB formation on x86 as
a post-legalize DAG combine (my ultimate goal).
Differential Revision: http://reviews.llvm.org/D4638
llvm-svn: 213898
|
|
|
|
| |
llvm-svn: 213882
|
|
|
|
| |
llvm-svn: 213844
|
|
|
|
|
|
|
|
|
|
| |
Use ComputeNumSignBits instead of checking for i8 / i16 which only
worked when AMDIL was lying about having legal i8 / i16.
If an integer is known to fit in 24-bits, we can
do division faster with float ops.
llvm-svn: 213843
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
insertions.
The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. My measurements of
codegen time shows slight improvement. The memory utilization is
unsurprisingly dominated by other factors (the IR and DAG itself
I suspect).
This change results in subtle, frustrating churn in the particular order
in which DAG combines are applied which causes a number of minor
regressions where we fail to match a pattern previously matched by
accident. AFAICT, all of these should be using AddToWorklist to directly
or should be written in a less brittle way. None of the changes seem
drastically bad, and a few of the changes seem distinctly better.
A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.
I think the x86 changes are really minor and uninteresting, but the
avx512 change at least is hiding a "regression" (despite the test case
being just noise, not testing some performance invariant) that might be
looked into. Not sure if any of the others impact specific "important"
code paths, but they didn't look terribly interesting to me, or the
changes were really minor. The consensus in review is to fix any
regressions that show up after the fact here.
Thanks to the other reviewers for checking the output on other
architectures. There is a specific regression on ARM that Tim already
has a fix prepped to commit.
Differential Revision: http://reviews.llvm.org/D4616
llvm-svn: 213727
|
|
|
|
|
|
| |
This pass converts 64-bit instructions to 32-bit when possible.
llvm-svn: 213561
|
|
|
|
|
|
|
|
|
| |
There are a few more cleanups to do, but I ran into some problems
with ext loads and trunc stores, when I tried to change some of the
vector loads and stores from custom to legal, so I wasn't able to
get rid of everything.
llvm-svn: 213552
|
|
|
|
| |
llvm-svn: 213551
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements a solution for constant initializers suggested
by Vadim Girlin, where we store the data after the shader code
and then use the S_GETPC instruction to compute its address.
This saves use the trouble of creating a new buffer for constant data
and then having to pass the pointer to the kernel via user SGPRs or the
input buffer.
llvm-svn: 213530
|
|
|
|
| |
llvm-svn: 213528
|
|
|
|
| |
llvm-svn: 213473
|
|
|
|
|
|
|
|
| |
This probably was killed by some generic DAGCombiner
improvements in checking the TargetBooleanContents instead
of just 1.
llvm-svn: 213471
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These instructions can only take a limited input range, and return
the constant value 1 out of range. We should do range reduction to
be able to process arbitrary values. Use a FRACT instruction after
normalization to achieve this. Also add a test for constant folding
with the lowered code with unsafe-fp-math enabled.
v2: use DAG lowering instead of intrinsic, adapt test
v3: calculate constant, fold pattern into instruction definition
v4: misc style fixes, add sin-fold testcase, cosmetics
Patch by Grigori Goronzy
llvm-svn: 213458
|
|
|
|
| |
llvm-svn: 213376
|
|
|
|
|
|
|
|
|
|
|
|
| |
Actual support for softening f16 operations is still limited, and can be added
when it's needed. But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.
Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.
llvm-svn: 213372
|
|
|
|
|
|
|
| |
This test is actually going in the opposite direction to what the
filename and function name suggested.
llvm-svn: 213358
|
|
|
|
|
|
|
|
| |
Unfortunately, we don't seem to have a direct truncation, but the
extension can be legally split into two operations so we should
support that.
llvm-svn: 213357
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.
During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.
Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.
llvm-svn: 213248
|
|
|
|
|
|
|
| |
These are precise enough to use for OpenCL unless denormals
are handled.
llvm-svn: 213107
|
|
|
|
| |
llvm-svn: 213096
|
|
|
|
|
|
|
| |
Assuming single precision denormals and accurate sqrt/div are not
reported, this passes the OpenCL conformance test.
llvm-svn: 213089
|
|
|
|
|
|
|
|
|
| |
v2: use ffbh/l if available
v3: Rebase on top of Matt's SI patches
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 213072
|
|
|
|
|
|
|
|
|
| |
This helps avoid redundant instructions to unpack, and repack
the vectors. Ideally we could recognize that pattern and eliminate
it. Currently v4i8 and other small element type vectors are scalarized,
so this has the added bonus of avoiding that.
llvm-svn: 213031
|
|
|
|
| |
llvm-svn: 213018
|
|
|
|
| |
llvm-svn: 213017
|
|
|
|
|
|
|
| |
Re-run tests changed in r211110 to test both paths.
Also fix broken check line.
llvm-svn: 212895
|
|
|
|
|
|
|
| |
The unpromoted path still needs to be tested since we can't
always promote to using LDS.
llvm-svn: 212894
|
|
|
|
| |
llvm-svn: 212870
|