| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
Legalize by casting to a 64-bit constant address. This isn't how the
DAG implements it, but it should.
llvm-svn: 371535
|
|
|
|
|
|
|
|
| |
There's still a lot more to do, but this handles decomposing due to
alignment. I've gotten it to the point where nothing crashes or
infinite loops the legalizer.
llvm-svn: 371533
|
|
|
|
|
|
|
|
| |
Reviewers: rampitec, vpykhtin
Differential Revision: https://reviews.llvm.org/D67101
llvm-svn: 371508
|
|
|
|
| |
llvm-svn: 371471
|
|
|
|
|
|
|
|
| |
Handle it the same way as G_BUILD_VECTOR_TRUNC. Arguably only
G_BUILD_VECTOR_TRUNC should be legal for this, but G_BUILD_VECTOR will
probably be more convenient in most cases.
llvm-svn: 371440
|
|
|
|
|
|
|
| |
This was getting chosen as the preferred 32-bit register class based
on how TableGen selects subregister classes.
llvm-svn: 371438
|
|
|
|
|
|
| |
Also fixes missing SubtargetPredicate on f16 class instructions.
llvm-svn: 371436
|
|
|
|
| |
llvm-svn: 371435
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This enables GlobalISel to handle various intrinsics. The custom node
pattern will be ignored, and the intrinsic will work. This will also
allow SelectionDAG to directly select the intrinsics, but as they are
all custom lowered to the nodes, this ends up leaving dead code in the
table.
Eventually either GlobalISel should add the equivalent of custom nodes
equivalent, or intrinsics should be directly used. These each have
different tradeoffs.
There are a few more to handle, but these are easy to handle
ones. Some others fail for other reasons.
llvm-svn: 371432
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unfortunately MnemonicAlias defines a "Predicates" field just like an
instruction or pattern, with a somewhat different interpretation.
This ends up overriding the intended Predicates set by
PredicateControl on the pseudoinstruction defintions with an empty
list. This allowed incorrectly selecting instructions that should have
been rejected due to the SubtargetPredicate from patterns on the
instruction definition.
This does remove the divergent predicate from the 64-bit shift
patterns, which were already not used for the 32-bit shift, so I'm not
sure what the point was. This also removes a second, redundant copy of
the 64-bit divergent patterns.
llvm-svn: 371427
|
|
|
|
|
|
| |
Handle the simple case that lowers to a constant.
llvm-svn: 371424
|
|
|
|
|
|
|
|
|
|
| |
Treat this as legal on gfx9 since it can use S_PACK_* instructions for
this.
This isn't used by anything yet. The same will probably apply to
16-bit G_BUILD_VECTOR without the trunc.
llvm-svn: 371423
|
|
|
|
|
|
|
| |
A new check for an explicitly atomic MMO is needed to avoid
incorrectly matching pattern for non-atomic loads
llvm-svn: 371418
|
|
|
|
| |
llvm-svn: 371416
|
|
|
|
|
|
| |
There are no scalar extloads.
llvm-svn: 371414
|
|
|
|
|
|
|
| |
Fixes 8-byte, 8-byte aligned LDS loads. 16-byte case still broken due
to not be reported as legal.
llvm-svn: 371413
|
|
|
|
| |
llvm-svn: 371412
|
|
|
|
|
|
|
| |
The pointer is always a VGPR. Also fix hardcoding the pointer size to
64.
llvm-svn: 371411
|
|
|
|
| |
llvm-svn: 371409
|
|
|
|
| |
llvm-svn: 371407
|
|
|
|
| |
llvm-svn: 371253
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D66958
llvm-svn: 371214
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes poor scheduling in a function containing a barrier and a few
load instructions.
Without this fix, ScheduleDAGInstrs::buildSchedGraph adds an artificial
edge in the dependency graph from the barrier instruction to the exit
node representing live-out latency, with a latency of about 500 cycles.
Because of this it thinks the critical path through the graph also has
a latency of about 500 cycles. And because of that it does not think
that any of the load instructions are on the critical path, so it
schedules them with no regard for their (80 cycle) latency, which gives
poor results.
Reviewers: arsenm, dstuttard, tpr, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67218
llvm-svn: 371192
|
|
|
|
|
|
| |
There should probably be a size only matcher.
llvm-svn: 371155
|
|
|
|
|
|
|
| |
Report soffset as a base register if the scratch resource can be
ignored.
llvm-svn: 371149
|
|
|
|
|
|
|
|
|
|
| |
The same stack is loaded for each workitem ID, and each use. Nothing
prevents you from creating multiple fixed stack objects with the same
offsets, so this was creating a load for each unique frame index,
despite them being the same offset. Re-use the same frame index so the
loads are CSEable.
llvm-svn: 371148
|
|
|
|
|
|
|
| |
The library currently uses ptrtoint and directly checks the queue ptr
for this, which counts as a pointer capture.
llvm-svn: 371009
|
|
|
|
|
|
| |
Avoids SSA violations in a future patch.
llvm-svn: 371008
|
|
|
|
| |
llvm-svn: 371007
|
|
|
|
| |
llvm-svn: 371006
|
|
|
|
|
|
|
| |
GOTPCREL32 doesn't exist on COFF, so it isn't used when this test runs
on Windows.
llvm-svn: 371000
|
|
|
|
| |
llvm-svn: 370980
|
|
|
|
| |
llvm-svn: 370979
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since an add instruction must produce an unused carry out, this
requires additional SGPRs. This can be avoided by keeping the entire
offset computation in SGPRs. If one SGPR is still available, this only
costs one extra mov. If none are available, the entire computation can
be done in place and reversed.
This does assume the use is a VGPR operand. This was already assumed,
and we currently only select frame indexes to VALU instructions. This
should probably be fixed at some point to handle more possible MIR.
llvm-svn: 370929
|
|
|
|
|
|
| |
This is mostly for the benefit of patterns which use 16-bit constants.
llvm-svn: 370921
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SI_PC_ADD_REL_OFFSET is 0"
Summary:
D61491 caused us to use relocs when they're not strictly necessary, to
refer to symbols in the text section. This is a pessimization and it's a
problem for some loaders that don't support relocs yet.
Reviewers: nhaehnle, arsenm, tpr
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65813
llvm-svn: 370667
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add test checking that the redundant immediate MOV instruction
(by-product of handling phi nodes) is not found in the generated code.
Reviewers: arsenm, anton-afanasyev, craig.topper, rtereshin, bogner
Reviewed By: arsenm
Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, wdng, jvesely, nhaehnle, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63860
llvm-svn: 370634
|
|
|
|
| |
llvm-svn: 370402
|
|
|
|
|
|
|
|
| |
This reverts r369664 (git commit 51f48295cbe8fa3a44db263b528dd9f7bae7bf9a)
It causes many benchmark regressions, internally and in llvm's benchmark suite.
llvm-svn: 370398
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SGPR spills aren't really handled after SILowerSGPRSpills. In order to
directly control what happens if the scavenger needs to spill, the
scavenger needs to be used directly. There is an alternative to
spilling in these contexts anyway since the frame register can be
increment and restored.
This does present another possible issue if spilling is needed for the
unused carry out if an add is needed. I think this can be avoided by
using a scalar add (although that clobbers SCC, which happens anyway).
llvm-svn: 370281
|
|
|
|
|
|
|
|
|
|
|
| |
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.
This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.
llvm-svn: 370280
|
|
|
|
|
|
|
|
|
| |
This reduces the number of SGPRs due to some concerns about running
out of SGPRs if you make all the SGPRs that aren't reserved available
for the calling convention.
Change-Id: Idb4ca4dc72f5b6808cb524ff7270915a8de5b4c1
llvm-svn: 370215
|
|
|
|
|
|
|
| |
If the result register already had a register class assigned, the
sources may not have been properly constrained.
llvm-svn: 370150
|
|
|
|
| |
llvm-svn: 370140
|
|
|
|
|
|
|
|
|
|
| |
Copied directly from the IR version.
Most of the testcases I've added for this are somewhat problematic
because they really end up testing the yet to be implemented version
for MUL_I24/MUL_U24.
llvm-svn: 370099
|
|
|
|
| |
llvm-svn: 370098
|
|
|
|
|
|
|
| |
This is something of a workaround since computeRegisterProperties
seems to be doing the wrong thing.
llvm-svn: 370086
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(-X) * (-Y) + Z --> X * Y + Z
This is a missing optimization that shows up as a potential regression in D66050,
so we should solve it first. We appear to be partly missing this fold in IR as well.
We do handle the simpler case already:
(-X) * (-Y) --> X * Y
And it might be beneficial to make the constraint less conservative (eg, if both
operands are cheap, but not necessarily cheaper), but that causes infinite looping
for the existing fmul transform.
Differential Revision: https://reviews.llvm.org/D66755
llvm-svn: 370071
|
|
|
|
|
|
|
|
|
| |
Fix typos. Use Hi and Lo prefixes for Or instead of LHS and RHS
to match names of surrounding variables.
Differential Revision: https://reviews.llvm.org/D66587
llvm-svn: 370062
|
|
|
|
|
|
|
|
|
| |
The problem these are supposed to work around can occur before the
intrinsics are lowered into the nodes. Try to directly simplify them
so they are matched before the bit assert operations can be optimized
out.
llvm-svn: 369994
|