| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
| |
Tests deferred since the existing DAG test depends on some other
operations, but isn't far from working as-is.
|
| |
|
|
|
|
|
|
|
|
| |
Follow-up of D72172.
Reviewed By: jhenderson, rnk
Differential Revision: https://reviews.llvm.org/D72180
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.
It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.
Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.
The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.
In any case, downstream projects which don't know `Address` can pass 0 as
the argument.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D72172
|
| |
|
|
|
|
| |
This will enable automatic GlobalISel support in a future commit.
|
|
|
|
|
|
|
| |
We have a lot of complex pattern variants that just set the source
modifiers that are really handled, and then set the output modifiers
to 0. We're unlikely to ever match output modifiers from the use
instruction side, and we already match clamp/omod in a separate pass.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This solves selection failures with generated selection patterns,
which would fail due to inferring the SGPR reg bank for virtual
registers with a set register class instead of VCC bank. Use
instruction selection would constrain the virtual register to a
specific class, so when the def was selected later the bank no longer
was set to VCC.
Remove the SCC reg bank. SCC isn't directly addressable, so it
requires copying from SCC to an allocatable 32-bit register during
selection, so these might as well be treated as 32-bit SGPR values.
Now any scalar boolean value that will produce an outupt in SCC should
be widened during RegBankSelect to s32. Any s1 value should be a
vector boolean during selection. This makes the vcc register bank
unambiguous with a normal SGPR during selection.
Summary of how this should now work:
- G_TRUNC is always a no-op, and never should use a vcc bank result.
- SALU boolean operations should be promoted to s32 in RegBankSelect
apply mapping
- An s1 value means vcc bank at selection. The exception is for
legalization artifacts that use s1, which are never VCC. All other
contexts should infer the VCC register classes for s1 typed
registers. The LLT for the register is now needed to infer the
correct register class. Extensions with vcc sources should be
legalized to a select of constants during RegBankSelect.
- Copy from non-vcc to vcc ensures high bits of the input value are
cleared during selection.
- SALU boolean inputs should ensure the inputs are 0/1. This includes
select, conditional branches, and carry-ins.
There are a few somewhat dirty details. One is that G_TRUNC/G_*EXT
selection ignores the usual register-bank from register class
functions, and can't handle truncates with VCC result banks. I think
this is OK, since the artifacts are specially treated anyway. This
does require some care to avoid producing cases with vcc. There will
also be no 100% reliable way to verify this rule is followed in
selection in case of register classes, and violations manifests
themselves as invalid copy instructions much later.
Standard phi handling also only considers the bank of the result
register, and doesn't insert copies to make the source banks
match. This doesn't work for vcc, so we have to manually correct phi
inputs in this case. We should add a verifier check to make sure there
are no phis with mixed vcc and non-vcc register bank inputs.
There's also some duplication with the LegalizerHelper, and some code
which should live in the helper. I don't see a good way to share
special knowledge about what types to use for intermediate operations
depending on the bank for example. Using the helper to replace
extensions with selects also seems somewhat awkward to me.
Another issue is there are some contexts calling
getRegBankFromRegClass that apparently don't have the LLT type for the
register, but I haven't yet run into a real issue from this.
This also introduces new unnecessary instructions in most cases, since
we don't yet try to optimize out the zext when the source is known to
come from a compare.
|
|
|
|
|
| |
Mostly copied from AMDGPU lowering implementation, except used
G_SITOFP instead of directly creating a select on -1.0, 0.0.
|
|
|
|
|
|
|
|
| |
This would complain about invalid legalizer rules otherwise.
Mark some operations as unsupported for AMDGPU. This currently seems
to produce the same legalize error as when no rules are defined, but
eventually this should produce a proper user facing error.
|
|
|
|
|
|
| |
The existing test only covered one case for r600. The use of
mul_legacy also looks suspicious to me, but leave it for now. The
patterns are also not making use of source modifiers.
|
|
|
|
|
| |
This solves one GlobalISel importer error, but the pattern still fails
for another reason.
|
| |
|
|
|
|
| |
Add "unreachable" default case to AMDGPUTargetStreamer::getArchNameFromElfMach
|
| |
|
|
|
|
|
|
|
|
| |
This assumed a 32-bit extract size, which would produce invalid copies
with 64-bit extracts. Handle the easy case. Ideally we would have a
way to get the proper subreg index for any 32-bit offset, but there
should probably be a tablegenerated way of getting the subreg index
for any size and offset.
|
|
|
|
|
|
|
|
| |
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D72143
Patch by Kazuaki Ishizaki.
|
|
|
|
|
|
| |
Implementing the APFloat part in PR4745.
Differential Revision: https://reviews.llvm.org/D69770
|
|
|
|
|
|
| |
This only handled G_SDIV, but they all are trivially scalarizable.
Also define placeholder AMDGPU division legalizer rules.
|
|
|
|
|
| |
Fix selecting these for volatile global loads, and ensure the loads
are constant enough.
|
|
|
|
|
| |
The attempts to widen sufficently aligned, odd sized loads wasn't
consistently applied.
|
|
|
|
|
|
|
| |
This produces more intelligible looking results, more comparabble to
the DAG output in the simplest cases. This is probably wrong in
complex control flow, but RegBankSelect doesn't attempt analyzing if
this is on a masked path for selecting the bank yet.
|
|
|
|
|
|
|
|
|
|
|
| |
We're checking the current register bank of the registers in the
instruction, but the mapping may have inserted cross bank copies and
is expecting to replace the registers.
We mostly get away with this currently, because VGPR->SGPR copies are
illegal, and we assume this won't happen. In a future change, we'll
start relying on more cross register bank copies being inserted, and
this starts to break down.
|
|
|
|
|
|
|
|
|
|
| |
same address to avoid WAR conflict.
Reviewers: rampitec, vpykhtin, nhaehnle
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D71934
|
|
|
|
|
|
|
|
|
|
| |
We can revert region schedule if new schedule decreases occupancy.
However, if we already have only one wave we would accept any new
schedule even if it blows up register pressure. Such schedule may
result in quite heavy spilling which can be avoided if we reject
this new schedule.
Differential Revision: https://reviews.llvm.org/D72181
|
|
|
|
|
|
| |
AMDGPU can't unambiguously go back from the selected instruction
register class to the register bank without knowing if this was used
in a boolean context.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- CF users won't be non-instruction values. Skip them to save the
compilation time. It's especially true when there are multiple
functions in that module, where, says, a constant may be used in most
functions. The current CF user tracing adds significant overhead.
Reviewers: alex-t, rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72174
|
|
|
|
|
|
| |
There are some things that are shareable between the legalizer,
regbankselect, and the selector that don't have an obvious place to
go.
|
|
|
|
|
| |
This would incorrectly allowing folding immediates. These currently
aren't selectable, but will be from GlobalISel soon.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).
There's no major functionality change, except for targets that never
implemented this check.
This LLVM attribute was originally added in
d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015).
Reviewers: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D72118
|
|
|
|
| |
This should be looking at the RHS of the add for a constant.
|
|
|
|
|
| |
The tablegen emitter now handles the immediate operand correctly, so
let the generatedd matcher works.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: - Other counters are accidentally cleared.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71866
|
|
|
|
|
|
|
|
|
|
|
| |
instead of creating a MERGE_VALUES node. NFCI
This allows us to clean up some places that were peeking through
the MERGE_VALUES node after the call. By returning the SDValues
directly, we can clean that up.
Unfortunately, there are several call sites in AMDGPU that wanted
the MERGE_VALUES and now need to create their own.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
The path already used for f16/f32 works a lot better when v_trunc_f64
is available.
|
|
|
|
|
|
|
|
|
| |
Sometimes the result bank of the phi is already assigned to something,
and should not be ignored. This is in preparation for additional
boolean phi handling changes.
Also refine the logic to fix some cases that were incorrectly deciding
to use SGPRs.
|
|
|
|
|
| |
This matches the DAG behavior where we don't use SReg_32_XM0
everywhere anymore, and fixes not coalescing the copies into m0.
|
| |
|
| |
|
|
|
|
|
|
| |
There ended up being two result registers, which would fail on
select. It was really defing a new temp register in the correct def
position, instead of the correct result register.
|
| |
|
| |
|