| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
This reverts commit r285939 and r285948. These broke some conformance tests.
llvm-svn: 285995
|
|
|
|
|
|
|
|
| |
Patch By: Wei Ding
Differential Revision: https://reviews.llvm.org/D18049
llvm-svn: 285939
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
A single flat memory operations that might access the scratch buffer
can only access MaxPrivateElementSize bytes.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25788
llvm-svn: 285198
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The v_movreld machine instruction is used with three operands that are
in a sense tied to each other (the explicit VGPR_32 def and the implicit
VGPR_NN def and use). There is no way to express that using the currently
available operand bits, and indeed there are cases where the Two Address
instructions pass does the wrong thing.
This patch introduces a new set of pseudo instructions that are identical
in intended semantics as v_movreld, but they only have two tied operands.
Having to add a new set of pseudo instructions is admittedly annoying, but
it's a fairly straightforward and solid approach. The only alternative I
see is to try to teach the Two Address instructions pass about Three Address
instructions, and I'm afraid that's trickier and is going to end up more
fragile.
Note that v_movrels does not suffer from this problem, and so this patch
does not touch it.
This fixes several GL45-CTS.shaders.indexing.* tests.
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25633
llvm-svn: 284980
|
|
|
|
|
|
|
|
|
| |
This will prevent following regression when enabling i16 support (D18049):
test/CodeGen/AMDGPU/cvt_f32_ubyte.ll
Differential Revision: https://reviews.llvm.org/D25805
llvm-svn: 284891
|
|
|
|
|
|
|
|
| |
relocations instead of fixups (amdhsa only)
Differential Revision: https://reviews.llvm.org/D25693
llvm-svn: 284759
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25203
llvm-svn: 284398
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If we are loading an i16 value from a 32-bit memory location, then
we need to be able to truncate the loaded value to i16.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25198
llvm-svn: 284397
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The hardware doesn't support this.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25523
llvm-svn: 284257
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25312
llvm-svn: 284215
|
|
|
|
|
|
|
|
| |
and global address space variables
Differential Revision: https://reviews.llvm.org/D25562
llvm-svn: 284196
|
|
|
|
|
|
|
|
| |
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.
llvm-svn: 284119
|
|
|
|
| |
llvm-svn: 284116
|
|
|
|
|
|
|
|
|
|
|
| |
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.
llvm-svn: 284031
|
|
|
|
|
|
|
| |
Allow inserting multiple instructions in the
expanded loop.
llvm-svn: 283177
|
|
|
|
|
|
|
|
| |
instructions
Differential Revision: https://reviews.llvm.org/D24125
llvm-svn: 282624
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.
The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.
llvm-svn: 281824
|
|
|
|
|
|
|
|
|
|
|
| |
These clean up some unnecessary or instructions in
cases with complex loops.
In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.
llvm-svn: 281786
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument. The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.
This consolidates all the logic AMDGPU uses for deducing memory types into a single
function. This will make it much easier to support different ABIs in the future.
Reviewers: arsenm
Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl
Differential Revision: https://reviews.llvm.org/D24614
llvm-svn: 281781
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22783
llvm-svn: 281779
|
|
|
|
|
|
|
|
| |
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.
llvm-svn: 281488
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
An IR load can be invariant, dereferenceable, neither, or both. But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.
This patch splits up the notions of invariance and dereferenceability at
the MI level. It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D23371
llvm-svn: 281151
|
|
|
|
|
|
|
| |
s should be SReg_32 to be as general as possible. This can avoid a copy
from m0.
llvm-svn: 280154
|
|
|
|
|
|
|
|
| |
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.
llvm-svn: 279901
|
|
|
|
| |
llvm-svn: 279408
|
|
|
|
| |
llvm-svn: 278965
|
|
|
|
|
|
|
| |
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.
llvm-svn: 278902
|
|
|
|
|
|
| |
Add test if the constant offset looks unaligned.
llvm-svn: 278589
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
adjustments.
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.
Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
for a maximum size allowed and relies on the next condition checking for %4 for correctness.
This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).
Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.
Reviewers: arsenm, jlebar, tstellarAMD
Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D23068
llvm-svn: 277735
|
|
|
|
| |
llvm-svn: 277535
|
|
|
|
|
|
|
| |
This should really be true for any immediate, not just
inline ones.
llvm-svn: 277260
|
|
|
|
|
|
|
| |
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.
llvm-svn: 277017
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D22482
llvm-svn: 276998
|
|
|
|
|
|
|
| |
This no longer uses the more complicated classification
of constants.
llvm-svn: 276945
|
|
|
|
|
|
|
|
|
| |
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.
llvm-svn: 276766
|
|
|
|
|
|
|
| |
This could use some additional optimization work
to use mad/mac legacy.
llvm-svn: 276764
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D22732
llvm-svn: 276682
|
|
|
|
|
|
|
|
|
| |
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.
This should probably be merged to the release branch.
llvm-svn: 276438
|
|
|
|
| |
llvm-svn: 276437
|
|
|
|
| |
llvm-svn: 276434
|
|
|
|
| |
llvm-svn: 276257
|
|
|
|
|
|
|
|
|
|
|
| |
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.
Simplify the lowering tests by using per function
subtarget features.
llvm-svn: 276051
|
|
|
|
|
|
|
|
|
|
|
| |
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.
These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.
llvm-svn: 275988
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.
The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.
v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.
llvm-svn: 275934
|
|
|
|
| |
llvm-svn: 275871
|
|
|
|
|
|
|
|
| |
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.
Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.
llvm-svn: 275617
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).
Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.
This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted. It also greatly simplifies the process of adding another flag
to getLoad.
Reviewers: chandlerc, tstellarAMD
Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits
Differential Revision: http://reviews.llvm.org/D22249
llvm-svn: 275592
|
|
|
|
| |
llvm-svn: 275508
|
|
|
|
| |
llvm-svn: 275371
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21484
llvm-svn: 275268
|