| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 264736
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.
This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.
shader-db stats:
2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18451
llvm-svn: 264589
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're erasing MI here, but then immediately using it again inside the
`if`. This moves the erase after we're done using it.
Doing that reveals a second problem though - this case is missing a
break, so we fall through to the default and dereference MI again.
This is obviously a bug, though I don't know how to write a test that
triggers it - all we do in the error case is print some extra debug
output.
Both of these issue crash on lots of tests under ASAN with the
recycling allocator changes from PR26808 applied.
llvm-svn: 264442
|
|
|
|
|
|
|
| |
This resolves bug 21148 by preventing promotion to
i64 induction variables.
llvm-svn: 264376
|
|
|
|
| |
llvm-svn: 264374
|
|
|
|
|
|
| |
We don't want to have a cost to scalarizing operations.
llvm-svn: 264364
|
|
|
|
|
|
| |
Patch By: Sonny Jiang
llvm-svn: 264295
|
|
|
|
| |
llvm-svn: 264255
|
|
|
|
|
|
|
| |
There is no benefit to these since materializing the constant 1
requires the same number of instructions as materializing uint_max
llvm-svn: 264215
|
|
|
|
| |
llvm-svn: 264214
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Strengthen tests of storing frame indices.
Right now this just creates irrelevant scheduling changes.
We don't want to have multiple frame index operands
on an instruction. There seem to be various assumptions
that at least the same frame index will not appear twice
in the LocalStackSlotAllocation pass.
There's no reason to have this happen, and it just
makes it easy to introduce bugs where the immediate
offset is appplied to the storing instruction when it should
really be applied to the value being stored as a separate
add.
This might not be sufficient. It might still be problematic
to have an add fi, fi situation, but that's even less unlikely
to happen in real code.
llvm-svn: 264200
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D18351
llvm-svn: 264137
|
|
|
|
|
|
|
|
|
|
| |
We can statically decide whether or not a register pressure set is for
SGPRs or VGPRs, so we don't need to re-compute this information in
SIRegisterInfo::getRegPressureSetLimit().
Differential Revision: http://reviews.llvm.org/D14805
llvm-svn: 264126
|
|
|
|
|
|
|
| |
Fixes Valgrind errors on the test cases that were reported as failing
by buildbots.
llvm-svn: 264000
|
|
|
|
|
|
|
| |
I meant to add these before committing r263982 as per the review,
but I forgot to squash.
llvm-svn: 263983
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Whole quad mode is already enabled for pixel shaders that compute
derivatives, but it must be suspended for instructions that cause a
shader to have side effects (i.e. stores and atomics).
This pass addresses the issue by storing the real (initial) live mask
in a register, masking EXEC before instructions that require exact
execution and (re-)enabling WQM where required.
This pass is run before register coalescing so that we can use
machine SSA for analysis.
The changes in this patch expose a problem with the second machine
scheduling pass: target independent instructions like COPY implicitly
use EXEC when they operate on VGPRs, but this fact is not encoded in
the MIR. This can lead to miscompilation because instructions are
moved past changes to EXEC.
This patch fixes the problem by adding use-implicit operands to
target independent instructions. Some general codegen passes are
relaxed to work with such implicit use operands.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: MatzeB, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18162
llvm-svn: 263982
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When control flow is implemented using the exec mask, the compiler will
insert branch instructions to skip over the masked section when exec is
zero if the section contains more than a certain number of instructions.
The previous code would only count instructions in successor blocks,
and this patch modifies the code to start counting instructions in all
blocks between the start and end of the branch.
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18282
llvm-svn: 263969
|
|
|
|
|
|
|
| |
These instructions do not have the same negative base
address problem that DS instructions do on SI.
llvm-svn: 263964
|
|
|
|
| |
llvm-svn: 263948
|
|
|
|
|
|
| |
This fixes an issue with rL263658 pointed out by Tom Stellard.
llvm-svn: 263823
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18255
llvm-svn: 263792
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).
The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, rivanvx, llvm-commits
Differential Revision: http://reviews.llvm.org/D18151
llvm-svn: 263791
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.
Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18218
llvm-svn: 263790
|
|
|
|
|
| |
Review: http://reviews.llvm.org/D18267
llvm-svn: 263789
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Symmary:
ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed.
Reviewers
arsenm, tstellarAMD
Subscribers
llvm-commits, arsenm
Differential Revision:
http://reviews.llvm.org/D18197
llvm-svn: 263720
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
As explained by the comment, threads will typically see different values
returned by atomic instructions even if the arguments are equal.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18156
llvm-svn: 263719
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18137
llvm-svn: 263658
|
|
|
|
|
|
|
|
|
|
|
|
| |
And emit an error if it fails.
This prevents illegal instructions from getting sent to the GPU, which
would potentially result in a hang.
This is a candidate for the stable branch(es).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 263627
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 263626
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.
Reviewers:
arsenm
tstellarAMD
Subscribers
llvm-commits, arsenm
Differential Revision:
http://reviews.llvm.org/D18064
llvm-svn: 263563
|
|
|
|
| |
llvm-svn: 263448
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17543
llvm-svn: 263447
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D18058
llvm-svn: 263441
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When multiple threads perform an atomic op with the same arguments, they
will usually see different return values.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18101
llvm-svn: 263440
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit.
s_rfe_b64 has just one destination operand and no source.
Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that.
Add s_memrealtime test and change comments in smem.s to follow common style.
Change test for s_memtime to use non-zero register to make it really test encoding.
Add tests for s_buffer_load*.
Add tests for SOPC instructions (same for SI and VI)
Differential Revision: http://reviews.llvm.org/D18040
llvm-svn: 263420
|
|
|
|
| |
llvm-svn: 263411
|
|
|
|
| |
llvm-svn: 263409
|
|
|
|
| |
llvm-svn: 263407
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17966
llvm-svn: 263242
|
|
|
|
|
|
|
|
|
|
| |
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.
Differential Revision: http://reviews.llvm.org/D17984
llvm-svn: 263212
|
|
|
|
|
|
|
|
| |
Frontend authors are strongly encouraged to keep allocas
in the entry block, so don't bother visiting every instruction
in the other blocks of the function.
llvm-svn: 263206
|
|
|
|
|
|
|
| |
Move a few functions only used by R600 to R600 specific code,
fix header macros to stop using R600, mark classes as final.
llvm-svn: 263204
|
|
|
|
|
|
|
| |
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.
llvm-svn: 263201
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.
The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.
For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.
BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17277
llvm-svn: 263140
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Define s_getreg intrinsic to generate s_getreg instruction to read
hardware registers.
Reviewers: tstellarAMD, arsenm
Subscribers: llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D17892
llvm-svn: 263124
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D17651
llvm-svn: 263108
|
|
|
|
|
|
| |
http://reviews.llvm.org/D17967
llvm-svn: 263021
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes error from building with clang:
/usr/local/google/home/tejohnson/llvm/llvm_15/lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp:407:12:
error: comparison of unsigned expression >= 0 is always true
[-Werror,-Wtautological-compare]
if ((Imm >= 0x000) && (Imm <= 0x0ff)) {
~~~ ^ ~~~~~
llvm-svn: 263014
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.
ToDo:
- VOP2bInst:
- vcc is considered as operand
- AsmMatcher doesn't apply mnemonic aliases when parsing operands
- v_mac_f32
- v_nop
- disable instructions with 64-bit operands
- change dpp_ctrl assembler representation to conform sp3
Review: http://reviews.llvm.org/D17804
llvm-svn: 263008
|
|
|
|
|
|
|
|
|
| |
Support legacy SP3 abs(v1) syntax. InstPrinter still uses |v1|.
Add tests.
Differential Revision: http://reviews.llvm.org/D17887
llvm-svn: 263006
|