| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It is off by default, but can be used
with --misched=si
Patch by: Axel Davy
Reviewers: arsenm, tstellarAMD, nhaehnle
Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D11885
llvm-svn: 257609
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.
Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.
Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.
With the IR generated by current Mesa, the changes are obviously relatively
minor:
7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)
Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)
Reviewers: tstellarAMD, arsenm, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15875
llvm-svn: 257074
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15629
llvm-svn: 256073
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.
Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)
Together with r255906, this fixes a VM fault in Unreal Elemental Demo.
While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15622
llvm-svn: 256072
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is just my first commit. Test!
Reviewers: none
Subscribers: none
Differential Revision: none
llvm-svn: 256022
|
|
|
|
|
|
| |
This reverts commit a493cb636e0152ad28210934a47c6c44b1437193.
llvm-svn: 256021
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is just my first commit. Test!
Reviewers: none
Subscribers: none
Differential Revision: none
llvm-svn: 256020
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The method insertNOPs expected the number of wait states to be passed as
parameter, while eliminateFrameIndex passed the immediate argument for the
S_NOP, leading to an off-by-one error. Rename the method to make the
meaning of its parameter clearer. The number of 4 / 5 wait states (which
is what the method has always _tried_ to do according to the comment) is
correct according to the hardware docs.
I stumbled upon this while trying to track down the cause of
https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed,
this patch unfortunately does not fix that bug...
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15542
llvm-svn: 255906
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15257
llvm-svn: 255204
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Don't use commuteInstruction, and don't commute if
doing so will not improve legality. Skip the more
complex checks for literal operands and constant bus restrictions,
which are not a concern for VOP2 instructions because src1
does not accept SGPRs or constants and few implicitly
read vcc.
This gets called quite a few times and the
attempts at commuting are a significant fraction
of the time spent in SIFixSGPRCopies, so it's
somewhat worthwhile to optimize. With this patch and others
leading up to it, this reduces the compile time of SIFixSGPRCopies
on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system.
llvm-svn: 254452
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.
If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.
This also only selectively enables all of the input registers
which are really required instead of always enabling them.
llvm-svn: 254331
|
|
|
|
| |
llvm-svn: 254330
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It does not work because of emergency stack slots.
This pass was supposed to eliminate dummy registers for the
spill instructions, but the register scavenger can introduce
more during PrologEpilogInserter, so some would end up
left behind if they were needed.
The potential for spilling the scratch resource descriptor
and offset register makes doing something like this
overly complicated. Reserve registers to use for the resource
descriptor and use them directly in eliminateFrameIndex.
Also removes creating another scratch resource descriptor
when directly selecting scratch MUBUF instructions.
The choice of which registers are reserved is temporary.
For now it attempts to pick the next available registers
after the user and system SGPRs.
llvm-svn: 254329
|
|
|
|
|
|
|
|
|
|
| |
v2: added more tests, moved the SALU->VALU conversion to a separate function
It looks like it's not possible to get subregisters in the S_ABS lowering
code, and I don't feel like guessing without testing what the correct code
would look like.
llvm-svn: 254095
|
|
|
|
|
|
| |
Test has a bogus verifier error which will be fixed by later commits.
llvm-svn: 252327
|
|
|
|
|
|
| |
The SGPR spill pseudos don't actually use them.
llvm-svn: 252324
|
|
|
|
|
|
|
| |
Instead of forcing 4 alignment when spilled, set register class
alignments.
llvm-svn: 252322
|
|
|
|
| |
llvm-svn: 252145
|
|
|
|
|
|
|
|
|
|
| |
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.
This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.
llvm-svn: 252140
|
|
|
|
|
|
| |
Add more comments etc.
llvm-svn: 251996
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was checking for a variety of situations that should
never happen. This saves a tiny bit of compile time.
We should not be selecting instructions with invalid operands in the
first place. Most of the time for registers copys are inserted
to the correct operand register class.
For VOP3, since all operand types are supported and literal
constants never are, we just need to verify the constant bus
requirements (all immediates should be legal inline ones).
The only possibly tricky case to maybe worry about is if when
legalizing operands in moveToVALU with s_add_i32 and similar
instructions. If the original s_add_i32 had a literal constant
and we need to replace it with v_add_i32_e64 we would have an
unsupported literal operand. However, I don't think we should worry
about that because SIFoldOperands should handle folding literal
constant operands into the SALU instructions based on the uses.
At SIFoldOperands time, the legality and profitability of
operand types is a bit different.
llvm-svn: 250951
|
|
|
|
|
|
|
| |
When verifying constant bus restrictions, this wasn't catching
uses in implicit operands.
llvm-svn: 250948
|
|
|
|
| |
llvm-svn: 250797
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This stops using an unknown reg class operand.
Currently build_vector selection has a broken looking check
where it tries to use a VGPR reg class and an SGPR one if it
sees an SGPR use.
With the source operand has an explicit VGPR class,
illegal copies will be inserted that SIFixSGPRCopies will take care
of normally later, which will allow removing the weird check
of build_vector users. Without this, when removed v_movrels_b32 would
still be emitted even though all of the values were only stored in
SGPRs.
llvm-svn: 249494
|
|
|
|
|
|
|
| |
Make sure we aren't accidentally not setting
these in the instruction definitions.
llvm-svn: 249170
|
|
|
|
|
|
|
|
|
|
| |
to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.
This is a candidate for stable llvm 3.7.
Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 248858
|
|
|
|
| |
llvm-svn: 248742
|
|
|
|
|
|
|
|
| |
When used recursively, this would set the kill flag
on the intermediate step from first splitting
x16 to x8.
llvm-svn: 248741
|
|
|
|
| |
llvm-svn: 248740
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The splitting of > 4 dword SMRD instructions
if using an offset in an SGPR instead of an immediate
was not setting the destination register,
resulting an an instruction missing an operand
which would assert later.
Test will be included in a following commit
which fixes a related issue.
llvm-svn: 248739
|
|
|
|
|
|
|
|
|
|
| |
mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)
Differential Revision: http://reviews.llvm.org/D11370
llvm-svn: 248735
|
|
|
|
|
|
|
|
|
| |
It's easier to understand creating a full instruction
than the current situation where sometimes a new
instruction is created and sometimes it is awkwardly
mutated in place.
llvm-svn: 248627
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When buffer resource descriptors were built, the upper two components
of the descriptor were first composed into a 64-bit register because
legalizeOperands assumed all operands had the same register class.
Fix that problem, but keep the workaround. I'm not sure anything
actually is actually emitting such a REG_SEQUENCE now.
If multiple resource descriptors are set up with different base
pointers, this is copied with a single s_mov_b64. We probably
should fix this better by recognizing a pair of s_mov_b32 later,
but for now delete the dead code.
llvm-svn: 248585
|
|
|
|
|
|
| |
This avoids needting to re-legalize the new REG_SEQUENCE.
llvm-svn: 248584
|
|
|
|
| |
llvm-svn: 248476
|
|
|
|
| |
llvm-svn: 248475
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.
This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.
llvm-svn: 248467
|
|
|
|
|
|
|
| |
If the instruction doesn't have enough operands, it
either shouldn't be marked as isCommutable or is malformed.
llvm-svn: 248242
|
|
|
|
| |
llvm-svn: 247230
|
|
|
|
|
|
|
|
|
|
|
|
| |
Instead of extracting both 32-bit components from the 128-bit
register. This produces fewer copies and is easier for
the copy peephole optimizer to understand and see the actual uses
as extracts from a reg_sequence.
This avoids needing to handle subregister composing in the
PeepholeOptimizer's ValueTracker for this case.
llvm-svn: 247162
|
|
|
|
|
|
|
| |
These are already added during the MachineInstr construction,
so this was adding the implicit registers twice.
llvm-svn: 246525
|
|
|
|
| |
llvm-svn: 246357
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without a memory operand, mayLoad or mayStore instructions
are treated as hasUnorderedMemRef, which results in much worse
scheduling.
We really should have a verifier check that any
non-side effecting mayLoad or mayStore has a memory operand.
There are a few instructions (interp and images) which I'm
not sure what / where to add these.
llvm-svn: 246356
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no context where s_mov_b64 is emitted
and could potentially be moved to the VALU.
It is currently only emitted for materializing
immediates, which can't be dependent on vector sources.
The immediate splitting is already done when selecting
constants. I'm not sure what contexts if any the register
splitting would have been used before.
Also clean up using s_mov_b64 in place of v_mov_b64_pseudo,
although this isn't required and just skips the extra step
of eliminating the copy from the SReg_64.
llvm-svn: 246080
|
|
|
|
| |
llvm-svn: 246079
|
|
|
|
|
|
|
| |
This wouldn't propagate to users of the original BFE
and would hit a verifier error.
llvm-svn: 246078
|
|
|
|
|
|
|
|
|
|
|
|
| |
When splitting 64-bit operations, create the correct
VALU instructions immediately.
This was splitting things like s_or_b64 into the two
s_or_b32s and then pushing the new instructions
onto the worklist. There's no reason we need
to do this intermediate step.
llvm-svn: 246077
|
|
|
|
| |
llvm-svn: 244402
|
|
|
|
| |
llvm-svn: 244380
|
|
|
|
|
|
| |
For the same reasons as the other physical registers.
llvm-svn: 244062
|