| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Delete inliner before replacing it.
llvm-svn: 313723
|
| |
|
|
| |
llvm-svn: 313719
|
| |
|
|
| |
llvm-svn: 313718
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Also starts selecting global loads for constant address
in some cases. Some end up selecting to mubuf still, which
requires investigation.
We still get sub-optimal regalloc and extra waitcnts inserted
due to not really tracking the liveness of the separate register
halves.
llvm-svn: 313716
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D36849
llvm-svn: 313714
|
| |
|
|
|
|
| |
Try to use a consistent naming scheme.
llvm-svn: 313713
|
| |
|
|
| |
llvm-svn: 313712
|
| |
|
|
|
|
|
|
|
| |
The pre-RA scheduler does load/store clustering, but post-RA
scheduler undoes it. Add mutation to prevent it.
Differential Revision: https://reviews.llvm.org/D38014
llvm-svn: 313670
|
| |
|
|
|
|
|
|
| |
The relocations used for externally visible functions
aren't supported, so the direct call emitted ends
up hitting a linker error.
llvm-svn: 313616
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37981
llvm-svn: 313565
|
| |
|
|
| |
llvm-svn: 313302
|
| |
|
|
|
|
| |
You can't use madmk/madmk if it already uses an SGPR input.
llvm-svn: 313298
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D35089
llvm-svn: 313297
|
| |
|
|
| |
llvm-svn: 313282
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because the stack growth direction and addressing is done
in the same direction, modifying SP at the beginning of the
call sequence was incorrect. If we had a stack passed argument,
we would end up skipping that number of bytes before pushing
arguments, leaving unused/inconsistent space.
The callee creates fixed stack objects in its frame, so
the space necessary for these is already logically allocated
in the callee, so we just let the callee increment SP if
it really requires it.
llvm-svn: 313279
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Using SplitCSR for the frame register was very broken. Often
the copies in the prolog and epilog were optimized out, in addition
to them being inserted after the true prolog where the FP
was clobbered.
I have a hacky solution which works that continues to use
split CSR, but for now this is simpler and will get to working
programs.
llvm-svn: 313274
|
| |
|
|
| |
llvm-svn: 313217
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MachineScheduler when clustering loads or stores checks if base
pointers point to the same memory. This check is done through
comparison of base registers of two memory instructions. This
works fine when instructions have separate offset operand. If
they require a full calculated pointer such instructions can
never be clustered according to such logic.
Changed shouldClusterMemOps to accept base registers as well and
let it decide what to do about it.
Differential Revision: https://reviews.llvm.org/D37698
llvm-svn: 313208
|
| |
|
|
|
|
| |
Missed in r312936
llvm-svn: 313205
|
| |
|
|
| |
llvm-svn: 312936
|
| |
|
|
|
|
|
|
|
|
| |
These two instructions are normally selected, but when the
two address pass converts mac into mad we end up with the
mad where we could have one of these.
Differential Revision: https://reviews.llvm.org/D37389
llvm-svn: 312928
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports
the exec mask as the valid mask to determine which pixels to render.
This commit marks any exp as needing to be in exact mode.
Actually, if there are multiple mrt exps, only one needs to have vm=1,
and only that one needs to be in exact mode. But that is an optimization
for another day.
Differential Revision: https://reviews.llvm.org/D36305
llvm-svn: 312915
|
| |
|
|
|
|
| |
... to check commit access for new committer.
llvm-svn: 312900
|
| |
|
|
| |
llvm-svn: 312836
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We have a lot of operand definition work essentially producing
every valid permutation of operands to workaround builiding
operand lists based on the instruction features. Apparently tablegen
already has a mostly undocumented operator to concat dags which
simplies this.
Convert one simple place to use this. The BUF instruction definitions
have much more complicated logic that can be totally rewritten now.
llvm-svn: 312822
|
| |
|
|
|
|
|
|
| |
The various scalar bit operations set SCC,
so one is erased or moved it needs to be recomputed.
Not sure why the existing tests don't fail on this.
llvm-svn: 312819
|
| |
|
|
| |
llvm-svn: 312732
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D36862
llvm-svn: 312729
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37397
llvm-svn: 312725
|
| |
|
|
|
|
|
| |
Keeping non-i16 extloads makes it easier to match some new
gfx9 load instructions.
llvm-svn: 312699
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37325
llvm-svn: 312676
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37522
llvm-svn: 312660
|
| |
|
|
|
|
|
|
| |
Flat loads do not have vdata operand but have vdst instead.
Differential Revision: https://reviews.llvm.org/D37502
llvm-svn: 312640
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Mesa still uses a hack where empty inline assembly is used as a kind of
optimization barrier. This exposed a problem where not enough wait states
were inserted, because the hazard recognizer implicitly assumed that each
inline assembly "instruction" has at least one wait state.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D37205
llvm-svn: 312635
|
| |
|
|
|
|
|
|
|
| |
When packet size equals packet align and is power of 2, transform
__read_pipe* and __write_pipe* to specialized library function.
Differential Revision: https://reviews.llvm.org/D36831
llvm-svn: 312598
|
| |
|
|
|
|
|
|
|
| |
- Refactor SIMemOpInfo's constructors
- Allow construction of NotAtomic SIMemOpInfo
Differential Revision: https://reviews.llvm.org/D37396
llvm-svn: 312563
|
| |
|
|
|
|
|
|
| |
If the only call in a function is a tail call, the
function isn't considered to have a call since it's a
type of return.
llvm-svn: 312561
|
| |
|
|
|
|
|
|
|
|
| |
- Make SIMemOpInfo a class
- Add accessor methods to SIMemOpInfo
- Move get*Info methods to SIMemOpInfo
Differential Revision: https://reviews.llvm.org/D37395
llvm-svn: 312541
|
| |
|
|
|
|
|
|
|
| |
- Rename MemOpInfo -> SIMemOpInfo
- Move SIMemOpInfo class out of SIMemoryLegalizer class
Differential Revision: https://reviews.llvm.org/D37394
llvm-svn: 312540
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37392
llvm-svn: 312364
|
| |
|
|
| |
llvm-svn: 312349
|
| |
|
|
|
|
|
| |
Doesn't include the tied operand necessary for the loads,
but is enough for the assembler to work.
llvm-svn: 312347
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes a bug that was exposed on gfx9 in various
GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests,
e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D36193
llvm-svn: 312337
|
| |
|
|
| |
llvm-svn: 312297
|
| |
|
|
|
|
| |
warnings; other minor fixes. Also affected in files (NFC).
llvm-svn: 312289
|
| |
|
|
|
|
|
|
|
|
| |
build_vector is a more useful canonical form when
pattern matching packed operations, so turn shift
into high element into a build_vector.
Should show no change for now.
llvm-svn: 312282
|
| |
|
|
|
|
| |
Also refine for f16 and rcp cases.
llvm-svn: 312213
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The majority of the time spent in the pass checking
for the register reads. Rather than searching all of
the defined registers for uses in each instruction,
use a set of defined registers and check the operands
of the instruction.
This process still is algorithmically not great,
but with the additional trick of skipping the analysis
for addresses with one use, this brings one slow
testcase into a reasonable range.
llvm-svn: 312206
|
| |
|
|
|
|
|
|
|
|
|
|
| |
These aren't really packed instructions, so the default
op_sel_hi should be 0 since this indicates a conversion.
The operand types are scalar values that behave similar
to an f16 scalar that may be converted to f32.
Doesn't change the default printing for op_sel_hi, just
the parsing.
llvm-svn: 312179
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The merge is only possible if the base address register is the
same for the two instructions. If there is only the one use,
there's no point in doing an expensive forward scan checking
for memory interference looking for a merge candidate.
This gives a signficant improvement in one extreme testcase.
The code to do the scan is still algorithmically terrible,
so this is still the slowest pass in that example.
llvm-svn: 312096
|