| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 280841
|
| |
|
|
|
|
|
|
| |
OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata.
Differential Revision: https://reviews.llvm.org/D23424
llvm-svn: 280829
|
| |
|
|
| |
llvm-svn: 280784
|
| |
|
|
|
|
|
|
| |
This reapplies r252565 and r252674, effectively reverting r252956.
This allows VS_32/VS_64 to be unallocatable like they should be.
llvm-svn: 280783
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch
Patch by Tom Stellard and Konstantin Zhuravlyov
Differential Revision: https://reviews.llvm.org/D21562
llvm-svn: 280747
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I put this code here, because I want to re-use it in a few other places.
This supersedes some of the immediate folding code we have in SIFoldOperands.
I think the peephole optimizers is probably a better place for folding
immediates into copies, since it does some register coalescing in the same time.
This will also make it easier to transition SIFoldOperands into a smarter pass,
where it looks at all uses of instruction at once to determine the optimal way to
fold operands. Right now, the pass just considers one operand at a time.
Reviewers: arsenm
Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23402
llvm-svn: 280744
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D24276
llvm-svn: 280742
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D24072
llvm-svn: 280655
|
| |
|
|
| |
llvm-svn: 280595
|
| |
|
|
|
|
|
|
| |
I'm not sure if this should be considered a bug in
copyImplicitOps or not, but implicit operands that are part
of the static instruction definition should not be copied.
llvm-svn: 280594
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This contains two changes that reduce the time spent in WQM, with the
intention of reducing bandwidth required by VMEM loads:
1. Sampling instructions by themselves don't need to run in WQM, only their
coordinate inputs need it (unless of course there is a dependent sampling
instruction). The initial scanInstructions step is modified accordingly.
2. When switching back from WQM to Exact, switch back as soon as possible.
This affects the logic in processBlock.
This should always be a win or at best neutral.
There are also some cleanups (e.g. remove unused ExecExports) and some new
debugging output.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D22092
llvm-svn: 280590
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.
The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.
Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23131
llvm-svn: 280589
|
| |
|
|
|
|
|
|
|
| |
readlane/writelane do not support using m0 as the output/input.
Constrain the register class of spill vregs to try to avoid this,
but also handle spilling of the physreg when necessary by inserting
an additional copy to a normal SGPR.
llvm-svn: 280584
|
| |
|
|
|
|
|
|
|
|
| |
have the same number of elements.
Fixes R600 piglit regressions since r280298
Differential Revision: https://reviews.llvm.org/D24174
llvm-svn: 280535
|
| |
|
|
|
|
|
|
|
| |
LOCAL and GLOBAL AS only
PRIVATE needs special treatment
Differential Revision: https://reviews.llvm.org/D23971
llvm-svn: 280526
|
| |
|
|
|
|
|
|
| |
Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer.
Differential Revision: https://reviews.llvm.org/D24145
llvm-svn: 280399
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Created a new td file MIMGInstructions.td which contains all definitions
of MIMG related instructions.
Reviewed by:
kzhuravl, vpykhtin
Differential Revision:
http://reviews.llvm.org/D24106
llvm-svn: 280385
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D23996
llvm-svn: 280349
|
| |
|
|
| |
llvm-svn: 280298
|
| |
|
|
|
|
|
| |
This occurs before RA pseudos are expanded. It's less
code to emit the copy.
llvm-svn: 280297
|
| |
|
|
|
|
| |
This will make future changes easier.
llvm-svn: 280296
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: This fixes some OpenCV tests that were broken by libclc commit r276443.
Reviewers: arsenm, jvesely
Subscribers: arsenm, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24051
llvm-svn: 280274
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Simply replace usage of aliases to functions with aliasee.
This came up when bitcode linking to builtin library and
calls to aliases not being resolved.
Also made minor improvements to existing test.
Reviewers: tstellarAMD, alex-t, vpykhtin
Subscribers: arsenm, wdng, rampitec
Differential Revision: https://reviews.llvm.org/D24023
llvm-svn: 280221
|
| |
|
|
|
|
|
| |
s should be SReg_32 to be as general as possible. This can avoid a copy
from m0.
llvm-svn: 280154
|
| |
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D23617
llvm-svn: 280101
|
| |
|
|
| |
llvm-svn: 280075
|
| |
|
|
|
|
|
|
|
| |
Move SDLoc initialization to comon place.
fall back to AMDGPU version in one place
Differential Revision: https://reviews.llvm.org/D23900
llvm-svn: 280030
|
| |
|
|
|
|
|
|
| |
This is handled by DAGCombiner in a more generic way
Differential Revision: https://reviews.llvm.org/D23970
llvm-svn: 280019
|
| |
|
|
| |
llvm-svn: 280006
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.
It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.
Shader DB stats:
32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)
Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D23688
llvm-svn: 279995
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The SILoadStoreOptimizer can now look ahead more then one instruction when
looking for instructions to merge, which greatly improves the number of
loads/stores that we are able to merge.
Moving the pass before scheduling avoids increasing register pressure after
the scheduler, so that the scheduler's register pressure estimates will be
more accurate. It also gives more consistent results, since it is no longer
affected by minor scheduling changes.
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D23814
llvm-svn: 279991
|
| |
|
|
|
|
| |
Fixes bug 29289
llvm-svn: 279986
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For shrinking SOPK instructions, we were creating a hint to tell the
register allocator to use the register allocated for src0 for the dst
operand as well. However, this seems to not work sometimes depending
on the order virtual registers are assigned physical registers.
To fix this, I've added a second allocation hint which does the reverse,
asks that the register allocated for dst is used for src0.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23862
llvm-svn: 279968
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The SILoadStoreOptimizer will need to use AliasAnalysis here in order to
move it before scheduling.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23813
llvm-svn: 279963
|
| |
|
|
|
|
|
|
| |
Fix and improve tests
Differential Revision: https://reviews.llvm.org/D23899
llvm-svn: 279925
|
| |
|
|
|
|
| |
Fixes bug 26800
llvm-svn: 279910
|
| |
|
|
|
|
|
| |
SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec.
SI_IF_BREAK and SI_ELSE_BREAK do not read it either.
llvm-svn: 279909
|
| |
|
|
| |
llvm-svn: 279902
|
| |
|
|
|
|
|
|
| |
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.
llvm-svn: 279901
|
| |
|
|
| |
llvm-svn: 279900
|
| |
|
|
|
|
|
|
|
| |
It isn't used for anything, and is also misleading since
it could be spilled at the end of the block, so it can't be relied
on. There ends up being a verifier error about using an undefined
register since the spill kills the register.
llvm-svn: 279899
|
| |
|
|
|
|
| |
Unfortunately this seems to only help the assembler diagnostic.
llvm-svn: 279895
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If the scheduler clusters the loads, then the offsets will be sorted,
but it is possible for the scheduler to scheduler loads together
without out explicitly clustering them, which would give us non-sorted
offsets.
Also, we will want to do this if we move the load/store optimizer before
the scheduler.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23776
llvm-svn: 279870
|
| |
|
|
| |
llvm-svn: 279868
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
There are a few different sgpr pressure sets, but we only care about
the one which covers all of the sgprs. We were using hard-coded
register pressure set names to determine the reg set id for the
biggest sgpr set. However, we were using the wrong name, and this
method is pretty fragile, since the reg pressure set names may
change.
The new method just looks for the pressure set that contains the most
reg units and sets that set as our SGPR pressure set. We've also
adopted the same technique for determining our VGPR pressure set.
Reviewers: arsenm
Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D23687
llvm-svn: 279867
|
| |
|
|
|
|
|
|
|
|
|
| |
MCContext already has many tasks, and separating CodeView out from it is
probably a good idea. The .cv_loc tracking was modelled on the DWARF
tracking which lived directly in MCContext.
Removes the inclusion of MCCodeView.h from MCContext.h, so now there are
only 10 build actions while I hack on CodeView support instead of 265.
llvm-svn: 279847
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.
Reviewed by:
arsenm and tstellarAMD
Differential Revision:
http://reviews.llvm.org/D22489
llvm-svn: 279660
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D23069
llvm-svn: 279629
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.
This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.
Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.
Differential Revision: http://reviews.llvm.org/D23736
llvm-svn: 279602
|
| |
|
|
|
|
|
| |
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.
llvm-svn: 279580
|