| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D34407
llvm-svn: 307097
|
|
|
|
|
|
|
|
|
| |
It broke a testcase.
Failing Tests (1):
LLVM :: CodeGen/AMDGPU/alignbit-pat.ll
llvm-svn: 307054
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D34407
llvm-svn: 307026
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
|
|
|
|
|
|
| |
Merge some of the old, smaller tests into more complete versions.
llvm-svn: 295792
|
|
|
|
|
|
|
|
|
|
|
| |
This switches to the workaround that HSA defaults to
for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 292982
|
|
|
|
| |
llvm-svn: 286912
|
|
|
|
|
|
| |
I'm guessing at how it is supposed to be printed
llvm-svn: 285490
|
|
|
|
| |
llvm-svn: 285449
|
|
|
|
| |
llvm-svn: 280970
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that unaligned access expansion should not attempt
to produce i64 accesses, we can remove the hack in
PreprocessISelDAG where this is done.
This allows splitting i64 private accesses while
allowing the new add nodes indexing the vector components
can be folded with the base pointer arithmetic.
llvm-svn: 268293
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.
shader-db stats:
Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)
Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)
Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)
llvm-svn: 266378
|
|
|
|
|
|
|
| |
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.
llvm-svn: 263201
|
|
|
|
|
|
|
| |
Make the REG_SEQUENCE be a VGPR, and do the register class
copy first.
llvm-svn: 251855
|
|
llvm-svn: 239657
|