| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
We weren't treating i16->f16 casts as legal on targets with these
instructions, and always using a pair of casts through i32.
|
|
|
|
|
| |
This wasn't catching a regression on targets with legal i16 triggered
in a future commit.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
|
|
|
|
|
|
|
|
|
|
| |
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.
llvm-svn: 282832
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.
shader-db stats:
Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)
Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)
Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)
llvm-svn: 266378
|
|
|
|
|
|
|
| |
This cleans up some messes since the individual scalar components
can be CSEed.
llvm-svn: 266376
|
|
|
|
| |
llvm-svn: 260493
|
|
|
|
| |
llvm-svn: 245774
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11061
llvm-svn: 242146
|
|
llvm-svn: 239657
|