| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D33793
llvm-svn: 304571
|
|
|
|
|
|
|
| |
Flat instructions gain an immediate offset, and 2 new
sets of segment specific flat instructions are added.
llvm-svn: 302729
|
|
|
|
|
|
|
|
|
| |
If workgroup size is known inform llvm about range returned by local
id and local size queries.
Differential Revision: https://reviews.llvm.org/D31804
llvm-svn: 300102
|
|
|
|
|
|
|
|
|
|
|
|
| |
The unused dummy src2_modifiers is missing, so it crashes
when trying to print it.
I tried to fully remove src2_modifiers, but there are some
irritations in the places where it is converted to mad since
it starts to require modifying use lists while iterating over
them.
llvm-svn: 299861
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.
The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.
Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.
Differential Revision: https://reviews.llvm.org/D31284
llvm-svn: 298846
|
|
|
|
|
|
|
|
| |
Add a few non-VOP3P but instructions related to packed.
Includes hack with dummy operands for the benefit of the assembler
llvm-svn: 296368
|
|
|
|
|
|
|
|
|
|
|
| |
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.
Also allow using clamp with f16, and use knowledge
of dx10_clamp.
llvm-svn: 295788
|
|
|
|
|
|
| |
This was accepting GFX9 instructions on VI.
llvm-svn: 295557
|
|
|
|
| |
llvm-svn: 295554
|
|
|
|
|
|
| |
This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907.
llvm-svn: 295054
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D26010
llvm-svn: 294692
|
|
|
|
|
|
|
|
| |
using switch statement
Differential Revision: https://reviews.llvm.org/D29741
llvm-svn: 294627
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D28760#fb670e28
llvm-svn: 294449
|
|
|
|
| |
llvm-svn: 294445
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D29318
llvm-svn: 294440
|
|
|
|
|
|
|
|
| |
lane masks.
Differential revision: https://reviews.llvm.org/D29442
llvm-svn: 294324
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.
In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.
Differential Revision: https://reviews.llvm.org/D29423
llvm-svn: 293837
|
|
|
|
|
|
|
|
|
|
|
| |
Accomplishes what r292982 was supposed to, which ended up
only really making the necessary test changes.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 293310
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].
Patch By: Dave Airlie
Reviewers: nhaehnle, arsenm, tstellarAMD
Reviewed By: arsenm
Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D25428
llvm-svn: 293000
|
|
|
|
|
|
|
|
|
|
|
| |
This switches to the workaround that HSA defaults to
for the mesa path.
This should be applied to the 4.0 branch.
Patch by Vedran Miletić <vedran@miletic.net>
llvm-svn: 292982
|
|
|
|
|
|
|
| |
The same control register controls both, and are set to
the same defaults. Keep the old names around as aliases.
llvm-svn: 292837
|
|
|
|
| |
llvm-svn: 292599
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: vpykhtin, artem.tamazov, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D28900
llvm-svn: 292596
|
|
|
|
|
|
| |
You Use warnings; other minor fixes (NFC).
llvm-svn: 289475
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
LC can currently select scalar load for uniform memory access
basing on readonly memory address space only. This restriction
originated from the fact that in HW prior to VI vector and scalar caches
are not coherent. With MemoryDependenceAnalysis we can check that the
memory location corresponding to the memory operand of the LOAD is not
clobbered along the all paths from the function entry.
Reviewers: rampitec, tstellarAMD, arsenm
Subscribers: wdng, arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D26917
llvm-svn: 289076
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D25975
llvm-svn: 286753
|
|
|
|
|
|
| |
I'm guessing at how it is supposed to be printed
llvm-svn: 285490
|
|
|
|
|
|
| |
This is possible when using inline asm.
llvm-svn: 285447
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: The hardware doesn't support this.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25523
llvm-svn: 284257
|
|
|
|
|
|
|
| |
VI added a second method of indexing into VGPRs
besides using v_movrel*
llvm-svn: 284027
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D24835
llvm-svn: 282223
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch
Patch by Tom Stellard and Konstantin Zhuravlyov
Differential Revision: https://reviews.llvm.org/D21562
llvm-svn: 280747
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.
It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.
Shader DB stats:
32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)
Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: https://reviews.llvm.org/D23688
llvm-svn: 279995
|
|
|
|
| |
llvm-svn: 278369
|
|
|
|
| |
llvm-svn: 276675
|
|
|
|
| |
llvm-svn: 274398
|
|
|
|
|
|
|
|
|
|
| |
TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr*
arguments (begin and end) that invite implicit conversions from
MachineInstrBundleIterator. One option would be to change their type to
an iterator, but since they don't seem to have been used since the API
was added in 2010, I'm deleting the dead code.
llvm-svn: 274304
|
|
|
|
| |
llvm-svn: 274039
|
|
|
|
| |
llvm-svn: 273964
|
|
|
|
| |
llvm-svn: 273940
|
|
|
|
| |
llvm-svn: 273937
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.
Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
- offset 0: work group ID x
- offset 4: work group ID y
- offset 8: work group ID z
- offset 16: work item ID x
- offset 20: work item ID y
- offset 24: work item ID z
Set
- amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
- amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
- amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled
Differential Revision: http://reviews.llvm.org/D20335
llvm-svn: 273769
|
|
|
|
|
|
|
|
| |
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.
llvm-svn: 273653
|
|
|
|
|
|
|
|
|
| |
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.
llvm-svn: 273652
|
|
|
|
|
|
|
| |
We don't need it to be that high. The natural alignment
for a single workitem's stack is 16.
llvm-svn: 273448
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the processor name failed to parse for amdgcn,
the resulting output would have R600 ISA in it.
If the processor name was missing or invalid for R600,
the wavefront size would not be set and there would be
crashes from missing itinerary data.
Fixes crashes in future commit caused by dividing by the unset/0
wavefront size.
llvm-svn: 271561
|
|
|
|
|
|
|
|
|
|
| |
Summary: Enable load-store-opt by default, and update LIT tests.
Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D20694
llvm-svn: 270894
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D20081
llvm-svn: 270594
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was assuming it could use all memory before, which is
a bad decision because it restricts occupancy.
By default, only try to use enough space that could reduce
occupancy to 7, an arbitrarily chosen limit.
Based on the exist LDS usage, try to round up to the limit
in the current tier instead of further hurting occupancy.
This isn't ideal, because it doesn't accurately know how much
space is going to be used for alignment padding.
llvm-svn: 269708
|
|
|
|
| |
llvm-svn: 269145
|