| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
llvm-svn: 239657
|
|
|
|
|
|
|
| |
Use unsigned as the underlying storage type of the AMDGPU address space
enum.
llvm-svn: 239355
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we sometimes know the address space, this can
theoretically do a better job.
This needs better test coverage, but this mostly depends on
first updating the loop optimizatiosn to provide the address
space.
llvm-svn: 239053
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Spilling can insert instructions almost anywhere, and this can mess
up control flow lowering in a multitude of ways, due to instruction
reordering. Let's sort this out the easy way: never spill registers
involved with control flow, i.e. saved EXEC masks.
Unfortunately, this does not work at all with optimizations disabled,
as the register allocator ignores spill weights. This should be
addressed in a future commit.
The test was reduced from the "stacks" shader of [1]. Some issues
trigger the machine verifier while another one is checked manually.
[1] http://madebyevan.com/webgl-path-tracing/
v2: only insert pass with optimizations enabled, merge test runs.
Patch by: Grigori Goronzy
llvm-svn: 237152
|
|
|
|
|
|
|
| |
their definitions, but forgot to clean up all the declarations which are
in different files.
llvm-svn: 227698
|
|
|
|
|
|
|
|
| |
We were passing the scratch buffer address to the shaders via user sgprs,
but now we use external symbols and have the driver patch the shader
using reloc information.
llvm-svn: 226586
|
|
|
|
| |
llvm-svn: 225988
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is equivalent to the AMDGPUTargetMachine now, but it is the
starting point for separating R600 and GCN functionality into separate
targets.
It is recommened that users start using the gcn triple for GCN-based
GPUs, because using the r600 triple for these GPUs will be deprecated in
the future.
llvm-svn: 225277
|
|
|
|
|
|
|
| |
This pass attempts to fold the source operands of mov and copy
instructions into their uses.
llvm-svn: 222581
|
|
|
|
|
|
|
|
| |
Function calls aren't supported yet.
This was reverted due to build breakages, which should be fixed now.
llvm-svn: 221173
|
|
|
|
|
|
|
|
|
| |
This reverts commit r220996.
It introduced layering violations causing link errors in many
configurations.
llvm-svn: 221020
|
|
|
|
|
|
| |
Function calls aren't supported yet.
llvm-svn: 220996
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently this only functions to match simple cases
where ds_read2_* / ds_write2_* instructions can be used.
In the future it might match some of the other weird
load patterns, such as direct to LDS loads.
Currently enabled only with a subtarget feature to enable
easier testing.
llvm-svn: 219533
|
|
|
|
|
|
|
|
|
|
| |
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
llvm-svn: 215558
|
|
|
|
|
|
| |
This pass converts 64-bit instructions to 32-bit when possible.
llvm-svn: 213561
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements a solution for constant initializers suggested
by Vadim Girlin, where we store the data after the shader code
and then use the S_GETPC instruction to compute its address.
This saves use the trouble of creating a new buffer for constant data
and then having to pass the pointer to the kernel via user SGPRs or the
input buffer.
llvm-svn: 213530
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SGPRs are written by instructions that sometimes will ignore control flow,
which means if you have code like:
if (VGPR0) {
SGPR0 = S_MOV_B32 0
} else {
SGPR0 = S_MOV_B32 1
}
The value of SGPR0 will 1 no matter what the condition is.
In order to deal with this situation correctly, we need to view the
program as if it were a single basic block when we calculate the
live ranges for the SGPRs. They way we actually update the live
range is by iterating over all of the segments in each LiveRange
object and setting the end of each segment equal to the start of
the next segment. So a live range like:
[3888r,9312r:0)[10032B,10384B:0) 0@3888r
will become:
[3888r,10032B:0)[10032B,10384B:0) 0@3888r
This change will allow us to use SALU instructions within branches.
llvm-svn: 212215
|
|
|
|
| |
llvm-svn: 211110
|
|
|
|
|
|
| |
Most of these are no longer used any more.
llvm-svn: 210915
|
|
|
|
|
|
|
|
| |
Use 4 since that's probably what it will be for spir.
Move ADDRESS_NONE to the end to keep the constant_buffer_* values
unchanged, since apparently a bunch of r600 tests use those directly.
llvm-svn: 209463
|
|
|
|
|
|
|
|
|
| |
We can't use SALU instructions for this since they ignore the EXEC mask
and are always executed.
This fixes several OpenCV tests.
llvm-svn: 207661
|
|
|
|
|
|
|
| |
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.
llvm-svn: 200018
|
|
|
|
|
|
|
| |
This enables -print-before-all to dump MachineInstrs after it is run.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 197057
|
|
|
|
|
|
|
| |
This enables -print-before-all to dump MachineInstrs after it is run.
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 197056
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The AMDGPUIndirectAddressing pass was previously responsible for
lowering private loads and stores to indirect addressing instructions.
However, this pass was buggy and way too complicated. The only
advantage it had over the new simplified code was that it saved one
instruction per direct write to private memory. This optimization
likely has a minimal impact on performance, and we may be able
to duplicate it using some other transformation.
For the private address space, we now:
1. Lower private loads/store to Register(Load|Store) instructions
2. Reserve part of the register file as 'private memory'
3. After regalloc lower the Register(Load|Store) instructions to
MOV instructions that use indirect addressing.
llvm-svn: 193179
|
|
|
|
| |
llvm-svn: 191790
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.
This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:
https://bugs.freedesktop.org/show_bug.cgi?id=66805
llvm-svn: 188429
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the VSrc_* register classes contain both VGPRs and SGPRs, copies
that used be emitted by isel like this:
SGPR = COPY VGPR
Will now be emitted like this:
VSrC = COPY VGPR
This patch also adds a pass that tries to identify and fix situations where
a VGPR to SGPR copy may occur. Hopefully, these changes will make it
impossible for the compiler to generate illegal VGPR to SGPR copies.
llvm-svn: 187831
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
conditions
Merge consecutive if-regions if they contain identical statements.
Both transformations reduce number of branches. The transformation
is guarded by a target-hook, and is currently enabled only for +R600,
but the correctness has been tested on X86 target using a variety of
CPU benchmarks.
Patch by: Mei Ye
llvm-svn: 187278
|
|
|
|
|
| |
Reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 186918
|
|
|
|
|
|
| |
exit
llvm-svn: 186724
|
|
|
|
|
|
|
|
| |
This should simplify the subtarget definitions and make it easier to
add new ones.
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
llvm-svn: 183566
|
|
|
|
|
| |
Reviewed-by: Vincent Lejeune <vljn@ovi.com>
llvm-svn: 183558
|
|
|
|
|
|
|
| |
Previously commited @183279 but tests were failing, reverted @183286
It was broken because @183336 was missing, now it's there.
llvm-svn: 183343
|
|
|
|
|
|
| |
This reverts commit r183279. CodeGen/R600/texture-input-merge.ll was failing.
llvm-svn: 183286
|
|
|
|
| |
llvm-svn: 183279
|
|
|
|
| |
llvm-svn: 182125
|
|
|
|
| |
llvm-svn: 180760
|
|
|
|
| |
llvm-svn: 178505
|
|
|
|
| |
llvm-svn: 178503
|
|
|
|
|
|
|
|
| |
v2: update CMakeLists.txt as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176626
|
|
|
|
|
|
|
| |
Maintaining CONST_COPY Instructions until Pre Emit may prevent some ifcvt case
and taking them in account for scheduling is difficult for no real benefit.
llvm-svn: 176488
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Seems to be allot simpler, and also paves the
way for further improvements.
v2: rebased on master, use 0 in BUFFER_LOAD_FORMAT_XYZW,
use VGPR0 in dummy EXP, avoid compiler warning, break
after encoding the first literal.
v3: correctly use V_ADD_F32_e64
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175354
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only implemented for R600 so far. SI is missing implementations of a
few callbacks used by the Indirect Addressing pass and needs code to
handle frame indices.
At the moment R600 only supports array sizes of 16 dwords or less.
Register packing of vector types is currently disabled, which means that a
vec4 is stored in T0_X, T1_X, T2_X, T3_X, rather than T0_XYZW. In order
to correctly pack registers in all cases, we will need to implement an
analysis pass for R600 that determines the correct vector width for each
array.
v2:
- Add support for i8 zext load from stack.
- Coding style fixes
v3:
- Don't reserve registers for indirect addressing when it isn't
being used.
- Fix bug caused by LLVM limiting the number of SubRegIndex
declarations.
v4:
- Fix 64-bit defines
llvm-svn: 174525
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove Cxxx registers, add new special register - "ALU_CONST" and new
operand for each alu src - "sel". ALU_CONST is used to designate that the
new operand contains the value to override src.sel, src.kc_bank, src.chan
for constants in the driver.
Patch by: Vadim Girlin
Vincent Lejeune:
- Use pointers for constants
- Fold CONST_ADDRESS when possible
Tom Stellard:
- Give CONSTANT_BUFFER_0 its own address space
- Use integer types for constant loads
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 173222
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some instructions like memory reads/writes are executed
asynchronously, so we need to insert S_WAITCNT instructions
to block before accessing their results. Previously we have
just inserted S_WAITCNT instructions after each async
instruction, this patch fixes this and adds a prober
insertion pass.
Patch by: Christian König
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
llvm-svn: 172846
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch replaces the control flow handling with a new
pass which structurize the graph before transforming it to
machine instruction. This has a couple of different advantages
and currently fixes 20 piglit tests without a single regression.
It is now a general purpose transformation that could be not
only be used for SI/R6xx, but also for other hardware
implementations that use a form of structurized control flow.
v2: further cleanup, fixes and documentation
Patch by: Christian König
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 170591
|
|
A new backend supporting AMD GPUs: Radeon HD2XXX - HD7XXX
llvm-svn: 169915
|