| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
V_CNDMASK_B32_e64 v0, v0, -1.#QNAN0e+00, s[2:3], 0, 0, 0, 0
FIXME: We really need to implement our formatter...
llvm-svn: 204118
|
|
|
|
| |
llvm-svn: 204073
|
|
|
|
| |
llvm-svn: 204072
|
|
|
|
|
|
|
|
| |
The type of the immediates should not matter as long as the encoding is
equivalent to the encoding of one of the legal inline constants.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 204056
|
|
|
|
|
|
|
|
|
|
|
|
| |
This instructions writes to an 32-bit SGPR. This change required adding
the 32-bit VCC_LO and VCC_HI registers, because the full VCC register
is 64 bits.
This fixes verifier errors on several of the indirect addressing piglit
tests.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 204055
|
|
|
|
|
|
|
|
|
| |
LDS instructions are pseudo instructions which model
the OQAP defs and uses within a single instruction.
This fixes a hang in the opencv MedianFilter tests.
llvm-svn: 203818
|
|
|
|
| |
llvm-svn: 203695
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 203281
|
|
|
|
|
|
|
|
| |
These are sometimes created by the shrink to boolean optimization in the
globalopt pass.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 203280
|
|
|
|
|
|
|
| |
This appears to only be working for global loads. Private
and local break for other reasons.
llvm-svn: 203135
|
|
|
|
| |
llvm-svn: 203134
|
|
|
|
|
|
| |
Simple cases hit a variety of problems at -O0.
llvm-svn: 202601
|
|
|
|
| |
llvm-svn: 202543
|
|
|
|
|
|
|
|
| |
If the SI_KILL operand is constant, we can either clear the exec mask if
the operand is negative, or do nothing otherwise.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 202337
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 202336
|
|
|
|
| |
llvm-svn: 202194
|
|
|
|
|
|
|
| |
Does not yet include larger part required
to match v_mad_i64_i32 / v_mad_u64_u32.
llvm-svn: 202077
|
|
|
|
|
|
|
|
|
| |
The API expects an ISD opcode, not an IR opcode.
Fixes a regression for R600.
Related to <rdar://problem/15519855>.
llvm-svn: 201923
|
|
|
|
| |
llvm-svn: 201493
|
|
|
|
|
|
|
| |
transformation does not bring any immediate benefits and introduce an illegal
operation.
llvm-svn: 201439
|
|
|
|
| |
llvm-svn: 201433
|
|
|
|
| |
llvm-svn: 201371
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 201370
|
|
|
|
| |
llvm-svn: 201369
|
|
|
|
| |
llvm-svn: 201368
|
|
|
|
| |
llvm-svn: 201367
|
|
|
|
|
|
|
| |
This isn't the most useful case to fix in the real world,
but bugpoint runs into this.
llvm-svn: 201177
|
|
|
|
|
|
|
|
|
|
|
| |
DS instructions that access local memory can only uses addresses that
are less than or equal to the value of M0. When M0 is uninitialized,
then we experience undefined behavior.
This patch also changes the behavior to emit S_WQM_B64 on pixel shaders
no matter what kind of DS instruction is used.
llvm-svn: 201097
|
|
|
|
|
|
|
| |
Stores of <4 x i64> do work (although they do expand to 4 stores
instead of 2), but 3 x i64 vectors fail to select.
llvm-svn: 200989
|
|
|
|
| |
llvm-svn: 200935
|
|
|
|
| |
llvm-svn: 200934
|
|
|
|
| |
llvm-svn: 200933
|
|
|
|
|
|
|
|
| |
There was a problem with the old pattern, so we were copying some
larger immediates into registers when we could have been encoding
them in the instruction.
llvm-svn: 200932
|
|
|
|
|
|
|
|
|
| |
Fixes opencl-example if_* tests with radeonsi.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200830
|
|
|
|
| |
llvm-svn: 200774
|
|
|
|
|
|
|
|
|
|
|
| |
The OpenCL specs say: "The vector versions of the math functions operate
component-wise. The description is per-component."
Patch by: Jan Vesely
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 200773
|
|
|
|
|
|
|
|
|
|
|
|
| |
V_ADD_F32 with source modifier does not produce -0.0 for this. Just
manipulate the sign bit directly instead.
Also add a pattern for (fneg (fabs ...)).
Fixes a bunch of bit encoding piglit tests with radeonsi.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200743
|
|
|
|
| |
llvm-svn: 200620
|
|
|
|
|
|
|
|
| |
This didn't work for any integer vectors, and didn't
work with some sizes of float vectors. This should now
work with all sizes of float and i32 vectors.
llvm-svn: 200619
|
|
|
|
|
|
|
| |
Fixes half a dozen piglit tests with radeonsi.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200283
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200196
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200195
|
|
|
|
|
|
|
|
|
|
| |
This pattern uses an SDNodeXForm, which isn't being emitted for some
reason. I can get it to work by attaching the PatLeaf that has the
XForm to the argument in the output pattern, but this results in an
immediate being used in a register operand, which the backend can't
handle yet.
llvm-svn: 199918
|
|
|
|
|
|
|
|
| |
The control flow finalizer would sometimes use an ALU_POP_AFTER
instruction before the vetex fetch clause instead of using a POP
instruction after it.
llvm-svn: 199917
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement the getUnrollingPreferences() function for
AMDGPUTargetTransformInfo so that loops that do address calculations
on pointers derived from alloca are unconditionally unrolled.
Unrolling these loops makes it more likely that SROA will be able to
eliminate the allocas, which is a big win for R600 since memory
allocated by alloca (private memory) is really slow.
llvm-svn: 199916
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The unit test is now disabled on non-asserts builds.
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)
We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199905
|
|
|
|
|
|
|
|
|
| |
This reverts commit 35b8331cad6eb512a2506adbc394201181da94ba.
The -debug-only flag for llc doesn't appear to be available in
all build configurations.
llvm-svn: 199845
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)
We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199842
|
|
|
|
|
| |
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199840
|
|
|
|
| |
llvm-svn: 199827
|