| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
| |
llvm-svn: 202194
|
|
|
|
| |
llvm-svn: 202080
|
|
|
|
|
|
|
| |
Does not yet include larger part required
to match v_mad_i64_i32 / v_mad_u64_u32.
llvm-svn: 202077
|
|
|
|
|
|
|
| |
The check is clearer as southern islands or later,
rather than checking for later than northern islands.
llvm-svn: 202076
|
|
|
|
| |
llvm-svn: 202075
|
|
|
|
| |
llvm-svn: 201371
|
|
|
|
|
| |
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 201370
|
|
|
|
| |
llvm-svn: 201369
|
|
|
|
| |
llvm-svn: 201368
|
|
|
|
| |
llvm-svn: 201222
|
|
|
|
|
|
|
| |
This isn't the most useful case to fix in the real world,
but bugpoint runs into this.
llvm-svn: 201177
|
|
|
|
|
|
|
| |
Truncation is just accessing a subregister for any multiple of
the register size, so it's free.
llvm-svn: 201107
|
|
|
|
|
|
|
|
|
|
|
| |
DS instructions that access local memory can only uses addresses that
are less than or equal to the value of M0. When M0 is uninitialized,
then we experience undefined behavior.
This patch also changes the behavior to emit S_WQM_B64 on pixel shaders
no matter what kind of DS instruction is used.
llvm-svn: 201097
|
|
|
|
|
|
|
|
| |
This doesn't change any functionality, since we only have two shader
types (compute and pixel) that use local memory. We're just changing
the logic to match the documentation.
llvm-svn: 201096
|
|
|
|
| |
llvm-svn: 200935
|
|
|
|
| |
llvm-svn: 200934
|
|
|
|
| |
llvm-svn: 200933
|
|
|
|
|
|
|
|
| |
There was a problem with the old pattern, so we were copying some
larger immediates into registers when we could have been encoding
them in the instruction.
llvm-svn: 200932
|
|
|
|
|
|
|
| |
On R600, some address spaces have more strict alignment
requirements than others.
llvm-svn: 200887
|
|
|
|
|
|
|
|
|
| |
Fixes opencl-example if_* tests with radeonsi.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74469
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200830
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
No functional change. Updated loops from:
for (I = scc_begin(), E = scc_end(); I != E; ++I)
to:
for (I = scc_begin(); !I.isAtEnd(); ++I)
for teh win.
llvm-svn: 200789
|
|
|
|
| |
llvm-svn: 200782
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a crashes in the OpenCV test suite and also the scrypt
kernel in bfgminer.
I was unable to come up with a reduced test case for this.
https://bugs.freedesktop.org/show_bug.cgi?id=72785
llvm-svn: 200776
|
|
|
|
|
|
|
| |
There is no lit test for this, because it would be too big and
complicated, but it does fix a crash in the Arithm/Absdiff.* OpenCV test.
llvm-svn: 200775
|
|
|
|
| |
llvm-svn: 200774
|
|
|
|
|
|
|
|
|
|
|
| |
The OpenCL specs say: "The vector versions of the math functions operate
component-wise. The description is per-component."
Patch by: Jan Vesely
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 200773
|
|
|
|
|
|
|
|
|
|
|
|
| |
V_ADD_F32 with source modifier does not produce -0.0 for this. Just
manipulate the sign bit directly instead.
Also add a pattern for (fneg (fabs ...)).
Fixes a bunch of bit encoding piglit tests with radeonsi.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200743
|
|
|
|
| |
llvm-svn: 200720
|
|
|
|
|
|
|
|
| |
This didn't work for any integer vectors, and didn't
work with some sizes of float vectors. This should now
work with all sizes of float and i32 vectors.
llvm-svn: 200619
|
|
|
|
|
|
|
|
| |
There is nothing wrong with printing the disassembly section when printing
text. An hypothetical assembler would then produce a .o just like our
direct object emission produces.
llvm-svn: 200583
|
|
|
|
| |
llvm-svn: 200582
|
|
|
|
| |
llvm-svn: 200581
|
|
|
|
|
|
|
|
| |
The subtarget info is explicitly passed to the EncodeInstruction
method and we should use that subtarget info to influence any
encoding decisions.
llvm-svn: 200350
|
|
|
|
| |
llvm-svn: 200349
|
|
|
|
| |
llvm-svn: 200348
|
|
|
|
| |
llvm-svn: 200345
|
|
|
|
|
|
|
| |
Fixes half a dozen piglit tests with radeonsi.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200283
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200196
|
|
|
|
|
| |
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 200195
|
|
|
|
|
|
| |
Sorry about that.
llvm-svn: 200171
|
|
|
|
| |
llvm-svn: 200170
|
|
|
|
|
|
|
| |
With this the target streamers will be able to know the target features that
are in use.
llvm-svn: 200135
|
|
|
|
|
|
|
|
|
|
| |
This has a few advantages:
* Only targets that use a MCTargetStreamer have to worry about it.
* There is never a MCTargetStreamer without a MCStreamer, so we can use a
reference.
* A MCTargetStreamer can talk to the MCStreamer in its constructor.
llvm-svn: 200129
|
|
|
|
| |
llvm-svn: 200021
|
|
|
|
|
|
|
| |
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.
llvm-svn: 200018
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a crash in the OpenCV OpenCL test suite.
There is no lit test for this, because the test would be very large
and could easily be invalidated by changes to the scheduler
or other parts of the compiler.
Patch by: Vincent Lejeune
llvm-svn: 199919
|
|
|
|
|
|
|
|
|
|
| |
This pattern uses an SDNodeXForm, which isn't being emitted for some
reason. I can get it to work by attaching the PatLeaf that has the
XForm to the argument in the output pattern, but this results in an
immediate being used in a register operand, which the backend can't
handle yet.
llvm-svn: 199918
|
|
|
|
|
|
|
|
| |
The control flow finalizer would sometimes use an ALU_POP_AFTER
instruction before the vetex fetch clause instead of using a POP
instruction after it.
llvm-svn: 199917
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implement the getUnrollingPreferences() function for
AMDGPUTargetTransformInfo so that loops that do address calculations
on pointers derived from alloca are unconditionally unrolled.
Unrolling these loops makes it more likely that SROA will be able to
eliminate the allocas, which is a big win for R600 since memory
allocated by alloca (private memory) is really slow.
llvm-svn: 199916
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The unit test is now disabled on non-asserts builds.
The CF stack can be corrupted if you use CF_ALU_PUSH_BEFORE,
CF_ALU_ELSE_AFTER, CF_ALU_BREAK, or CF_ALU_CONTINUE when the number of
sub-entries on the stack is greater than or equal to the stack entry
size and sub-entries modulo 4 is either 0 or 3 (on cedar the bug is
present when number of sub-entries module 8 is either 7 or 0)
We choose to be conservative and always apply the work-around when the
number of sub-enries is greater than or equal to the stack entry size,
so that we can safely over-allocate the stack when we are unsure of the
stack allocation rules.
reviewed-by: Vincent Lejeune <vljn at ovi.com>
llvm-svn: 199905
|