| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D45073
llvm-svn: 328874
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The phase attempts to transform operations that extract a portion of a value
into an SDWA src operand in cases where that value is used only once. It
was not prepared for this use to be the preserved portion of a value for
dst:UNUSED_PRESERVE, resulting in a crash or assert.
This change either rejects the illegal SDWA attempt, or in the case where
dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded
extract instruction.
Reviewers: arsenm, #amdgpu
Reviewed By: arsenm, #amdgpu
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D44364
llvm-svn: 328856
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While the stack access instructions don't care about
alignment > 4, some transformations on the pointer calculation
do make assumptions based on knowing the low bits of a pointer
are 0. If a stack object ends up being accessed through its
absolute address (relative to the kernel scratch wave offset),
the addressing expression may depend on the stack frame being
properly aligned. This was breaking in a testcase due to the
add->or combine.
I think some of the SP/FP handling logic is still backwards,
and overly simplistic to support all of the stack features.
Code which tries to modify the SP with inline asm for example
or variable sized objects will probably require redoing this.
llvm-svn: 328831
|
| |
|
|
|
|
|
| |
8 and 16-byte values are common, so increase the default
alignment to avoid realigning the stack in most functions.
llvm-svn: 328821
|
| |
|
|
| |
llvm-svn: 328818
|
| |
|
|
|
|
|
|
|
|
|
| |
This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981.
It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a
release-with-asserts build. I will resubmit the change when I have fixed
that.
Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a
llvm-svn: 328695
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).
This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.
Reviewers: kzhuravl, nhaehnle, timcorringham
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm
Differential Revision: https://reviews.llvm.org/D44468
Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 328673
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
source value
Summary:
Rev 327580 "[CodeGen] Use MIR syntax for MachineMemOperand printing"
broke -print-machineinstrs for us on AMDGPU, because we have custom
pseudo source values, and MIR serialization does not implement that.
This commit at least restores the functionality of -print-machineinstrs,
even if it does not properly implement the missing MIR serialization
functionality.
Differential Revision: https://reviews.llvm.org/D44871
Change-Id: I44961c0b90bf6d48c01484ed7a4e466fd300db66
llvm-svn: 328668
|
| |
|
|
|
|
|
|
| |
Before this was not done if the function had no calls in it. This
is still a possible issue with any callable function, regardless
of calls present.
llvm-svn: 328659
|
| |
|
|
|
|
|
|
| |
The combine on a select of a load only triggers for
addrspace 0, and discards the MachinePointerInfo. The
conservative default needs to be used for this.
llvm-svn: 328652
|
| |
|
|
|
|
|
| |
These were changed to match the asm output name a long time ago,
although I think the old tablegenerated names still work.
llvm-svn: 328651
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In a function, s5 is used as the frame base SGPR. If a function
is calling another function, during the call sequence
it is copied to a preserved SGPR and restored.
Before it was possible for the scheduler to move stack operations
before the restore of s5, since there's nothing to associate
a frame index access with the restore.
Add an implicit use of s5 to the adjcallstack pseudo which ends
the call sequence to preven this from happening. I'm not 100%
satisfied with this solution, but I'm not sure what else would be
better.
llvm-svn: 328650
|
| |
|
|
|
|
|
|
| |
Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue.
Differential Revision: https://reviews.llvm.org/D44697
llvm-svn: 328351
|
| |
|
|
|
|
|
|
|
|
|
| |
attribute for AMDGPU
- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS.
Differential Revision: https://reviews.llvm.org/D43736
llvm-svn: 328349
|
| |
|
|
|
|
|
|
| |
We propagate the existing NaN value when possible.
Differential Revision: https://reviews.llvm.org/D44521
llvm-svn: 328140
|
| |
|
|
| |
llvm-svn: 327891
|
| |
|
|
|
|
|
|
| |
As suggested in D44521 - bitcast to integer for the math,
so we preserve the intent of these tests when NaN math
gets folded away.
llvm-svn: 327890
|
| |
|
|
| |
llvm-svn: 327843
|
| |
|
|
|
|
|
|
|
| |
Normally DCE kills these, but at -O0 these get left behind
leaving suspicious looking illegal copies.
Replace with IMPLICIT_DEF to avoid iterator issues.
llvm-svn: 327842
|
| |
|
|
| |
llvm-svn: 327774
|
| |
|
|
| |
llvm-svn: 327773
|
| |
|
|
| |
llvm-svn: 327772
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This is a follow-on patch of https://reviews.llvm.org/D44210
Author: FarhanaAleen
Reviewed By: msearles
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D44319
llvm-svn: 327726
|
| |
|
|
|
|
|
|
|
|
|
| |
opcodes
See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751
Differential Revision: https://reviews.llvm.org/D44529
Reviewers: artem.tamazov, arsenm
llvm-svn: 327723
|
| |
|
|
|
|
|
|
| |
of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed.
Differential Revision: https://reviews.llvm.org/D44434
llvm-svn: 327583
|
| |
|
|
|
|
|
|
|
|
| |
Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)".
rdar://38163529
Differential Revision: https://reviews.llvm.org/D42377
llvm-svn: 327580
|
| |
|
|
|
|
|
|
|
|
| |
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.
Differential Revision: https://reviews.llvm.org/D44322
llvm-svn: 327291
|
| |
|
|
|
|
|
|
|
| |
See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252
Differential Revision: https://reviews.llvm.org/D43874
Reviewers: artem.tamazov, arsenm
llvm-svn: 327278
|
| |
|
|
| |
llvm-svn: 327269
|
| |
|
|
| |
llvm-svn: 327268
|
| |
|
|
| |
llvm-svn: 327267
|
| |
|
|
| |
llvm-svn: 327211
|
| |
|
|
| |
llvm-svn: 327209
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
address-space.
Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64.
This patch supports ds_read_b128 instruction pattern and generation of this instruction.
In the vectorizer, this patch also widen the vector length so that vectorizer generates
128 bit loads for local address-space which gets translated to ds_read_b128.
Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128.
Author: FarhanaAleen
Reviewed By: rampitec, arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D44210
llvm-svn: 327153
|
| |
|
|
| |
llvm-svn: 327147
|
| |
|
|
|
|
|
|
| |
GFX9 should select opsel version.
Differential Revision: https://reviews.llvm.org/D44279
llvm-svn: 327106
|
| |
|
|
| |
llvm-svn: 327066
|
| |
|
|
|
|
|
| |
This will likely need to be changed again for anything more than:
fmul undef, undef -> undef
llvm-svn: 327034
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache;
loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.
Author: FarhanaAleen
Reviewed By: rampitec
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D44179
llvm-svn: 326910
|
| |
|
|
|
|
| |
This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03.
llvm-svn: 326907
|
| |
|
|
| |
llvm-svn: 326904
|
| |
|
|
|
|
|
|
|
|
| |
One addrspacecast disappeared in clang emitted IR for
block invoke function due to adoption of the new
addr space mapping.
Differential Revision: https://reviews.llvm.org/D43785
llvm-svn: 326806
|
| |
|
|
| |
llvm-svn: 326715
|
| |
|
|
|
|
|
| |
As far as I can tell legalization of weird sizes for the
output type isn't implemented.
llvm-svn: 326714
|
| |
|
|
|
|
|
|
| |
dependent instruction selection.
Differential revision: https://reviews.llvm.org/D35267
llvm-svn: 326703
|
| |
|
|
| |
llvm-svn: 326589
|
| |
|
|
| |
llvm-svn: 326588
|
| |
|
|
|
|
| |
Patch by Tom Stellard
llvm-svn: 326587
|
| |
|
|
|
|
| |
Patch by Tom Stellard
llvm-svn: 326586
|
| |
|
|
|
|
| |
Patch by Tom Stellard
llvm-svn: 326567
|