| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
GFX9 and above support sin/cos instructions with a greater range and thus don't
require a fract instruction prior to invocation.
Added a subtarget feature to reflect this and added code to take advantage of
expanded range on GFX9+
Also updated the tests to check correct behaviour
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51933
Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0
llvm-svn: 342222
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
I accidentally left this behind in D50306, and it causes a build warning
when I build with gcc7.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52022
Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c
llvm-svn: 342189
|
|
|
|
|
|
|
|
|
|
|
|
| |
If an argument was passed on the stack, this
was using the default alignment.
I'm not sure there's an observable change from this. This
was observable due to bugs in expansion of unaligned
loads and stores, but since that is fixed I don't think
this matters much.
llvm-svn: 342133
|
|
|
|
|
|
|
|
| |
Differential revision: https://reviews.llvm.org/D51931
Reviewers: rampitec
llvm-svn: 342120
|
|
|
|
|
|
|
|
|
|
| |
Load offset inlining pattern changed.
Differential revision: https://reviews.llvm.org/D51975
Reviewers: rampitec
llvm-svn: 342115
|
|
|
|
|
|
|
|
|
|
| |
default values)
Change by Tony Tye
Differential Revision: https://reviews.llvm.org/D51954
llvm-svn: 342077
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
llvm-svn: 342069
|
|
|
|
|
|
|
|
|
|
|
| |
TargetParser."
This reverts commit r341982.
The change introduced a layering violation. Reverting to unbreak
our integrate.
llvm-svn: 342023
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
Differential Revision: https://reviews.llvm.org/D51890
llvm-svn: 341982
|
|
|
|
|
|
|
|
|
| |
Immediate selection predicate changed
Differential revision: https://reviews.llvm.org/D51734
Reviewers: rampitec
llvm-svn: 341928
|
|
|
|
| |
llvm-svn: 341895
|
|
|
|
|
|
|
|
|
|
| |
Inline immediate move to V_MADAK_F32.
Differential revision: https://reviews.llvm.org/D51586
Reviewer: rampitec
llvm-svn: 341843
|
|
|
|
|
|
|
| |
Now the pointer size should always be correct and
we don't need to improperly inspect the pointee type.
llvm-svn: 341806
|
|
|
|
|
|
|
| |
This will require something to cast. Before this would eliminate
the cast, which would result in copies of $noreg.
llvm-svn: 341803
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This already worked if only one register piece was used,
but didn't if a type was split into multiple, unequal
sized pieces.
Fixes not splitting 3i16/v3f16 into two registers for
AMDGPU.
This will also allow fixing the ABI for 16-bit vectors
in a future commit so that it's the same for all subtargets.
llvm-svn: 341801
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GCNHazardRecognizer wait state counting
Summary:
This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted.
Reviewers: nhaehnle, arsenm
Reviewed By: arsenm
Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51726
Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b
llvm-svn: 341798
|
|
|
|
| |
llvm-svn: 341768
|
|
|
|
| |
llvm-svn: 341767
|
|
|
|
|
|
|
|
|
| |
immediate SMRD offset.
Differential revision: https://reviews.llvm.org/D51610
Reviewer: rampitec
llvm-svn: 341636
|
|
|
|
|
|
| |
Causes a regression in expensive checks.
llvm-svn: 341589
|
|
|
|
| |
llvm-svn: 341567
|
|
|
|
|
|
|
|
|
| |
Emit a waterfall loop in the general case for a potentially-divergent Rsrc
operand. When practical, avoid this by using Addr64 instructions.
Differential Revision: https://reviews.llvm.org/D50982
llvm-svn: 341413
|
|
|
|
|
|
| |
Match behavior in DAG of r340343
llvm-svn: 341393
|
|
|
|
| |
llvm-svn: 341303
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D49737
llvm-svn: 341271
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D51555
llvm-svn: 341266
|
|
|
|
|
|
|
|
|
|
|
| |
The intention is to enable the extract_vector_elt load combine,
and doing this for other operations interferes with more
useful optimizations on vectors.
Handle any type of load since in principle we should do the
same combine for the various load intrinsics.
llvm-svn: 341219
|
|
|
|
|
|
|
| |
This doesn't really matter if clang is always emitting
the visibility as hidden by default.
llvm-svn: 341168
|
|
|
|
| |
llvm-svn: 341165
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433).
The purpose of this patch is to free up the name DivergenceAnalysis for the new generic
implementation. The generic implementation class will be shared by specialized
divergence analysis classes.
Patch by: Simon Moll
Reviewed By: nhaehnle
Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50434
Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb
llvm-svn: 341071
|
|
|
|
|
|
|
|
|
|
| |
Operands Folding 1.
Reviewers: rampitec
Differential revision: https://reviews/llvm/org/D51316
llvm-svn: 341068
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This fixes GPU hangs with OpenGL bindless handle arithmetic.
Reviewers: arsenm, nhaehnle
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51203
llvm-svn: 340959
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: D.u32 = S0.u8[0] * S1.u8[0] +
S0.u8[1] * S1.u8[1] +
S0.u8[2] * S1.u8[2] +
S0.u8[3] * S1.u8[3] + S2.u32
Author: FarhanaAleen
Reviewed By: arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D50921
llvm-svn: 340936
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add some optional code to validate getInstSizeInBytes for emitted
instructions. This flushed out some issues which are fixed by this
patch:
- Streamline getInstSizeInBytes
- Properly define the VI readlane/writelane instruction as VOP3
- Fix the inline constant determination. Specifically, this change
fixes an issue where a 32-bit value of 0xffffffff was recorded
as unsigned. This is equal to -1 when restricting to a 32-bit
comparison, and an inline constant can be used.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D50629
Change-Id: Id87c3b7975839da0de8156a124b0ce98c5fb47f2
llvm-svn: 340903
|
|
|
|
| |
llvm-svn: 340868
|
|
|
|
|
|
|
|
| |
This can leave behind the uses with the defs removed.
Since this should only really happen in tests, it's not worth the
effort of trying to handle this.
llvm-svn: 340866
|
|
|
|
|
|
|
|
|
| |
The original motivating example uses a 64-bit add, so the carry
is used. Insert a copy from VCC. This may allow shrinking of
the used carry instruction. At worst, we are replacing a
mov to materialize the constant with a copy of vcc.
llvm-svn: 340862
|
|
|
|
|
|
|
|
|
| |
This needs to be done in the SSA fold operands
pass to be effective, so there is a bit of overlap
with SIShrinkInstructions but I don't think this
is practically avoidable.
llvm-svn: 340859
|
|
|
|
| |
llvm-svn: 340855
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adding support for a16 for gfx9. A16 bit replaces r128 bit for gfx9.
Change-Id: Ie8b881e4e6d2f023fb5e0150420893513e5f4841
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50575
llvm-svn: 340831
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Patch by Marek Olsak and David Stuttard, both of AMD.
This adds a new amdgcn intrinsic supporting s.buffer.load, in particular
multiple dword variants. These are convenient to use from some front-end
implementations.
Also modified the existing llvm.SI.load.const intrinsic to common up the
underlying implementation.
This modification also requires that we can lower to non-uniform loads correctly
by splitting larger dword variants into sizes supported by the non-uniform
versions of the load.
V2: Addressed minor review comments.
V3: i1 glc is now i32 cachepolicy for consistency with buffer and
tbuffer intrinsics, plus fixed formatting issue.
V4: Added glc test.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51098
Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b
llvm-svn: 340684
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
32-bit constant address space is declared as 6, so the
maximum number of address spaces is 6, not 5.
Fixes "LLVM ERROR: Pointer address space out of range".
v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS
v4: - fix compilation issues
- fix out of bounds access
v3: use static_assert()
v2: add a very simple test for 32-bit addr space
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630
llvm-svn: 340417
|
|
|
|
|
|
|
|
|
|
| |
Constant and global may alias, also one rules table wasn't
ordered correctly.
Pinpointed by Matt.
v2: add a test with swapped parameters
llvm-svn: 340416
|
|
|
|
|
|
|
|
|
| |
This was hackily adding in the 4-bytes reserved for the callee's
emergency stack slot. Treat it like a normal stack allocation
so we get the correct alignment padding behavior. This fixes
an inconsistency between the caller and callee.
llvm-svn: 340396
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Extend BasicBlocksUtils to update MemorySSA.
Subscribers: sanjoy, arsenm, nhaehnle, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D45300
llvm-svn: 340365
|
|
|
|
|
|
|
|
|
| |
In general we can't assume flat loads are uniform, and cases where we can prove
they are should be handled through infer-address-spaces.
Differential Revision: https://reviews.llvm.org/D50991
llvm-svn: 340343
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Transform add (mul ((i32)S0.x, (i32)S1.x),
add( mul ((i32)S0.y, (i32)S1.y), (i32)S3) => i/udot2((v2i16)S0, (v2i16)S1, (i32)S3)
Author: FarhanaAleen
Reviewed By: arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D50024
llvm-svn: 340295
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics
only allowed float types for the data to be loaded or stored, which
sometimes meant the frontend needed to generate a bitcast. In this, the
new intrinsics copied the old buffer intrinsics.
This commit extends the new intrinsics to allow int types as well.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D50315
Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85
llvm-svn: 340270
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit adds new intrinsics
llvm.amdgcn.raw.buffer.load
llvm.amdgcn.raw.buffer.load.format
llvm.amdgcn.raw.buffer.load.format.d16
llvm.amdgcn.struct.buffer.load
llvm.amdgcn.struct.buffer.load.format
llvm.amdgcn.struct.buffer.load.format.d16
llvm.amdgcn.raw.buffer.store
llvm.amdgcn.raw.buffer.store.format
llvm.amdgcn.raw.buffer.store.format.d16
llvm.amdgcn.struct.buffer.store
llvm.amdgcn.struct.buffer.store.format
llvm.amdgcn.struct.buffer.store.format.d16
llvm.amdgcn.raw.buffer.atomic.*
llvm.amdgcn.struct.buffer.atomic.*
with the following changes from the llvm.amdgcn.buffer.*
intrinsics:
* there are separate raw and struct versions: raw does not have an
index arg and sets idxen=0 in the instruction, and struct always sets
idxen=1 in the instruction even if the index is 0, to allow for the
fact that gfx9 does bounds checking differently depending on whether
idxen is set;
* there is a combined cachepolicy arg (glc+slc)
* there are now only two offset args: one for the offset that is
included in bounds checking and swizzling, to be split between the
instruction's voffset and immoffset fields, and one for the offset
that is excluded from bounds checking and swizzling, to go into the
instruction's soffset field.
The AMDISD::BUFFER_* SD nodes always have an index operand, all three
offset operands, combined cachepolicy operand, and an extra idxen
operand.
The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50306
Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205
llvm-svn: 340269
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This commit adds new intrinsics
llvm.amdgcn.raw.tbuffer.load
llvm.amdgcn.struct.tbuffer.load
llvm.amdgcn.raw.tbuffer.store
llvm.amdgcn.struct.tbuffer.store
with the following changes from the llvm.amdgcn.tbuffer.* intrinsics:
* there are separate raw and struct versions: raw does not have an index
arg and sets idxen=0 in the instruction, and struct always sets
idxen=1 in the instruction even if the index is 0, to allow for the
fact that gfx9 does bounds checking differently depending on whether
idxen is set;
* there is a combined format arg (dfmt+nfmt)
* there is a combined cachepolicy arg (glc+slc)
* there are now only two offset args: one for the offset that is
included in bounds checking and swizzling, to be split between the
instruction's voffset and immoffset fields, and one for the offset
that is excluded from bounds checking and swizzling, to go into the
instruction's soffset field.
The AMDISD::TBUFFER_* SD nodes always have an index operand, all three
offset operands, combined format operand, combined cachepolicy operand,
and an extra idxen operand.
The tbuffer pseudo- and real instructions now also have a combined
format operand.
The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store
intrinsics continue to work.
V2: Separate raw and struct intrinsics.
V3: Moved extract_glc and extract_slc defs to a more sensible place.
V4: Rebased on D49995.
V5: Only two separate offset args instead of three.
V6: Pseudo- and real instructions have joint format operand.
V7: Restored optionality of dfmt and nfmt in assembler.
V8: Addressed minor review comments.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D49026
Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4
llvm-svn: 340268
|