summaryrefslogtreecommitdiffstats
path: root/llvm/docs/AMDGPUModifierSyntax.rst
diff options
context:
space:
mode:
Diffstat (limited to 'llvm/docs/AMDGPUModifierSyntax.rst')
-rw-r--r--llvm/docs/AMDGPUModifierSyntax.rst349
1 files changed, 300 insertions, 49 deletions
diff --git a/llvm/docs/AMDGPUModifierSyntax.rst b/llvm/docs/AMDGPUModifierSyntax.rst
index 1a555b67832..d66e94dcb91 100644
--- a/llvm/docs/AMDGPUModifierSyntax.rst
+++ b/llvm/docs/AMDGPUModifierSyntax.rst
@@ -73,8 +73,8 @@ Examples:
.. _amdgpu_synid_sw_offset16:
-pattern
-~~~~~~~
+swizzle pattern
+~~~~~~~~~~~~~~~
This is a special modifier which may be used with *ds_swizzle_b32* instruction only.
It specifies a swizzle pattern in numeric or symbolic form. The default value is 0.
@@ -165,8 +165,8 @@ EXP Modifiers
done
~~~~
-Specifies if this is the last export from the shader to the target. By default, current
-instruction does not finish an export sequence.
+Specifies if this is the last export from the shader to the target. By default,
+*exp* instruction does not finish an export sequence.
======================================== ================================================
Syntax Description
@@ -249,11 +249,71 @@ Examples:
offset:-4000
offset:0x10
+.. _amdgpu_synid_flat_offset12s:
+
+offset12s
+~~~~~~~~~
+
+Specifies an immediate signed 12-bit offset, in bytes. The default value is 0.
+
+Can be used with *global/scratch* opcodes only.
+
+GFX10 only.
+
+ ============================ =======================================================
+ Syntax Description
+ ============================ =======================================================
+ offset:{-2048..2047} Specifies a 12-bit signed offset as an
+ :ref:`integer number <amdgpu_synid_integer_number>`.
+ ============================ =======================================================
+
+Examples:
+
+.. parsed-literal::
+
+ offset:-2000
+ offset:0x10
+
+.. _amdgpu_synid_flat_offset11:
+
+offset11
+~~~~~~~~
+
+Specifies an immediate unsigned 11-bit offset, in bytes. The default value is 0.
+
+Cannot be used with *global/scratch* opcodes.
+
+GFX10 only.
+
+ ================= ======================================================
+ Syntax Description
+ ================= ======================================================
+ offset:{0..2047} Specifies an 11-bit unsigned offset as a positive
+ :ref:`integer number <amdgpu_synid_integer_number>`.
+ ================= ======================================================
+
+Examples:
+
+.. parsed-literal::
+
+ offset:2047
+ offset:0xff
+
+dlc
+~~~
+
+See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
+
glc
~~~
See a description :ref:`here<amdgpu_synid_glc>`.
+lds
+~~~
+
+See a description :ref:`here<amdgpu_synid_lds>`. GFX10 only.
+
slc
~~~
@@ -345,7 +405,7 @@ r128
Specifies texture resource size. The default size is 256 bits.
-GFX7 and GFX8 only.
+GFX7, GFX8 and GFX10 only.
=================== ================================================
Syntax Description
@@ -407,7 +467,7 @@ Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
Note that GFX8.0 does not support data packing.
Each 16-bit data element occupies 1 VGPR.
- GFX8.1 and GFX9 support data packing.
+ GFX8.1, GFX9 and GFX10 support data packing.
Each pair of 16-bit data elements
occupies 1 VGPR.
======================================== ================================================
@@ -417,7 +477,8 @@ Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
a16
~~~
-Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only.
+Specifies size of image address components: 16 or 32 bits (32 bits by default).
+GFX9 and GFX10 only.
======================================== ================================================
Syntax Description
@@ -425,9 +486,69 @@ Specifies size of image address components: 16 or 32 bits (32 bits by default).
a16 Enables 16-bits image address components.
======================================== ================================================
+.. _amdgpu_synid_dim:
+
+dim
+~~~
+
+Specifies surface dimension. This is a mandatory modifier. There is no default value.
+
+GFX10 only.
+
+ =============================== =========================================================
+ Syntax Description
+ =============================== =========================================================
+ dim:1D One-dimensional image.
+ dim:2D Two-dimensional image.
+ dim:3D Three-dimensional image.
+ dim:CUBE Cubemap array.
+ dim:1D_ARRAY One-dimensional image array.
+ dim:2D_ARRAY Two-dimensional image array.
+ dim:2D_MSAA Two-dimensional multi-sample auto-aliasing image.
+ dim:2D_MSAA_ARRAY Two-dimensional multi-sample auto-aliasing image array.
+ =============================== =========================================================
+
+The following table defines an alternative syntax which is supported
+for compatibility with SP3 assembler:
+
+ =============================== =========================================================
+ Syntax Description
+ =============================== =========================================================
+ dim:SQ_RSRC_IMG_1D One-dimensional image.
+ dim:SQ_RSRC_IMG_2D Two-dimensional image.
+ dim:SQ_RSRC_IMG_3D Three-dimensional image.
+ dim:SQ_RSRC_IMG_CUBE Cubemap array.
+ dim:SQ_RSRC_IMG_1D_ARRAY One-dimensional image array.
+ dim:SQ_RSRC_IMG_2D_ARRAY Two-dimensional image array.
+ dim:SQ_RSRC_IMG_2D_MSAA Two-dimensional multi-sample auto-aliasing image.
+ dim:SQ_RSRC_IMG_2D_MSAA_ARRAY Two-dimensional multi-sample auto-aliasing image array.
+ =============================== =========================================================
+
+dlc
+~~~
+
+See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
+
Miscellaneous Modifiers
-----------------------
+.. _amdgpu_synid_dlc:
+
+dlc
+~~~
+
+Controls device level cache policy for memory operations. Used for synchronization.
+When specified, forces operation to bypass device level cache making the operation device
+level coherent. By default, instructions use device level cache.
+
+GFX10 only.
+
+ ======================================== ================================================
+ Syntax Description
+ ======================================== ================================================
+ dlc Bypass device level cache.
+ ======================================== ================================================
+
.. _amdgpu_synid_glc:
glc
@@ -444,50 +565,63 @@ See AMD documentation for details.
glc Set glc bit to 1.
======================================== ================================================
-.. _amdgpu_synid_slc:
+.. _amdgpu_synid_lds:
-slc
+lds
~~~
-Specifies cache policy. The default value is off (0).
+Specifies where to store the result: VGPRs or LDS (VGPRs by default).
-See AMD documentation for details.
+ ======================================== ===========================
+ Syntax Description
+ ======================================== ===========================
+ lds Store result in LDS.
+ ======================================== ===========================
+
+.. _amdgpu_synid_nv:
+
+nv
+~~
+
+Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
+
+GFX9 only.
======================================== ================================================
Syntax Description
======================================== ================================================
- slc Set slc bit to 1.
+ nv Indicates that instruction operates on
+ non-volatile memory.
======================================== ================================================
-.. _amdgpu_synid_tfe:
+.. _amdgpu_synid_slc:
-tfe
+slc
~~~
-Controls access to partially resident textures. The default value is off (0).
+Specifies cache policy. The default value is off (0).
See AMD documentation for details.
======================================== ================================================
Syntax Description
======================================== ================================================
- tfe Set tfe bit to 1.
+ slc Set slc bit to 1.
======================================== ================================================
-.. _amdgpu_synid_nv:
+.. _amdgpu_synid_tfe:
-nv
-~~
+tfe
+~~~
-Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
+Controls access to partially resident textures. The default value is off (0).
-GFX9 only.
+See AMD documentation for details.
======================================== ================================================
Syntax Description
======================================== ================================================
- nv Indicates that instruction operates on
- non-volatile memory.
+ tfe Set tfe bit to 1.
======================================== ================================================
MUBUF/MTBUF Modifiers
@@ -574,18 +708,15 @@ slc
See a description :ref:`here<amdgpu_synid_slc>`.
-.. _amdgpu_synid_lds:
-
lds
~~~
-Specifies where to store the result: VGPRs or LDS (VGPRs by default).
+See a description :ref:`here<amdgpu_synid_lds>`.
- ======================================== ===========================
- Syntax Description
- ======================================== ===========================
- lds Store result in LDS.
- ======================================== ===========================
+dlc
+~~~
+
+See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
tfe
~~~
@@ -617,7 +748,12 @@ See a description :ref:`here<amdgpu_synid_glc>`.
nv
~~
-See a description :ref:`here<amdgpu_synid_nv>`.
+See a description :ref:`here<amdgpu_synid_nv>`. GFX9 only.
+
+dlc
+~~~
+
+See a description :ref:`here<amdgpu_synid_dlc>`. GFX10 only.
VINTRP Modifiers
----------------
@@ -628,7 +764,7 @@ high
~~~~
Specifies which half of the LDS word to use. Low half of LDS word is used by default.
-GFX9 only.
+GFX9 and GFX10 only.
======================================== ================================
Syntax Description
@@ -636,10 +772,60 @@ GFX9 only.
high Use high half of LDS word.
======================================== ================================
-VOP1/VOP2 DPP Modifiers
------------------------
+DPP8 Modifiers
+--------------
+
+GFX10 only.
+
+.. _amdgpu_synid_dpp8_sel:
+
+dpp8_sel
+~~~~~~~~
+
+Selects which lane to pull data from, within a group of 8 lanes. This is a mandatory modifier.
+There is no default value.
+
+GFX10 only.
+
+The *dpp8_sel* modifier must specify exactly 8 values, each ranging from 0 to 7.
+First value selects which lane to read from to supply data into lane 0.
+Second value controls value for lane 1 and so on.
+
+ =============================================================== ===========================
+ Syntax Description
+ =============================================================== ===========================
+ dpp8:[{0..7},{0..7},{0..7},{0..7},{0..7},{0..7},{0..7},{0..7}] Select lanes to read from.
+ =============================================================== ===========================
+
+Examples:
+
+.. parsed-literal::
+
+ dpp8:[7,6,5,4,3,2,1,0]
+ dpp8:[0,1,0,1,0,1,0,1]
+
+.. _amdgpu_synid_fi8:
+
+fi
+~~
+
+Controls interaction with inactive lanes for *dpp8* instructions. The default value is zero.
+
+Note. *Inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero.
-GFX8 and GFX9 only.
+GFX10 only.
+
+ ==================================== =====================================================
+ Syntax Description
+ ==================================== =====================================================
+ fi:0 Fetch zero when accessing data from inactive lanes.
+ fi:1 Fetch pre-exist values from inactive lanes.
+ ==================================== =====================================================
+
+DPP/DPP16 Modifiers
+-------------------
+
+GFX8, GFX9 and GFX10 only.
.. _amdgpu_synid_dpp_ctrl:
@@ -649,7 +835,9 @@ dpp_ctrl
Specifies how data are shared between threads. This is a mandatory modifier.
There is no default value.
-Note. The lanes of a wavefront are organized in four banks and four rows.
+GFX8 and GFX9 only. Use :ref:`dpp16_ctrl<amdgpu_synid_dpp16_ctrl>` for GFX10.
+
+Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
======================================== ================================================
Syntax Description
@@ -679,6 +867,44 @@ Examples:
quad_perm:[0, 1, 2, 3]
row_shl:3
+.. _amdgpu_synid_dpp16_ctrl:
+
+dpp16_ctrl
+~~~~~~~~~~
+
+Specifies how data are shared between threads. This is a mandatory modifier.
+There is no default value.
+
+GFX10 only. Use :ref:`dpp_ctrl<amdgpu_synid_dpp_ctrl>` for GFX8 and GFX9.
+
+Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
+(There are only two rows in *wave32* mode.)
+
+ ======================================== ====================================================
+ Syntax Description
+ ======================================== ====================================================
+ quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads.
+ row_mirror Mirror threads within row.
+ row_half_mirror Mirror threads within 1/2 row (8 threads).
+ row_share:{0..15} Share the value from the specified lane with other
+ lanes in the row.
+ row_xmask:{0..15} Fetch from XOR(current lane id, specified lane id).
+ row_shl:{1..15} Row shift left by 1-15 threads.
+ row_shr:{1..15} Row shift right by 1-15 threads.
+ row_ror:{1..15} Row rotate right by 1-15 threads.
+ ======================================== ====================================================
+
+Note: Numeric parameters may be specified as either
+:ref:`integer numbers<amdgpu_synid_integer_number>` or
+:ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
+
+Examples:
+
+.. parsed-literal::
+
+ quad_perm:[0, 1, 2, 3]
+ row_shl:3
+
.. _amdgpu_synid_row_mask:
row_mask
@@ -686,7 +912,8 @@ row_mask
Controls which rows are enabled for data sharing. By default, all rows are enabled.
-Note. The lanes of a wavefront are organized in four banks and four rows.
+Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
+(There are only two rows in *wave32* mode.)
======================================== =====================================================
Syntax Description
@@ -696,6 +923,9 @@ Note. The lanes of a wavefront are organized in four banks and four rows.
Each of 4 bits in the mask controls one
row (0 - disabled, 1 - enabled).
+
+ In *wave32* mode the values should be limited to
+ {0..7}.
======================================== =====================================================
Examples:
@@ -713,7 +943,8 @@ bank_mask
Controls which banks are enabled for data sharing. By default, all banks are enabled.
-Note. The lanes of a wavefront are organized in four banks and four rows.
+Note. The lanes of a wavefront are organized in four *rows* and four *banks*.
+(There are only two rows in *wave32* mode.)
======================================== =======================================================
Syntax Description
@@ -750,10 +981,30 @@ invalid lanes is disabled.
return zero.
======================================== ================================================
-VOP1/VOP2/VOPC SDWA Modifiers
------------------------------
+.. _amdgpu_synid_fi16:
+
+fi
+~~
+
+Controls interaction with *inactive* lanes for *dpp16* instructions. The default value is zero.
-GFX8 and GFX9 only.
+Note. *Inactive* lanes are those whose :ref:`exec<amdgpu_synid_exec>` mask bit is zero.
+
+GFX10 only.
+
+ ======================================== ==================================================
+ Syntax Description
+ ======================================== ==================================================
+ fi:0 Interaction with inactive lanes is controlled by
+ :ref:`bound_ctrl<amdgpu_synid_bound_ctrl>`.
+
+ fi:1 Fetch pre-exist values from inactive lanes.
+ ======================================== ==================================================
+
+SDWA Modifiers
+--------------
+
+GFX8, GFX9 and GFX10 only.
clamp
~~~~~
@@ -765,7 +1016,7 @@ omod
See a description :ref:`here<amdgpu_synid_omod>`.
-GFX9 only.
+GFX9 and GFX10 only.
.. _amdgpu_synid_dst_sel:
@@ -844,12 +1095,12 @@ Controls which bits in the src1 are used. By default, all bits are used.
.. _amdgpu_synid_sdwa_operand_modifiers:
-VOP1/VOP2/VOPC SDWA Operand Modifiers
--------------------------------------
+SDWA Operand Modifiers
+----------------------
Operand modifiers are not used separately. They are applied to source operands.
-GFX8 and GFX9 only.
+GFX8, GFX9 and GFX10 only.
abs
~~~
@@ -903,7 +1154,7 @@ The value 0 selects the low bits, while 1 selects the high bits.
Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
by op_sel must be 0.
-GFX9 only.
+GFX9 and GFX10 only.
======================================== ============================================================
Syntax Description
@@ -1029,7 +1280,7 @@ This section describes modifiers of *regular* VOP3P instructions.
*v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16*
instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`.
-GFX9 only.
+GFX9 and GFX10 only.
.. _amdgpu_synid_op_sel:
@@ -1173,7 +1424,7 @@ in a manner different from *regular* VOP3P instructions.
See a description below.
-GFX9 only.
+GFX9 and GFX10 only.
.. _amdgpu_synid_mad_mix_op_sel:
OpenPOWER on IntegriCloud