summaryrefslogtreecommitdiffstats
path: root/llvm/docs/AMDGPUOperandSyntax.rst
diff options
context:
space:
mode:
Diffstat (limited to 'llvm/docs/AMDGPUOperandSyntax.rst')
-rw-r--r--llvm/docs/AMDGPUOperandSyntax.rst1502
1 files changed, 756 insertions, 746 deletions
diff --git a/llvm/docs/AMDGPUOperandSyntax.rst b/llvm/docs/AMDGPUOperandSyntax.rst
index 4f3536eed40..4fa2bb2c9eb 100644
--- a/llvm/docs/AMDGPUOperandSyntax.rst
+++ b/llvm/docs/AMDGPUOperandSyntax.rst
@@ -1,6 +1,6 @@
-=================================================
-Syntax of AMDGPU Assembler Operands and Modifiers
-=================================================
+=====================================
+Syntax of AMDGPU Instruction Operands
+=====================================
.. contents::
:local:
@@ -8,1048 +8,1058 @@ Syntax of AMDGPU Assembler Operands and Modifiers
Conventions
===========
-The following conventions are used in syntax description:
+The following notation is used throughout this document:
- =================== =============================================================
+ =================== =============================================================================
Notation Description
- =================== =============================================================
+ =================== =============================================================================
{0..N} Any integer value in the range from 0 to N (inclusive).
- Unless stated otherwise, this value may be specified as
- either a literal or an llvm expression.
- <x> Syntax and meaning of *<x>* is explained elsewhere.
- =================== =============================================================
+ <x> Syntax and meaning of *x* is explained elsewhere.
+ =================== =============================================================================
.. _amdgpu_syn_operands:
Operands
========
-TBD
+.. _amdgpu_synid_v:
-.. _amdgpu_syn_modifiers:
+v
+-
-Modifiers
-=========
+Vector registers. There are 256 32-bit vector registers.
-DS Modifiers
-------------
-
-.. _amdgpu_synid_ds_offset8:
-
-ds_offset8
-~~~~~~~~~~
-
-Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0.
-
-Used with DS instructions which have 2 addresses.
-
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- offset:{0..0xFF} Specifies a 8-bit offset.
- ======================================== ================================================
-
-.. _amdgpu_synid_ds_offset16:
-
-ds_offset16
-~~~~~~~~~~~
-
-Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0.
-
-Used with DS instructions which have 1 address.
-
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- offset:{0..0xFFFF} Specifies a 16-bit offset.
- ======================================== ================================================
-
-.. _amdgpu_synid_sw_offset16:
-
-sw_offset16
-~~~~~~~~~~~
-
-This is a special modifier which may be used with *ds_swizzle_b32* instruction only.
-Specifies a sizzle pattern in numeric or symbolic form. The default value is 0.
-
-See AMD documentation for more information.
-
- ======================================================= ===================================================
- Syntax Description
- ======================================================= ===================================================
- offset:{0..0xFFFF} Specifies a 16-bit swizzle pattern
- in a numeric form.
- offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3}) Specifies a quad permute mode pattern; each
- number is a lane id.
- offset:swizzle(BITMASK_PERM, "<mask>") Specifies a bitmask permute mode pattern
- which converts a 5-bit lane id to another
- lane id with which the lane interacts.
-
- <mask> is a 5 character sequence which
- specifies how to transform the bits of the
- lane id. The following characters are allowed:
-
- * "0" - set bit to 0.
+A sequence of *vector* registers may be used to operate with more than 32 bits of data.
- * "1" - set bit to 1.
+Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* registers.
- * "p" - preserve bit.
+ =================================================== ====================================================================
+ Syntax Description
+ =================================================== ====================================================================
+ **v**\<N> A single 32-bit *vector* register.
- * "i" - inverse bit.
+ *N* must be a decimal integer number.
+ **v[**\ <N>\ **]** A single 32-bit *vector* register.
- offset:swizzle(BROADCAST,{2..32},{0..N}) Specifies a broadcast mode.
- Broadcasts the value of any particular lane to
- all lanes in its group.
+ *N* may be specified as an
+ :ref:`integer number<amdgpu_synid_integer_number>`
+ or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
+ **v[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers.
- The first numeric parameter is a group
- size and must be equal to 2, 4, 8, 16 or 32.
+ *N* and *K* may be specified as
+ :ref:`integer numbers<amdgpu_synid_integer_number>`
+ or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
+ **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers.
- The second numeric parameter is an index of the
- lane being broadcasted. The index must not exceed
- group size.
- offset:swizzle(SWAP,{1..16}) Specifies a swap mode.
- Swaps the neighboring groups of
- 1, 2, 4, 8 or 16 lanes.
- offset:swizzle(REVERSE,{2..32}) Specifies a reverse mode. Reverses
- the lanes for groups of 2, 4, 8, 16 or 32 lanes.
- ======================================================= ===================================================
+ Register indices must be specified as decimal integer numbers.
+ =================================================== ====================================================================
-.. _amdgpu_synid_gds:
+Note. *N* and *K* must satisfy the following conditions:
-gds
-~~~
+* *N* <= *K*.
+* 0 <= *N* <= 255.
+* 0 <= *K* <= 255.
+* *K-N+1* must be equal to 1, 2, 3, 4, 8 or 16.
-Specifies whether to use GDS or LDS memory (LDS is the default).
+Examples:
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- gds Use GDS memory.
- ======================================== ================================================
+.. code-block:: nasm
+ v255
+ v[0]
+ v[0:1]
+ v[1:1]
+ v[0:3]
+ v[2*2]
+ v[1-1:2-1]
+ [v252]
+ [v252,v253,v254,v255]
-EXP Modifiers
--------------
+.. _amdgpu_synid_s:
-.. _amdgpu_synid_done:
+s
+-
-done
-~~~~
+Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
-Specifies if this is the last export from the shader to the target. By default, current
-instruction does not finish an export sequence.
+ ======= ============================
+ GPU Number of *scalar* registers
+ ======= ============================
+ GFX7 104
+ GFX8 102
+ GFX9 102
+ ======= ============================
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- done Indicates the last export operation.
- ======================================== ================================================
+A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
+Assembler currently supports sequences of 1, 2, 4, 8 and 16 *scalar* registers.
-.. _amdgpu_synid_compr:
+Pairs of *scalar* registers must be even-aligned (the first register must be even).
+Sequences of 4 and more *scalar* registers must be quad-aligned.
-compr
-~~~~~
+ ======================================================== ====================================================================
+ Syntax Description
+ ======================================================== ====================================================================
+ **s**\ <N> A single 32-bit *scalar* register.
-Indicates if the data are compressed (not compressed by default).
+ *N* must be a decimal integer number.
+ **s[**\ <N>\ **]** A single 32-bit *scalar* register.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- compr Data are compressed.
- ======================================== ================================================
+ *N* may be specified as an
+ :ref:`integer number<amdgpu_synid_integer_number>`
+ or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
+ **s[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers.
-.. _amdgpu_synid_vm:
+ *N* and *K* may be specified as
+ :ref:`integer numbers<amdgpu_synid_integer_number>`
+ or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
+ **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers.
-vm
-~~
+ Register indices must be specified as decimal integer numbers.
+ ======================================================== ====================================================================
-Specifies valid mask flag state (off by default).
+Note. *N* and *K* must satisfy the following conditions:
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- vm Set valid mask flag.
- ======================================== ================================================
+* *N* must be properly aligned based on sequence size.
+* *N* <= *K*.
+* 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
+* 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
+* *K-N+1* must be equal to 1, 2, 4, 8 or 16.
-FLAT Modifiers
---------------
+Examples:
-.. _amdgpu_synid_flat_offset12:
+.. code-block:: nasm
-flat_offset12
-~~~~~~~~~~~~~
+ s0
+ s[0]
+ s[0:1]
+ s[1:1]
+ s[0:3]
+ s[2*2]
+ s[1-1:2-1]
+ [s4]
+ [s4,s5,s6,s7]
-Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
+Examples of *scalar* registers with an invalid alignment:
-Cannot be used with *global/scratch* opcodes. GFX9 only.
+.. code-block:: nasm
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- offset:{0..4095} Specifies a 12-bit unsigned offset.
- ======================================== ================================================
+ s[1:2]
+ s[2:5]
-.. _amdgpu_synid_flat_offset13:
+.. _amdgpu_synid_trap:
-flat_offset13
-~~~~~~~~~~~~~
+trap
+----
-Specifies an immediate signed 13-bit offset, in bytes. The default value is 0.
+A set of trap handler registers:
-Can be used with *global/scratch* opcodes only. GFX9 only.
+* :ref:`ttmp<amdgpu_synid_ttmp>`
+* :ref:`tba<amdgpu_synid_tba>`
+* :ref:`tma<amdgpu_synid_tma>`
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- offset:{-4096..+4095} Specifies a 13-bit signed offset.
- ======================================== ================================================
+.. _amdgpu_synid_ttmp:
-glc
-~~~
+ttmp
+----
-See a description :ref:`here<amdgpu_synid_glc>`.
+Trap handler temporary scalar registers, 32-bits wide.
+The number of available *ttmp* registers depends on GPU:
-slc
-~~~
+ ======= ===========================
+ GPU Number of *ttmp* registers
+ ======= ===========================
+ GFX7 12
+ GFX8 12
+ GFX9 16
+ ======= ===========================
-See a description :ref:`here<amdgpu_synid_slc>`.
+A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
+Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers.
-tfe
-~~~
+Pairs of *ttmp* registers must be even-aligned (the first register must be even).
+Sequences of 4 and more *ttmp* registers must be quad-aligned.
-See a description :ref:`here<amdgpu_synid_tfe>`.
+ ============================================================= ====================================================================
+ Syntax Description
+ ============================================================= ====================================================================
+ **ttmp**\ <N> A single 32-bit *ttmp* register.
-nv
-~~
+ *N* must be a decimal integer number.
+ **ttmp[**\ <N>\ **]** A single 32-bit *ttmp* register.
-See a description :ref:`here<amdgpu_synid_nv>`.
+ *N* may be specified as an
+ :ref:`integer number<amdgpu_synid_integer_number>`
+ or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
+ **ttmp[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers.
-MIMG Modifiers
---------------
+ *N* and *K* may be specified as
+ :ref:`integer numbers<amdgpu_synid_integer_number>`
+ or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
+ **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers.
-.. _amdgpu_synid_dmask:
+ Register indices must be specified as decimal integer numbers.
+ ============================================================= ====================================================================
-dmask
-~~~~~
+Note. *N* and *K* must satisfy the following conditions:
-Specifies which channels (image components) are used by the operation. By default, no channels
-are used.
+* *N* must be properly aligned based on sequence size.
+* *N* <= *K*.
+* 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
+* 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
+* *K-N+1* must be equal to 1, 2, 4, 8 or 16.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- dmask:{0..15} Each bit corresponds to one of 4 image
- components (RGBA). If the specified bit value
- is 0, the component is not used, value 1 means
- that the component is used.
- ======================================== ================================================
+Examples:
-This modifier has some limitations depending on instruction kind:
+.. code-block:: nasm
- ======================================== ================================================
- Instruction Kind Valid dmask Values
- ======================================== ================================================
- 32-bit atomic cmpswap 0x3
- other 32-bit atomic instructions 0x1
- 64-bit atomic cmpswap 0xF
- other 64-bit atomic instructions 0x3
- GATHER4 0x1, 0x2, 0x4, 0x8
- Other instructions any value
- ======================================== ================================================
+ ttmp0
+ ttmp[0]
+ ttmp[0:1]
+ ttmp[1:1]
+ ttmp[0:3]
+ ttmp[2*2]
+ ttmp[1-1:2-1]
+ [ttmp4]
+ [ttmp4,ttmp5,ttmp6,ttmp7]
-.. _amdgpu_synid_unorm:
+Examples of *ttmp* registers with an invalid alignment:
-unorm
-~~~~~
+.. code-block:: nasm
-Specifies whether address is normalized or not (normalized by default).
+ ttmp[1:2]
+ ttmp[2:5]
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- unorm Force address to be un-normalized.
- ======================================== ================================================
+.. _amdgpu_synid_tba:
-glc
-~~~
+tba
+---
-See a description :ref:`here<amdgpu_synid_glc>`.
+Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
-slc
-~~~
+ ================== ======================================================================= =============
+ Syntax Description Availability
+ ================== ======================================================================= =============
+ tba 64-bit *trap base address* register. GFX7, GFX8
+ [tba] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8
+ [tba_lo,tba_hi] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8
+ ================== ======================================================================= =============
-See a description :ref:`here<amdgpu_synid_slc>`.
+High and low 32 bits of *trap base address* may be accessed as separate registers:
-.. _amdgpu_synid_r128:
+ ================== ======================================================================= =============
+ Syntax Description Availability
+ ================== ======================================================================= =============
+ tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8
+ tba_hi High 32 bits of *trap base address* register. GFX7, GFX8
+ [tba_lo] Low 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8
+ [tba_hi] High 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8
+ ================== ======================================================================= =============
-r128
-~~~~
+Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9,
+but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
-Specifies texture resource size. The default size is 256 bits.
+.. _amdgpu_synid_tma:
-GFX7 and GFX8 only.
+tma
+---
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- r128 Specifies 128 bits texture resource size.
- ======================================== ================================================
+Trap memory address, 64-bits wide.
-tfe
-~~~
+ ================= ======================================================================= ==================
+ Syntax Description Availability
+ ================= ======================================================================= ==================
+ tma 64-bit *trap memory address* register. GFX7, GFX8
+ [tma] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8
+ [tma_lo,tma_hi] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8
+ ================= ======================================================================= ==================
-See a description :ref:`here<amdgpu_synid_tfe>`.
+High and low 32 bits of *trap memory address* may be accessed as separate registers:
-.. _amdgpu_synid_lwe:
+ ================= ======================================================================= ==================
+ Syntax Description Availability
+ ================= ======================================================================= ==================
+ tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8
+ tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8
+ [tma_lo] Low 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
+ [tma_hi] High 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
+ ================= ======================================================================= ==================
-lwe
-~~~
+Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9,
+but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
-Specifies LOD warning status (LOD warning is disabled by default).
+.. _amdgpu_synid_flat_scratch:
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- lwe Enables LOD warning.
- ======================================== ================================================
-
-.. _amdgpu_synid_da:
-
-da
-~~
-
-Specifies if an array index must be sent to TA. By default, array index is not sent.
-
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- da Send an array-index to TA.
- ======================================== ================================================
+flat_scratch
+------------
-.. _amdgpu_synid_d16:
+Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
-d16
-~~~
+ ================================== ================================================================
+ Syntax Description
+ ================================== ================================================================
+ flat_scratch 64-bit *flat scratch* address register.
+ [flat_scratch] 64-bit *flat scratch* address register (an alternative syntax).
+ [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an alternative syntax).
+ ================================== ================================================================
+
+High and low 32 bits of *flat scratch* address may be accessed as separate registers:
+
+ ========================= =========================================================================
+ Syntax Description
+ ========================= =========================================================================
+ flat_scratch_lo Low 32 bits of *flat scratch* address register.
+ flat_scratch_hi High 32 bits of *flat scratch* address register.
+ [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an alternative syntax).
+ [flat_scratch_hi] High 32 bits of *flat scratch* address register (an alternative syntax).
+ ========================= =========================================================================
+
+.. _amdgpu_synid_xnack:
+
+xnack
+-----
+
+Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
+received an *XNACK* due to a vector memory operation.
-Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
+.. WARNING:: GFX7 does not support *xnack* feature. Not all GFX8 and GFX9 :ref:`processors<amdgpu-processors>` support *xnack* feature.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- d16 Enables 16-bits data mode.
+\
- On loads, convert data in memory to 16-bit
- format before storing it in VGPRs.
+ ============================== =====================================================
+ Syntax Description
+ ============================== =====================================================
+ xnack_mask 64-bit *xnack mask* register.
+ [xnack_mask] 64-bit *xnack mask* register (an alternative syntax).
+ [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an alternative syntax).
+ ============================== =====================================================
- For stores, convert 16-bit data in VGPRs to
- 32 bits before going to memory.
+High and low 32 bits of *xnack mask* may be accessed as separate registers:
- Note that 16-bit data are stored in VGPRs
- unpacked in GFX8.0. In GFX8.1 and GFX9 16-bit
- data are packed.
- ======================================== ================================================
+ ===================== ==============================================================
+ Syntax Description
+ ===================== ==============================================================
+ xnack_mask_lo Low 32 bits of *xnack mask* register.
+ xnack_mask_hi High 32 bits of *xnack mask* register.
+ [xnack_mask_lo] Low 32 bits of *xnack mask* register (an alternative syntax).
+ [xnack_mask_hi] High 32 bits of *xnack mask* register (an alternative syntax).
+ ===================== ==============================================================
-.. _amdgpu_synid_a16:
+.. _amdgpu_synid_vcc:
-a16
-~~~
+vcc
+---
-Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only.
+Vector condition code, 64-bits wide. A bit mask with one bit per thread;
+it holds the result of a vector compare operation.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- a16 Enables 16-bits image address components.
- ======================================== ================================================
+ ================ =========================================================================
+ Syntax Description
+ ================ =========================================================================
+ vcc 64-bit *vector condition code* register.
+ [vcc] 64-bit *vector condition code* register (an alternative syntax).
+ [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an alternative syntax).
+ ================ =========================================================================
-Miscellaneous Modifiers
------------------------
+High and low 32 bits of *vector condition code* may be accessed as separate registers:
-.. _amdgpu_synid_glc:
+ ================ =========================================================================
+ Syntax Description
+ ================ =========================================================================
+ vcc_lo Low 32 bits of *vector condition code* register.
+ vcc_hi High 32 bits of *vector condition code* register.
+ [vcc_lo] Low 32 bits of *vector condition code* register (an alternative syntax).
+ [vcc_hi] High 32 bits of *vector condition code* register (an alternative syntax).
+ ================ =========================================================================
-glc
-~~~
+.. _amdgpu_synid_m0:
-This modifier has different meaning for loads, stores, and atomic operations.
-The default value is off (0).
+m0
+--
-See AMD documentation for details.
+A 32-bit memory register. It has various uses,
+including register indexing and bounds checking.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- glc Set glc bit to 1.
- ======================================== ================================================
+ =========== ===================================================
+ Syntax Description
+ =========== ===================================================
+ m0 A 32-bit *memory* register.
+ [m0] A 32-bit *memory* register (an alternative syntax).
+ =========== ===================================================
-.. _amdgpu_synid_slc:
+.. _amdgpu_synid_exec:
-slc
-~~~
+exec
+----
-Specifies cache policy. The default value is off (0).
+Execute mask, 64-bits wide. A bit mask with one bit per thread,
+which is applied to vector instructions and controls which threads execute
+and which ignore the instruction.
-See AMD documentation for details.
+ ===================== =================================================================
+ Syntax Description
+ ===================== =================================================================
+ exec 64-bit *execute mask* register.
+ [exec] 64-bit *execute mask* register (an alternative syntax).
+ [exec_lo,exec_hi] 64-bit *execute mask* register (an alternative syntax).
+ ===================== =================================================================
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- slc Set slc bit to 1.
- ======================================== ================================================
+High and low 32 bits of *execute mask* may be accessed as separate registers:
-.. _amdgpu_synid_tfe:
+ ===================== =================================================================
+ Syntax Description
+ ===================== =================================================================
+ exec_lo Low 32 bits of *execute mask* register.
+ exec_hi High 32 bits of *execute mask* register.
+ [exec_lo] Low 32 bits of *execute mask* register (an alternative syntax).
+ [exec_hi] High 32 bits of *execute mask* register (an alternative syntax).
+ ===================== =================================================================
-tfe
-~~~
+.. _amdgpu_synid_vccz:
-Controls access to partially resident textures. The default value is off (0).
+vccz
+----
-See AMD documentation for details.
+A single bit-flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- tfe Set tfe bit to 1.
- ======================================== ================================================
+.. WARNING:: This operand is not currently supported by AMDGPU assembler.
-.. _amdgpu_synid_nv:
+.. _amdgpu_synid_execz:
-nv
-~~
+execz
+-----
-Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
+A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
-GFX9 only.
+.. WARNING:: This operand is not currently supported by AMDGPU assembler.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- nv Indicates that instruction operates on
- non-volatile memory.
- ======================================== ================================================
+.. _amdgpu_synid_scc:
-MUBUF/MTBUF Modifiers
----------------------
+scc
+---
-.. _amdgpu_synid_idxen:
+A single bit flag indicating the result of a scalar compare operation.
-idxen
-~~~~~
+.. WARNING:: This operand is not currently supported by AMDGPU assembler.
-Specifies whether address components include an index. By default, no components are used.
+.. _amdgpu_synid_ldsdirect:
-Can be used together with :ref:`offen<amdgpu_synid_offen>`.
+lds_direct
+----------
-Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
+A special operand which supplies a 32-bit value
+fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- idxen Address components include an index.
- ======================================== ================================================
+.. WARNING:: This operand is not currently supported by AMDGPU assembler.
-.. _amdgpu_synid_offen:
+.. _amdgpu_synid_constant:
-offen
-~~~~~
+constant
+--------
-Specifies whether address components include an offset. By default, no components are used.
+A set of integer and floating-point *inline constants*:
-Can be used together with :ref:`idxen<amdgpu_synid_idxen>`.
+* :ref:`iconst<amdgpu_synid_iconst>`
+* :ref:`fconst<amdgpu_synid_fconst>`
-Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
+These operands are encoded as a part of instruction.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- offen Address components include an offset.
- ======================================== ================================================
+If a number may be encoded as either
+a :ref:`literal<amdgpu_synid_literal>` or
+an :ref:`inline constant<amdgpu_synid_constant>`,
+assembler selects the latter encoding as more efficient.
-.. _amdgpu_synid_addr64:
+.. _amdgpu_synid_iconst:
-addr64
-~~~~~~
+iconst
+------
-Specifies whether a 64-bit address is used. By default, no address is used.
+An :ref:`integer number<amdgpu_synid_integer_number>`
+encoded as an *inline constant*.
-GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and
-:ref:`idxen<amdgpu_synid_idxen>` modifiers.
+Only a small fraction of integer numbers may be encoded as *inline constants*.
+They are enumerated in the table below.
+Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- addr64 A 64-bit address is used.
- ======================================== ================================================
+Integer *inline constants* are converted to
+:ref:`expected operand type<amdgpu_syn_instruction_type>`
+as described :ref:`here<amdgpu_synid_int_const_conv>`.
-.. _amdgpu_synid_buf_offset12:
+ ================================== ====================================
+ Value Note
+ ================================== ====================================
+ {0..64} Positive integer inline constants.
+ {-16..-1} Negative integer inline constants.
+ ================================== ====================================
-buf_offset12
-~~~~~~~~~~~~
+.. WARNING:: GFX7 does not support inline constants for *f16* operands.
-Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
+There are also symbolic inline constants which provide read-only access to H/W registers.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- offset:{0..0xFFF} Specifies a 12-bit unsigned offset.
- ======================================== ================================================
+.. WARNING:: These inline constants are not currently supported by AMDGPU assembler.
-glc
-~~~
+\
-See a description :ref:`here<amdgpu_synid_glc>`.
+ ======================== ================================================ =============
+ Syntax Note Availability
+ ======================== ================================================ =============
+ shared_base Base address of shared memory region. GFX9
+ shared_limit Address of the end of shared memory region. GFX9
+ private_base Base address of private memory region. GFX9
+ private_limit Address of the end of private memory region. GFX9
+ pops_exiting_wave_id A dedicated counter for POPS. GFX9
+ ======================== ================================================ =============
-slc
-~~~
+.. _amdgpu_synid_fconst:
-See a description :ref:`here<amdgpu_synid_slc>`.
+fconst
+------
-.. _amdgpu_synid_lds:
+A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
+encoded as an *inline constant*.
-lds
-~~~
+Only a small fraction of floating-point numbers may be encoded as *inline constants*.
+They are enumerated in the table below.
+Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
-Specifies where to store the result: VGPRs or LDS (VGPRs by default).
+Floating-point *inline constants* are converted to
+:ref:`expected operand type<amdgpu_syn_instruction_type>`
+as described :ref:`here<amdgpu_synid_fp_const_conv>`.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- lds Store result in LDS.
- ======================================== ================================================
+ ================================== ===================================================== ==================
+ Value Note Availability
+ ================================== ===================================================== ==================
+ 0.0 The same as integer constant 0. All GPUs
+ 0.5 Floating-point constant 0.5 All GPUs
+ 1.0 Floating-point constant 1.0 All GPUs
+ 2.0 Floating-point constant 2.0 All GPUs
+ 4.0 Floating-point constant 4.0 All GPUs
+ -0.5 Floating-point constant -0.5 All GPUs
+ -1.0 Floating-point constant -1.0 All GPUs
+ -2.0 Floating-point constant -2.0 All GPUs
+ -4.0 Floating-point constant -4.0 All GPUs
+ 0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8, GFX9
+ 0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8, GFX9
+ 0.159154943091895317852646485335 1.0/(2.0*pi). GFX8, GFX9
+ ================================== ===================================================== ==================
-tfe
-~~~
+.. WARNING:: GFX7 does not support inline constants for *f16* operands.
-See a description :ref:`here<amdgpu_synid_tfe>`.
+.. _amdgpu_synid_literal:
-.. _amdgpu_synid_dfmt:
+literal
+-------
-dfmt
-~~~~
+A literal is a 64-bit value which is encoded as a separate 32-bit dword in the instruction stream.
-TBD
+If a number may be encoded as either
+a :ref:`literal<amdgpu_synid_literal>` or
+an :ref:`inline constant<amdgpu_synid_constant>`,
+assembler selects the latter encoding as more efficient.
-.. _amdgpu_synid_nfmt:
+Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or
+:ref:`expressions<amdgpu_synid_expression>`
+(expressions are currently supported for 32-bit operands only).
-nfmt
-~~~~
+A 64-bit literal value is converted by assembler
+to an :ref:`expected operand type<amdgpu_syn_instruction_type>`
+as described :ref:`here<amdgpu_synid_lit_conv>`.
-TBD
+An instruction may use only one literal but several operands may refer the same literal.
-SMRD/SMEM Modifiers
--------------------
+.. _amdgpu_synid_uimm8:
-glc
-~~~
+uimm8
+-----
-See a description :ref:`here<amdgpu_synid_glc>`.
+A 8-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
+The value is encoded as part of the opcode so it is free to use.
-nv
-~~
+.. _amdgpu_synid_uimm32:
-See a description :ref:`here<amdgpu_synid_nv>`.
+uimm32
+------
-VINTRP Modifiers
-----------------
+A 32-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
+The value is stored as a separate 32-bit dword in the instruction stream.
-.. _amdgpu_synid_high:
+.. _amdgpu_synid_uimm20:
-high
-~~~~
+uimm20
+------
-Specifies which half of the LDS word to use. Low half of LDS word is used by default.
-GFX9 only.
+A 20-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- high Use high half of LDS word.
- ======================================== ================================================
+.. _amdgpu_synid_uimm21:
-VOP1/VOP2 DPP Modifiers
------------------------
+uimm21
+------
-GFX8 and GFX9 only.
+A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
-.. _amdgpu_synid_dpp_ctrl:
+.. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
-dpp_ctrl
-~~~~~~~~
+.. _amdgpu_synid_simm21:
-Specifies how data are shared between threads. This is a mandatory modifier.
-There is no default value.
+simm21
+------
-Note. The lanes of a wavefront are organized in four banks and four rows.
+A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads.
- row_mirror Mirror threads within row.
- row_half_mirror Mirror threads within 1/2 row (8 threads).
- row_bcast:15 Broadcast 15th thread of each row to next row.
- row_bcast:31 Broadcast thread 31 to rows 2 and 3.
- wave_shl:1 Wavefront left shift by 1 thread.
- wave_rol:1 Wavefront left rotate by 1 thread.
- wave_shr:1 Wavefront right shift by 1 thread.
- wave_ror:1 Wavefront right rotate by 1 thread.
- row_shl:{1..15} Row shift left by 1-15 threads.
- row_shr:{1..15} Row shift right by 1-15 threads.
- row_ror:{1..15} Row rotate right by 1-15 threads.
- ======================================== ================================================
+.. WARNING:: Assembler currently supports 20-bit unsigned offsets only .Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
-.. _amdgpu_synid_row_mask:
+.. _amdgpu_synid_off:
-row_mask
-~~~~~~~~
+off
+---
-Controls which rows are enabled for data sharing. By default, all rows are enabled.
+A special entity which indicates that the value of this operand is not used.
-Note. The lanes of a wavefront are organized in four banks and four rows.
+ ================================== ===================================================
+ Syntax Description
+ ================================== ===================================================
+ off Indicates an unused operand.
+ ================================== ===================================================
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- row_mask:{0..15} Each of 4 bits in the mask controls one
- row (0 - disabled, 1 - enabled).
- ======================================== ================================================
-.. _amdgpu_synid_bank_mask:
+.. _amdgpu_synid_number:
-bank_mask
-~~~~~~~~~
+Numbers
+=======
-Controls which banks are enabled for data sharing. By default, all banks are enabled.
+.. _amdgpu_synid_integer_number:
-Note. The lanes of a wavefront are organized in four banks and four rows.
+Integer Numbers
+---------------
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- bank_mask:{0..15} Each of 4 bits in the mask controls one
- bank (0 - disabled, 1 - enabled).
- ======================================== ================================================
+Integer numbers are 64 bits wide.
+They may be specified in binary, octal, hexadecimal and decimal formats:
-.. _amdgpu_synid_bound_ctrl:
+ ============== ====================================
+ Format Syntax
+ ============== ====================================
+ Decimal [-]?[1-9][0-9]*
+ Binary [-]?0b[01]+
+ Octal [-]?0[0-7]+
+ Hexadecimal [-]?0x[0-9a-fA-F]+
+ \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH]
+ ============== ====================================
-bound_ctrl
-~~~~~~~~~~
+Examples:
-Controls data sharing when accessing an invalid lane. By default, data sharing with
-invalid lanes is disabled.
+.. code-block:: nasm
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- bound_ctrl:0 Enables data sharing with invalid lanes.
- Accessing data from an invalid lane will
- return zero.
- ======================================== ================================================
+ -1234
+ 0b1010
+ 010
+ 0xff
+ 0ffh
-VOP1/VOP2/VOPC SDWA Modifiers
------------------------------
+.. _amdgpu_synid_floating-point_number:
-GFX8 and GFX9 only.
+Floating-Point Numbers
+----------------------
-clamp
-~~~~~
+All floating-point numbers are handled as double (64 bits wide).
-See a description :ref:`here<amdgpu_synid_clamp>`.
+Floating-point numbers may be specified in hexadecimal and decimal formats:
-omod
-~~~~
+ ============== ======================================================== ========================================================
+ Format Syntax Note
+ ============== ======================================================== ========================================================
+ Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? Must include either a decimal separator or an exponent.
+ Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+
+ ============== ======================================================== ========================================================
-See a description :ref:`here<amdgpu_synid_omod>`.
+Examples:
-GFX9 only.
+.. code-block:: nasm
-.. _amdgpu_synid_dst_sel:
+ -1.234
+ 234e2
+ -0x1afp-10
+ 0x.1afp10
-dst_sel
-~~~~~~~
+.. _amdgpu_synid_expression:
-Selects which bits in the destination are affected. By default, all bits are affected.
+Expressions
+===========
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- dst_sel:DWORD Use bits 31:0.
- dst_sel:BYTE_0 Use bits 7:0.
- dst_sel:BYTE_1 Use bits 15:8.
- dst_sel:BYTE_2 Use bits 23:16.
- dst_sel:BYTE_3 Use bits 31:24.
- dst_sel:WORD_0 Use bits 15:0.
- dst_sel:WORD_1 Use bits 31:16.
- ======================================== ================================================
+An expression specifies an address or a numeric value.
+There are two kinds of expressions:
+* :ref:`Absolute<amdgpu_synid_absolute_expression>`.
+* :ref:`Relocatable<amdgpu_synid_relocatable_expression>`.
-.. _amdgpu_synid_dst_unused:
+.. _amdgpu_synid_absolute_expression:
-dst_unused
-~~~~~~~~~~
+Absolute Expressions
+--------------------
-Controls what to do with the bits in the destination which are not selected
-by :ref:`dst_sel<amdgpu_synid_dst_sel>`.
-By default, unused bits are preserved.
+The value of an absolute expression remains the same after program relocation.
+Absolute expressions must not include unassigned and relocatable values
+such as labels.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- dst_unused:UNUSED_PAD Pad with zeros.
- dst_unused:UNUSED_SEXT Sign-extend upper bits, zero lower bits.
- dst_unused:UNUSED_PRESERVE Preserve bits.
- ======================================== ================================================
+Examples:
-.. _amdgpu_synid_src0_sel:
+.. code-block:: nasm
-src0_sel
-~~~~~~~~
+ x = -1
+ y = x + 10
-Controls which bits in the src0 are used. By default, all bits are used.
+.. _amdgpu_synid_relocatable_expression:
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- src0_sel:DWORD Use bits 31:0.
- src0_sel:BYTE_0 Use bits 7:0.
- src0_sel:BYTE_1 Use bits 15:8.
- src0_sel:BYTE_2 Use bits 23:16.
- src0_sel:BYTE_3 Use bits 31:24.
- src0_sel:WORD_0 Use bits 15:0.
- src0_sel:WORD_1 Use bits 31:16.
- ======================================== ================================================
+Relocatable Expressions
+-----------------------
-.. _amdgpu_synid_src1_sel:
+The value of a relocatable expression depends on program relocation.
-src1_sel
-~~~~~~~~
+Note that use of relocatable expressions is limited with branch targets
+and 32-bit :ref:`literals<amdgpu_synid_literal>`.
-Controls which bits in the src1 are used. By default, all bits are used.
+Addition information about relocation may be found :ref:`here<amdgpu-relocation-records>`.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- src1_sel:DWORD Use bits 31:0.
- src1_sel:BYTE_0 Use bits 7:0.
- src1_sel:BYTE_1 Use bits 15:8.
- src1_sel:BYTE_2 Use bits 23:16.
- src1_sel:BYTE_3 Use bits 31:24.
- src1_sel:WORD_0 Use bits 15:0.
- src1_sel:WORD_1 Use bits 31:16.
- ======================================== ================================================
+Examples:
-VOP1/VOP2/VOPC SDWA Operand Modifiers
--------------------------------------
+.. code-block:: nasm
-Operand modifiers are not used separately. They are applied to source operands.
+ y = x + 10 // x is not yet defined. Undefined symbols are assumed to be PC-relative.
+ z = .
-GFX8 and GFX9 only.
+Expression Data Type
+--------------------
-abs
-~~~
+Expressions and operands of expressions are interpreted as 64-bit integers.
-See a description :ref:`here<amdgpu_synid_abs>`.
+Expressions may include 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` (double).
+However these operands are also handled as 64-bit integers
+using binary representation of specified floating-point numbers.
+No conversion from floating-point to integer is performed.
-neg
-~~~
+Examples:
-See a description :ref:`here<amdgpu_synid_neg>`.
+.. code-block:: nasm
-.. _amdgpu_synid_sext:
+ x = 0.1 // x is assigned an integer 4591870180066957722 which is a binary representation of 0.1.
+ y = x + x // y is a sum of two integer values; it is not equal to 0.2!
-sext
-~~~~
+Syntax
+------
-Sign-extends value of a (sub-dword) operand to fill all 32 bits.
-Has no effect for 32-bit operands.
+Expressions are composed of
+:ref:`symbols<amdgpu_synid_symbol>`,
+:ref:`integer numbers<amdgpu_synid_integer_number>`,
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
+:ref:`binary operators<amdgpu_synid_expression_bin_op>`,
+:ref:`unary operators<amdgpu_synid_expression_un_op>` and subexpressions.
-Valid for integer operands only.
+Expressions may also use "." which is a reference to the current PC (program counter).
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- sext(<operand>) Sign-extend operand value.
- ======================================== ================================================
+The syntax of expressions is shown below::
-VOP3 Modifiers
---------------
+ expr ::= expr binop expr | primaryexpr ;
-.. _amdgpu_synid_vop3_op_sel:
+ primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ;
-vop3_op_sel
-~~~~~~~~~~~
+ binop ::= '&&'
+ | '||'
+ | '|'
+ | '^'
+ | '&'
+ | '!'
+ | '=='
+ | '!='
+ | '<>'
+ | '<'
+ | '<='
+ | '>'
+ | '>='
+ | '<<'
+ | '>>'
+ | '+'
+ | '-'
+ | '*'
+ | '/'
+ | '%' ;
-Selects the low [15:0] or high [31:16] operand bits for source and destination operands.
-By default, low bits are used for all operands.
+ unop ::= '~'
+ | '+'
+ | '-'
+ | '!' ;
-The number of values specified with the op_sel modifier must match the number of instruction
-operands (both source and destination). First value controls src0, second value controls src1
-and so on, except that the last value controls destination.
-The value 0 selects the low bits, while 1 selects the high bits.
+.. _amdgpu_synid_expression_bin_op:
-Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
-by op_sel must be 0.
+Binary Operators
+----------------
-GFX9 only.
+Binary operators are described in the following table.
+They operate on and produce 64-bit integers.
+Operators with higher priority are performed first.
+
+ ========== ========= ===============================================
+ Operator Priority Meaning
+ ========== ========= ===============================================
+ \* 5 Integer multiplication.
+ / 5 Integer division.
+ % 5 Integer signed remainder.
+ \+ 4 Integer addition.
+ \- 4 Integer subtraction.
+ << 3 Integer shift left.
+ >> 3 Logical shift right.
+ == 2 Equality comparison.
+ != 2 Inequality comparison.
+ <> 2 Inequality comparison.
+ < 2 Signed less than comparison.
+ <= 2 Signed less than or equal comparison.
+ > 2 Signed greater than comparison.
+ >= 2 Signed greater than or equal comparison.
+ \| 1 Bitwise or.
+ ^ 1 Bitwise xor.
+ & 1 Bitwise and.
+ && 0 Logical and.
+ || 0 Logical or.
+ ========== ========= ===============================================
+
+.. _amdgpu_synid_expression_un_op:
+
+Unary Operators
+---------------
- ======================================== ============================================================
- Syntax Description
- ======================================== ============================================================
- op_sel:[{0..1},{0..1}] Select operand bits for instructions with 1 source operand.
- op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
- op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
- ======================================== ============================================================
+Unary operators are described in the following table.
+They operate on and produce 64-bit integers.
-.. _amdgpu_synid_clamp:
+ ========== ===============================================
+ Operator Meaning
+ ========== ===============================================
+ ! Logical negation.
+ ~ Bitwise negation.
+ \+ Integer unary plus.
+ \- Integer unary minus.
+ ========== ===============================================
-clamp
-~~~~~
+.. _amdgpu_synid_symbol:
-Clamp meaning depends on instruction.
+Symbols
+-------
-For *v_cmp* instructions, clamp modifier indicates that the compare signals
-if a floating point exception occurs. By default, signaling is disabled.
-Not supported by GFX7.
+A symbol is a named 64-bit value, representing a relocatable
+address or an absolute (non-relocatable) number.
-For integer operations, clamp modifier indicates that the result must be clamped
-to the largest and smallest representable value. By default, there is no clamping.
-Integer clamping is not supported by GFX7.
+Symbol names have the following syntax:
+ ``[a-zA-Z_.][a-zA-Z0-9_$.@]*``
-For floating point operations, clamp modifier indicates that the result must be clamped
-to the range [0.0, 1.0]. By default, there is no clamping.
+The table below provides several examples of syntax used for symbol definition.
-Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
+ ================ ==========================================================
+ Syntax Meaning
+ ================ ==========================================================
+ .globl <S> Declares a global symbol S without assigning it a value.
+ .set <S>, <E> Assigns the value of an expression E to a symbol S.
+ <S> = <E> Assigns the value of an expression E to a symbol S.
+ <S>: Declares a label S and assigns it the current PC value.
+ ================ ==========================================================
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- clamp Enables clamping (or signaling).
- ======================================== ================================================
+A symbol may be used before it is declared or assigned;
+unassigned symbols are assumed to be PC-relative.
-.. _amdgpu_synid_omod:
+Addition information about symbols may be found :ref:`here<amdgpu-symbols>`.
-omod
-~~~~
+.. _amdgpu_synid_conv:
-Specifies if an output modifier must be applied to the result.
-By default, no output modifiers are applied.
+Conversions
+===========
-Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
+This section describes what happens when a 64-bit
+:ref:`integer number<amdgpu_synid_integer_number>`, a
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or a
+:ref:`symbol<amdgpu_synid_symbol>`
+is used for an operand which has a different type or size.
-Output modifiers are valid for f32 and f64 floating point results only.
-They must not be used with f16.
+Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W:
-Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result
-but accepts output modifiers.
+* Values encoded as :ref:`inline constants<amdgpu_synid_constant>` are handled by H/W.
+* Values encoded as :ref:`literals<amdgpu_synid_literal>` are converted by assembler.
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- mul:2 Multiply the result by 2.
- mul:4 Multiply the result by 4.
- div:2 Multiply the result by 0.5.
- ======================================== ================================================
+.. _amdgpu_synid_const_conv:
-VOP3 Operand Modifiers
-----------------------
+Inline Constants
+----------------
-Operand modifiers are not used separately. They are applied to source operands.
+.. _amdgpu_synid_int_const_conv:
-.. _amdgpu_synid_abs:
+Integer Inline Constants
+~~~~~~~~~~~~~~~~~~~~~~~~
-abs
-~~~
+Integer :ref:`inline constants<amdgpu_synid_constant>`
+may be thought of as 64-bit
+:ref:`integer numbers<amdgpu_synid_integer_number>`;
+when used as operands they are truncated to the size of
+:ref:`expected operand type<amdgpu_syn_instruction_type>`.
+No data type conversions are performed.
-Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any).
-Valid for floating point operands only.
+Examples:
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- abs(<operand>) Get absolute value of operand.
- \|<operand>| The same as above.
- ======================================== ================================================
+.. code-block:: nasm
-.. _amdgpu_synid_neg:
+ // GFX9
-neg
-~~~
+ v_add_u16 v0, -1, 0 // v0 = 0xFFFF
+ v_add_f16 v0, -1, 0 // v0 = 0xFFFF (NaN)
-Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any).
-Valid for floating point operands only.
+ v_add_u32 v0, -1, 0 // v0 = 0xFFFFFFFF
+ v_add_f32 v0, -1, 0 // v0 = 0xFFFFFFFF (NaN)
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- neg(<operand>) Get negative value of operand.
- -<operand> The same as above.
- ======================================== ================================================
+.. _amdgpu_synid_fp_const_conv:
-VOP3P Modifiers
----------------
+Floating-Point Inline Constants
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-This section describes modifiers of regular VOP3P instructions.
-*v_mad_mix* modifiers are described :ref:`in a separate section<amdgpu_synid_mad_mix>`.
+Floating-point :ref:`inline constants<amdgpu_synid_constant>`
+may be thought of as 64-bit
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`;
+when used as operands they are converted to a floating-point number of
+:ref:`expected operand size<amdgpu_syn_instruction_type>`.
-GFX9 only.
+Examples:
-.. _amdgpu_synid_op_sel:
+.. code-block:: nasm
-op_sel
-~~~~~~
+ // GFX9
-Selects the low [15:0] or high [31:16] operand bits as input to the operation
-which results in the lower-half of the destination.
-By default, low bits are used for all operands.
+ v_add_f16 v0, 1.0, 0 // v0 = 0x3C00 (1.0)
+ v_add_u16 v0, 1.0, 0 // v0 = 0x3C00
-The number of values specified with the op_sel modifier must match the number of source
-operands. First value controls src0, second value controls src1 and so on.
-The value 0 selects the low bits, while 1 selects the high bits.
+ v_add_f32 v0, 1.0, 0 // v0 = 0x3F800000 (1.0)
+ v_add_u32 v0, 1.0, 0 // v0 = 0x3F800000
- ======================================== =============================================================
- Syntax Description
- ======================================== =============================================================
- op_sel:[{0..1}] Select operand bits for instructions with 1 source operand.
- op_sel:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
- op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
- ======================================== =============================================================
-.. _amdgpu_synid_op_sel_hi:
+.. _amdgpu_synid_lit_conv:
-op_sel_hi
-~~~~~~~~~
+Literals
+--------
-Selects the low [15:0] or high [31:16] operand bits as input to the operation
-which results in the upper-half of the destination.
-By default, high bits are used for all operands.
+.. _amdgpu_synid_int_lit_conv:
-The number of values specified with the op_sel_hi modifier must match the number of source
-operands. First value controls src0, second value controls src1 and so on.
-The value 0 selects the low bits, while 1 selects the high bits.
+Integer Literals
+~~~~~~~~~~~~~~~~
- ======================================== =============================================================
- Syntax Description
- ======================================== =============================================================
- op_sel_hi:[{0..1}] Select operand bits for instructions with 1 source operand.
- op_sel_hi:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands.
- op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands.
- ======================================== =============================================================
+Integer :ref:`literals<amdgpu_synid_literal>`
+are specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>`.
-.. _amdgpu_synid_neg_lo:
+When used as operands they are converted to
+:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
-neg_lo
-~~~~~~
+ ============== ============== =============== ====================================================================
+ Expected type Condition Result Note
+ ============== ============== =============== ====================================================================
+ i16, u16, b16 cond(num, 16) num.u16 Truncate to 16 bits.
+ i32, u32, b32 cond(num, 32) num.u32 Truncate to 32 bits.
+ i64 cond(num, 32) {-1, num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits.
+ u64, b64 cond(num, 32) { 0, num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits.
+ f16 cond(num, 16) num.u16 Use low 16 bits as an f16 value.
+ f32 cond(num, 32) num.u32 Use low 32 bits as an f32 value.
+ f64 cond(num, 32) {num.u32, 0} Use low 32 bits of the number as high 32 bits
+ of the result; low 32 bits of the result are zeroed.
+ ============== ============== =============== ====================================================================
-Specifies whether to change sign of operand values selected by
-:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used
-as input to the operation which results in the upper-half of the destination.
+The condition *cond(X,S)* indicates if a 64-bit number *X*
+can be converted to a smaller size *S* by truncation of upper bits.
+There are two cases when the conversion is possible:
-The number of values specified with this modifier must match the number of source
-operands. First value controls src0, second value controls src1 and so on.
+* The truncated bits are all 0.
+* The truncated bits are all 1 and the value after truncation has its MSB bit set.
-The value 0 indicates that the corresponding operand value is used unmodified,
-the value 1 indicates that negative value of the operand must be used.
+Examples of valid literals:
-By default, operand values are used unmodified.
+.. code-block:: nasm
-This modifier is valid for floating point operands only.
+ // GFX9
- ======================================== ==================================================================
- Syntax Description
- ======================================== ==================================================================
- neg_lo:[{0..1}] Select affected operands for instructions with 1 source operand.
- neg_lo:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands.
- neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
- ======================================== ==================================================================
+ v_add_u16 v0, 0xff00, v0 // value after conversion: 0xff00
+ v_add_u16 v0, 0xffffffffffffff00, v0 // value after conversion: 0xff00
+ v_add_u16 v0, -256, v0 // value after conversion: 0xff00
-.. _amdgpu_synid_neg_hi:
+ s_bfe_i64 s[0:1], 0xffefffff, s3 // value after conversion: 0xffffffffffefffff
+ s_bfe_u64 s[0:1], 0xffefffff, s3 // value after conversion: 0x00000000ffefffff
+ v_ceil_f64_e32 v[0:1], 0xffefffff // value after conversion: 0xffefffff00000000 (-1.7976922776554302e308)
-neg_hi
-~~~~~~
+Examples of invalid literals:
-Specifies whether to change sign of operand values selected by
-:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used
-as input to the operation which results in the upper-half of the destination.
+.. code-block:: nasm
-The number of values specified with this modifier must match the number of source
-operands. First value controls src0, second value controls src1 and so on.
+ // GFX9
-The value 0 indicates that the corresponding operand value is used unmodified,
-the value 1 indicates that negative value of the operand must be used.
+ v_add_u16 v0, 0x1ff00, v0 // conversion is not possible as truncated bits are not all 0 or 1
+ v_add_u16 v0, 0xffffffffffff00ff, v0 // conversion is not possible as truncated bits do not match MSB of the result
-By default, operand values are used unmodified.
+.. _amdgpu_synid_fp_lit_conv:
-This modifier is valid for floating point operands only.
+Floating-Point Literals
+~~~~~~~~~~~~~~~~~~~~~~~
- ======================================== ==================================================================
- Syntax Description
- ======================================== ==================================================================
- neg_hi:[{0..1}] Select affected operands for instructions with 1 source operand.
- neg_hi:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands.
- neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands.
- ======================================== ==================================================================
+Floating-point :ref:`literals<amdgpu_synid_literal>` are specified as 64-bit
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
-clamp
-~~~~~
+When used as operands they are converted to
+:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
-See a description :ref:`here<amdgpu_synid_clamp>`.
+ ============== ============== ================= =================================================================
+ Expected type Condition Result Note
+ ============== ============== ================= =================================================================
+ i16, u16, b16 cond(num, 16) f16(num) Convert to f16 and use bits of the result as an integer value.
+ i32, u32, b32 cond(num, 32) f32(num) Convert to f32 and use bits of the result as an integer value.
+ i64, u64, b64 false \- Conversion disabled because of an unclear semantics.
+ f16 cond(num, 16) f16(num) Convert to f16.
+ f32 cond(num, 32) f32(num) Convert to f32.
+ f64 true {num.u32.hi, 0} Use high 32 bits of the number as high 32 bits of the result;
+ zero-fill low 32 bits of the result.
-.. _amdgpu_synid_mad_mix:
+ Note that the result may differ from the original number.
+ ============== ============== ================= =================================================================
-VOP3P V_MAD_MIX Modifiers
--------------------------
+The condition *cond(X,S)* indicates if an f64 number *X* can be converted
+to a smaller *S*-bit floating-point type without overflow or underflow.
+Precision lost is allowed.
-These instructions use VOP3P format but have different modifiers.
+Examples of valid literals:
-GFX9 only.
+.. code-block:: nasm
-.. _amdgpu_synid_mad_mix_op_sel:
+ // GFX9
-mad_mix_op_sel
-~~~~~~~~~~~~~~
+ v_add_f16 v1, 65500.0, v2
+ v_add_f32 v1, 65600.0, v2
-This operand has meaning only for 16-bit source operands as indicated by
-:ref:`mad_mix_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`.
-It specifies to select either the low [15:0] or high [31:16] operand bits
-as input to the operation.
+ // value before conversion: 0x7fefffffffffffff (1.7976931348623157e308)
+ v_ceil_f64 v[0:1], 1.7976931348623157e308 // value after conversion: 0x7fefffff00000000 (1.7976922776554302e308)
-The value 0 indicates the low bits, the value 1 indicates the high 16 bits.
-By default, low bits are used for all operands.
+Examples of invalid literals:
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand.
- ======================================== ================================================
+.. code-block:: nasm
-.. _amdgpu_synid_mad_mix_op_sel_hi:
+ // GFX9
-mad_mix_op_sel_hi
-~~~~~~~~~~~~~~~~~
+ v_add_f16 v1, 65600.0, v2 // cannot be converted to f16 because of overflow
-Selects the size of source operands: either 32 bits or 16 bits.
-By default, 32 bits are used for all source operands.
+.. _amdgpu_synid_exp_conv:
-The value 0 indicates 32 bits, the value 1 indicates 16 bits.
-The location of 16 bits in the operand may be specified by
-:ref:`mad_mix_op_sel<amdgpu_synid_mad_mix_op_sel>`.
+Expressions
+~~~~~~~~~~~
- ======================================== ================================================
- Syntax Description
- ======================================== ================================================
- op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand.
- ======================================== ================================================
+Expressions operate with and result in 64-bit integers.
-abs
-~~~
+When used as operands they are truncated to
+:ref:`expected operand size<amdgpu_syn_instruction_type>`.
+No data type conversions are performed.
-See a description :ref:`here<amdgpu_synid_abs>`.
+Examples:
-neg
-~~~
+.. code-block:: nasm
-See a description :ref:`here<amdgpu_synid_neg>`.
+ // GFX9
-clamp
-~~~~~
+ x = 0.1
+ v_sqrt_f32 v0, x // v0 = [low 32 bits of 0.1 (double)]
+ v_sqrt_f32 v0, (0.1 + 0) // the same as above
+ v_sqrt_f32 v0, 0.1 // v0 = [0.1 (double) converted to float]
-See a description :ref:`here<amdgpu_synid_clamp>`.
OpenPOWER on IntegriCloud