diff options
Diffstat (limited to 'llvm/docs/AMDGPUOperandSyntax.rst')
-rw-r--r-- | llvm/docs/AMDGPUOperandSyntax.rst | 1502 |
1 files changed, 756 insertions, 746 deletions
diff --git a/llvm/docs/AMDGPUOperandSyntax.rst b/llvm/docs/AMDGPUOperandSyntax.rst index 4f3536eed40..4fa2bb2c9eb 100644 --- a/llvm/docs/AMDGPUOperandSyntax.rst +++ b/llvm/docs/AMDGPUOperandSyntax.rst @@ -1,6 +1,6 @@ -================================================= -Syntax of AMDGPU Assembler Operands and Modifiers -================================================= +===================================== +Syntax of AMDGPU Instruction Operands +===================================== .. contents:: :local: @@ -8,1048 +8,1058 @@ Syntax of AMDGPU Assembler Operands and Modifiers Conventions =========== -The following conventions are used in syntax description: +The following notation is used throughout this document: - =================== ============================================================= + =================== ============================================================================= Notation Description - =================== ============================================================= + =================== ============================================================================= {0..N} Any integer value in the range from 0 to N (inclusive). - Unless stated otherwise, this value may be specified as - either a literal or an llvm expression. - <x> Syntax and meaning of *<x>* is explained elsewhere. - =================== ============================================================= + <x> Syntax and meaning of *x* is explained elsewhere. + =================== ============================================================================= .. _amdgpu_syn_operands: Operands ======== -TBD +.. _amdgpu_synid_v: -.. _amdgpu_syn_modifiers: +v +- -Modifiers -========= +Vector registers. There are 256 32-bit vector registers. -DS Modifiers ------------- - -.. _amdgpu_synid_ds_offset8: - -ds_offset8 -~~~~~~~~~~ - -Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0. - -Used with DS instructions which have 2 addresses. - - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - offset:{0..0xFF} Specifies a 8-bit offset. - ======================================== ================================================ - -.. _amdgpu_synid_ds_offset16: - -ds_offset16 -~~~~~~~~~~~ - -Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0. - -Used with DS instructions which have 1 address. - - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - offset:{0..0xFFFF} Specifies a 16-bit offset. - ======================================== ================================================ - -.. _amdgpu_synid_sw_offset16: - -sw_offset16 -~~~~~~~~~~~ - -This is a special modifier which may be used with *ds_swizzle_b32* instruction only. -Specifies a sizzle pattern in numeric or symbolic form. The default value is 0. - -See AMD documentation for more information. - - ======================================================= =================================================== - Syntax Description - ======================================================= =================================================== - offset:{0..0xFFFF} Specifies a 16-bit swizzle pattern - in a numeric form. - offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3}) Specifies a quad permute mode pattern; each - number is a lane id. - offset:swizzle(BITMASK_PERM, "<mask>") Specifies a bitmask permute mode pattern - which converts a 5-bit lane id to another - lane id with which the lane interacts. - - <mask> is a 5 character sequence which - specifies how to transform the bits of the - lane id. The following characters are allowed: - - * "0" - set bit to 0. +A sequence of *vector* registers may be used to operate with more than 32 bits of data. - * "1" - set bit to 1. +Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* registers. - * "p" - preserve bit. + =================================================== ==================================================================== + Syntax Description + =================================================== ==================================================================== + **v**\<N> A single 32-bit *vector* register. - * "i" - inverse bit. + *N* must be a decimal integer number. + **v[**\ <N>\ **]** A single 32-bit *vector* register. - offset:swizzle(BROADCAST,{2..32},{0..N}) Specifies a broadcast mode. - Broadcasts the value of any particular lane to - all lanes in its group. + *N* may be specified as an + :ref:`integer number<amdgpu_synid_integer_number>` + or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. + **v[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers. - The first numeric parameter is a group - size and must be equal to 2, 4, 8, 16 or 32. + *N* and *K* may be specified as + :ref:`integer numbers<amdgpu_synid_integer_number>` + or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. + **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers. - The second numeric parameter is an index of the - lane being broadcasted. The index must not exceed - group size. - offset:swizzle(SWAP,{1..16}) Specifies a swap mode. - Swaps the neighboring groups of - 1, 2, 4, 8 or 16 lanes. - offset:swizzle(REVERSE,{2..32}) Specifies a reverse mode. Reverses - the lanes for groups of 2, 4, 8, 16 or 32 lanes. - ======================================================= =================================================== + Register indices must be specified as decimal integer numbers. + =================================================== ==================================================================== -.. _amdgpu_synid_gds: +Note. *N* and *K* must satisfy the following conditions: -gds -~~~ +* *N* <= *K*. +* 0 <= *N* <= 255. +* 0 <= *K* <= 255. +* *K-N+1* must be equal to 1, 2, 3, 4, 8 or 16. -Specifies whether to use GDS or LDS memory (LDS is the default). +Examples: - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - gds Use GDS memory. - ======================================== ================================================ +.. code-block:: nasm + v255 + v[0] + v[0:1] + v[1:1] + v[0:3] + v[2*2] + v[1-1:2-1] + [v252] + [v252,v253,v254,v255] -EXP Modifiers -------------- +.. _amdgpu_synid_s: -.. _amdgpu_synid_done: +s +- -done -~~~~ +Scalar 32-bit registers. The number of available *scalar* registers depends on GPU: -Specifies if this is the last export from the shader to the target. By default, current -instruction does not finish an export sequence. + ======= ============================ + GPU Number of *scalar* registers + ======= ============================ + GFX7 104 + GFX8 102 + GFX9 102 + ======= ============================ - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - done Indicates the last export operation. - ======================================== ================================================ +A sequence of *scalar* registers may be used to operate with more than 32 bits of data. +Assembler currently supports sequences of 1, 2, 4, 8 and 16 *scalar* registers. -.. _amdgpu_synid_compr: +Pairs of *scalar* registers must be even-aligned (the first register must be even). +Sequences of 4 and more *scalar* registers must be quad-aligned. -compr -~~~~~ + ======================================================== ==================================================================== + Syntax Description + ======================================================== ==================================================================== + **s**\ <N> A single 32-bit *scalar* register. -Indicates if the data are compressed (not compressed by default). + *N* must be a decimal integer number. + **s[**\ <N>\ **]** A single 32-bit *scalar* register. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - compr Data are compressed. - ======================================== ================================================ + *N* may be specified as an + :ref:`integer number<amdgpu_synid_integer_number>` + or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. + **s[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers. -.. _amdgpu_synid_vm: + *N* and *K* may be specified as + :ref:`integer numbers<amdgpu_synid_integer_number>` + or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. + **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers. -vm -~~ + Register indices must be specified as decimal integer numbers. + ======================================================== ==================================================================== -Specifies valid mask flag state (off by default). +Note. *N* and *K* must satisfy the following conditions: - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - vm Set valid mask flag. - ======================================== ================================================ +* *N* must be properly aligned based on sequence size. +* *N* <= *K*. +* 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers. +* 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers. +* *K-N+1* must be equal to 1, 2, 4, 8 or 16. -FLAT Modifiers --------------- +Examples: -.. _amdgpu_synid_flat_offset12: +.. code-block:: nasm -flat_offset12 -~~~~~~~~~~~~~ + s0 + s[0] + s[0:1] + s[1:1] + s[0:3] + s[2*2] + s[1-1:2-1] + [s4] + [s4,s5,s6,s7] -Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0. +Examples of *scalar* registers with an invalid alignment: -Cannot be used with *global/scratch* opcodes. GFX9 only. +.. code-block:: nasm - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - offset:{0..4095} Specifies a 12-bit unsigned offset. - ======================================== ================================================ + s[1:2] + s[2:5] -.. _amdgpu_synid_flat_offset13: +.. _amdgpu_synid_trap: -flat_offset13 -~~~~~~~~~~~~~ +trap +---- -Specifies an immediate signed 13-bit offset, in bytes. The default value is 0. +A set of trap handler registers: -Can be used with *global/scratch* opcodes only. GFX9 only. +* :ref:`ttmp<amdgpu_synid_ttmp>` +* :ref:`tba<amdgpu_synid_tba>` +* :ref:`tma<amdgpu_synid_tma>` - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - offset:{-4096..+4095} Specifies a 13-bit signed offset. - ======================================== ================================================ +.. _amdgpu_synid_ttmp: -glc -~~~ +ttmp +---- -See a description :ref:`here<amdgpu_synid_glc>`. +Trap handler temporary scalar registers, 32-bits wide. +The number of available *ttmp* registers depends on GPU: -slc -~~~ + ======= =========================== + GPU Number of *ttmp* registers + ======= =========================== + GFX7 12 + GFX8 12 + GFX9 16 + ======= =========================== -See a description :ref:`here<amdgpu_synid_slc>`. +A sequence of *ttmp* registers may be used to operate with more than 32 bits of data. +Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers. -tfe -~~~ +Pairs of *ttmp* registers must be even-aligned (the first register must be even). +Sequences of 4 and more *ttmp* registers must be quad-aligned. -See a description :ref:`here<amdgpu_synid_tfe>`. + ============================================================= ==================================================================== + Syntax Description + ============================================================= ==================================================================== + **ttmp**\ <N> A single 32-bit *ttmp* register. -nv -~~ + *N* must be a decimal integer number. + **ttmp[**\ <N>\ **]** A single 32-bit *ttmp* register. -See a description :ref:`here<amdgpu_synid_nv>`. + *N* may be specified as an + :ref:`integer number<amdgpu_synid_integer_number>` + or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. + **ttmp[**\ <N>:<K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers. -MIMG Modifiers --------------- + *N* and *K* may be specified as + :ref:`integer numbers<amdgpu_synid_integer_number>` + or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. + **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers. -.. _amdgpu_synid_dmask: + Register indices must be specified as decimal integer numbers. + ============================================================= ==================================================================== -dmask -~~~~~ +Note. *N* and *K* must satisfy the following conditions: -Specifies which channels (image components) are used by the operation. By default, no channels -are used. +* *N* must be properly aligned based on sequence size. +* *N* <= *K*. +* 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers. +* 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers. +* *K-N+1* must be equal to 1, 2, 4, 8 or 16. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - dmask:{0..15} Each bit corresponds to one of 4 image - components (RGBA). If the specified bit value - is 0, the component is not used, value 1 means - that the component is used. - ======================================== ================================================ +Examples: -This modifier has some limitations depending on instruction kind: +.. code-block:: nasm - ======================================== ================================================ - Instruction Kind Valid dmask Values - ======================================== ================================================ - 32-bit atomic cmpswap 0x3 - other 32-bit atomic instructions 0x1 - 64-bit atomic cmpswap 0xF - other 64-bit atomic instructions 0x3 - GATHER4 0x1, 0x2, 0x4, 0x8 - Other instructions any value - ======================================== ================================================ + ttmp0 + ttmp[0] + ttmp[0:1] + ttmp[1:1] + ttmp[0:3] + ttmp[2*2] + ttmp[1-1:2-1] + [ttmp4] + [ttmp4,ttmp5,ttmp6,ttmp7] -.. _amdgpu_synid_unorm: +Examples of *ttmp* registers with an invalid alignment: -unorm -~~~~~ +.. code-block:: nasm -Specifies whether address is normalized or not (normalized by default). + ttmp[1:2] + ttmp[2:5] - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - unorm Force address to be un-normalized. - ======================================== ================================================ +.. _amdgpu_synid_tba: -glc -~~~ +tba +--- -See a description :ref:`here<amdgpu_synid_glc>`. +Trap base address, 64-bits wide. Holds the pointer to the current trap handler program. -slc -~~~ + ================== ======================================================================= ============= + Syntax Description Availability + ================== ======================================================================= ============= + tba 64-bit *trap base address* register. GFX7, GFX8 + [tba] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8 + [tba_lo,tba_hi] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8 + ================== ======================================================================= ============= -See a description :ref:`here<amdgpu_synid_slc>`. +High and low 32 bits of *trap base address* may be accessed as separate registers: -.. _amdgpu_synid_r128: + ================== ======================================================================= ============= + Syntax Description Availability + ================== ======================================================================= ============= + tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8 + tba_hi High 32 bits of *trap base address* register. GFX7, GFX8 + [tba_lo] Low 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8 + [tba_hi] High 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8 + ================== ======================================================================= ============= -r128 -~~~~ +Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9, +but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions. -Specifies texture resource size. The default size is 256 bits. +.. _amdgpu_synid_tma: -GFX7 and GFX8 only. +tma +--- - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - r128 Specifies 128 bits texture resource size. - ======================================== ================================================ +Trap memory address, 64-bits wide. -tfe -~~~ + ================= ======================================================================= ================== + Syntax Description Availability + ================= ======================================================================= ================== + tma 64-bit *trap memory address* register. GFX7, GFX8 + [tma] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8 + [tma_lo,tma_hi] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8 + ================= ======================================================================= ================== -See a description :ref:`here<amdgpu_synid_tfe>`. +High and low 32 bits of *trap memory address* may be accessed as separate registers: -.. _amdgpu_synid_lwe: + ================= ======================================================================= ================== + Syntax Description Availability + ================= ======================================================================= ================== + tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8 + tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8 + [tma_lo] Low 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8 + [tma_hi] High 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8 + ================= ======================================================================= ================== -lwe -~~~ +Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9, +but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions. -Specifies LOD warning status (LOD warning is disabled by default). +.. _amdgpu_synid_flat_scratch: - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - lwe Enables LOD warning. - ======================================== ================================================ - -.. _amdgpu_synid_da: - -da -~~ - -Specifies if an array index must be sent to TA. By default, array index is not sent. - - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - da Send an array-index to TA. - ======================================== ================================================ +flat_scratch +------------ -.. _amdgpu_synid_d16: +Flat scratch address, 64-bits wide. Holds the base address of scratch memory. -d16 -~~~ + ================================== ================================================================ + Syntax Description + ================================== ================================================================ + flat_scratch 64-bit *flat scratch* address register. + [flat_scratch] 64-bit *flat scratch* address register (an alternative syntax). + [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an alternative syntax). + ================================== ================================================================ + +High and low 32 bits of *flat scratch* address may be accessed as separate registers: + + ========================= ========================================================================= + Syntax Description + ========================= ========================================================================= + flat_scratch_lo Low 32 bits of *flat scratch* address register. + flat_scratch_hi High 32 bits of *flat scratch* address register. + [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an alternative syntax). + [flat_scratch_hi] High 32 bits of *flat scratch* address register (an alternative syntax). + ========================= ========================================================================= + +.. _amdgpu_synid_xnack: + +xnack +----- + +Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads +received an *XNACK* due to a vector memory operation. -Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7. +.. WARNING:: GFX7 does not support *xnack* feature. Not all GFX8 and GFX9 :ref:`processors<amdgpu-processors>` support *xnack* feature. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - d16 Enables 16-bits data mode. +\ - On loads, convert data in memory to 16-bit - format before storing it in VGPRs. + ============================== ===================================================== + Syntax Description + ============================== ===================================================== + xnack_mask 64-bit *xnack mask* register. + [xnack_mask] 64-bit *xnack mask* register (an alternative syntax). + [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an alternative syntax). + ============================== ===================================================== - For stores, convert 16-bit data in VGPRs to - 32 bits before going to memory. +High and low 32 bits of *xnack mask* may be accessed as separate registers: - Note that 16-bit data are stored in VGPRs - unpacked in GFX8.0. In GFX8.1 and GFX9 16-bit - data are packed. - ======================================== ================================================ + ===================== ============================================================== + Syntax Description + ===================== ============================================================== + xnack_mask_lo Low 32 bits of *xnack mask* register. + xnack_mask_hi High 32 bits of *xnack mask* register. + [xnack_mask_lo] Low 32 bits of *xnack mask* register (an alternative syntax). + [xnack_mask_hi] High 32 bits of *xnack mask* register (an alternative syntax). + ===================== ============================================================== -.. _amdgpu_synid_a16: +.. _amdgpu_synid_vcc: -a16 -~~~ +vcc +--- -Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only. +Vector condition code, 64-bits wide. A bit mask with one bit per thread; +it holds the result of a vector compare operation. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - a16 Enables 16-bits image address components. - ======================================== ================================================ + ================ ========================================================================= + Syntax Description + ================ ========================================================================= + vcc 64-bit *vector condition code* register. + [vcc] 64-bit *vector condition code* register (an alternative syntax). + [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an alternative syntax). + ================ ========================================================================= -Miscellaneous Modifiers ------------------------ +High and low 32 bits of *vector condition code* may be accessed as separate registers: -.. _amdgpu_synid_glc: + ================ ========================================================================= + Syntax Description + ================ ========================================================================= + vcc_lo Low 32 bits of *vector condition code* register. + vcc_hi High 32 bits of *vector condition code* register. + [vcc_lo] Low 32 bits of *vector condition code* register (an alternative syntax). + [vcc_hi] High 32 bits of *vector condition code* register (an alternative syntax). + ================ ========================================================================= -glc -~~~ +.. _amdgpu_synid_m0: -This modifier has different meaning for loads, stores, and atomic operations. -The default value is off (0). +m0 +-- -See AMD documentation for details. +A 32-bit memory register. It has various uses, +including register indexing and bounds checking. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - glc Set glc bit to 1. - ======================================== ================================================ + =========== =================================================== + Syntax Description + =========== =================================================== + m0 A 32-bit *memory* register. + [m0] A 32-bit *memory* register (an alternative syntax). + =========== =================================================== -.. _amdgpu_synid_slc: +.. _amdgpu_synid_exec: -slc -~~~ +exec +---- -Specifies cache policy. The default value is off (0). +Execute mask, 64-bits wide. A bit mask with one bit per thread, +which is applied to vector instructions and controls which threads execute +and which ignore the instruction. -See AMD documentation for details. + ===================== ================================================================= + Syntax Description + ===================== ================================================================= + exec 64-bit *execute mask* register. + [exec] 64-bit *execute mask* register (an alternative syntax). + [exec_lo,exec_hi] 64-bit *execute mask* register (an alternative syntax). + ===================== ================================================================= - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - slc Set slc bit to 1. - ======================================== ================================================ +High and low 32 bits of *execute mask* may be accessed as separate registers: -.. _amdgpu_synid_tfe: + ===================== ================================================================= + Syntax Description + ===================== ================================================================= + exec_lo Low 32 bits of *execute mask* register. + exec_hi High 32 bits of *execute mask* register. + [exec_lo] Low 32 bits of *execute mask* register (an alternative syntax). + [exec_hi] High 32 bits of *execute mask* register (an alternative syntax). + ===================== ================================================================= -tfe -~~~ +.. _amdgpu_synid_vccz: -Controls access to partially resident textures. The default value is off (0). +vccz +---- -See AMD documentation for details. +A single bit-flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - tfe Set tfe bit to 1. - ======================================== ================================================ +.. WARNING:: This operand is not currently supported by AMDGPU assembler. -.. _amdgpu_synid_nv: +.. _amdgpu_synid_execz: -nv -~~ +execz +----- -Specifies if instruction is operating on non-volatile memory. By default, memory is volatile. +A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros. -GFX9 only. +.. WARNING:: This operand is not currently supported by AMDGPU assembler. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - nv Indicates that instruction operates on - non-volatile memory. - ======================================== ================================================ +.. _amdgpu_synid_scc: -MUBUF/MTBUF Modifiers ---------------------- +scc +--- -.. _amdgpu_synid_idxen: +A single bit flag indicating the result of a scalar compare operation. -idxen -~~~~~ +.. WARNING:: This operand is not currently supported by AMDGPU assembler. -Specifies whether address components include an index. By default, no components are used. +.. _amdgpu_synid_ldsdirect: -Can be used together with :ref:`offen<amdgpu_synid_offen>`. +lds_direct +---------- -Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`. +A special operand which supplies a 32-bit value +fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - idxen Address components include an index. - ======================================== ================================================ +.. WARNING:: This operand is not currently supported by AMDGPU assembler. -.. _amdgpu_synid_offen: +.. _amdgpu_synid_constant: -offen -~~~~~ +constant +-------- -Specifies whether address components include an offset. By default, no components are used. +A set of integer and floating-point *inline constants*: -Can be used together with :ref:`idxen<amdgpu_synid_idxen>`. +* :ref:`iconst<amdgpu_synid_iconst>` +* :ref:`fconst<amdgpu_synid_fconst>` -Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`. +These operands are encoded as a part of instruction. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - offen Address components include an offset. - ======================================== ================================================ +If a number may be encoded as either +a :ref:`literal<amdgpu_synid_literal>` or +an :ref:`inline constant<amdgpu_synid_constant>`, +assembler selects the latter encoding as more efficient. -.. _amdgpu_synid_addr64: +.. _amdgpu_synid_iconst: -addr64 -~~~~~~ +iconst +------ -Specifies whether a 64-bit address is used. By default, no address is used. +An :ref:`integer number<amdgpu_synid_integer_number>` +encoded as an *inline constant*. -GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and -:ref:`idxen<amdgpu_synid_idxen>` modifiers. +Only a small fraction of integer numbers may be encoded as *inline constants*. +They are enumerated in the table below. +Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - addr64 A 64-bit address is used. - ======================================== ================================================ +Integer *inline constants* are converted to +:ref:`expected operand type<amdgpu_syn_instruction_type>` +as described :ref:`here<amdgpu_synid_int_const_conv>`. -.. _amdgpu_synid_buf_offset12: + ================================== ==================================== + Value Note + ================================== ==================================== + {0..64} Positive integer inline constants. + {-16..-1} Negative integer inline constants. + ================================== ==================================== -buf_offset12 -~~~~~~~~~~~~ +.. WARNING:: GFX7 does not support inline constants for *f16* operands. -Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0. +There are also symbolic inline constants which provide read-only access to H/W registers. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - offset:{0..0xFFF} Specifies a 12-bit unsigned offset. - ======================================== ================================================ +.. WARNING:: These inline constants are not currently supported by AMDGPU assembler. -glc -~~~ +\ -See a description :ref:`here<amdgpu_synid_glc>`. + ======================== ================================================ ============= + Syntax Note Availability + ======================== ================================================ ============= + shared_base Base address of shared memory region. GFX9 + shared_limit Address of the end of shared memory region. GFX9 + private_base Base address of private memory region. GFX9 + private_limit Address of the end of private memory region. GFX9 + pops_exiting_wave_id A dedicated counter for POPS. GFX9 + ======================== ================================================ ============= -slc -~~~ +.. _amdgpu_synid_fconst: -See a description :ref:`here<amdgpu_synid_slc>`. +fconst +------ -.. _amdgpu_synid_lds: +A :ref:`floating-point number<amdgpu_synid_floating-point_number>` +encoded as an *inline constant*. -lds -~~~ +Only a small fraction of floating-point numbers may be encoded as *inline constants*. +They are enumerated in the table below. +Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`. -Specifies where to store the result: VGPRs or LDS (VGPRs by default). +Floating-point *inline constants* are converted to +:ref:`expected operand type<amdgpu_syn_instruction_type>` +as described :ref:`here<amdgpu_synid_fp_const_conv>`. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - lds Store result in LDS. - ======================================== ================================================ + ================================== ===================================================== ================== + Value Note Availability + ================================== ===================================================== ================== + 0.0 The same as integer constant 0. All GPUs + 0.5 Floating-point constant 0.5 All GPUs + 1.0 Floating-point constant 1.0 All GPUs + 2.0 Floating-point constant 2.0 All GPUs + 4.0 Floating-point constant 4.0 All GPUs + -0.5 Floating-point constant -0.5 All GPUs + -1.0 Floating-point constant -1.0 All GPUs + -2.0 Floating-point constant -2.0 All GPUs + -4.0 Floating-point constant -4.0 All GPUs + 0.1592 1.0/(2.0*pi). Use only for 16-bit operands. GFX8, GFX9 + 0.15915494 1.0/(2.0*pi). Use only for 16- and 32-bit operands. GFX8, GFX9 + 0.159154943091895317852646485335 1.0/(2.0*pi). GFX8, GFX9 + ================================== ===================================================== ================== -tfe -~~~ +.. WARNING:: GFX7 does not support inline constants for *f16* operands. -See a description :ref:`here<amdgpu_synid_tfe>`. +.. _amdgpu_synid_literal: -.. _amdgpu_synid_dfmt: +literal +------- -dfmt -~~~~ +A literal is a 64-bit value which is encoded as a separate 32-bit dword in the instruction stream. -TBD +If a number may be encoded as either +a :ref:`literal<amdgpu_synid_literal>` or +an :ref:`inline constant<amdgpu_synid_constant>`, +assembler selects the latter encoding as more efficient. -.. _amdgpu_synid_nfmt: +Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`, +:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or +:ref:`expressions<amdgpu_synid_expression>` +(expressions are currently supported for 32-bit operands only). -nfmt -~~~~ +A 64-bit literal value is converted by assembler +to an :ref:`expected operand type<amdgpu_syn_instruction_type>` +as described :ref:`here<amdgpu_synid_lit_conv>`. -TBD +An instruction may use only one literal but several operands may refer the same literal. -SMRD/SMEM Modifiers -------------------- +.. _amdgpu_synid_uimm8: -glc -~~~ +uimm8 +----- -See a description :ref:`here<amdgpu_synid_glc>`. +A 8-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. +The value is encoded as part of the opcode so it is free to use. -nv -~~ +.. _amdgpu_synid_uimm32: -See a description :ref:`here<amdgpu_synid_nv>`. +uimm32 +------ -VINTRP Modifiers ----------------- +A 32-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. +The value is stored as a separate 32-bit dword in the instruction stream. -.. _amdgpu_synid_high: +.. _amdgpu_synid_uimm20: -high -~~~~ +uimm20 +------ -Specifies which half of the LDS word to use. Low half of LDS word is used by default. -GFX9 only. +A 20-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - high Use high half of LDS word. - ======================================== ================================================ +.. _amdgpu_synid_uimm21: -VOP1/VOP2 DPP Modifiers ------------------------ +uimm21 +------ -GFX8 and GFX9 only. +A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. -.. _amdgpu_synid_dpp_ctrl: +.. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement. -dpp_ctrl -~~~~~~~~ +.. _amdgpu_synid_simm21: -Specifies how data are shared between threads. This is a mandatory modifier. -There is no default value. +simm21 +------ -Note. The lanes of a wavefront are organized in four banks and four rows. +A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - quad_perm:[{0..3},{0..3},{0..3},{0..3}] Full permute of 4 threads. - row_mirror Mirror threads within row. - row_half_mirror Mirror threads within 1/2 row (8 threads). - row_bcast:15 Broadcast 15th thread of each row to next row. - row_bcast:31 Broadcast thread 31 to rows 2 and 3. - wave_shl:1 Wavefront left shift by 1 thread. - wave_rol:1 Wavefront left rotate by 1 thread. - wave_shr:1 Wavefront right shift by 1 thread. - wave_ror:1 Wavefront right rotate by 1 thread. - row_shl:{1..15} Row shift left by 1-15 threads. - row_shr:{1..15} Row shift right by 1-15 threads. - row_ror:{1..15} Row rotate right by 1-15 threads. - ======================================== ================================================ +.. WARNING:: Assembler currently supports 20-bit unsigned offsets only .Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement. -.. _amdgpu_synid_row_mask: +.. _amdgpu_synid_off: -row_mask -~~~~~~~~ +off +--- -Controls which rows are enabled for data sharing. By default, all rows are enabled. +A special entity which indicates that the value of this operand is not used. -Note. The lanes of a wavefront are organized in four banks and four rows. + ================================== =================================================== + Syntax Description + ================================== =================================================== + off Indicates an unused operand. + ================================== =================================================== - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - row_mask:{0..15} Each of 4 bits in the mask controls one - row (0 - disabled, 1 - enabled). - ======================================== ================================================ -.. _amdgpu_synid_bank_mask: +.. _amdgpu_synid_number: -bank_mask -~~~~~~~~~ +Numbers +======= -Controls which banks are enabled for data sharing. By default, all banks are enabled. +.. _amdgpu_synid_integer_number: -Note. The lanes of a wavefront are organized in four banks and four rows. +Integer Numbers +--------------- - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - bank_mask:{0..15} Each of 4 bits in the mask controls one - bank (0 - disabled, 1 - enabled). - ======================================== ================================================ +Integer numbers are 64 bits wide. +They may be specified in binary, octal, hexadecimal and decimal formats: -.. _amdgpu_synid_bound_ctrl: + ============== ==================================== + Format Syntax + ============== ==================================== + Decimal [-]?[1-9][0-9]* + Binary [-]?0b[01]+ + Octal [-]?0[0-7]+ + Hexadecimal [-]?0x[0-9a-fA-F]+ + \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH] + ============== ==================================== -bound_ctrl -~~~~~~~~~~ +Examples: -Controls data sharing when accessing an invalid lane. By default, data sharing with -invalid lanes is disabled. +.. code-block:: nasm - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - bound_ctrl:0 Enables data sharing with invalid lanes. - Accessing data from an invalid lane will - return zero. - ======================================== ================================================ + -1234 + 0b1010 + 010 + 0xff + 0ffh -VOP1/VOP2/VOPC SDWA Modifiers ------------------------------ +.. _amdgpu_synid_floating-point_number: -GFX8 and GFX9 only. +Floating-Point Numbers +---------------------- -clamp -~~~~~ +All floating-point numbers are handled as double (64 bits wide). -See a description :ref:`here<amdgpu_synid_clamp>`. +Floating-point numbers may be specified in hexadecimal and decimal formats: -omod -~~~~ + ============== ======================================================== ======================================================== + Format Syntax Note + ============== ======================================================== ======================================================== + Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? Must include either a decimal separator or an exponent. + Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+ + ============== ======================================================== ======================================================== -See a description :ref:`here<amdgpu_synid_omod>`. +Examples: -GFX9 only. +.. code-block:: nasm -.. _amdgpu_synid_dst_sel: + -1.234 + 234e2 + -0x1afp-10 + 0x.1afp10 -dst_sel -~~~~~~~ +.. _amdgpu_synid_expression: -Selects which bits in the destination are affected. By default, all bits are affected. +Expressions +=========== - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - dst_sel:DWORD Use bits 31:0. - dst_sel:BYTE_0 Use bits 7:0. - dst_sel:BYTE_1 Use bits 15:8. - dst_sel:BYTE_2 Use bits 23:16. - dst_sel:BYTE_3 Use bits 31:24. - dst_sel:WORD_0 Use bits 15:0. - dst_sel:WORD_1 Use bits 31:16. - ======================================== ================================================ +An expression specifies an address or a numeric value. +There are two kinds of expressions: +* :ref:`Absolute<amdgpu_synid_absolute_expression>`. +* :ref:`Relocatable<amdgpu_synid_relocatable_expression>`. -.. _amdgpu_synid_dst_unused: +.. _amdgpu_synid_absolute_expression: -dst_unused -~~~~~~~~~~ +Absolute Expressions +-------------------- -Controls what to do with the bits in the destination which are not selected -by :ref:`dst_sel<amdgpu_synid_dst_sel>`. -By default, unused bits are preserved. +The value of an absolute expression remains the same after program relocation. +Absolute expressions must not include unassigned and relocatable values +such as labels. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - dst_unused:UNUSED_PAD Pad with zeros. - dst_unused:UNUSED_SEXT Sign-extend upper bits, zero lower bits. - dst_unused:UNUSED_PRESERVE Preserve bits. - ======================================== ================================================ +Examples: -.. _amdgpu_synid_src0_sel: +.. code-block:: nasm -src0_sel -~~~~~~~~ + x = -1 + y = x + 10 -Controls which bits in the src0 are used. By default, all bits are used. +.. _amdgpu_synid_relocatable_expression: - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - src0_sel:DWORD Use bits 31:0. - src0_sel:BYTE_0 Use bits 7:0. - src0_sel:BYTE_1 Use bits 15:8. - src0_sel:BYTE_2 Use bits 23:16. - src0_sel:BYTE_3 Use bits 31:24. - src0_sel:WORD_0 Use bits 15:0. - src0_sel:WORD_1 Use bits 31:16. - ======================================== ================================================ +Relocatable Expressions +----------------------- -.. _amdgpu_synid_src1_sel: +The value of a relocatable expression depends on program relocation. -src1_sel -~~~~~~~~ +Note that use of relocatable expressions is limited with branch targets +and 32-bit :ref:`literals<amdgpu_synid_literal>`. -Controls which bits in the src1 are used. By default, all bits are used. +Addition information about relocation may be found :ref:`here<amdgpu-relocation-records>`. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - src1_sel:DWORD Use bits 31:0. - src1_sel:BYTE_0 Use bits 7:0. - src1_sel:BYTE_1 Use bits 15:8. - src1_sel:BYTE_2 Use bits 23:16. - src1_sel:BYTE_3 Use bits 31:24. - src1_sel:WORD_0 Use bits 15:0. - src1_sel:WORD_1 Use bits 31:16. - ======================================== ================================================ +Examples: -VOP1/VOP2/VOPC SDWA Operand Modifiers -------------------------------------- +.. code-block:: nasm -Operand modifiers are not used separately. They are applied to source operands. + y = x + 10 // x is not yet defined. Undefined symbols are assumed to be PC-relative. + z = . -GFX8 and GFX9 only. +Expression Data Type +-------------------- -abs -~~~ +Expressions and operands of expressions are interpreted as 64-bit integers. -See a description :ref:`here<amdgpu_synid_abs>`. +Expressions may include 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` (double). +However these operands are also handled as 64-bit integers +using binary representation of specified floating-point numbers. +No conversion from floating-point to integer is performed. -neg -~~~ +Examples: -See a description :ref:`here<amdgpu_synid_neg>`. +.. code-block:: nasm -.. _amdgpu_synid_sext: + x = 0.1 // x is assigned an integer 4591870180066957722 which is a binary representation of 0.1. + y = x + x // y is a sum of two integer values; it is not equal to 0.2! -sext -~~~~ +Syntax +------ -Sign-extends value of a (sub-dword) operand to fill all 32 bits. -Has no effect for 32-bit operands. +Expressions are composed of +:ref:`symbols<amdgpu_synid_symbol>`, +:ref:`integer numbers<amdgpu_synid_integer_number>`, +:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`, +:ref:`binary operators<amdgpu_synid_expression_bin_op>`, +:ref:`unary operators<amdgpu_synid_expression_un_op>` and subexpressions. -Valid for integer operands only. +Expressions may also use "." which is a reference to the current PC (program counter). - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - sext(<operand>) Sign-extend operand value. - ======================================== ================================================ +The syntax of expressions is shown below:: -VOP3 Modifiers --------------- + expr ::= expr binop expr | primaryexpr ; -.. _amdgpu_synid_vop3_op_sel: + primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ; -vop3_op_sel -~~~~~~~~~~~ + binop ::= '&&' + | '||' + | '|' + | '^' + | '&' + | '!' + | '==' + | '!=' + | '<>' + | '<' + | '<=' + | '>' + | '>=' + | '<<' + | '>>' + | '+' + | '-' + | '*' + | '/' + | '%' ; -Selects the low [15:0] or high [31:16] operand bits for source and destination operands. -By default, low bits are used for all operands. + unop ::= '~' + | '+' + | '-' + | '!' ; -The number of values specified with the op_sel modifier must match the number of instruction -operands (both source and destination). First value controls src0, second value controls src1 -and so on, except that the last value controls destination. -The value 0 selects the low bits, while 1 selects the high bits. +.. _amdgpu_synid_expression_bin_op: -Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified -by op_sel must be 0. +Binary Operators +---------------- -GFX9 only. +Binary operators are described in the following table. +They operate on and produce 64-bit integers. +Operators with higher priority are performed first. + + ========== ========= =============================================== + Operator Priority Meaning + ========== ========= =============================================== + \* 5 Integer multiplication. + / 5 Integer division. + % 5 Integer signed remainder. + \+ 4 Integer addition. + \- 4 Integer subtraction. + << 3 Integer shift left. + >> 3 Logical shift right. + == 2 Equality comparison. + != 2 Inequality comparison. + <> 2 Inequality comparison. + < 2 Signed less than comparison. + <= 2 Signed less than or equal comparison. + > 2 Signed greater than comparison. + >= 2 Signed greater than or equal comparison. + \| 1 Bitwise or. + ^ 1 Bitwise xor. + & 1 Bitwise and. + && 0 Logical and. + || 0 Logical or. + ========== ========= =============================================== + +.. _amdgpu_synid_expression_un_op: + +Unary Operators +--------------- - ======================================== ============================================================ - Syntax Description - ======================================== ============================================================ - op_sel:[{0..1},{0..1}] Select operand bits for instructions with 1 source operand. - op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 2 source operands. - op_sel:[{0..1},{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. - ======================================== ============================================================ +Unary operators are described in the following table. +They operate on and produce 64-bit integers. -.. _amdgpu_synid_clamp: + ========== =============================================== + Operator Meaning + ========== =============================================== + ! Logical negation. + ~ Bitwise negation. + \+ Integer unary plus. + \- Integer unary minus. + ========== =============================================== -clamp -~~~~~ +.. _amdgpu_synid_symbol: -Clamp meaning depends on instruction. +Symbols +------- -For *v_cmp* instructions, clamp modifier indicates that the compare signals -if a floating point exception occurs. By default, signaling is disabled. -Not supported by GFX7. +A symbol is a named 64-bit value, representing a relocatable +address or an absolute (non-relocatable) number. -For integer operations, clamp modifier indicates that the result must be clamped -to the largest and smallest representable value. By default, there is no clamping. -Integer clamping is not supported by GFX7. +Symbol names have the following syntax: + ``[a-zA-Z_.][a-zA-Z0-9_$.@]*`` -For floating point operations, clamp modifier indicates that the result must be clamped -to the range [0.0, 1.0]. By default, there is no clamping. +The table below provides several examples of syntax used for symbol definition. -Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any). + ================ ========================================================== + Syntax Meaning + ================ ========================================================== + .globl <S> Declares a global symbol S without assigning it a value. + .set <S>, <E> Assigns the value of an expression E to a symbol S. + <S> = <E> Assigns the value of an expression E to a symbol S. + <S>: Declares a label S and assigns it the current PC value. + ================ ========================================================== - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - clamp Enables clamping (or signaling). - ======================================== ================================================ +A symbol may be used before it is declared or assigned; +unassigned symbols are assumed to be PC-relative. -.. _amdgpu_synid_omod: +Addition information about symbols may be found :ref:`here<amdgpu-symbols>`. -omod -~~~~ +.. _amdgpu_synid_conv: -Specifies if an output modifier must be applied to the result. -By default, no output modifiers are applied. +Conversions +=========== -Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any). +This section describes what happens when a 64-bit +:ref:`integer number<amdgpu_synid_integer_number>`, a +:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or a +:ref:`symbol<amdgpu_synid_symbol>` +is used for an operand which has a different type or size. -Output modifiers are valid for f32 and f64 floating point results only. -They must not be used with f16. +Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W: -Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result -but accepts output modifiers. +* Values encoded as :ref:`inline constants<amdgpu_synid_constant>` are handled by H/W. +* Values encoded as :ref:`literals<amdgpu_synid_literal>` are converted by assembler. - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - mul:2 Multiply the result by 2. - mul:4 Multiply the result by 4. - div:2 Multiply the result by 0.5. - ======================================== ================================================ +.. _amdgpu_synid_const_conv: -VOP3 Operand Modifiers ----------------------- +Inline Constants +---------------- -Operand modifiers are not used separately. They are applied to source operands. +.. _amdgpu_synid_int_const_conv: -.. _amdgpu_synid_abs: +Integer Inline Constants +~~~~~~~~~~~~~~~~~~~~~~~~ -abs -~~~ +Integer :ref:`inline constants<amdgpu_synid_constant>` +may be thought of as 64-bit +:ref:`integer numbers<amdgpu_synid_integer_number>`; +when used as operands they are truncated to the size of +:ref:`expected operand type<amdgpu_syn_instruction_type>`. +No data type conversions are performed. -Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any). -Valid for floating point operands only. +Examples: - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - abs(<operand>) Get absolute value of operand. - \|<operand>| The same as above. - ======================================== ================================================ +.. code-block:: nasm -.. _amdgpu_synid_neg: + // GFX9 -neg -~~~ + v_add_u16 v0, -1, 0 // v0 = 0xFFFF + v_add_f16 v0, -1, 0 // v0 = 0xFFFF (NaN) -Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any). -Valid for floating point operands only. + v_add_u32 v0, -1, 0 // v0 = 0xFFFFFFFF + v_add_f32 v0, -1, 0 // v0 = 0xFFFFFFFF (NaN) - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - neg(<operand>) Get negative value of operand. - -<operand> The same as above. - ======================================== ================================================ +.. _amdgpu_synid_fp_const_conv: -VOP3P Modifiers ---------------- +Floating-Point Inline Constants +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This section describes modifiers of regular VOP3P instructions. -*v_mad_mix* modifiers are described :ref:`in a separate section<amdgpu_synid_mad_mix>`. +Floating-point :ref:`inline constants<amdgpu_synid_constant>` +may be thought of as 64-bit +:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`; +when used as operands they are converted to a floating-point number of +:ref:`expected operand size<amdgpu_syn_instruction_type>`. -GFX9 only. +Examples: -.. _amdgpu_synid_op_sel: +.. code-block:: nasm -op_sel -~~~~~~ + // GFX9 -Selects the low [15:0] or high [31:16] operand bits as input to the operation -which results in the lower-half of the destination. -By default, low bits are used for all operands. + v_add_f16 v0, 1.0, 0 // v0 = 0x3C00 (1.0) + v_add_u16 v0, 1.0, 0 // v0 = 0x3C00 -The number of values specified with the op_sel modifier must match the number of source -operands. First value controls src0, second value controls src1 and so on. -The value 0 selects the low bits, while 1 selects the high bits. + v_add_f32 v0, 1.0, 0 // v0 = 0x3F800000 (1.0) + v_add_u32 v0, 1.0, 0 // v0 = 0x3F800000 - ======================================== ============================================================= - Syntax Description - ======================================== ============================================================= - op_sel:[{0..1}] Select operand bits for instructions with 1 source operand. - op_sel:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands. - op_sel:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. - ======================================== ============================================================= -.. _amdgpu_synid_op_sel_hi: +.. _amdgpu_synid_lit_conv: -op_sel_hi -~~~~~~~~~ +Literals +-------- -Selects the low [15:0] or high [31:16] operand bits as input to the operation -which results in the upper-half of the destination. -By default, high bits are used for all operands. +.. _amdgpu_synid_int_lit_conv: -The number of values specified with the op_sel_hi modifier must match the number of source -operands. First value controls src0, second value controls src1 and so on. -The value 0 selects the low bits, while 1 selects the high bits. +Integer Literals +~~~~~~~~~~~~~~~~ - ======================================== ============================================================= - Syntax Description - ======================================== ============================================================= - op_sel_hi:[{0..1}] Select operand bits for instructions with 1 source operand. - op_sel_hi:[{0..1},{0..1}] Select operand bits for instructions with 2 source operands. - op_sel_hi:[{0..1},{0..1},{0..1}] Select operand bits for instructions with 3 source operands. - ======================================== ============================================================= +Integer :ref:`literals<amdgpu_synid_literal>` +are specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>`. -.. _amdgpu_synid_neg_lo: +When used as operands they are converted to +:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below. -neg_lo -~~~~~~ + ============== ============== =============== ==================================================================== + Expected type Condition Result Note + ============== ============== =============== ==================================================================== + i16, u16, b16 cond(num, 16) num.u16 Truncate to 16 bits. + i32, u32, b32 cond(num, 32) num.u32 Truncate to 32 bits. + i64 cond(num, 32) {-1, num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits. + u64, b64 cond(num, 32) { 0, num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits. + f16 cond(num, 16) num.u16 Use low 16 bits as an f16 value. + f32 cond(num, 32) num.u32 Use low 32 bits as an f32 value. + f64 cond(num, 32) {num.u32, 0} Use low 32 bits of the number as high 32 bits + of the result; low 32 bits of the result are zeroed. + ============== ============== =============== ==================================================================== -Specifies whether to change sign of operand values selected by -:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used -as input to the operation which results in the upper-half of the destination. +The condition *cond(X,S)* indicates if a 64-bit number *X* +can be converted to a smaller size *S* by truncation of upper bits. +There are two cases when the conversion is possible: -The number of values specified with this modifier must match the number of source -operands. First value controls src0, second value controls src1 and so on. +* The truncated bits are all 0. +* The truncated bits are all 1 and the value after truncation has its MSB bit set. -The value 0 indicates that the corresponding operand value is used unmodified, -the value 1 indicates that negative value of the operand must be used. +Examples of valid literals: -By default, operand values are used unmodified. +.. code-block:: nasm -This modifier is valid for floating point operands only. + // GFX9 - ======================================== ================================================================== - Syntax Description - ======================================== ================================================================== - neg_lo:[{0..1}] Select affected operands for instructions with 1 source operand. - neg_lo:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands. - neg_lo:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands. - ======================================== ================================================================== + v_add_u16 v0, 0xff00, v0 // value after conversion: 0xff00 + v_add_u16 v0, 0xffffffffffffff00, v0 // value after conversion: 0xff00 + v_add_u16 v0, -256, v0 // value after conversion: 0xff00 -.. _amdgpu_synid_neg_hi: + s_bfe_i64 s[0:1], 0xffefffff, s3 // value after conversion: 0xffffffffffefffff + s_bfe_u64 s[0:1], 0xffefffff, s3 // value after conversion: 0x00000000ffefffff + v_ceil_f64_e32 v[0:1], 0xffefffff // value after conversion: 0xffefffff00000000 (-1.7976922776554302e308) -neg_hi -~~~~~~ +Examples of invalid literals: -Specifies whether to change sign of operand values selected by -:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used -as input to the operation which results in the upper-half of the destination. +.. code-block:: nasm -The number of values specified with this modifier must match the number of source -operands. First value controls src0, second value controls src1 and so on. + // GFX9 -The value 0 indicates that the corresponding operand value is used unmodified, -the value 1 indicates that negative value of the operand must be used. + v_add_u16 v0, 0x1ff00, v0 // conversion is not possible as truncated bits are not all 0 or 1 + v_add_u16 v0, 0xffffffffffff00ff, v0 // conversion is not possible as truncated bits do not match MSB of the result -By default, operand values are used unmodified. +.. _amdgpu_synid_fp_lit_conv: -This modifier is valid for floating point operands only. +Floating-Point Literals +~~~~~~~~~~~~~~~~~~~~~~~ - ======================================== ================================================================== - Syntax Description - ======================================== ================================================================== - neg_hi:[{0..1}] Select affected operands for instructions with 1 source operand. - neg_hi:[{0..1},{0..1}] Select affected operands for instructions with 2 source operands. - neg_hi:[{0..1},{0..1},{0..1}] Select affected operands for instructions with 3 source operands. - ======================================== ================================================================== +Floating-point :ref:`literals<amdgpu_synid_literal>` are specified as 64-bit +:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`. -clamp -~~~~~ +When used as operands they are converted to +:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below. -See a description :ref:`here<amdgpu_synid_clamp>`. + ============== ============== ================= ================================================================= + Expected type Condition Result Note + ============== ============== ================= ================================================================= + i16, u16, b16 cond(num, 16) f16(num) Convert to f16 and use bits of the result as an integer value. + i32, u32, b32 cond(num, 32) f32(num) Convert to f32 and use bits of the result as an integer value. + i64, u64, b64 false \- Conversion disabled because of an unclear semantics. + f16 cond(num, 16) f16(num) Convert to f16. + f32 cond(num, 32) f32(num) Convert to f32. + f64 true {num.u32.hi, 0} Use high 32 bits of the number as high 32 bits of the result; + zero-fill low 32 bits of the result. -.. _amdgpu_synid_mad_mix: + Note that the result may differ from the original number. + ============== ============== ================= ================================================================= -VOP3P V_MAD_MIX Modifiers -------------------------- +The condition *cond(X,S)* indicates if an f64 number *X* can be converted +to a smaller *S*-bit floating-point type without overflow or underflow. +Precision lost is allowed. -These instructions use VOP3P format but have different modifiers. +Examples of valid literals: -GFX9 only. +.. code-block:: nasm -.. _amdgpu_synid_mad_mix_op_sel: + // GFX9 -mad_mix_op_sel -~~~~~~~~~~~~~~ + v_add_f16 v1, 65500.0, v2 + v_add_f32 v1, 65600.0, v2 -This operand has meaning only for 16-bit source operands as indicated by -:ref:`mad_mix_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`. -It specifies to select either the low [15:0] or high [31:16] operand bits -as input to the operation. + // value before conversion: 0x7fefffffffffffff (1.7976931348623157e308) + v_ceil_f64 v[0:1], 1.7976931348623157e308 // value after conversion: 0x7fefffff00000000 (1.7976922776554302e308) -The value 0 indicates the low bits, the value 1 indicates the high 16 bits. -By default, low bits are used for all operands. +Examples of invalid literals: - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - op_sel:[{0..1},{0..1},{0..1}] Select location of each 16-bit source operand. - ======================================== ================================================ +.. code-block:: nasm -.. _amdgpu_synid_mad_mix_op_sel_hi: + // GFX9 -mad_mix_op_sel_hi -~~~~~~~~~~~~~~~~~ + v_add_f16 v1, 65600.0, v2 // cannot be converted to f16 because of overflow -Selects the size of source operands: either 32 bits or 16 bits. -By default, 32 bits are used for all source operands. +.. _amdgpu_synid_exp_conv: -The value 0 indicates 32 bits, the value 1 indicates 16 bits. -The location of 16 bits in the operand may be specified by -:ref:`mad_mix_op_sel<amdgpu_synid_mad_mix_op_sel>`. +Expressions +~~~~~~~~~~~ - ======================================== ================================================ - Syntax Description - ======================================== ================================================ - op_sel_hi:[{0..1},{0..1},{0..1}] Select size of each source operand. - ======================================== ================================================ +Expressions operate with and result in 64-bit integers. -abs -~~~ +When used as operands they are truncated to +:ref:`expected operand size<amdgpu_syn_instruction_type>`. +No data type conversions are performed. -See a description :ref:`here<amdgpu_synid_abs>`. +Examples: -neg -~~~ +.. code-block:: nasm -See a description :ref:`here<amdgpu_synid_neg>`. + // GFX9 -clamp -~~~~~ + x = 0.1 + v_sqrt_f32 v0, x // v0 = [low 32 bits of 0.1 (double)] + v_sqrt_f32 v0, (0.1 + 0) // the same as above + v_sqrt_f32 v0, 0.1 // v0 = [0.1 (double) converted to float] -See a description :ref:`here<amdgpu_synid_clamp>`. |