diff options
Diffstat (limited to 'llvm/docs/AMDGPUOperandSyntax.rst')
| -rw-r--r-- | llvm/docs/AMDGPUOperandSyntax.rst | 543 |
1 files changed, 255 insertions, 288 deletions
diff --git a/llvm/docs/AMDGPUOperandSyntax.rst b/llvm/docs/AMDGPUOperandSyntax.rst index 523c5ac7179..c20da004729 100644 --- a/llvm/docs/AMDGPUOperandSyntax.rst +++ b/llvm/docs/AMDGPUOperandSyntax.rst @@ -38,7 +38,8 @@ Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* register =================================================== ==================================================================== **v**\<N> A single 32-bit *vector* register. - *N* must be a decimal integer number. + *N* must be a decimal + :ref:`integer number<amdgpu_synid_integer_number>`. **v[**\ <N>\ **]** A single 32-bit *vector* register. *N* may be specified as an @@ -51,10 +52,11 @@ Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* register or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *vector* registers. - Register indices must be specified as decimal integer numbers. + Register indices must be specified as decimal + :ref:`integer numbers<amdgpu_synid_integer_number>`. =================================================== ==================================================================== -Note. *N* and *K* must satisfy the following conditions: +Note: *N* and *K* must satisfy the following conditions: * *N* <= *K*. * 0 <= *N* <= 255. @@ -77,26 +79,27 @@ Examples: .. _amdgpu_synid_nsa: -*Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*: +GFX10 *Image* instructions may use special *NSA* (Non-Sequential Address) syntax for *image addresses*: - =================================================== ==================================================================== - Syntax Description - =================================================== ==================================================================== - **[v**\ <A>, \ **v**\ <B>, ... **v**\ <X>\ **]** A sequence of *vector* registers. At least one register - must be specified. + ===================================== ================================================= + Syntax Description + ===================================== ================================================= + **[Vm**, \ **Vn**, ... **Vk**\ **]** A sequence of 32-bit *vector* registers. + Each register may be specified using a syntax + defined :ref:`above<amdgpu_synid_v>`. - In contrast with standard syntax described above, registers in - this sequence are not required to have consecutive indices. - Moreover, the same register may appear in the list more than once. - =================================================== ==================================================================== - -Note. Reqister indices must be in the range 0..255. They must be specified as decimal integer numbers. + In contrast with standard syntax, registers + in *NSA* sequence are not required to have + consecutive indices. Moreover, the same register + may appear in the list more than once. + ===================================== ================================================= Examples: .. parsed-literal:: - [v32,v1,v2] + [v32,v1,v[2]] + [v[32],v[1:1],[v2]] [v4,v4,v4,v4] .. _amdgpu_synid_s: @@ -126,7 +129,9 @@ Sequences of 4 and more *scalar* registers must be quad-aligned. ======================================================== ==================================================================== **s**\ <N> A single 32-bit *scalar* register. - *N* must be a decimal integer number. + *N* must be a decimal + :ref:`integer number<amdgpu_synid_integer_number>`. + **s[**\ <N>\ **]** A single 32-bit *scalar* register. *N* may be specified as an @@ -137,12 +142,14 @@ Sequences of 4 and more *scalar* registers must be quad-aligned. *N* and *K* may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>` or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. + **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *scalar* registers. - Register indices must be specified as decimal integer numbers. + Register indices must be specified as decimal + :ref:`integer numbers<amdgpu_synid_integer_number>`. ======================================================== ==================================================================== -Note. *N* and *K* must satisfy the following conditions: +Note: *N* and *K* must satisfy the following conditions: * *N* must be properly aligned based on sequence size. * *N* <= *K*. @@ -210,7 +217,8 @@ Sequences of 4 and more *ttmp* registers must be quad-aligned. ============================================================= ==================================================================== **ttmp**\ <N> A single 32-bit *ttmp* register. - *N* must be a decimal integer number. + *N* must be a decimal + :ref:`integer number<amdgpu_synid_integer_number>`. **ttmp[**\ <N>\ **]** A single 32-bit *ttmp* register. *N* may be specified as an @@ -223,10 +231,11 @@ Sequences of 4 and more *ttmp* registers must be quad-aligned. or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`. **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]** A sequence of (\ *K-N+1*\ ) *ttmp* registers. - Register indices must be specified as decimal integer numbers. + Register indices must be specified as decimal + :ref:`integer numbers<amdgpu_synid_integer_number>`. ============================================================= ==================================================================== -Note. *N* and *K* must satisfy the following conditions: +Note: *N* and *K* must satisfy the following conditions: * *N* must be properly aligned based on sequence size. * *N* <= *K*. @@ -266,8 +275,8 @@ Trap base address, 64-bits wide. Holds the pointer to the current trap handler p Syntax Description Availability ================== ======================================================================= ============= tba 64-bit *trap base address* register. GFX7, GFX8 - [tba] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8 - [tba_lo,tba_hi] 64-bit *trap base address* register (an alternative syntax). GFX7, GFX8 + [tba] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8 + [tba_lo,tba_hi] 64-bit *trap base address* register (an SP3 syntax). GFX7, GFX8 ================== ======================================================================= ============= High and low 32 bits of *trap base address* may be accessed as separate registers: @@ -277,8 +286,8 @@ High and low 32 bits of *trap base address* may be accessed as separate register ================== ======================================================================= ============= tba_lo Low 32 bits of *trap base address* register. GFX7, GFX8 tba_hi High 32 bits of *trap base address* register. GFX7, GFX8 - [tba_lo] Low 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8 - [tba_hi] High 32 bits of *trap base address* register (an alternative syntax). GFX7, GFX8 + [tba_lo] Low 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8 + [tba_hi] High 32 bits of *trap base address* register (an SP3 syntax). GFX7, GFX8 ================== ======================================================================= ============= Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9 and GFX10, @@ -295,8 +304,8 @@ Trap memory address, 64-bits wide. Syntax Description Availability ================= ======================================================================= ================== tma 64-bit *trap memory address* register. GFX7, GFX8 - [tma] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8 - [tma_lo,tma_hi] 64-bit *trap memory address* register (an alternative syntax). GFX7, GFX8 + [tma] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8 + [tma_lo,tma_hi] 64-bit *trap memory address* register (an SP3 syntax). GFX7, GFX8 ================= ======================================================================= ================== High and low 32 bits of *trap memory address* may be accessed as separate registers: @@ -306,8 +315,8 @@ High and low 32 bits of *trap memory address* may be accessed as separate regist ================= ======================================================================= ================== tma_lo Low 32 bits of *trap memory address* register. GFX7, GFX8 tma_hi High 32 bits of *trap memory address* register. GFX7, GFX8 - [tma_lo] Low 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8 - [tma_hi] High 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8 + [tma_lo] Low 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8 + [tma_hi] High 32 bits of *trap memory address* register (an SP3 syntax). GFX7, GFX8 ================= ======================================================================= ================== Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9 and GFX10, @@ -324,8 +333,8 @@ Flat scratch address, 64-bits wide. Holds the base address of scratch memory. Syntax Description ================================== ================================================================ flat_scratch 64-bit *flat scratch* address register. - [flat_scratch] 64-bit *flat scratch* address register (an alternative syntax). - [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an alternative syntax). + [flat_scratch] 64-bit *flat scratch* address register (an SP3 syntax). + [flat_scratch_lo,flat_scratch_hi] 64-bit *flat scratch* address register (an SP3 syntax). ================================== ================================================================ High and low 32 bits of *flat scratch* address may be accessed as separate registers: @@ -335,8 +344,8 @@ High and low 32 bits of *flat scratch* address may be accessed as separate regis ========================= ========================================================================= flat_scratch_lo Low 32 bits of *flat scratch* address register. flat_scratch_hi High 32 bits of *flat scratch* address register. - [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an alternative syntax). - [flat_scratch_hi] High 32 bits of *flat scratch* address register (an alternative syntax). + [flat_scratch_lo] Low 32 bits of *flat scratch* address register (an SP3 syntax). + [flat_scratch_hi] High 32 bits of *flat scratch* address register (an SP3 syntax). ========================= ========================================================================= .. _amdgpu_synid_xnack: @@ -355,8 +364,8 @@ received an *XNACK* due to a vector memory operation. Syntax Description ============================== ===================================================== xnack_mask 64-bit *xnack mask* register. - [xnack_mask] 64-bit *xnack mask* register (an alternative syntax). - [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an alternative syntax). + [xnack_mask] 64-bit *xnack mask* register (an SP3 syntax). + [xnack_mask_lo,xnack_mask_hi] 64-bit *xnack mask* register (an SP3 syntax). ============================== ===================================================== High and low 32 bits of *xnack mask* may be accessed as separate registers: @@ -366,8 +375,8 @@ High and low 32 bits of *xnack mask* may be accessed as separate registers: ===================== ============================================================== xnack_mask_lo Low 32 bits of *xnack mask* register. xnack_mask_hi High 32 bits of *xnack mask* register. - [xnack_mask_lo] Low 32 bits of *xnack mask* register (an alternative syntax). - [xnack_mask_hi] High 32 bits of *xnack mask* register (an alternative syntax). + [xnack_mask_lo] Low 32 bits of *xnack mask* register (an SP3 syntax). + [xnack_mask_hi] High 32 bits of *xnack mask* register (an SP3 syntax). ===================== ============================================================== .. _amdgpu_synid_vcc: @@ -385,8 +394,8 @@ Note that GFX10 H/W does not use high 32 bits of *vcc* in *wave32* mode. Syntax Description ================ ========================================================================= vcc 64-bit *vector condition code* register. - [vcc] 64-bit *vector condition code* register (an alternative syntax). - [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an alternative syntax). + [vcc] 64-bit *vector condition code* register (an SP3 syntax). + [vcc_lo,vcc_hi] 64-bit *vector condition code* register (an SP3 syntax). ================ ========================================================================= High and low 32 bits of *vector condition code* may be accessed as separate registers: @@ -396,8 +405,8 @@ High and low 32 bits of *vector condition code* may be accessed as separate regi ================ ========================================================================= vcc_lo Low 32 bits of *vector condition code* register. vcc_hi High 32 bits of *vector condition code* register. - [vcc_lo] Low 32 bits of *vector condition code* register (an alternative syntax). - [vcc_hi] High 32 bits of *vector condition code* register (an alternative syntax). + [vcc_lo] Low 32 bits of *vector condition code* register (an SP3 syntax). + [vcc_hi] High 32 bits of *vector condition code* register (an SP3 syntax). ================ ========================================================================= .. _amdgpu_synid_m0: @@ -412,7 +421,7 @@ including register indexing and bounds checking. Syntax Description =========== =================================================== m0 A 32-bit *memory* register. - [m0] A 32-bit *memory* register (an alternative syntax). + [m0] A 32-bit *memory* register (an SP3 syntax). =========== =================================================== .. _amdgpu_synid_exec: @@ -430,8 +439,8 @@ Note that GFX10 H/W does not use high 32 bits of *exec* in *wave32* mode. Syntax Description ===================== ================================================================= exec 64-bit *execute mask* register. - [exec] 64-bit *execute mask* register (an alternative syntax). - [exec_lo,exec_hi] 64-bit *execute mask* register (an alternative syntax). + [exec] 64-bit *execute mask* register (an SP3 syntax). + [exec_lo,exec_hi] 64-bit *execute mask* register (an SP3 syntax). ===================== ================================================================= High and low 32 bits of *execute mask* may be accessed as separate registers: @@ -441,8 +450,8 @@ High and low 32 bits of *execute mask* may be accessed as separate registers: ===================== ================================================================= exec_lo Low 32 bits of *execute mask* register. exec_hi High 32 bits of *execute mask* register. - [exec_lo] Low 32 bits of *execute mask* register (an alternative syntax). - [exec_hi] High 32 bits of *execute mask* register (an alternative syntax). + [exec_lo] Low 32 bits of *execute mask* register (an SP3 syntax). + [exec_hi] High 32 bits of *execute mask* register (an SP3 syntax). ===================== ================================================================= .. _amdgpu_synid_vccz: @@ -452,7 +461,7 @@ vccz A single bit flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros. -Note. When GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`. +Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`vcc_lo<amdgpu_synid_vcc_lo>`. .. _amdgpu_synid_execz: @@ -461,7 +470,7 @@ execz A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros. -Note. When GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`. +Note: when GFX10 operates in *wave32* mode, this register reflects state of :ref:`exec_lo<amdgpu_synid_exec>`. .. _amdgpu_synid_scc: @@ -495,19 +504,20 @@ GFX10 only. .. _amdgpu_synid_constant: -constant --------- +inline constant +--------------- + +An *inline constant* is an integer or a floating-point value encoded as a part of an instruction. +Compare *inline constants* with :ref:`literals<amdgpu_synid_literal>`. -A set of integer and floating-point *inline* constants and values: +Inline constants include: * :ref:`iconst<amdgpu_synid_iconst>` * :ref:`fconst<amdgpu_synid_fconst>` * :ref:`ival<amdgpu_synid_ival>` -In contrast with :ref:`literals<amdgpu_synid_literal>`, these operands are encoded as a part of instruction. - If a number may be encoded as either -a :ref:`literal<amdgpu_synid_literal>` or +a :ref:`literal<amdgpu_synid_literal>` or a :ref:`constant<amdgpu_synid_constant>`, assembler selects the latter encoding as more efficient. @@ -516,17 +526,14 @@ assembler selects the latter encoding as more efficient. iconst ~~~~~~ -An :ref:`integer number<amdgpu_synid_integer_number>` +An :ref:`integer number<amdgpu_synid_integer_number>` or +an :ref:`absolute expression<amdgpu_synid_absolute_expression>` encoded as an *inline constant*. Only a small fraction of integer numbers may be encoded as *inline constants*. They are enumerated in the table below. Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`. -Integer *inline constants* are converted to -:ref:`expected operand type<amdgpu_syn_instruction_type>` -as described :ref:`here<amdgpu_synid_int_const_conv>`. - ================================== ==================================== Value Note ================================== ==================================== @@ -548,10 +555,6 @@ Only a small fraction of floating-point numbers may be encoded as *inline consta They are enumerated in the table below. Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`. -Floating-point *inline constants* are converted to -:ref:`expected operand type<amdgpu_syn_instruction_type>` -as described :ref:`here<amdgpu_synid_fp_const_conv>`. - ===================== ===================================================== ================== Value Note Availability ===================== ===================================================== ================== @@ -594,21 +597,18 @@ These operands provide read-only access to H/W registers. literal ------- -A literal is a 64-bit value which is encoded as a separate 32-bit dword in the instruction stream. +A *literal* is a 64-bit value encoded as a separate 32-bit dword in the instruction stream. +Compare *literals* with :ref:`inline constants<amdgpu_synid_constant>`. If a number may be encoded as either -a :ref:`literal<amdgpu_synid_literal>` or +a :ref:`literal<amdgpu_synid_literal>` or an :ref:`inline constant<amdgpu_synid_constant>`, assembler selects the latter encoding as more efficient. Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`, -:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or -:ref:`expressions<amdgpu_synid_expression>` -(expressions are currently supported for 32-bit operands only). - -A 64-bit literal value is converted by assembler -to an :ref:`expected operand type<amdgpu_syn_instruction_type>` -as described :ref:`here<amdgpu_synid_lit_conv>`. +:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`, +:ref:`absolute expressions<amdgpu_synid_absolute_expression>` or +:ref:`relocatable expressions<amdgpu_synid_relocatable_expression>`. An instruction may use only one literal but several operands may refer the same literal. @@ -617,30 +617,38 @@ An instruction may use only one literal but several operands may refer the same uimm8 ----- -A 8-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. -The value is encoded as part of the opcode so it is free to use. +A 8-bit :ref:`integer number<amdgpu_synid_integer_number>` +or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. +The value must be in the range 0..0xFF. .. _amdgpu_synid_uimm32: uimm32 ------ -A 32-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. -The value is stored as a separate 32-bit dword in the instruction stream. +A 32-bit :ref:`integer number<amdgpu_synid_integer_number>` +or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. +The value must be in the range 0..0xFFFFFFFF. .. _amdgpu_synid_uimm20: uimm20 ------ -A 20-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. +A 20-bit :ref:`integer number<amdgpu_synid_integer_number>` +or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. + +The value must be in the range 0..0xFFFFF. .. _amdgpu_synid_uimm21: uimm21 ------ -A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. +A 21-bit :ref:`integer number<amdgpu_synid_integer_number>` +or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. + +The value must be in the range 0..0x1FFFFF. .. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement. @@ -649,7 +657,10 @@ A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`. simm21 ------ -A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`. +A 21-bit :ref:`integer number<amdgpu_synid_integer_number>` +or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`. + +The value must be in the range -0x100000..0x0FFFFF. .. WARNING:: Assembler currently supports 20-bit unsigned offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement. @@ -678,27 +689,20 @@ Integer Numbers --------------- Integer numbers are 64 bits wide. -They may be specified in binary, octal, hexadecimal and decimal formats: - - ============== ==================================== - Format Syntax - ============== ==================================== - Decimal [-]?[1-9][0-9]* - Binary [-]?0b[01]+ - Octal [-]?0[0-7]+ - Hexadecimal [-]?0x[0-9a-fA-F]+ - \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH] - ============== ==================================== +They are converted to :ref:`expected operand type<amdgpu_syn_instruction_type>` +as described :ref:`here<amdgpu_synid_int_conv>`. -Examples: +Integer numbers may be specified in binary, octal, hexadecimal and decimal formats: -.. parsed-literal:: - - -1234 - 0b1010 - 010 - 0xff - 0ffh + ============ =============================== ======== + Format Syntax Example + ============ =============================== ======== + Decimal [-]?[1-9][0-9]* -1234 + Binary [-]?0b[01]+ 0b1010 + Octal [-]?0[0-7]+ 010 + Hexadecimal [-]?0x[0-9a-fA-F]+ 0xff + \ [-]?[0x]?[0-9][0-9a-fA-F]*[hH] 0ffh + ============ =============================== ======== .. _amdgpu_synid_floating-point_number: @@ -706,31 +710,29 @@ Floating-Point Numbers ---------------------- All floating-point numbers are handled as double (64 bits wide). +They are converted to +:ref:`expected operand type<amdgpu_syn_instruction_type>` +as described :ref:`here<amdgpu_synid_fp_conv>`. Floating-point numbers may be specified in hexadecimal and decimal formats: - ============== ======================================================== ======================================================== - Format Syntax Note - ============== ======================================================== ======================================================== - Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? Must include either a decimal separator or an exponent. - Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+ - ============== ======================================================== ======================================================== - -Examples: - -.. parsed-literal:: - - -1.234 - 234e2 - -0x1afp-10 - 0x.1afp10 + ============ ======================================================== ====================== ==================== + Format Syntax Examples Note + ============ ======================================================== ====================== ==================== + Decimal [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)? -1.234, 234e2 Must include either + a decimal separator + or an exponent. + Hexadecimal [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+ -0x1afp-10, 0x.1afp10 + ============ ======================================================== ====================== ==================== .. _amdgpu_synid_expression: Expressions =========== -An expression specifies an address or a numeric value. +An expression is evaluated to a 64-bit integer. +Note that floating-point expressions are not supported. + There are two kinds of expressions: * :ref:`Absolute<amdgpu_synid_absolute_expression>`. @@ -741,10 +743,14 @@ There are two kinds of expressions: Absolute Expressions -------------------- -The value of an absolute expression remains the same after program relocation. +The value of an absolute expression does not change after program relocation. Absolute expressions must not include unassigned and relocatable values such as labels. +Absolute expressions are evaluated to 64-bit integer values and converted to +:ref:`expected operand type<amdgpu_syn_instruction_type>` +as described :ref:`here<amdgpu_synid_int_conv>`. + Examples: .. parsed-literal:: @@ -760,45 +766,38 @@ Relocatable Expressions The value of a relocatable expression depends on program relocation. Note that use of relocatable expressions is limited with branch targets -and 32-bit :ref:`literals<amdgpu_synid_literal>`. +and 32-bit integer operands. -Addition information about relocation may be found :ref:`here<amdgpu-relocation-records>`. - -Examples: +A relocatable expression is evaluated to a 64-bit integer value +which depends on operand kind and :ref:`relocation type<amdgpu-relocation-records>` +of symbol(s) used in the expression. For example, if an instruction refers a label, +this reference is evaluated to an offset from the address after the instruction +to the label address: .. parsed-literal:: - y = x + 10 // x is not yet defined. Undefined symbols are assumed to be PC-relative. - z = . - -Expression Data Type --------------------- - -Expressions and operands of expressions are interpreted as 64-bit integers. + label: + v_add_co_u32_e32 v0, vcc, label, v1 // 'label' operand is evaluated to -4 -Expressions may include 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` (double). -However these operands are also handled as 64-bit integers -using binary representation of specified floating-point numbers. -No conversion from floating-point to integer is performed. - -Examples: +Note that values of relocatable expressions are usually unknown at assembly time; +they are resolved later by a linker and converted to +:ref:`expected operand type<amdgpu_syn_instruction_type>` +as described :ref:`here<amdgpu_synid_rl_conv>`. -.. parsed-literal:: +Operands and Operations +----------------------- - x = 0.1 // x is assigned an integer 4591870180066957722 which is a binary representation of 0.1. - y = x + x // y is a sum of two integer values; it is not equal to 0.2! +Expressions are composed of 64-bit integer operands and operations. +Operands include :ref:`integer numbers<amdgpu_synid_integer_number>` +and :ref:`symbols<amdgpu_synid_symbol>`. -Syntax ------- +Expressions may also use "." which is a reference to the current PC (program counter). -Expressions are composed of -:ref:`symbols<amdgpu_synid_symbol>`, -:ref:`integer numbers<amdgpu_synid_integer_number>`, -:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`, -:ref:`binary operators<amdgpu_synid_expression_bin_op>`, -:ref:`unary operators<amdgpu_synid_expression_un_op>` and subexpressions. +:ref:`Unary<amdgpu_synid_expression_un_op>` and :ref:`binary<amdgpu_synid_expression_bin_op>` +operations produce 64-bit integer results. -Expressions may also use "." which is a reference to the current PC (program counter). +Syntax of Expressions +--------------------- The syntax of expressions is shown below:: @@ -887,7 +886,7 @@ They operate on and produce 64-bit integers. Symbols ------- -A symbol is a named 64-bit value, representing a relocatable +A symbol is a named 64-bit integer value, representing a relocatable address or an absolute (non-relocatable) number. Symbol names have the following syntax: @@ -907,128 +906,78 @@ The table below provides several examples of syntax used for symbol definition. A symbol may be used before it is declared or assigned; unassigned symbols are assumed to be PC-relative. -Addition information about symbols may be found :ref:`here<amdgpu-symbols>`. +Additional information about symbols may be found :ref:`here<amdgpu-symbols>`. .. _amdgpu_synid_conv: -Conversions -=========== +Type and Size Conversion +======================== This section describes what happens when a 64-bit :ref:`integer number<amdgpu_synid_integer_number>`, a -:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or a -:ref:`symbol<amdgpu_synid_symbol>` +:ref:`floating-point number<amdgpu_synid_floating-point_number>` or an +:ref:`expression<amdgpu_synid_expression>` is used for an operand which has a different type or size. -Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W: - -* Values encoded as :ref:`inline constants<amdgpu_synid_constant>` are handled by H/W. -* Values encoded as :ref:`literals<amdgpu_synid_literal>` are converted by assembler. - -.. _amdgpu_synid_const_conv: - -Inline Constants ----------------- - -.. _amdgpu_synid_int_const_conv: - -Integer Inline Constants -~~~~~~~~~~~~~~~~~~~~~~~~ - -Integer :ref:`inline constants<amdgpu_synid_constant>` -may be thought of as 64-bit -:ref:`integer numbers<amdgpu_synid_integer_number>`; -when used as operands they are truncated to the size of -:ref:`expected operand type<amdgpu_syn_instruction_type>`. -No data type conversions are performed. - -Examples: - -.. parsed-literal:: - - // GFX9 - - v_add_u16 v0, -1, 0 // v0 = 0xFFFF - v_add_f16 v0, -1, 0 // v0 = 0xFFFF (NaN) - - v_add_u32 v0, -1, 0 // v0 = 0xFFFFFFFF - v_add_f32 v0, -1, 0 // v0 = 0xFFFFFFFF (NaN) +.. _amdgpu_synid_int_conv: -.. _amdgpu_synid_fp_const_conv: +Conversion of Integer Values +---------------------------- -Floating-Point Inline Constants -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Instruction operands may be specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>` or +:ref:`absolute expressions<amdgpu_synid_absolute_expression>`. These values are converted to +the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps: -Floating-point :ref:`inline constants<amdgpu_synid_constant>` -may be thought of as 64-bit -:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`; -when used as operands they are converted to a floating-point number of -:ref:`expected operand size<amdgpu_syn_instruction_type>`. +1. *Validation*. Assembler checks if the input value may be truncated without loss to the required *truncation width* +(see the table below). There are two cases when this operation is enabled: -Examples: - -.. parsed-literal:: - - // GFX9 - - v_add_f16 v0, 1.0, 0 // v0 = 0x3C00 (1.0) - v_add_u16 v0, 1.0, 0 // v0 = 0x3C00 - - v_add_f32 v0, 1.0, 0 // v0 = 0x3F800000 (1.0) - v_add_u32 v0, 1.0, 0 // v0 = 0x3F800000 - - -.. _amdgpu_synid_lit_conv: - -Literals --------- + * The truncated bits are all 0. + * The truncated bits are all 1 and the value after truncation has its MSB bit set. -.. _amdgpu_synid_int_lit_conv: +In all other cases assembler triggers an error. -Integer Literals -~~~~~~~~~~~~~~~~ +2. *Conversion*. The input value is converted to the expected type as described in the table below. +Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W (or both). -Integer :ref:`literals<amdgpu_synid_literal>` -are specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>`. + ============== ================= =============== ==================================================================== + Expected type Truncation Width Conversion Description + ============== ================= =============== ==================================================================== + i16, u16, b16 16 num.u16 Truncate to 16 bits. + i32, u32, b32 32 num.u32 Truncate to 32 bits. + i64 32 {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits. + u64, b64 32 {0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits. + f16 16 num.u16 Use low 16 bits as an f16 value. + f32 32 num.u32 Use low 32 bits as an f32 value. + f64 32 {num.u32,0} Use low 32 bits of the number as high 32 bits + of the result; low 32 bits of the result are zeroed. + ============== ================= =============== ==================================================================== -When used as operands they are converted to -:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below. - - ============== ============== =============== ==================================================================== - Expected type Condition Result Note - ============== ============== =============== ==================================================================== - i16, u16, b16 cond(num,16) num.u16 Truncate to 16 bits. - i32, u32, b32 cond(num,32) num.u32 Truncate to 32 bits. - i64 cond(num,32) {-1,num.i32} Truncate to 32 bits and then sign-extend the result to 64 bits. - u64, b64 cond(num,32) { 0,num.u32} Truncate to 32 bits and then zero-extend the result to 64 bits. - f16 cond(num,16) num.u16 Use low 16 bits as an f16 value. - f32 cond(num,32) num.u32 Use low 32 bits as an f32 value. - f64 cond(num,32) {num.u32,0} Use low 32 bits of the number as high 32 bits - of the result; low 32 bits of the result are zeroed. - ============== ============== =============== ==================================================================== - -The condition *cond(X,S)* indicates if a 64-bit number *X* -can be converted to a smaller size *S* by truncation of upper bits. -There are two cases when the conversion is possible: - -* The truncated bits are all 0. -* The truncated bits are all 1 and the value after truncation has its MSB bit set. - -Examples of valid literals: +Examples of enabled conversions: .. parsed-literal:: // GFX9 - // Literal value after conversion: - v_add_u16 v0, 0xff00, v0 // 0xff00 - v_add_u16 v0, 0xffffffffffffff00, v0 // 0xff00 - v_add_u16 v0, -256, v0 // 0xff00 - // Literal value after conversion: - s_bfe_i64 s[0:1], 0xffefffff, s3 // 0xffffffffffefffff - s_bfe_u64 s[0:1], 0xffefffff, s3 // 0x00000000ffefffff - v_ceil_f64_e32 v[0:1], 0xffefffff // 0xffefffff00000000 (-1.7976922776554302e308) -Examples of invalid literals: + v_add_u16 v0, -1, 0 // src0 = 0xFFFF + v_add_f16 v0, -1, 0 // src0 = 0xFFFF (NaN) + // + v_add_u32 v0, -1, 0 // src0 = 0xFFFFFFFF + v_add_f32 v0, -1, 0 // src0 = 0xFFFFFFFF (NaN) + // + v_add_u16 v0, 0xff00, v0 // src0 = 0xff00 + v_add_u16 v0, 0xffffffffffffff00, v0 // src0 = 0xff00 + v_add_u16 v0, -256, v0 // src0 = 0xff00 + // + s_bfe_i64 s[0:1], 0xffefffff, s3 // src0 = 0xffffffffffefffff + s_bfe_u64 s[0:1], 0xffefffff, s3 // src0 = 0x00000000ffefffff + v_ceil_f64_e32 v[0:1], 0xffefffff // src0 = 0xffefffff00000000 (-1.7976922776554302e308) + // + x = 0xffefffff // + s_bfe_i64 s[0:1], x, s3 // src0 = 0xffffffffffefffff + s_bfe_u64 s[0:1], x, s3 // src0 = 0x00000000ffefffff + v_ceil_f64_e32 v[0:1], x // src0 = 0xffefffff00000000 (-1.7976922776554302e308) + +Examples of disabled conversions: .. parsed-literal:: @@ -1037,49 +986,57 @@ Examples of invalid literals: v_add_u16 v0, 0x1ff00, v0 // truncated bits are not all 0 or 1 v_add_u16 v0, 0xffffffffffff00ff, v0 // truncated bits do not match MSB of the result -.. _amdgpu_synid_fp_lit_conv: +.. _amdgpu_synid_fp_conv: -Floating-Point Literals -~~~~~~~~~~~~~~~~~~~~~~~ +Conversion of Floating-Point Values +----------------------------------- -Floating-point :ref:`literals<amdgpu_synid_literal>` are specified as 64-bit -:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`. +Instruction operands may be specified as 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>`. +These values are converted to the :ref:`expected operand type<amdgpu_syn_instruction_type>` using the following steps: -When used as operands they are converted to -:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below. +1. *Validation*. Assembler checks if the input f64 number can be converted +to the *required floating-point type* (see the table below) without overflow or underflow. +Precision lost is allowed. If this conversion is not possible, assembler triggers an error. - ============== ============== ================= ================================================================= - Expected type Condition Result Note - ============== ============== ================= ================================================================= - i16, u16, b16 cond(num,16) f16(num) Convert to f16 and use bits of the result as an integer value. - i32, u32, b32 cond(num,32) f32(num) Convert to f32 and use bits of the result as an integer value. - i64, u64, b64 false \- Conversion disabled because of an unclear semantics. - f16 cond(num,16) f16(num) Convert to f16. - f32 cond(num,32) f32(num) Convert to f32. - f64 true {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result; - zero-fill low 32 bits of the result. +2. *Conversion*. The input value is converted to the expected type as described in the table below. +Depending on operand kind, this is performed by either assembler or AMDGPU H/W (or both). - Note that the result may differ from the original number. - ============== ============== ================= ================================================================= + ============== ================ ================= ================================================================= + Expected type Required FP Type Conversion Description + ============== ================ ================= ================================================================= + i16, u16, b16 f16 f16(num) Convert to f16 and use bits of the result as an integer value. + i32, u32, b32 f32 f32(num) Convert to f32 and use bits of the result as an integer value. + i64, u64, b64 \- \- Conversion disabled. + f16 f16 f16(num) Convert to f16. + f32 f32 f32(num) Convert to f32. + f64 f64 {num.u32.hi,0} Use high 32 bits of the number as high 32 bits of the result; + zero-fill low 32 bits of the result. -The condition *cond(X,S)* indicates if an f64 number *X* can be converted -to a smaller *S*-bit floating-point type without overflow or underflow. -Precision lost is allowed. + Note that the result may differ from the original number. + ============== ================ ================= ================================================================= -Examples of valid literals: +Examples of enabled conversions: .. parsed-literal:: // GFX9 - v_add_f16 v1, 65500.0, v2 - v_add_f32 v1, 65600.0, v2 + v_add_f16 v0, 1.0, 0 // src0 = 0x3C00 (1.0) + v_add_u16 v0, 1.0, 0 // src0 = 0x3C00 + // + v_add_f32 v0, 1.0, 0 // src0 = 0x3F800000 (1.0) + v_add_u32 v0, 1.0, 0 // src0 = 0x3F800000 - // Literal value before conversion: 1.7976931348623157e308 (0x7fefffffffffffff) - // Literal value after conversion: 1.7976922776554302e308 (0x7fefffff00000000) + // src0 before conversion: + // 1.7976931348623157e308 = 0x7fefffffffffffff + // src0 after conversion: + // 1.7976922776554302e308 = 0x7fefffff00000000 v_ceil_f64 v[0:1], 1.7976931348623157e308 -Examples of invalid literals: + v_add_f16 v1, 65500.0, v2 // ok for f16. + v_add_f32 v1, 65600.0, v2 // ok for f32, but would result in overflow for f16. + +Examples of disabled conversions: .. parsed-literal:: @@ -1087,25 +1044,35 @@ Examples of invalid literals: v_add_f16 v1, 65600.0, v2 // overflow -.. _amdgpu_synid_exp_conv: +.. _amdgpu_synid_rl_conv: -Expressions -~~~~~~~~~~~ +Conversion of Relocatable Values +-------------------------------- -Expressions operate with and result in 64-bit integers. +:ref:`Relocatable expressions<amdgpu_synid_relocatable_expression>` +may be used with 32-bit integer operands and jump targets. -When used as operands they are truncated to -:ref:`expected operand size<amdgpu_syn_instruction_type>`. -No data type conversions are performed. +When the value of a relocatable expression is resolved by a linker, it is +converted as needed and truncated to the operand size. The conversion depends +on :ref:`relocation type<amdgpu-relocation-records>` and operand kind. -Examples: +For example, when a 32-bit operand of an instruction refers a relocatable expression *expr*, +this reference is evaluated to a 64-bit offset from the address after the +instruction to the address being referenced, *counted in bytes*. +Then the value is truncated to 32 bits and encoded as a literal: .. parsed-literal:: - // GFX9 + expr = . + v_add_co_u32_e32 v0, vcc, expr, v1 // 'expr' operand is evaluated to -4 + // and then truncated to 0xFFFFFFFC - x = 0.1 - v_sqrt_f32 v0, x // v0 = [low 32 bits of 0.1 (double)] - v_sqrt_f32 v0, (0.1 + 0) // the same as above - v_sqrt_f32 v0, 0.1 // v0 = [0.1 (double) converted to float] +As another example, when a branch instruction refers a label, +this reference is evaluated to an offset from the address after the +instruction to the label address, *counted in dwords*. +Then the value is truncated to 16 bits: + +.. parsed-literal:: + label: + s_branch label // 'label' operand is evaluated to -1 and truncated to 0xFFFF |

