1 files changed, 756 insertions, 746 deletions
diff --git a/llvm/docs/AMDGPUOperandSyntax.rst b/llvm/docs/AMDGPUOperandSyntax.rst
index 4f3536eed40..4fa2bb2c9eb 100644
--- a/llvm/docs/AMDGPUOperandSyntax.rst
+++ b/llvm/docs/AMDGPUOperandSyntax.rst
@@ -1,6 +1,6 @@
-=================================================
-Syntax of AMDGPU Assembler Operands and Modifiers
-=================================================
+=====================================
+Syntax of AMDGPU Instruction Operands
+=====================================
 
 .. contents::
    :local:
@@ -8,1048 +8,1058 @@ Syntax of AMDGPU Assembler Operands and Modifiers
 Conventions
 ===========
 
-The following conventions are used in syntax description:
+The following notation is used throughout this document:
 
-    =================== =============================================================
+    =================== =============================================================================
     Notation            Description
-    =================== =============================================================
+    =================== =============================================================================
     {0..N}              Any integer value in the range from 0 to N (inclusive).
-                        Unless stated otherwise, this value may be specified as
-                        either a literal or an llvm expression.
-    <x>                 Syntax and meaning of *<x>* is explained elsewhere.
-    =================== =============================================================
+    <x>                 Syntax and meaning of *x* is explained elsewhere.
+    =================== =============================================================================
 
 .. _amdgpu_syn_operands:
 
 Operands
 ========
 
-TBD
+.. _amdgpu_synid_v:
 
-.. _amdgpu_syn_modifiers:
+v
+-
 
-Modifiers
-=========
+Vector registers. There are 256 32-bit vector registers.
 
-DS Modifiers
-------------
-
-.. _amdgpu_synid_ds_offset8:
-
-ds_offset8
-~~~~~~~~~~
-
-Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0.
-
-Used with DS instructions which have 2 addresses.
-
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    offset:{0..0xFF}                         Specifies a 8-bit offset.
-    ======================================== ================================================
-
-.. _amdgpu_synid_ds_offset16:
-
-ds_offset16
-~~~~~~~~~~~
-
-Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0.
-
-Used with DS instructions which have 1 address.
-
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    offset:{0..0xFFFF}                       Specifies a 16-bit offset.
-    ======================================== ================================================
-
-.. _amdgpu_synid_sw_offset16:
-
-sw_offset16
-~~~~~~~~~~~
-
-This is a special modifier which may be used with *ds_swizzle_b32* instruction only.
-Specifies a sizzle pattern in numeric or symbolic form. The default value is 0.
-
-See AMD documentation for more information.
-
-    ======================================================= ===================================================
-    Syntax                                                  Description
-    ======================================================= ===================================================
-    offset:{0..0xFFFF}                                      Specifies a 16-bit swizzle pattern
-                                                            in a numeric form.
-    offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3})   Specifies a quad permute mode pattern; each
-                                                            number is a lane id.
-    offset:swizzle(BITMASK_PERM, "<mask>")                  Specifies a bitmask permute mode pattern
-                                                            which converts a 5-bit lane id to another
-                                                            lane id with which the lane interacts.
-
-                                                            <mask> is a 5 character sequence which
-                                                            specifies how to transform the bits of the
-                                                            lane id. The following characters are allowed:
-
-                                                              * "0" - set bit to 0.
+A sequence of *vector* registers may be used to operate with more than 32 bits of data.
 
-                                                              * "1" - set bit to 1.
+Assembler currently supports sequences of 1, 2, 3, 4, 8 and 16 *vector* registers.
 
-                                                              * "p" - preserve bit.
+    =================================================== ====================================================================
+    Syntax                                              Description
+    =================================================== ====================================================================
+    **v**\<N>                                           A single 32-bit *vector* register.
 
-                                                              * "i" - inverse bit.
+                                                        *N* must be a decimal integer number.
+    **v[**\ <N>\ **]**                                  A single 32-bit *vector* register.
 
-    offset:swizzle(BROADCAST,{2..32},{0..N})                Specifies a broadcast mode.
-                                                            Broadcasts the value of any particular lane to
-                                                            all lanes in its group.
+                                                        *N* may be specified as an
+                                                        :ref:`integer number<amdgpu_synid_integer_number>`
+                                                        or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
+    **v[**\ <N>:<K>\ **]**                              A sequence of (\ *K-N+1*\ ) *vector* registers.
 
-                                                            The first numeric parameter is a group
-                                                            size and must be equal to 2, 4, 8, 16 or 32.
+                                                        *N* and *K* may be specified as
+                                                        :ref:`integer numbers<amdgpu_synid_integer_number>`
+                                                        or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
+    **[v**\ <N>, \ **v**\ <N+1>, ... **v**\ <K>\ **]**  A sequence of (\ *K-N+1*\ ) *vector* registers.
 
-                                                            The second numeric parameter is an index of the
-                                                            lane being broadcasted. The index must not exceed
-                                                            group size.
-    offset:swizzle(SWAP,{1..16})                            Specifies a swap mode.
-                                                            Swaps the neighboring groups of
-                                                            1, 2, 4, 8 or 16 lanes.
-    offset:swizzle(REVERSE,{2..32})                         Specifies a reverse mode. Reverses
-                                                            the lanes for groups of 2, 4, 8, 16 or 32 lanes.
-    ======================================================= ===================================================
+                                                        Register indices must be specified as decimal integer numbers.
+    =================================================== ====================================================================
 
-.. _amdgpu_synid_gds:
+Note. *N* and *K* must satisfy the following conditions:
 
-gds
-~~~
+* *N* <= *K*.
+* 0 <= *N* <= 255.
+* 0 <= *K* <= 255.
+* *K-N+1* must be equal to 1, 2, 3, 4, 8 or 16.
 
-Specifies whether to use GDS or LDS memory (LDS is the default).
+Examples:
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    gds                                      Use GDS memory.
-    ======================================== ================================================
+.. code-block:: nasm
 
+  v255
+  v[0]
+  v[0:1]
+  v[1:1]
+  v[0:3]
+  v[2*2]
+  v[1-1:2-1]
+  [v252]
+  [v252,v253,v254,v255]
 
-EXP Modifiers
--------------
+.. _amdgpu_synid_s:
 
-.. _amdgpu_synid_done:
+s
+-
 
-done
-~~~~
+Scalar 32-bit registers. The number of available *scalar* registers depends on GPU:
 
-Specifies if this is the last export from the shader to the target. By default, current
-instruction does not finish an export sequence.
+    ======= ============================
+    GPU     Number of *scalar* registers
+    ======= ============================
+    GFX7    104
+    GFX8    102
+    GFX9    102
+    ======= ============================
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    done                                     Indicates the last export operation.
-    ======================================== ================================================
+A sequence of *scalar* registers may be used to operate with more than 32 bits of data.
+Assembler currently supports sequences of 1, 2, 4, 8 and 16 *scalar* registers.
 
-.. _amdgpu_synid_compr:
+Pairs of *scalar* registers must be even-aligned (the first register must be even).
+Sequences of 4 and more *scalar* registers must be quad-aligned.
 
-compr
-~~~~~
+    ======================================================== ====================================================================
+    Syntax                                                   Description
+    ======================================================== ====================================================================
+    **s**\ <N>                                               A single 32-bit *scalar* register.
 
-Indicates if the data are compressed (not compressed by default).
+                                                             *N* must be a decimal integer number.
+    **s[**\ <N>\ **]**                                       A single 32-bit *scalar* register.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    compr                                    Data are compressed.
-    ======================================== ================================================
+                                                             *N* may be specified as an
+                                                             :ref:`integer number<amdgpu_synid_integer_number>`
+                                                             or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
+    **s[**\ <N>:<K>\ **]**                                   A sequence of (\ *K-N+1*\ ) *scalar* registers.
 
-.. _amdgpu_synid_vm:
+                                                             *N* and *K* may be specified as
+                                                             :ref:`integer numbers<amdgpu_synid_integer_number>`
+                                                             or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
+    **[s**\ <N>, \ **s**\ <N+1>, ... **s**\ <K>\ **]**       A sequence of (\ *K-N+1*\ ) *scalar* registers.
 
-vm
-~~
+                                                             Register indices must be specified as decimal integer numbers.
+    ======================================================== ====================================================================
 
-Specifies valid mask flag state (off by default).
+Note. *N* and *K* must satisfy the following conditions:
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    vm                                       Set valid mask flag.
-    ======================================== ================================================
+* *N* must be properly aligned based on sequence size.
+* *N* <= *K*.
+* 0 <= *N* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
+* 0 <= *K* < *SMAX*\ , where *SMAX* is the number of available *scalar* registers.
+* *K-N+1* must be equal to 1, 2, 4, 8 or 16.
 
-FLAT Modifiers
---------------
+Examples:
 
-.. _amdgpu_synid_flat_offset12:
+.. code-block:: nasm
 
-flat_offset12
-~~~~~~~~~~~~~
+  s0
+  s[0]
+  s[0:1]
+  s[1:1]
+  s[0:3]
+  s[2*2]
+  s[1-1:2-1]
+  [s4]
+  [s4,s5,s6,s7]
 
-Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
+Examples of *scalar* registers with an invalid alignment:
 
-Cannot be used with *global/scratch* opcodes. GFX9 only.
+.. code-block:: nasm
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    offset:{0..4095}                         Specifies a 12-bit unsigned offset.
-    ======================================== ================================================
+  s[1:2]
+  s[2:5]
 
-.. _amdgpu_synid_flat_offset13:
+.. _amdgpu_synid_trap:
 
-flat_offset13
-~~~~~~~~~~~~~
+trap
+----
 
-Specifies an immediate signed 13-bit offset, in bytes. The default value is 0.
+A set of trap handler registers:
 
-Can be used with *global/scratch* opcodes only. GFX9 only.
+* :ref:`ttmp<amdgpu_synid_ttmp>`
+* :ref:`tba<amdgpu_synid_tba>`
+* :ref:`tma<amdgpu_synid_tma>`
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    offset:{-4096..+4095}                    Specifies a 13-bit signed offset.
-    ======================================== ================================================
+.. _amdgpu_synid_ttmp:
 
-glc
-~~~
+ttmp
+----
 
-See a description :ref:`here<amdgpu_synid_glc>`.
+Trap handler temporary scalar registers, 32-bits wide.
+The number of available *ttmp* registers depends on GPU:
 
-slc
-~~~
+    ======= ===========================
+    GPU     Number of *ttmp* registers
+    ======= ===========================
+    GFX7    12
+    GFX8    12
+    GFX9    16
+    ======= ===========================
 
-See a description :ref:`here<amdgpu_synid_slc>`.
+A sequence of *ttmp* registers may be used to operate with more than 32 bits of data.
+Assembler currently supports sequences of 1, 2, 4, 8 and 16 *ttmp* registers.
 
-tfe
-~~~
+Pairs of *ttmp* registers must be even-aligned (the first register must be even).
+Sequences of 4 and more *ttmp* registers must be quad-aligned.
 
-See a description :ref:`here<amdgpu_synid_tfe>`.
+    ============================================================= ====================================================================
+    Syntax                                                        Description
+    ============================================================= ====================================================================
+    **ttmp**\ <N>                                                 A single 32-bit *ttmp* register.
 
-nv
-~~
+                                                                  *N* must be a decimal integer number.
+    **ttmp[**\ <N>\ **]**                                         A single 32-bit *ttmp* register.
 
-See a description :ref:`here<amdgpu_synid_nv>`.
+                                                                  *N* may be specified as an
+                                                                  :ref:`integer number<amdgpu_synid_integer_number>`
+                                                                  or an :ref:`absolute expression<amdgpu_synid_absolute_expression>`.
+    **ttmp[**\ <N>:<K>\ **]**                                     A sequence of (\ *K-N+1*\ ) *ttmp* registers.
 
-MIMG Modifiers
---------------
+                                                                  *N* and *K* may be specified as
+                                                                  :ref:`integer numbers<amdgpu_synid_integer_number>`
+                                                                  or :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
+    **[ttmp**\ <N>, \ **ttmp**\ <N+1>, ... **ttmp**\ <K>\ **]**   A sequence of (\ *K-N+1*\ ) *ttmp* registers.
 
-.. _amdgpu_synid_dmask:
+                                                                  Register indices must be specified as decimal integer numbers.
+    ============================================================= ====================================================================
 
-dmask
-~~~~~
+Note. *N* and *K* must satisfy the following conditions:
 
-Specifies which channels (image components) are used by the operation. By default, no channels
-are used.
+* *N* must be properly aligned based on sequence size.
+* *N* <= *K*.
+* 0 <= *N* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
+* 0 <= *K* < *TMAX*, where *TMAX* is the number of available *ttmp* registers.
+* *K-N+1* must be equal to 1, 2, 4, 8 or 16.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    dmask:{0..15}                            Each bit corresponds to one of 4 image
-                                             components (RGBA). If the specified bit value
-                                             is 0, the component is not used, value 1 means
-                                             that the component is used.
-    ======================================== ================================================
+Examples:
 
-This modifier has some limitations depending on instruction kind:
+.. code-block:: nasm
 
-    ======================================== ================================================
-    Instruction Kind                         Valid dmask Values
-    ======================================== ================================================
-    32-bit atomic cmpswap                    0x3
-    other 32-bit atomic instructions         0x1
-    64-bit atomic cmpswap                    0xF
-    other 64-bit atomic instructions         0x3
-    GATHER4                                  0x1, 0x2, 0x4, 0x8
-    Other instructions                       any value
-    ======================================== ================================================
+  ttmp0
+  ttmp[0]
+  ttmp[0:1]
+  ttmp[1:1]
+  ttmp[0:3]
+  ttmp[2*2]
+  ttmp[1-1:2-1]
+  [ttmp4]
+  [ttmp4,ttmp5,ttmp6,ttmp7]
 
-.. _amdgpu_synid_unorm:
+Examples of *ttmp* registers with an invalid alignment:
 
-unorm
-~~~~~
+.. code-block:: nasm
 
-Specifies whether address is normalized or not (normalized by default).
+  ttmp[1:2]
+  ttmp[2:5]
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    unorm                                    Force address to be un-normalized.
-    ======================================== ================================================
+.. _amdgpu_synid_tba:
 
-glc
-~~~
+tba
+---
 
-See a description :ref:`here<amdgpu_synid_glc>`.
+Trap base address, 64-bits wide. Holds the pointer to the current trap handler program.
 
-slc
-~~~
+    ================== ======================================================================= =============
+    Syntax             Description                                                             Availability
+    ================== ======================================================================= =============
+    tba                64-bit *trap base address* register.                                    GFX7, GFX8
+    [tba]              64-bit *trap base address* register (an alternative syntax).            GFX7, GFX8
+    [tba_lo,tba_hi]    64-bit *trap base address* register (an alternative syntax).            GFX7, GFX8
+    ================== ======================================================================= =============
 
-See a description :ref:`here<amdgpu_synid_slc>`.
+High and low 32 bits of *trap base address* may be accessed as separate registers:
 
-.. _amdgpu_synid_r128:
+    ================== ======================================================================= =============
+    Syntax             Description                                                             Availability
+    ================== ======================================================================= =============
+    tba_lo             Low 32 bits of *trap base address* register.                            GFX7, GFX8
+    tba_hi             High 32 bits of *trap base address* register.                           GFX7, GFX8
+    [tba_lo]           Low 32 bits of *trap base address* register (an alternative syntax).    GFX7, GFX8
+    [tba_hi]           High 32 bits of *trap base address* register (an alternative syntax).   GFX7, GFX8
+    ================== ======================================================================= =============
 
-r128
-~~~~
+Note that *tba*, *tba_lo* and *tba_hi* are not accessible as assembler registers in GFX9,
+but *tba* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
 
-Specifies texture resource size. The default size is 256 bits.
+.. _amdgpu_synid_tma:
 
-GFX7 and GFX8 only.
+tma
+---
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    r128                                     Specifies 128 bits texture resource size.
-    ======================================== ================================================
+Trap memory address, 64-bits wide.
 
-tfe
-~~~
+    ================= ======================================================================= ==================
+    Syntax            Description                                                             Availability
+    ================= ======================================================================= ==================
+    tma               64-bit *trap memory address* register.                                  GFX7, GFX8
+    [tma]             64-bit *trap memory address* register (an alternative syntax).          GFX7, GFX8
+    [tma_lo,tma_hi]   64-bit *trap memory address* register (an alternative syntax).          GFX7, GFX8
+    ================= ======================================================================= ==================
 
-See a description :ref:`here<amdgpu_synid_tfe>`.
+High and low 32 bits of *trap memory address* may be accessed as separate registers:
 
-.. _amdgpu_synid_lwe:
+    ================= ======================================================================= ==================
+    Syntax            Description                                                             Availability
+    ================= ======================================================================= ==================
+    tma_lo            Low 32 bits of *trap memory address* register.                          GFX7, GFX8
+    tma_hi            High 32 bits of *trap memory address* register.                         GFX7, GFX8
+    [tma_lo]          Low 32 bits of *trap memory address* register (an alternative syntax).  GFX7, GFX8
+    [tma_hi]          High 32 bits of *trap memory address* register (an alternative syntax). GFX7, GFX8
+    ================= ======================================================================= ==================
 
-lwe
-~~~
+Note that *tma*, *tma_lo* and *tma_hi* are not accessible as assembler registers in GFX9,
+but *tma* is readable/writable with the help of *s_get_reg* and *s_set_reg* instructions.
 
-Specifies LOD warning status (LOD warning is disabled by default).
+.. _amdgpu_synid_flat_scratch:
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    lwe                                      Enables LOD warning.
-    ======================================== ================================================
-
-.. _amdgpu_synid_da:
-
-da
-~~
-
-Specifies if an array index must be sent to TA. By default, array index is not sent.
-
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    da                                       Send an array-index to TA.
-    ======================================== ================================================
+flat_scratch
+------------
 
-.. _amdgpu_synid_d16:
+Flat scratch address, 64-bits wide. Holds the base address of scratch memory.
 
-d16
-~~~
+    ================================== ================================================================
+    Syntax                             Description
+    ================================== ================================================================
+    flat_scratch                       64-bit *flat scratch* address register.
+    [flat_scratch]                     64-bit *flat scratch* address register (an alternative syntax).
+    [flat_scratch_lo,flat_scratch_hi]  64-bit *flat scratch* address register (an alternative syntax).
+    ================================== ================================================================
+
+High and low 32 bits of *flat scratch* address may be accessed as separate registers:
+
+    ========================= =========================================================================
+    Syntax                    Description
+    ========================= =========================================================================
+    flat_scratch_lo           Low 32 bits of *flat scratch* address register.
+    flat_scratch_hi           High 32 bits of *flat scratch* address register.
+    [flat_scratch_lo]         Low 32 bits of *flat scratch* address register (an alternative syntax).
+    [flat_scratch_hi]         High 32 bits of *flat scratch* address register (an alternative syntax).
+    ========================= =========================================================================
+
+.. _amdgpu_synid_xnack:
+
+xnack
+-----
+
+Xnack mask, 64-bits wide. Holds a 64-bit mask of which threads
+received an *XNACK* due to a vector memory operation.
 
-Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
+.. WARNING:: GFX7 does not support *xnack* feature. Not all GFX8 and GFX9 :ref:`processors<amdgpu-processors>` support *xnack* feature.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    d16                                      Enables 16-bits data mode.
+\
 
-                                             On loads, convert data in memory to 16-bit
-                                             format before storing it in VGPRs.
+    ============================== =====================================================
+    Syntax                         Description
+    ============================== =====================================================
+    xnack_mask                     64-bit *xnack mask* register.
+    [xnack_mask]                   64-bit *xnack mask* register (an alternative syntax).
+    [xnack_mask_lo,xnack_mask_hi]  64-bit *xnack mask* register (an alternative syntax).
+    ============================== =====================================================
 
-                                             For stores, convert 16-bit data in VGPRs to
-                                             32 bits before going to memory.
+High and low 32 bits of *xnack mask* may be accessed as separate registers:
 
-                                             Note that 16-bit data are stored in VGPRs
-                                             unpacked in GFX8.0. In GFX8.1 and GFX9 16-bit
-                                             data are packed.
-    ======================================== ================================================
+    ===================== ==============================================================
+    Syntax                Description
+    ===================== ==============================================================
+    xnack_mask_lo         Low 32 bits of *xnack mask* register.
+    xnack_mask_hi         High 32 bits of *xnack mask* register.
+    [xnack_mask_lo]       Low 32 bits of *xnack mask* register (an alternative syntax).
+    [xnack_mask_hi]       High 32 bits of *xnack mask* register (an alternative syntax).
+    ===================== ==============================================================
 
-.. _amdgpu_synid_a16:
+.. _amdgpu_synid_vcc:
 
-a16
-~~~
+vcc
+---
 
-Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only.
+Vector condition code, 64-bits wide. A bit mask with one bit per thread;
+it holds the result of a vector compare operation.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    a16                                      Enables 16-bits image address components.
-    ======================================== ================================================
+    ================ =========================================================================
+    Syntax           Description
+    ================ =========================================================================
+    vcc              64-bit *vector condition code* register.
+    [vcc]            64-bit *vector condition code* register (an alternative syntax).
+    [vcc_lo,vcc_hi]  64-bit *vector condition code* register (an alternative syntax).
+    ================ =========================================================================
 
-Miscellaneous Modifiers
------------------------
+High and low 32 bits of *vector condition code* may be accessed as separate registers:
 
-.. _amdgpu_synid_glc:
+    ================ =========================================================================
+    Syntax           Description
+    ================ =========================================================================
+    vcc_lo           Low 32 bits of *vector condition code* register.
+    vcc_hi           High 32 bits of *vector condition code* register.
+    [vcc_lo]         Low 32 bits of *vector condition code* register (an alternative syntax).
+    [vcc_hi]         High 32 bits of *vector condition code* register (an alternative syntax).
+    ================ =========================================================================
 
-glc
-~~~
+.. _amdgpu_synid_m0:
 
-This modifier has different meaning for loads, stores, and atomic operations.
-The default value is off (0).
+m0
+--
 
-See AMD documentation for details.
+A 32-bit memory register. It has various uses,
+including register indexing and bounds checking.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    glc                                      Set glc bit to 1.
-    ======================================== ================================================
+    =========== ===================================================
+    Syntax      Description
+    =========== ===================================================
+    m0          A 32-bit *memory* register.
+    [m0]        A 32-bit *memory* register (an alternative syntax).
+    =========== ===================================================
 
-.. _amdgpu_synid_slc:
+.. _amdgpu_synid_exec:
 
-slc
-~~~
+exec
+----
 
-Specifies cache policy. The default value is off (0).
+Execute mask, 64-bits wide. A bit mask with one bit per thread,
+which is applied to vector instructions and controls which threads execute
+and which ignore the instruction.
 
-See AMD documentation for details.
+    ===================== =================================================================
+    Syntax                Description
+    ===================== =================================================================
+    exec                  64-bit *execute mask* register.
+    [exec]                64-bit *execute mask* register (an alternative syntax).
+    [exec_lo,exec_hi]     64-bit *execute mask* register (an alternative syntax).
+    ===================== =================================================================
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    slc                                      Set slc bit to 1.
-    ======================================== ================================================
+High and low 32 bits of *execute mask* may be accessed as separate registers:
 
-.. _amdgpu_synid_tfe:
+    ===================== =================================================================
+    Syntax                Description
+    ===================== =================================================================
+    exec_lo               Low 32 bits of *execute mask* register.
+    exec_hi               High 32 bits of *execute mask* register.
+    [exec_lo]             Low 32 bits of *execute mask* register (an alternative syntax).
+    [exec_hi]             High 32 bits of *execute mask* register (an alternative syntax).
+    ===================== =================================================================
 
-tfe
-~~~
+.. _amdgpu_synid_vccz:
 
-Controls access to partially resident textures. The default value is off (0).
+vccz
+----
 
-See AMD documentation for details.
+A single bit-flag indicating that the :ref:`vcc<amdgpu_synid_vcc>` is all zeros.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    tfe                                      Set tfe bit to 1.
-    ======================================== ================================================
+.. WARNING:: This operand is not currently supported by AMDGPU assembler.
 
-.. _amdgpu_synid_nv:
+.. _amdgpu_synid_execz:
 
-nv
-~~
+execz
+-----
 
-Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
+A single bit flag indicating that the :ref:`exec<amdgpu_synid_exec>` is all zeros.
 
-GFX9 only.
+.. WARNING:: This operand is not currently supported by AMDGPU assembler.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    nv                                       Indicates that instruction operates on
-                                             non-volatile memory.
-    ======================================== ================================================
+.. _amdgpu_synid_scc:
 
-MUBUF/MTBUF Modifiers
----------------------
+scc
+---
 
-.. _amdgpu_synid_idxen:
+A single bit flag indicating the result of a scalar compare operation.
 
-idxen
-~~~~~
+.. WARNING:: This operand is not currently supported by AMDGPU assembler.
 
-Specifies whether address components include an index. By default, no components are used.
+.. _amdgpu_synid_ldsdirect:
 
-Can be used together with :ref:`offen<amdgpu_synid_offen>`.
+lds_direct
+----------
 
-Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
+A special operand which supplies a 32-bit value
+fetched from *LDS* memory using :ref:`m0<amdgpu_synid_m0>` as an address.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    idxen                                    Address components include an index.
-    ======================================== ================================================
+.. WARNING:: This operand is not currently supported by AMDGPU assembler.
 
-.. _amdgpu_synid_offen:
+.. _amdgpu_synid_constant:
 
-offen
-~~~~~
+constant
+--------
 
-Specifies whether address components include an offset. By default, no components are used.
+A set of integer and floating-point *inline constants*:
 
-Can be used together with :ref:`idxen<amdgpu_synid_idxen>`.
+* :ref:`iconst<amdgpu_synid_iconst>`
+* :ref:`fconst<amdgpu_synid_fconst>`
 
-Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
+These operands are encoded as a part of instruction.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    offen                                    Address components include an offset.
-    ======================================== ================================================
+If a number may be encoded as either
+a :ref:`literal<amdgpu_synid_literal>` or 
+an :ref:`inline constant<amdgpu_synid_constant>`,
+assembler selects the latter encoding as more efficient.
 
-.. _amdgpu_synid_addr64:
+.. _amdgpu_synid_iconst:
 
-addr64
-~~~~~~
+iconst
+------
 
-Specifies whether a 64-bit address is used. By default, no address is used.
+An :ref:`integer number<amdgpu_synid_integer_number>`
+encoded as an *inline constant*.
 
-GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and
-:ref:`idxen<amdgpu_synid_idxen>` modifiers.
+Only a small fraction of integer numbers may be encoded as *inline constants*.
+They are enumerated in the table below.
+Other integer numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    addr64                                   A 64-bit address is used.
-    ======================================== ================================================
+Integer *inline constants* are converted to
+:ref:`expected operand type<amdgpu_syn_instruction_type>`
+as described :ref:`here<amdgpu_synid_int_const_conv>`.
 
-.. _amdgpu_synid_buf_offset12:
+    ================================== ====================================
+    Value                              Note
+    ================================== ====================================
+    {0..64}                            Positive integer inline constants.
+    {-16..-1}                          Negative integer inline constants.
+    ================================== ====================================
 
-buf_offset12
-~~~~~~~~~~~~
+.. WARNING:: GFX7 does not support inline constants for *f16* operands.
 
-Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
+There are also symbolic inline constants which provide read-only access to H/W registers.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    offset:{0..0xFFF}                        Specifies a 12-bit unsigned offset.
-    ======================================== ================================================
+.. WARNING:: These inline constants are not currently supported by AMDGPU assembler.
 
-glc
-~~~
+\
 
-See a description :ref:`here<amdgpu_synid_glc>`.
+    ======================== ================================================ =============
+    Syntax                   Note                                             Availability
+    ======================== ================================================ =============
+    shared_base              Base address of shared memory region.            GFX9
+    shared_limit             Address of the end of shared memory region.      GFX9
+    private_base             Base address of private memory region.           GFX9
+    private_limit            Address of the end of private memory region.     GFX9
+    pops_exiting_wave_id     A dedicated counter for POPS.                    GFX9
+    ======================== ================================================ =============
 
-slc
-~~~
+.. _amdgpu_synid_fconst:
 
-See a description :ref:`here<amdgpu_synid_slc>`.
+fconst
+------
 
-.. _amdgpu_synid_lds:
+A :ref:`floating-point number<amdgpu_synid_floating-point_number>`
+encoded as an *inline constant*.
 
-lds
-~~~
+Only a small fraction of floating-point numbers may be encoded as *inline constants*.
+They are enumerated in the table below.
+Other floating-point numbers have to be encoded as :ref:`literals<amdgpu_synid_literal>`.
 
-Specifies where to store the result: VGPRs or LDS (VGPRs by default).
+Floating-point *inline constants* are converted to
+:ref:`expected operand type<amdgpu_syn_instruction_type>`
+as described :ref:`here<amdgpu_synid_fp_const_conv>`.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    lds                                      Store result in LDS.
-    ======================================== ================================================
+    ================================== ===================================================== ==================
+    Value                              Note                                                  Availability
+    ================================== ===================================================== ==================
+    0.0                                The same as integer constant 0.                       All GPUs
+    0.5                                Floating-point constant 0.5                           All GPUs
+    1.0                                Floating-point constant 1.0                           All GPUs
+    2.0                                Floating-point constant 2.0                           All GPUs
+    4.0                                Floating-point constant 4.0                           All GPUs
+    -0.5                               Floating-point constant -0.5                          All GPUs
+    -1.0                               Floating-point constant -1.0                          All GPUs
+    -2.0                               Floating-point constant -2.0                          All GPUs
+    -4.0                               Floating-point constant -4.0                          All GPUs
+    0.1592                             1.0/(2.0*pi). Use only for 16-bit operands.           GFX8, GFX9
+    0.15915494                         1.0/(2.0*pi). Use only for 16- and 32-bit operands.   GFX8, GFX9
+    0.159154943091895317852646485335   1.0/(2.0*pi).                                         GFX8, GFX9
+    ================================== ===================================================== ==================
 
-tfe
-~~~
+.. WARNING:: GFX7 does not support inline constants for *f16* operands.
 
-See a description :ref:`here<amdgpu_synid_tfe>`.
+.. _amdgpu_synid_literal:
 
-.. _amdgpu_synid_dfmt:
+literal
+-------
 
-dfmt
-~~~~
+A literal is a 64-bit value which is encoded as a separate 32-bit dword in the instruction stream.
 
-TBD
+If a number may be encoded as either
+a :ref:`literal<amdgpu_synid_literal>` or 
+an :ref:`inline constant<amdgpu_synid_constant>`,
+assembler selects the latter encoding as more efficient.
 
-.. _amdgpu_synid_nfmt:
+Literals may be specified as :ref:`integer numbers<amdgpu_synid_integer_number>`,
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or
+:ref:`expressions<amdgpu_synid_expression>`
+(expressions are currently supported for 32-bit operands only).
 
-nfmt
-~~~~
+A 64-bit literal value is converted by assembler
+to an :ref:`expected operand type<amdgpu_syn_instruction_type>`
+as described :ref:`here<amdgpu_synid_lit_conv>`.
 
-TBD
+An instruction may use only one literal but several operands may refer the same literal.
 
-SMRD/SMEM Modifiers
--------------------
+.. _amdgpu_synid_uimm8:
 
-glc
-~~~
+uimm8
+-----
 
-See a description :ref:`here<amdgpu_synid_glc>`.
+A 8-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
+The value is encoded as part of the opcode so it is free to use.
 
-nv
-~~
+.. _amdgpu_synid_uimm32:
 
-See a description :ref:`here<amdgpu_synid_nv>`.
+uimm32
+------
 
-VINTRP Modifiers
-----------------
+A 32-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
+The value is stored as a separate 32-bit dword in the instruction stream.
 
-.. _amdgpu_synid_high:
+.. _amdgpu_synid_uimm20:
 
-high
-~~~~
+uimm20
+------
 
-Specifies which half of the LDS word to use. Low half of LDS word is used by default.
-GFX9 only.
+A 20-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    high                                     Use high half of LDS word.
-    ======================================== ================================================
+.. _amdgpu_synid_uimm21:
 
-VOP1/VOP2 DPP Modifiers
------------------------
+uimm21
+------
 
-GFX8 and GFX9 only.
+A 21-bit positive :ref:`integer number<amdgpu_synid_integer_number>`.
 
-.. _amdgpu_synid_dpp_ctrl:
+.. WARNING:: Assembler currently supports 20-bit offsets only. Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
 
-dpp_ctrl
-~~~~~~~~
+.. _amdgpu_synid_simm21:
 
-Specifies how data are shared between threads. This is a mandatory modifier.
-There is no default value.
+simm21
+------
 
-Note. The lanes of a wavefront are organized in four banks and four rows.
+A 21-bit :ref:`integer number<amdgpu_synid_integer_number>`.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    quad_perm:[{0..3},{0..3},{0..3},{0..3}]  Full permute of 4 threads.
-    row_mirror                               Mirror threads within row.
-    row_half_mirror                          Mirror threads within 1/2 row (8 threads).
-    row_bcast:15                             Broadcast 15th thread of each row to next row.
-    row_bcast:31                             Broadcast thread 31 to rows 2 and 3.
-    wave_shl:1                               Wavefront left shift by 1 thread.
-    wave_rol:1                               Wavefront left rotate by 1 thread.
-    wave_shr:1                               Wavefront right shift by 1 thread.
-    wave_ror:1                               Wavefront right rotate by 1 thread.
-    row_shl:{1..15}                          Row shift left by 1-15 threads.
-    row_shr:{1..15}                          Row shift right by 1-15 threads.
-    row_ror:{1..15}                          Row rotate right by 1-15 threads.
-    ======================================== ================================================
+.. WARNING:: Assembler currently supports 20-bit unsigned offsets only .Use :ref:`uimm20<amdgpu_synid_uimm20>` as a replacement.
 
-.. _amdgpu_synid_row_mask:
+.. _amdgpu_synid_off:
 
-row_mask
-~~~~~~~~
+off
+---
 
-Controls which rows are enabled for data sharing. By default, all rows are enabled.
+A special entity which indicates that the value of this operand is not used.
 
-Note. The lanes of a wavefront are organized in four banks and four rows.
+    ================================== ===================================================
+    Syntax                             Description
+    ================================== ===================================================
+    off                                Indicates an unused operand.
+    ================================== ===================================================
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    row_mask:{0..15}                         Each of 4 bits in the mask controls one
-                                             row (0 - disabled, 1 - enabled).
-    ======================================== ================================================
 
-.. _amdgpu_synid_bank_mask:
+.. _amdgpu_synid_number:
 
-bank_mask
-~~~~~~~~~
+Numbers
+=======
 
-Controls which banks are enabled for data sharing. By default, all banks are enabled.
+.. _amdgpu_synid_integer_number:
 
-Note. The lanes of a wavefront are organized in four banks and four rows.
+Integer Numbers
+---------------
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    bank_mask:{0..15}                        Each of 4 bits in the mask controls one
-                                             bank (0 - disabled, 1 - enabled).
-    ======================================== ================================================
+Integer numbers are 64 bits wide.
+They may be specified in binary, octal, hexadecimal and decimal formats:
 
-.. _amdgpu_synid_bound_ctrl:
+    ============== ====================================
+    Format         Syntax
+    ============== ====================================
+    Decimal        [-]?[1-9][0-9]*
+    Binary         [-]?0b[01]+
+    Octal          [-]?0[0-7]+
+    Hexadecimal    [-]?0x[0-9a-fA-F]+
+    \              [-]?[0x]?[0-9][0-9a-fA-F]*[hH]
+    ============== ====================================
 
-bound_ctrl
-~~~~~~~~~~
+Examples:
 
-Controls data sharing when accessing an invalid lane. By default, data sharing with
-invalid lanes is disabled.
+.. code-block:: nasm
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    bound_ctrl:0                             Enables data sharing with invalid lanes.
-                                             Accessing data from an invalid lane will
-                                             return zero.
-    ======================================== ================================================
+  -1234
+  0b1010
+  010
+  0xff
+  0ffh
 
-VOP1/VOP2/VOPC SDWA Modifiers
------------------------------
+.. _amdgpu_synid_floating-point_number:
 
-GFX8 and GFX9 only.
+Floating-Point Numbers
+----------------------
 
-clamp
-~~~~~
+All floating-point numbers are handled as double (64 bits wide).
 
-See a description :ref:`here<amdgpu_synid_clamp>`.
+Floating-point numbers may be specified in hexadecimal and decimal formats:
 
-omod
-~~~~
+    ============== ======================================================== ========================================================
+    Format         Syntax                                                   Note
+    ============== ======================================================== ========================================================
+    Decimal        [-]?[0-9]*[.][0-9]*([eE][+-]?[0-9]*)?                    Must include either a decimal separator or an exponent.
+    Hexadecimal    [-]0x[0-9a-fA-F]*(.[0-9a-fA-F]*)?[pP][+-]?[0-9a-fA-F]+
+    ============== ======================================================== ========================================================
 
-See a description :ref:`here<amdgpu_synid_omod>`.
+Examples:
 
-GFX9 only.
+.. code-block:: nasm
 
-.. _amdgpu_synid_dst_sel:
+ -1.234
+ 234e2
+ -0x1afp-10
+ 0x.1afp10
 
-dst_sel
-~~~~~~~
+.. _amdgpu_synid_expression:
 
-Selects which bits in the destination are affected. By default, all bits are affected.
+Expressions
+===========
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    dst_sel:DWORD                            Use bits 31:0.
-    dst_sel:BYTE_0                           Use bits 7:0.
-    dst_sel:BYTE_1                           Use bits 15:8.
-    dst_sel:BYTE_2                           Use bits 23:16.
-    dst_sel:BYTE_3                           Use bits 31:24.
-    dst_sel:WORD_0                           Use bits 15:0.
-    dst_sel:WORD_1                           Use bits 31:16.
-    ======================================== ================================================
+An expression specifies an address or a numeric value.
+There are two kinds of expressions:
 
+* :ref:`Absolute<amdgpu_synid_absolute_expression>`.
+* :ref:`Relocatable<amdgpu_synid_relocatable_expression>`.
 
-.. _amdgpu_synid_dst_unused:
+.. _amdgpu_synid_absolute_expression:
 
-dst_unused
-~~~~~~~~~~
+Absolute Expressions
+--------------------
 
-Controls what to do with the bits in the destination which are not selected
-by :ref:`dst_sel<amdgpu_synid_dst_sel>`.
-By default, unused bits are preserved.
+The value of an absolute expression remains the same after program relocation.
+Absolute expressions must not include unassigned and relocatable values
+such as labels.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    dst_unused:UNUSED_PAD                    Pad with zeros.
-    dst_unused:UNUSED_SEXT                   Sign-extend upper bits, zero lower bits.
-    dst_unused:UNUSED_PRESERVE               Preserve bits.
-    ======================================== ================================================
+Examples:
 
-.. _amdgpu_synid_src0_sel:
+.. code-block:: nasm
 
-src0_sel
-~~~~~~~~
+    x = -1
+    y = x + 10
 
-Controls which bits in the src0 are used. By default, all bits are used.
+.. _amdgpu_synid_relocatable_expression:
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    src0_sel:DWORD                           Use bits 31:0.
-    src0_sel:BYTE_0                          Use bits 7:0.
-    src0_sel:BYTE_1                          Use bits 15:8.
-    src0_sel:BYTE_2                          Use bits 23:16.
-    src0_sel:BYTE_3                          Use bits 31:24.
-    src0_sel:WORD_0                          Use bits 15:0.
-    src0_sel:WORD_1                          Use bits 31:16.
-    ======================================== ================================================
+Relocatable Expressions
+-----------------------
 
-.. _amdgpu_synid_src1_sel:
+The value of a relocatable expression depends on program relocation.
 
-src1_sel
-~~~~~~~~
+Note that use of relocatable expressions is limited with branch targets
+and 32-bit :ref:`literals<amdgpu_synid_literal>`.
 
-Controls which bits in the src1 are used. By default, all bits are used.
+Addition information about relocation may be found :ref:`here<amdgpu-relocation-records>`.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    src1_sel:DWORD                           Use bits 31:0.
-    src1_sel:BYTE_0                          Use bits 7:0.
-    src1_sel:BYTE_1                          Use bits 15:8.
-    src1_sel:BYTE_2                          Use bits 23:16.
-    src1_sel:BYTE_3                          Use bits 31:24.
-    src1_sel:WORD_0                          Use bits 15:0.
-    src1_sel:WORD_1                          Use bits 31:16.
-    ======================================== ================================================
+Examples:
 
-VOP1/VOP2/VOPC SDWA Operand Modifiers
--------------------------------------
+.. code-block:: nasm
 
-Operand modifiers are not used separately. They are applied to source operands.
+    y = x + 10 // x is not yet defined. Undefined symbols are assumed to be PC-relative.
+    z = .
 
-GFX8 and GFX9 only.
+Expression Data Type
+--------------------
 
-abs
-~~~
+Expressions and operands of expressions are interpreted as 64-bit integers.
 
-See a description :ref:`here<amdgpu_synid_abs>`.
+Expressions may include 64-bit :ref:`floating-point numbers<amdgpu_synid_floating-point_number>` (double).
+However these operands are also handled as 64-bit integers
+using binary representation of specified floating-point numbers.
+No conversion from floating-point to integer is performed.
 
-neg
-~~~
+Examples:
 
-See a description :ref:`here<amdgpu_synid_neg>`.
+.. code-block:: nasm
 
-.. _amdgpu_synid_sext:
+    x = 0.1    // x is assigned an integer 4591870180066957722 which is a binary representation of 0.1.
+    y = x + x  // y is a sum of two integer values; it is not equal to 0.2!
 
-sext
-~~~~
+Syntax
+------
 
-Sign-extends value of a (sub-dword) operand to fill all 32 bits.
-Has no effect for 32-bit operands.
+Expressions are composed of
+:ref:`symbols<amdgpu_synid_symbol>`,
+:ref:`integer numbers<amdgpu_synid_integer_number>`,
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`,
+:ref:`binary operators<amdgpu_synid_expression_bin_op>`,
+:ref:`unary operators<amdgpu_synid_expression_un_op>` and subexpressions.
 
-Valid for integer operands only.
+Expressions may also use "." which is a reference to the current PC (program counter).
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    sext(<operand>)                          Sign-extend operand value.
-    ======================================== ================================================
+The syntax of expressions is shown below::
 
-VOP3 Modifiers
---------------
+    expr ::= expr binop expr | primaryexpr ;
 
-.. _amdgpu_synid_vop3_op_sel:
+    primaryexpr ::= '(' expr ')' | symbol | number | '.' | unop primaryexpr ;
 
-vop3_op_sel
-~~~~~~~~~~~
+    binop ::= '&&'
+            | '||'
+            | '|'
+            | '^'
+            | '&'
+            | '!'
+            | '=='
+            | '!='
+            | '<>'
+            | '<'
+            | '<='
+            | '>'
+            | '>='
+            | '<<'
+            | '>>'
+            | '+'
+            | '-'
+            | '*'
+            | '/'
+            | '%' ;
 
-Selects the low [15:0] or high [31:16] operand bits for source and destination operands.
-By default, low bits are used for all operands.
+    unop ::= '~'
+           | '+'
+           | '-'
+           | '!' ;
 
-The number of values specified with the op_sel modifier must match the number of instruction
-operands (both source and destination). First value controls src0, second value controls src1
-and so on, except that the last value controls destination.
-The value 0 selects the low bits, while 1 selects the high bits.
+.. _amdgpu_synid_expression_bin_op:
 
-Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
-by op_sel must be 0.
+Binary Operators
+----------------
 
-GFX9 only.
+Binary operators are described in the following table.
+They operate on and produce 64-bit integers.
+Operators with higher priority are performed first.
+
+    ========== ========= ===============================================
+    Operator   Priority  Meaning
+    ========== ========= ===============================================
+       \*         5      Integer multiplication.
+       /          5      Integer division.
+       %          5      Integer signed remainder.
+       \+         4      Integer addition.
+       \-         4      Integer subtraction.
+       <<         3      Integer shift left.
+       >>         3      Logical shift right.
+       ==         2      Equality comparison.
+       !=         2      Inequality comparison.
+       <>         2      Inequality comparison.
+       <          2      Signed less than comparison.
+       <=         2      Signed less than or equal comparison.
+       >          2      Signed greater than comparison.
+       >=         2      Signed greater than or equal comparison.
+      \|          1      Bitwise or.
+       ^          1      Bitwise xor.
+       &          1      Bitwise and.
+       &&         0      Logical and.
+       ||         0      Logical or.
+    ========== ========= ===============================================
+
+.. _amdgpu_synid_expression_un_op:
+
+Unary Operators
+---------------
 
-    ======================================== ============================================================
-    Syntax                                   Description
-    ======================================== ============================================================
-    op_sel:[{0..1},{0..1}]                   Select operand bits for instructions with 1 source operand.
-    op_sel:[{0..1},{0..1},{0..1}]            Select operand bits for instructions with 2 source operands.
-    op_sel:[{0..1},{0..1},{0..1},{0..1}]     Select operand bits for instructions with 3 source operands.
-    ======================================== ============================================================
+Unary operators are described in the following table.
+They operate on and produce 64-bit integers.
 
-.. _amdgpu_synid_clamp:
+    ========== ===============================================
+    Operator   Meaning
+    ========== ===============================================
+       !       Logical negation.
+       ~       Bitwise negation.
+       \+      Integer unary plus.
+       \-      Integer unary minus.
+    ========== ===============================================
 
-clamp
-~~~~~
+.. _amdgpu_synid_symbol:
 
-Clamp meaning depends on instruction.
+Symbols
+-------
 
-For *v_cmp* instructions, clamp modifier indicates that the compare signals
-if a floating point exception occurs. By default, signaling is disabled.
-Not supported by GFX7.
+A symbol is a named 64-bit value, representing a relocatable
+address or an absolute (non-relocatable) number.
 
-For integer operations, clamp modifier indicates that the result must be clamped
-to the largest and smallest representable value. By default, there is no clamping.
-Integer clamping is not supported by GFX7.
+Symbol names have the following syntax:
+    ``[a-zA-Z_.][a-zA-Z0-9_$.@]*``
 
-For floating point operations, clamp modifier indicates that the result must be clamped
-to the range [0.0, 1.0]. By default, there is no clamping.
+The table below provides several examples of syntax used for symbol definition.
 
-Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
+    ================ ==========================================================
+    Syntax           Meaning
+    ================ ==========================================================
+    .globl <S>       Declares a global symbol S without assigning it a value.
+    .set <S>, <E>    Assigns the value of an expression E to a symbol S.
+    <S> = <E>        Assigns the value of an expression E to a symbol S.
+    <S>:             Declares a label S and assigns it the current PC value.
+    ================ ==========================================================
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    clamp                                    Enables clamping (or signaling).
-    ======================================== ================================================
+A symbol may be used before it is declared or assigned;
+unassigned symbols are assumed to be PC-relative.
 
-.. _amdgpu_synid_omod:
+Addition information about symbols may be found :ref:`here<amdgpu-symbols>`.
 
-omod
-~~~~
+.. _amdgpu_synid_conv:
 
-Specifies if an output modifier must be applied to the result.
-By default, no output modifiers are applied.
+Conversions
+===========
 
-Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
+This section describes what happens when a 64-bit
+:ref:`integer number<amdgpu_synid_integer_number>`, a
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>` or a
+:ref:`symbol<amdgpu_synid_symbol>`
+is used for an operand which has a different type or size.
 
-Output modifiers are valid for f32 and f64 floating point results only.
-They must not be used with f16.
+Depending on operand kind, this conversion is performed by either assembler or AMDGPU H/W:
 
-Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result
-but accepts output modifiers.
+* Values encoded as :ref:`inline constants<amdgpu_synid_constant>` are handled by H/W.
+* Values encoded as :ref:`literals<amdgpu_synid_literal>` are converted by assembler.
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    mul:2                                    Multiply the result by 2.
-    mul:4                                    Multiply the result by 4.
-    div:2                                    Multiply the result by 0.5.
-    ======================================== ================================================
+.. _amdgpu_synid_const_conv:
 
-VOP3 Operand Modifiers
-----------------------
+Inline Constants
+----------------
 
-Operand modifiers are not used separately. They are applied to source operands.
+.. _amdgpu_synid_int_const_conv:
 
-.. _amdgpu_synid_abs:
+Integer Inline Constants
+~~~~~~~~~~~~~~~~~~~~~~~~
 
-abs
-~~~
+Integer :ref:`inline constants<amdgpu_synid_constant>`
+may be thought of as 64-bit
+:ref:`integer numbers<amdgpu_synid_integer_number>`;
+when used as operands they are truncated to the size of
+:ref:`expected operand type<amdgpu_syn_instruction_type>`.
+No data type conversions are performed.
 
-Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any).
-Valid for floating point operands only.
+Examples:
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    abs(<operand>)                           Get absolute value of operand.
-    \|<operand>|                             The same as above.
-    ======================================== ================================================
+.. code-block:: nasm
 
-.. _amdgpu_synid_neg:
+    // GFX9
 
-neg
-~~~
+    v_add_u16 v0, -1, 0    // v0 = 0xFFFF
+    v_add_f16 v0, -1, 0    // v0 = 0xFFFF (NaN)
 
-Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any).
-Valid for floating point operands only.
+    v_add_u32 v0, -1, 0    // v0 = 0xFFFFFFFF
+    v_add_f32 v0, -1, 0    // v0 = 0xFFFFFFFF (NaN)
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    neg(<operand>)                           Get negative value of operand.
-    -<operand>                               The same as above.
-    ======================================== ================================================
+.. _amdgpu_synid_fp_const_conv:
 
-VOP3P Modifiers
----------------
+Floating-Point Inline Constants
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-This section describes modifiers of regular VOP3P instructions.
-*v_mad_mix* modifiers are described :ref:`in a separate section<amdgpu_synid_mad_mix>`.
+Floating-point :ref:`inline constants<amdgpu_synid_constant>`
+may be thought of as 64-bit
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`;
+when used as operands they are converted to a floating-point number of
+:ref:`expected operand size<amdgpu_syn_instruction_type>`.
 
-GFX9 only.
+Examples:
 
-.. _amdgpu_synid_op_sel:
+.. code-block:: nasm
 
-op_sel
-~~~~~~
+    // GFX9
 
-Selects the low [15:0] or high [31:16] operand bits as input to the operation
-which results in the lower-half of the destination.
-By default, low bits are used for all operands.
+    v_add_f16 v0, 1.0, 0    // v0 = 0x3C00 (1.0)
+    v_add_u16 v0, 1.0, 0    // v0 = 0x3C00
 
-The number of values specified with the op_sel modifier must match the number of source
-operands. First value controls src0, second value controls src1 and so on.
-The value 0 selects the low bits, while 1 selects the high bits.
+    v_add_f32 v0, 1.0, 0    // v0 = 0x3F800000 (1.0)
+    v_add_u32 v0, 1.0, 0    // v0 = 0x3F800000
 
-    ======================================== =============================================================
-    Syntax                                   Description
-    ======================================== =============================================================
-    op_sel:[{0..1}]                          Select operand bits for instructions with 1 source operand.
-    op_sel:[{0..1},{0..1}]                   Select operand bits for instructions with 2 source operands.
-    op_sel:[{0..1},{0..1},{0..1}]            Select operand bits for instructions with 3 source operands.
-    ======================================== =============================================================
 
-.. _amdgpu_synid_op_sel_hi:
+.. _amdgpu_synid_lit_conv:
 
-op_sel_hi
-~~~~~~~~~
+Literals
+--------
 
-Selects the low [15:0] or high [31:16] operand bits as input to the operation
-which results in the upper-half of the destination.
-By default, high bits are used for all operands.
+.. _amdgpu_synid_int_lit_conv:
 
-The number of values specified with the op_sel_hi modifier must match the number of source
-operands. First value controls src0, second value controls src1 and so on.
-The value 0 selects the low bits, while 1 selects the high bits.
+Integer Literals
+~~~~~~~~~~~~~~~~
 
-    ======================================== =============================================================
-    Syntax                                   Description
-    ======================================== =============================================================
-    op_sel_hi:[{0..1}]                       Select operand bits for instructions with 1 source operand.
-    op_sel_hi:[{0..1},{0..1}]                Select operand bits for instructions with 2 source operands.
-    op_sel_hi:[{0..1},{0..1},{0..1}]         Select operand bits for instructions with 3 source operands.
-    ======================================== =============================================================
+Integer :ref:`literals<amdgpu_synid_literal>`
+are specified as 64-bit :ref:`integer numbers<amdgpu_synid_integer_number>`.
 
-.. _amdgpu_synid_neg_lo:
+When used as operands they are converted to
+:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
 
-neg_lo
-~~~~~~
+    ============== ============== =============== ====================================================================
+    Expected type  Condition      Result          Note
+    ============== ============== =============== ====================================================================
+    i16, u16, b16  cond(num, 16)  num.u16         Truncate to 16 bits.
+    i32, u32, b32  cond(num, 32)  num.u32         Truncate to 32 bits.
+    i64            cond(num, 32)  {-1, num.i32}   Truncate to 32 bits and then sign-extend the result to 64 bits.
+    u64, b64       cond(num, 32)  { 0, num.u32}   Truncate to 32 bits and then zero-extend the result to 64 bits.
+    f16            cond(num, 16)  num.u16         Use low 16 bits as an f16 value.
+    f32            cond(num, 32)  num.u32         Use low 32 bits as an f32 value.
+    f64            cond(num, 32)  {num.u32, 0}    Use low 32 bits of the number as high 32 bits
+                                                  of the result; low 32 bits of the result are zeroed.
+    ============== ============== =============== ====================================================================
 
-Specifies whether to change sign of operand values selected by
-:ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used
-as input to the operation which results in the upper-half of the destination.
+The condition *cond(X,S)* indicates if a 64-bit number *X*
+can be converted to a smaller size *S* by truncation of upper bits.
+There are two cases when the conversion is possible:
 
-The number of values specified with this modifier must match the number of source
-operands. First value controls src0, second value controls src1 and so on.
+* The truncated bits are all 0.
+* The truncated bits are all 1 and the value after truncation has its MSB bit set.
 
-The value 0 indicates that the corresponding operand value is used unmodified,
-the value 1 indicates that negative value of the operand must be used.
+Examples of valid literals:
 
-By default, operand values are used unmodified.
+.. code-block:: nasm
 
-This modifier is valid for floating point operands only.
+    // GFX9
 
-    ======================================== ==================================================================
-    Syntax                                   Description
-    ======================================== ==================================================================
-    neg_lo:[{0..1}]                          Select affected operands for instructions with 1 source operand.
-    neg_lo:[{0..1},{0..1}]                   Select affected operands for instructions with 2 source operands.
-    neg_lo:[{0..1},{0..1},{0..1}]            Select affected operands for instructions with 3 source operands.
-    ======================================== ==================================================================
+    v_add_u16 v0, 0xff00, v0                     // value after conversion: 0xff00
+    v_add_u16 v0, 0xffffffffffffff00, v0         // value after conversion: 0xff00
+    v_add_u16 v0, -256, v0                       // value after conversion: 0xff00
 
-.. _amdgpu_synid_neg_hi:
+    s_bfe_i64 s[0:1], 0xffefffff, s3             // value after conversion: 0xffffffffffefffff
+    s_bfe_u64 s[0:1], 0xffefffff, s3             // value after conversion: 0x00000000ffefffff
+    v_ceil_f64_e32 v[0:1], 0xffefffff            // value after conversion: 0xffefffff00000000 (-1.7976922776554302e308)
 
-neg_hi
-~~~~~~
+Examples of invalid literals:
 
-Specifies whether to change sign of operand values selected by
-:ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used
-as input to the operation which results in the upper-half of the destination.
+.. code-block:: nasm
 
-The number of values specified with this modifier must match the number of source
-operands. First value controls src0, second value controls src1 and so on.
+    // GFX9
 
-The value 0 indicates that the corresponding operand value is used unmodified,
-the value 1 indicates that negative value of the operand must be used.
+    v_add_u16 v0, 0x1ff00, v0               // conversion is not possible as truncated bits are not all 0 or 1
+    v_add_u16 v0, 0xffffffffffff00ff, v0    // conversion is not possible as truncated bits do not match MSB of the result
 
-By default, operand values are used unmodified.
+.. _amdgpu_synid_fp_lit_conv:
 
-This modifier is valid for floating point operands only.
+Floating-Point Literals
+~~~~~~~~~~~~~~~~~~~~~~~
 
-    ======================================== ==================================================================
-    Syntax                                   Description
-    ======================================== ==================================================================
-    neg_hi:[{0..1}]                          Select affected operands for instructions with 1 source operand.
-    neg_hi:[{0..1},{0..1}]                   Select affected operands for instructions with 2 source operands.
-    neg_hi:[{0..1},{0..1},{0..1}]            Select affected operands for instructions with 3 source operands.
-    ======================================== ==================================================================
+Floating-point :ref:`literals<amdgpu_synid_literal>` are specified as 64-bit
+:ref:`floating-point numbers<amdgpu_synid_floating-point_number>`.
 
-clamp
-~~~~~
+When used as operands they are converted to
+:ref:`expected operand type<amdgpu_syn_instruction_type>` as described below.
 
-See a description :ref:`here<amdgpu_synid_clamp>`.
+    ============== ============== ================= =================================================================
+    Expected type  Condition      Result            Note
+    ============== ============== ================= =================================================================
+    i16, u16, b16  cond(num, 16)  f16(num)          Convert to f16 and use bits of the result as an integer value.
+    i32, u32, b32  cond(num, 32)  f32(num)          Convert to f32 and use bits of the result as an integer value.
+    i64, u64, b64  false          \-                Conversion disabled because of an unclear semantics.
+    f16            cond(num, 16)  f16(num)          Convert to f16.
+    f32            cond(num, 32)  f32(num)          Convert to f32.
+    f64            true           {num.u32.hi, 0}   Use high 32 bits of the number as high 32 bits of the result;
+                                                    zero-fill low 32 bits of the result.
 
-.. _amdgpu_synid_mad_mix:
+                                                    Note that the result may differ from the original number.
+    ============== ============== ================= =================================================================
 
-VOP3P V_MAD_MIX Modifiers
--------------------------
+The condition *cond(X,S)* indicates if an f64 number *X* can be converted
+to a smaller *S*-bit floating-point type without overflow or underflow.
+Precision lost is allowed.
 
-These instructions use VOP3P format but have different modifiers.
+Examples of valid literals:
 
-GFX9 only.
+.. code-block:: nasm
 
-.. _amdgpu_synid_mad_mix_op_sel:
+    // GFX9
 
-mad_mix_op_sel
-~~~~~~~~~~~~~~
+    v_add_f16 v1, 65500.0, v2
+    v_add_f32 v1, 65600.0, v2
 
-This operand has meaning only for 16-bit source operands as indicated by
-:ref:`mad_mix_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`.
-It specifies to select either the low [15:0] or high [31:16] operand bits
-as input to the operation.
+                                                 // value before conversion: 0x7fefffffffffffff (1.7976931348623157e308)
+    v_ceil_f64 v[0:1], 1.7976931348623157e308    // value after conversion:  0x7fefffff00000000 (1.7976922776554302e308)
 
-The value 0 indicates the low bits, the value 1 indicates the high 16 bits.
-By default, low bits are used for all operands.
+Examples of invalid literals:
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    op_sel:[{0..1},{0..1},{0..1}]            Select location of each 16-bit source operand.
-    ======================================== ================================================
+.. code-block:: nasm
 
-.. _amdgpu_synid_mad_mix_op_sel_hi:
+    // GFX9
 
-mad_mix_op_sel_hi
-~~~~~~~~~~~~~~~~~
+    v_add_f16 v1, 65600.0, v2                    // cannot be converted to f16 because of overflow
 
-Selects the size of source operands: either 32 bits or 16 bits.
-By default, 32 bits are used for all source operands.
+.. _amdgpu_synid_exp_conv:
 
-The value 0 indicates 32 bits, the value 1 indicates 16 bits.
-The location of 16 bits in the operand may be specified by
-:ref:`mad_mix_op_sel<amdgpu_synid_mad_mix_op_sel>`.
+Expressions
+~~~~~~~~~~~
 
-    ======================================== ================================================
-    Syntax                                   Description
-    ======================================== ================================================
-    op_sel_hi:[{0..1},{0..1},{0..1}]         Select size of each source operand.
-    ======================================== ================================================
+Expressions operate with and result in 64-bit integers.
 
-abs
-~~~
+When used as operands they are truncated to
+:ref:`expected operand size<amdgpu_syn_instruction_type>`.
+No data type conversions are performed.
 
-See a description :ref:`here<amdgpu_synid_abs>`.
+Examples:
 
-neg
-~~~
+.. code-block:: nasm
 
-See a description :ref:`here<amdgpu_synid_neg>`.
+    // GFX9
 
-clamp
-~~~~~
+    x = 0.1
+    v_sqrt_f32 v0, x           // v0 = [low 32 bits of 0.1 (double)]
+    v_sqrt_f32 v0, (0.1 + 0)   // the same as above
+    v_sqrt_f32 v0, 0.1         // v0 = [0.1 (double) converted to float]
 
-See a description :ref:`here<amdgpu_synid_clamp>`.