SPIR-V Programming Guide¶
SPIR-V is an open, royalty-free, standard intermediate language capable of representing parallel compute kernels. SPIR-V is adaptable to multiple execution environments: a SPIR-V module is consumed by an execution environment, as specified by a client API. This document describes the SPIR-V execution environment for the ‘oneAPI’ Level-Zero API. The SPIR-V execution environment describes required support for some SPIR-V capabilities, additional semantics for some SPIR-V instructions, and additional validation rules that a SPIR-V binary module must adhere to in order to be considered valid.
This document is written for compiler developers who are generating SPIR-V modules intended to be consumed by the ‘oneAPI’ Level-Zero API, for implementors of the ‘oneAPI’ Level-Zero API, and for software developers who are using SPIR-V modules with the ‘oneAPI’ Level-Zero API.
This section describes common properties of all ‘oneAPI’ Level-Zero environments that consume SPIR-V modules.
A SPIR-V module is interpreted as a series of 32-bit words in host endianness, with literal strings packed as described in the SPIR-V specification. The first few words of the SPIR-V module must be a magic number and a SPIR-V version number, as described in the SPIR-V specification.
Supported SPIR-V Versions¶
The maximum SPIR-V version supported by a device is described by ze_device_module_properties_t.spirvVersionSupported.
Source Language Encoding¶
The source language version is purely informational and has no semantic meaning.
Numerical Type Formats¶
Floating-point types are represented and stored using IEEE-754 semantics. All integer formats are represented and stored using 2’s-complement format.
The following types are supported. Note that some types may require additional capabilities, and may not be supported by all environments.
Basic Scalar and Vector Types¶
OpTypeVoid is supported.
The following scalar types are supported:
OpTypeInt, with Width equal to 8, 16, 32, or 64, and with Signedness equal to zero, indicating no signedness semantics.
OpTypeFloat, with Width equal to 16, 32, or 64.
OpTypeVector vector types are supported. The vector Component Type may be any of the scalar types described above. Supported vector Component Counts are 2, 3, 4, 8, or 16.
OpTypeArray array types are supported, OpTypeStruct struct types are supported, OpTypeFunction functions are supported, and OpTypePointer pointer types are supported.
An OpFunction in a SPIR-V module that is identified with OpEntryPoint defines a kernel that may be launched using host API interfaces.
Kernel Return Types¶
The Result Type for an OpFunction identified with OpEntryPoint must be OpTypeVoid.
An OpFunctionParameter for an OpFunction that is identified with OpEntryPoint defines a kernel argument. Allowed types for kernel arguments are:
For OpTypeInt parameters, supported Widths are 8, 16, 32, and 64, and must have no signedness semantics.
For OpTypeFloat parameters, supported Widths are 16 and 32.
For OpTypeStruct parameters, supported structure Member Types are:
For OpTypePointer parameters, supported Storage Classes are:
Environments that support extensions or optional features may allow additional types in an entry point’s parameter list.
An environment that supports SPIR-V 1.0 must support SPIR-V 1.0 modules that declare the following capabilities:
ImageBasic (for devices supporting ze_device_image_properties_t.supported)
Float16 (for devices supporting ZE_DEVICE_MODULE_FLAG_FP16)
Float64 (for devices supporting ZE_DEVICE_MODULE_FLAG_FP64)
Int64Atomics (for devices supporting ZE_DEVICE_MODULE_FLAG_INT64_ATOMICS)
If the ‘oneAPI’ environment supports the ImageBasic capability, then the following capabilities must also be supported:
An environment supporting SPIR-V 1.1 must support SPIR-V 1.1 modules that declare the capabilities required for SPIR-V 1.0 modules, above.
SPIR-V 1.1 does not add any new required capabilities.
An environment supporting SPIR-V 1.2 must support SPIR-V 1.2 modules that declare the capabilities required for SPIR-V 1.1 modules, above.
SPIR-V 1.2 does not add any new required capabilities.
The following are a list of validation rules that apply to SPIR-V modules executing in all ‘oneAPI’ Level-Zero environments:
The Execution Model declared in OpEntryPoint must be Kernel.
The Addressing Model declared in OpMemoryModel must Physical64, indicating that device pointers are 64-bits.
The Memory Model declared in OpMemoryModel must be OpenCL.
For all OpTypeInt integer type-declaration instructions:
Signedness must be 0, indicating no signedness semantics.
For all OpTypeImage type-declaration instructions: * Sampled Type must be OpTypeVoid. * Sampled must be 0, indicating that the image usage will be known at run time, not at compile time. * MS must be 0, indicating single-sampled content. * Arrayed may only be set to 1, indicating arrayed content, when Dim is set to 1D or 2D. * Image Format must be Unknown, indicating that the image does not have a specified format. * The optional image Access Qualifier must be present.
The image write instruction OpImageWrite must not include any optional Image Operands.
The image read instructions OpImageRead and OpImageSampleExplicitLod must not include the optional Image Operand ConstOffset.
For all Atomic Instructions:
32-bit integer types are supported for the Result Type and/or type of Value. 64-bit integer types are optionally supported for the Result Type and/or type of Value for devices supporting ZE_DEVICE_MODULE_FLAG_INT64_ATOMICS.
The Pointer operand must be a pointer to the Function, Workgroup, CrossWorkGroup, or Generic Storage Classes.
Recursion is not supported. The static function call graph for an entry point must not contain cycles.
Whether irreducible control flow is legal is implementation defined.
For the instructions OpGroupAsyncCopy and OpGroupWaitEvents, Scope for Execution must be:
For all other instructions, Scope for Execution must be one of:
Scope for Memory must be one of:
‘oneAPI’ Level-Zero API environments must accept SPIR-V modules that
declare use of the
SPV_INTEL_subgroups extension via
When use of the
SPV_INTEL_subgroups extension is declared in the
module via OpExtension, the environment must accept modules that
declare the following SPIR-V capabilities:
The environment must accept the following types for Data for the SubgroupShuffleINTEL instructions:
Scalars and OpTypeVectors with 2, 4, 8, or 16 Component Count components of the following Component Type types:
OpTypeFloat with a Width of 32 bits (
TBD: char types?
OpTypeInt with a Width of 16 bits and Signedness of 0 (
OpTypeInt with a Width of 32 bits and Signedness of 0 (
Scalars of OpTypeInt with a Width of 64 bits and Signedness of 0 (
TBD: vectors of long types?
Additionally, if the Float16 capability is declared and supported:
Scalars of OpTypeFloat with a Width of 16 bits (
Additionally, if the Float64 capability is declared and supported:
Scalars of OpTypeFloat with a Width of 64 bits (
The environment must accept the following types for Result and Data for the SubgroupBufferBlockIOINTEL and SubgroupImageBlockIOINTEL instructions:
Scalars and OpTypeVectors with 2, 4, or 8 Component Count components of the following Component Type types:
OpTypeInt with a Width of 32 bits and Signedness of 0 (
OpTypeInt with a Width of 16 bits and Signedness of 0 (
For Ptr, valid Storage Classes are:
Dim must be 2D
Depth must be 0 (not a depth image)
Arrayed must be 0 (non-arrayed content)
MS must be 0 (single-sampled content)
For Coordinate, the following types are supported:
OpTypeVectors with two Component Count components of Component Type OpTypeInt with a Width of 32 bits and Signedness of 0 (
Notes and Restrictions¶
The SubgroupShuffleINTEL instructions may be placed within non-uniform control flow and hence do not have to be encountered by all invocations in the subgroup, however Data may only be shuffled among invocations encountering the SubgroupShuffleINTEL instruction. Shuffling Data from an invocation that does not encounter the SubgroupShuffleINTEL instruction will produce undefined results.
There is no defined behavior for out-of-range shuffle indices for the SubgroupShuffleINTEL instructions.
The SubgroupBufferBlockIOINTEL and SubgroupImageBlockIOINTEL instructions are only guaranteed to work correctly if placed strictly within uniform control flow within the subgroup. This ensures that if any invocation executes it, all invocations will execute it. If placed elsewhere, behavior is undefined.
There is no defined out-of-range behavior for the SubgroupBufferBlockIOINTEL instructions.
The SubgroupImageBlockIOINTEL instructions do support bounds
checking, however they bounds-check to the image width in units of
uints, not in units of image elements. This means:
If the image has an Image Format size equal to the size of a
uint(four bytes, for example Rgba8), the image will be correctly bounds-checked. In this case, out-of-bounds reads will return the edge image element (the equivalent of ClampToEdge), and out-of-bounds writes will be ignored.
If the image has an Image Format size less than the size of a
uint(such as R8), the entire image is addressable, however bounds checking will occur too late. For this reason, extra care should be taken to avoid out-of-bounds reads and writes, since out-of-bounds reads may return invalid data and out-of-bounds writes may corrupt other images or buffers unpredictably.
The following restrictions apply to the SubgroupBufferBlockIOINTEL instructions:
The pointer Ptr must be 32-bit (4-byte) aligned for reads, and must be 128-bit (16-byte) aligned for writes.
The following restrictions apply to the SubgroupImageBlockIOINTEL instructions:
The behavior of the SubgroupImageBlockIOINTEL instructions is undefined for images with an element size greater than four bytes (such as Rgba32f).
The following restrictions apply to the OpSubgroupImageBlockWriteINTEL instruction:
Unlike the image block read instruction, which may read from any arbitrary byte offset, the x-component of the byte coordinate for the image block write instruction must be a multiple of four; in other words, the write must begin at a 32-bit boundary. There is no restriction on the y-component of the coordinate.
‘oneAPI’ Level-Zero API environments supporting the extension ZE_extension_float_atomics must support additional atomic instructions, capabilities, and types.
Atomic Load, Store, and Exchange¶
If the ‘oneAPI’ Level-Zero API environment supports the extension ZE_extension_float_atomics and ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_LOAD_STORE or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_LOAD_STORE, then for the Atomic Instructions OpAtomicLoad, OpAtomicStore, and OpAtomicExchange:
16-bit floating-point types are supported for the Result Type and type of Value.
Atomic Add and Subtract¶
If the ‘oneAPI’ Level-Zero API environment supports the extension ZE_extension_float_atomics and ze_device_fp_atomic_ext_flags_t.fp16Flags, ze_device_fp_atomic_ext_flags_t.fp32Flags, or ze_device_fp_atomic_ext_flags_t.fp64Flags include ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, then the environment must accept modules that declare use of the extensions
For the Atomic Instruction OpAtomicFAddEXT added by these extensions:
When ze_device_fp_atomic_ext_flags_t.fp32Flags, ze_device_fp_atomic_ext_flags_t.fp64Flags, or ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD, the Pointer operand may be a pointer to the CrossWorkGroup Storage Class.
When ze_device_fp_atomic_ext_flags_t.fp32Flags, ze_device_fp_atomic_ext_flags_t.fp64Flags, or ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, the Pointer operand may be a pointer to the Workgroup Storage Class.
Atomic Min and Max¶
If the ‘oneAPI’ Level-Zero API environment supports the extension ZE_extension_float_atomics and the ze_device_fp_atomic_ext_flags_t.fp32Flags, ze_device_fp_atomic_ext_flags_t.fp64Flags, or ze_device_fp_atomic_ext_flags_t.fp16Flags bitfields include ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, then the environment must accept modules that declare use of the extension
When ze_device_fp_atomic_ext_flags_t.fp32Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the AtomicFloat32MinMaxEXT capability must be supported.
When ze_device_fp_atomic_ext_flags_t.fp64Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the AtomicFloat64MinMaxEXT capability must be supported.
When ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the AtomicFloat16MinMaxEXT capability must be supported.
For the Atomic Instructions OpAtomicFMinEXT and OpAtomicFMaxEXT added by this extension:
When ze_device_fp_atomic_ext_flags_t.fp16Flags, ze_device_fp_atomic_ext_flags_t.fp32Flags, or ze_device_fp_atomic_ext_flags_t.fp64Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX , the Pointer operand may be a pointer to the CrossWorkGroup Storage Class.
When ze_device_fp_atomic_ext_flags_t.fp16Flags, ze_device_fp_atomic_ext_flags_t.fp32Flags, or ze_device_fp_atomic_ext_flags_t.fp64Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the Pointer operand may be a pointer to the Workgroup Storage Class.
The ‘oneAPI’ Level-Zero environment will meet or exceed the numerical compliance requirements defined in the OpenCL SPIR-V Environment Specification. See: Numerical Compliance.