SPIR-V Programming Guide
Contents
SPIR-V Programming Guide¶
Introduction¶
SPIR-V is an open, royalty-free, standard intermediate language capable of representing parallel compute kernels. SPIR-V is adaptable to multiple execution environments: a SPIR-V module is consumed by an execution environment, as specified by a client API. This document describes the SPIR-V execution environment for the ‘oneAPI’ Level-Zero API. The SPIR-V execution environment describes required support for some SPIR-V capabilities, additional semantics for some SPIR-V instructions, and additional validation rules that a SPIR-V binary module must adhere to in order to be considered valid.
This document is written for compiler developers who are generating SPIR-V modules intended to be consumed by the ‘oneAPI’ Level-Zero API, for implementors of the ‘oneAPI’ Level-Zero API, and for software developers who are using SPIR-V modules with the ‘oneAPI’ Level-Zero API.
Common Properties¶
This section describes common properties of all ‘oneAPI’ Level-Zero environments that consume SPIR-V modules.
A SPIR-V module is interpreted as a series of 32-bit words in host endianness, with literal strings packed as described in the SPIR-V specification. The first few words of the SPIR-V module must be a magic number and a SPIR-V version number, as described in the SPIR-V specification.
Supported SPIR-V Versions¶
The maximum SPIR-V version supported by a device is described by ze_device_module_properties_t.spirvVersionSupported.
Extended Instruction Sets¶
The OpenCL.std extended instruction set for OpenCL is supported.
Source Language Encoding¶
The source language version is purely informational and has no semantic meaning.
Numerical Type Formats¶
Floating-point types are represented and stored using IEEE-754 semantics. All integer formats are represented and stored using 2’s-complement format.
Supported Types¶
The following types are supported. Note that some types may require additional capabilities, and may not be supported by all environments.
Basic Scalar and Vector Types¶
OpTypeVoid is supported.
The following scalar types are supported:
OpTypeBool
OpTypeInt, with Width equal to 8, 16, 32, or 64, and with Signedness equal to zero, indicating no signedness semantics.
OpTypeFloat, with Width equal to 16, 32, or 64.
OpTypeVector vector types are supported. The vector Component Type may be any of the scalar types described above. Supported vector Component Counts are 2, 3, 4, 8, or 16.
OpTypeArray array types are supported, OpTypeStruct struct types are supported, OpTypeFunction functions are supported, and OpTypePointer pointer types are supported.
Kernels¶
An OpFunction in a SPIR-V module that is identified with OpEntryPoint defines a kernel that may be launched using host API interfaces.
Kernel Return Types¶
The Result Type for an OpFunction identified with OpEntryPoint must be OpTypeVoid.
Kernel Arguments¶
An OpFunctionParameter for an OpFunction that is identified with OpEntryPoint defines a kernel argument. Allowed types for kernel arguments are:
OpTypeInt
OpTypeFloat
OpTypeStruct
OpTypeVector
OpTypePointer
OpTypeSampler
OpTypeImage
For OpTypeInt parameters, supported Widths are 8, 16, 32, and 64, and must have no signedness semantics.
For OpTypeFloat parameters, supported Widths are 16 and 32.
For OpTypeStruct parameters, supported structure Member Types are:
OpTypeInt
OpTypeFloat
OpTypeStruct
OpTypeVector
OpTypePointer
For OpTypePointer parameters, supported Storage Classes are:
CrossWorkgroup
Workgroup
UniformConstant
Environments that support extensions or optional features may allow additional types in an entry point’s parameter list.
Required Capabilities¶
SPIR-V 1.0¶
An environment that supports SPIR-V 1.0 must support SPIR-V 1.0 modules that declare the following capabilities:
Addresses
Float16Buffer
Int64
Int16
Int8
Kernel
Linkage
Vector16
GenericPointer
Groups
ImageBasic (for devices supporting ze_device_image_properties_t.supported)
Float16 (for devices supporting ZE_DEVICE_MODULE_FLAG_FP16)
Float64 (for devices supporting ZE_DEVICE_MODULE_FLAG_FP64)
Int64Atomics (for devices supporting ZE_DEVICE_MODULE_FLAG_INT64_ATOMICS)
If the ‘oneAPI’ environment supports the ImageBasic capability, then the following capabilities must also be supported:
LiteralSampler
Sampled1D
Image1D
SampledBuffer
ImageBuffer
ImageReadWrite
SPIR-V 1.1¶
An environment supporting SPIR-V 1.1 must support SPIR-V 1.1 modules that declare the capabilities required for SPIR-V 1.0 modules, above.
SPIR-V 1.1 does not add any new required capabilities.
SPIR-V 1.2¶
An environment supporting SPIR-V 1.2 must support SPIR-V 1.2 modules that declare the capabilities required for SPIR-V 1.1 modules, above.
SPIR-V 1.2 does not add any new required capabilities.
Validation Rules¶
The following are a list of validation rules that apply to SPIR-V modules executing in all ‘oneAPI’ Level-Zero environments:
The Execution Model declared in OpEntryPoint must be Kernel.
The Addressing Model declared in OpMemoryModel must Physical64, indicating that device pointers are 64-bits.
The Memory Model declared in OpMemoryModel must be OpenCL.
For all OpTypeInt integer type-declaration instructions:
Signedness must be 0, indicating no signedness semantics.
For all OpTypeImage type-declaration instructions: * Sampled Type must be OpTypeVoid. * Sampled must be 0, indicating that the image usage will be known at run time, not at compile time. * MS must be 0, indicating single-sampled content. * Arrayed may only be set to 1, indicating arrayed content, when Dim is set to 1D or 2D. * Image Format must be Unknown, indicating that the image does not have a specified format. * The optional image Access Qualifier must be present.
The image write instruction OpImageWrite must not include any optional Image Operands.
The image read instructions OpImageRead and OpImageSampleExplicitLod must not include the optional Image Operand ConstOffset.
For all Atomic Instructions:
32-bit integer types are supported for the Result Type and/or type of Value. 64-bit integer types are optionally supported for the Result Type and/or type of Value for devices supporting ZE_DEVICE_MODULE_FLAG_INT64_ATOMICS.
The Pointer operand must be a pointer to the Function, Workgroup, CrossWorkGroup, or Generic Storage Classes.
Recursion is not supported. The static function call graph for an entry point must not contain cycles.
Whether irreducible control flow is legal is implementation defined.
For the instructions OpGroupAsyncCopy and OpGroupWaitEvents, Scope for Execution must be:
Workgroup
For all other instructions, Scope for Execution must be one of:
Workgroup
Subgroup
Scope for Memory must be one of:
CrossDevice
Device
Workgroup
Invocation
Subgroup
Extensions¶
Intel Subgroups¶
‘oneAPI’ Level-Zero API environments must accept SPIR-V modules that
declare use of the SPV_INTEL_subgroups
extension via
OpExtension.
When use of the SPV_INTEL_subgroups
extension is declared in the
module via OpExtension, the environment must accept modules that
declare the following SPIR-V capabilities:
SubgroupShuffleINTEL
SubgroupBufferBlockIOINTEL
SubgroupImageBlockIOINTEL
The environment must accept the following types for Data for the SubgroupShuffleINTEL instructions:
Scalars and OpTypeVectors with 2, 4, 8, or 16 Component Count components of the following Component Type types:
OpTypeFloat with a Width of 32 bits (
float
)OpTypeInt with a Width of 8 bits and Signedness of 0 (
char
anduchar
)OpTypeInt with a Width of 16 bits and Signedness of 0 (
short
andushort
)OpTypeInt with a Width of 32 bits and Signedness of 0 (
int
anduint
)
Scalars of OpTypeInt with a Width of 64 bits and Signedness of 0 (
long
andulong
)
Additionally, if the Float16 capability is declared and supported:
Scalars of OpTypeFloat with a Width of 16 bits (
half
)
Additionally, if the Float64 capability is declared and supported:
Scalars of OpTypeFloat with a Width of 64 bits (
double
)
The environment must accept the following types for Result and Data for the SubgroupBufferBlockIOINTEL and SubgroupImageBlockIOINTEL instructions:
Scalars and OpTypeVectors with 2, 4, or 8 Component Count components of the following Component Type types:
OpTypeInt with a Width of 32 bits and Signedness of 0 (
int
anduint
)OpTypeInt with a Width of 16 bits and Signedness of 0 (
short
andushort
)
For Ptr, valid Storage Classes are:
CrossWorkGroup (
global
)
For Image:
Dim must be 2D
Depth must be 0 (not a depth image)
Arrayed must be 0 (non-arrayed content)
MS must be 0 (single-sampled content)
For Coordinate, the following types are supported:
OpTypeVectors with two Component Count components of Component Type OpTypeInt with a Width of 32 bits and Signedness of 0 (
int2
)
Notes and Restrictions¶
The SubgroupShuffleINTEL instructions may be placed within non-uniform control flow and hence do not have to be encountered by all invocations in the subgroup, however Data may only be shuffled among invocations encountering the SubgroupShuffleINTEL instruction. Shuffling Data from an invocation that does not encounter the SubgroupShuffleINTEL instruction will produce undefined results.
There is no defined behavior for out-of-range shuffle indices for the SubgroupShuffleINTEL instructions.
The SubgroupBufferBlockIOINTEL and SubgroupImageBlockIOINTEL instructions are only guaranteed to work correctly if placed strictly within uniform control flow within the subgroup. This ensures that if any invocation executes it, all invocations will execute it. If placed elsewhere, behavior is undefined.
There is no defined out-of-range behavior for the SubgroupBufferBlockIOINTEL instructions.
The SubgroupImageBlockIOINTEL instructions do support bounds
checking, however they bounds-check to the image width in units of
uints
, not in units of image elements. This means:
If the image has an Image Format size equal to the size of a
uint
(four bytes, for example Rgba8), the image will be correctly bounds-checked. In this case, out-of-bounds reads will return the edge image element (the equivalent of ClampToEdge), and out-of-bounds writes will be ignored.If the image has an Image Format size less than the size of a
uint
(such as R8), the entire image is addressable, however bounds checking will occur too late. For this reason, extra care should be taken to avoid out-of-bounds reads and writes, since out-of-bounds reads may return invalid data and out-of-bounds writes may corrupt other images or buffers unpredictably.
The following restrictions apply to the SubgroupBufferBlockIOINTEL instructions:
The pointer Ptr must be 32-bit (4-byte) aligned for reads, and must be 128-bit (16-byte) aligned for writes.
The following restrictions apply to the SubgroupImageBlockIOINTEL instructions:
The behavior of the SubgroupImageBlockIOINTEL instructions is undefined for images with an element size greater than four bytes (such as Rgba32f).
The following restrictions apply to the OpSubgroupImageBlockWriteINTEL instruction:
Unlike the image block read instruction, which may read from any arbitrary byte offset, the x-component of the byte coordinate for the image block write instruction must be a multiple of four; in other words, the write must begin at a 32-bit boundary. There is no restriction on the y-component of the coordinate.
Floating-Point Atomics¶
‘oneAPI’ Level-Zero API environments supporting the extension ZE_extension_float_atomics must support additional atomic instructions, capabilities, and types.
Atomic Load, Store, and Exchange¶
If the ‘oneAPI’ Level-Zero API environment supports the extension ZE_extension_float_atomics and ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_LOAD_STORE or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_LOAD_STORE, then for the Atomic Instructions OpAtomicLoad, OpAtomicStore, and OpAtomicExchange:
16-bit floating-point types are supported for the Result Type and type of Value.
When ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_LOAD_STORE, the Pointer operand may be a pointer to the CrossWorkGroup Storage Class.
When ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_LOAD_STORE, the Pointer operand may be a pointer to the Workgroup Storage Class.
Atomic Add and Subtract¶
If the ‘oneAPI’ Level-Zero API environment supports the extension ZE_extension_float_atomics and ze_device_fp_atomic_ext_flags_t.fp16Flags, ze_device_fp_atomic_ext_flags_t.fp32Flags, or ze_device_fp_atomic_ext_flags_t.fp64Flags include ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, then the environment must accept modules that declare use of the extensions SPV_EXT_shader_atomic_float_add
and SPV_EXT_shader_atomic_float16_add
.
Additionally:
When ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, the AtomicFloat16AddEXT capability must be supported.
When ze_device_fp_atomic_ext_flags_t.fp32Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, the AtomicFloat32AddEXT capability must be supported.
When ze_device_fp_atomic_ext_flags_t.fp64Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, the AtomicFloat64AddEXT capability must be supported.
For the Atomic Instruction OpAtomicFAddEXT added by these extensions:
When ze_device_fp_atomic_ext_flags_t.fp32Flags, ze_device_fp_atomic_ext_flags_t.fp64Flags, or ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_ADD, the Pointer operand may be a pointer to the CrossWorkGroup Storage Class.
When ze_device_fp_atomic_ext_flags_t.fp32Flags, ze_device_fp_atomic_ext_flags_t.fp64Flags, or ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_ADD, the Pointer operand may be a pointer to the Workgroup Storage Class.
Atomic Min and Max¶
If the ‘oneAPI’ Level-Zero API environment supports the extension ZE_extension_float_atomics and the ze_device_fp_atomic_ext_flags_t.fp32Flags, ze_device_fp_atomic_ext_flags_t.fp64Flags, or ze_device_fp_atomic_ext_flags_t.fp16Flags bitfields include ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, then the environment must accept modules that declare use of the extension SPV_EXT_shader_atomic_float_min_max
.
Additionally:
When ze_device_fp_atomic_ext_flags_t.fp32Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the AtomicFloat32MinMaxEXT capability must be supported.
When ze_device_fp_atomic_ext_flags_t.fp64Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the AtomicFloat64MinMaxEXT capability must be supported.
When ze_device_fp_atomic_ext_flags_t.fp16Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX or ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the AtomicFloat16MinMaxEXT capability must be supported.
For the Atomic Instructions OpAtomicFMinEXT and OpAtomicFMaxEXT added by this extension:
When ze_device_fp_atomic_ext_flags_t.fp16Flags, ze_device_fp_atomic_ext_flags_t.fp32Flags, or ze_device_fp_atomic_ext_flags_t.fp64Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_GLOBAL_MIN_MAX , the Pointer operand may be a pointer to the CrossWorkGroup Storage Class.
When ze_device_fp_atomic_ext_flags_t.fp16Flags, ze_device_fp_atomic_ext_flags_t.fp32Flags, or ze_device_fp_atomic_ext_flags_t.fp64Flags includes ZE_DEVICE_FP_ATOMIC_EXT_FLAG_LOCAL_MIN_MAX, the Pointer operand may be a pointer to the Workgroup Storage Class.
Extended Subgroups¶
‘oneAPI’ Level-Zero API environments supporting the extension ZE_extension_subgroups must support additional subgroup instructions, capabilities, and types.
Extended Types¶
The following Groups instructions must be supported with Scope for Execution equal to Subgroup:
OpGroupBroadcast
OpGroupIAdd, OpGroupFAdd
OpGroupSMin, OpGroupUMin, OpGroupFMin
OpGroupSMax, OpGroupUMax, OpGroupFMax
For these instructions, valid types for Value are:
Scalars of supported types:
OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
)OpTypeFloat (equivalent to
half
,float
, anddouble
)
Additionally, for OpGroupBroadcast, valid types for Value are:
OpTypeVectors with 2, 3, 4, 8, or 16 Component Count components of supported types:
OpTypeInt (equivalent to
charn
,ucharn
,shortn
,ushortn
,intn
,uintn
,longn
, andulongn
)OpTypeFloat (equivalent to
halfn
,floatn
, anddoublen
)
Vote¶
The following capabilities must be supported:
GroupNonUniform
GroupNonUniformVote
For instructions requiring these capabilities, Scope for Execution may be:
Subgroup
For the instruction OpGroupNonUniformAllEqual, valid types for Value are:
Scalars of supported types:
OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
)OpTypeFloat (equivalent to
half
,float
, anddouble
)
Ballot¶
The following capabilities must be supported:
GroupNonUniformBallot
For instructions requiring these capabilities, Scope for Execution may be:
Subgroup
For the non-uniform broadcast instruction OpGroupNonUniformBroadcast, valid types for Value are:
Scalars of supported types:
OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
)OpTypeFloat (equivalent to
half
,float
, anddouble
)
OpTypeVectors with 2, 3, 4, 8, or 16 Component Count components of supported types:
OpTypeInt (equivalent to
charn
,ucharn
,shortn
,ushortn
,intn
,uintn
,longn
, andulongn
)OpTypeFloat (equivalent to
halfn
,floatn
, anddoublen
)
For the instruction OpGroupNonUniformBroadcastFirst, valid types for Value are:
Scalars of supported types:
OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
)OpTypeFloat (equivalent to
half
,float
, anddouble
)
For the instruction OpGroupNonUniformBallot, the valid Result Type is an
OpTypeVector with four Component Count components of OpTypeInt, with Width
equal to 32 and Signedness equal to 0 (equivalent to uint4
).
For the instructions OpGroupNonUniformInverseBallot, OpGroupNonUniformBallotBitExtract, OpGroupNonUniformBallotBitCount, OpGroupNonUniformBallotFindLSB, and OpGroupNonUniformBallotFindMSB, the valid type for Value is an OpTypeVector with four Component Count components of OpTypeInt, with Width equal to 32 and Signedness equal to 0 (equivalent to uint4).
For built-in variables decorated with SubgroupEqMask, SubgroupGeMask,
SubgroupGtMask, SubgroupLeMask, or SubgroupLtMask, the supported
variable type is an OpTypeVector with four Component Count components of
OpTypeInt, with Width equal to 32 and Signedness equal to 0 (equivalent
to uint4
).
Non-Uniform Arithmetic¶
The following capabilities must be supported:
GroupNonUniformArithmetic
For instructions requiring these capabilities, Scope for Execution may be:
Subgroup
For the instructions OpGroupNonUniformLogicalAnd, OpGroupNonUniformLogicalOr, and OpGroupNonUniformLogicalXor, the valid type for Value is OpTypeBool.
Otherwise, for the GroupNonUniformArithmetic scan and reduction instructions, valid types for Value are:
Scalars of supported types:
OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
)OpTypeFloat (equivalent to
half
,float
, anddouble
)
For the GroupNonUniformArithmetic scan and reduction instructions, the optional ClusterSize operand must not be present.
Shuffles¶
The following capabilities must be supported:
GroupNonUniformShuffle
For instructions requiring these capabilities, Scope for Execution may be:
Subgroup
For the instructions OpGroupNonUniformShuffle and OpGroupNonUniformShuffleXor requiring these capabilities, valid types for Value are:
Scalars of supported types:
OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
)OpTypeFloat (equivalent to
half
,float
, anddouble
)
Relative Shuffles¶
The following capabilities must be supported:
GroupNonUniformShuffleRelative
For instructions requiring these capabilities, Scope for Execution may be:
Subgroup
For the GroupNonUniformShuffleRelative instructions, valid types for Value are:
Scalars of supported types:
OpTypeInt (equivalent to
char
,uchar
,short
,ushort
,int
,uint
,long
, andulong
)OpTypeFloat (equivalent to
half
,float
, anddouble
)
Clustered Reductions¶
The following capabilities must be supported:
GroupNonUniformClustered
For instructions requiring these capabilities, Scope for Execution may be:
Subgroup
When the GroupNonUniformClustered capability is declared, the GroupNonUniformArithmetic scan and reduction instructions may include the optional ClusterSize operand.
Linkonce ODR¶
‘oneAPI’ Level-Zero API environments supporting the extension
ZE_extension_linkonce_odr must must accept SPIR-V modules that declare use of the SPV_KHR_linkonce_odr
extension via OpExtension.
When use of the SPV_KHR_linkonce_odr
extension is declared in the
module via OpExtension, the environment must accept modules that
include the LinkOnceODR linkage type.
Bfloat16 Conversions¶
‘oneAPI’ Level-Zero API environments supporting the extension
ZE_extension_bfloat16_conversions must must accept SPIR-V modules that declare use of the SPV_INTEL_bloat16_conversion
extension via OpExtension.
When use of the SPV_INTEL_bloat16_conversion
extension is declared in the
module via OpExtension, the environment must accept modules that
declare the Bfloat16ConversionINTEL capability.
For the instructions OpConvertFToBF16INTEL and OpConvertBF16ToFINTEL added by the extension:
Valid types for Result Type, Float Value, and Bfloat16 Value are Scalars and OpTypeVectors with 2, 3, 4, 8, or 16 Component Count components
Numerical Compliance¶
The ‘oneAPI’ Level-Zero environment will meet or exceed the numerical compliance requirements defined in the OpenCL SPIR-V Environment Specification. See: Numerical Compliance.
Image Addressing and Filtering¶
The ‘oneAPI’ Level-Zero environment image addressing and filtering behavior is compatible with the behavior defined in the OpenCL SPIR-V Environment Specification. See: Image Addressing and Filtering.