The execution environment includes how data is provided to computational routines in Use of Queues, support for several devices in Device Usage, synchronous and asynchronous execution models in Asynchronous Execution and Host Thread Safety.
Use of Queues¶
sycl::queue defined in the oneAPI DPC++ specification is used to specify the device and features enabled on that device on which a task will be enqueued.
Each oneDTL non-member computational routine must take a
sycl::queue reference as its first parameter.
For all functions, data type and data dimension must be specified as template parameters as well. Typically the following template signature must be implemented:
DataT- type of source or destination data (integer/float)
Dimension- dimension of data array
Example. ZFP compression method:
template <typename DataT, std::uint32_t Dimension> sycl::event encode(sycl::queue& queue, zfp_field<DataT, Dimension>& source, zfp_compressed_stream<DataT, Dimension>& stream)
All computation performed by the routine must be done on the hardware device(s) associated with this queue, with possible aid from the host, unless otherwise specified. In the case of an ordered queue, all computation must also be ordered with respect to other kernels as if enqueued on that queue.
A particular oneDTL implementation may not support the execution of a given oneDTL routine on the specified device(s). In this case, the implementation must either perform the computation on the host or throw an exception. See Error Handling for the possible exceptions.
Device usage control (e.g. controlling the number of cores used on the CPU, or the number of execution units on a GPU) can be done by partitioning a
sycl::device instance into subdevices, when supported by the device.
When given a queue associated with such subdevice and if a oneDTL implementation supports the device, it must only perform computation on that subdevice.
The oneDTL API is designed to allow asynchronous execution of computational routines, to facilitate concurrent usage of multiple devices in the system. Each computational routine enqueues job to be performed on the selected device, and it may be returned before execution completes in case of error occurred.
Calling applications must ensure that any inputs are valid until computation is complete, and likewise to wait for computation completion before reading any outputs. This behavior can be achieved automatically when using DPC++ buffers.
Unless otherwise specified, asynchronous execution is allowed, but not guaranteed, by any oneDTL computational routine, and may vary between implementations and/or versions. oneDTL implementations must clearly document whether execution is guaranteed to be asynchronous for each supported routine. Regardless, calling applications must not launch any oneDTL computational routine with a dependency on a future oneDTL API call, even if this computational routine executes asynchronously (i.e., a oneDTL implementation may assume no anti dependencies are present). This guarantee allows oneDTL implementations to reserve resources for execution without risking deadlock.
Host Thread Safety¶
All oneDTL member and non-member functions must be host thread safe. That is, they may be safely called simultaneously from concurrent host threads. However, oneDTL objects should not be shared between concurrent host threads unless otherwise specified.