Execution Model#

This section describes implementing the oneIPL execution environment. The execution environment includes how data is provided to computational functions in Use of Queues, support for several devices in Device Usage, synchronous and asynchronous execution models in Asynchronous Execution, and Host Thread Safety.

Use of Queues#

The sycl::queue defined by the SYCL 2020 specification is used to specify the device and the features enabled on that device to which a task may be enqueued. oneIPL has two forms of computational functions: class-based Member Functions and stand-alone Non-Member Functions. As these forms may interact with the sycl::queue in different ways, a section for each one is provided to describe assumptions.

Member Functions#

The oneIPL class-based API, used for ipl::image, requires a sycl::queue as an argument to the constructor. Implementations shall submit explicit copy operations and check the USM pointer type if it is an argument of the constructor. The check may be disabled for performance reasons. It is recommended to control such checks in the build options.

Non-Member Functions#

The oneIPL non-member computational function takes a sycl::queue reference as its first parameter and a std::vector<sycl::event> as its last parameter with an empty default value. Typically, the following template signature must be implemented:

  • ComputeT - type in which computations are done

  • SrcImageT or SrcBatchT - src image/batch type

  • DstImageT or DstBatchT - dst image/batch type

The API might have multiple input or output type template arguments and function arguments.

In addition to input and output images, the algorithmic parameters must be defined in a separate class with a _spec postfix. If a <function> requires additional arguments, they are provided in the <function>_spec class and the corresponding argument in the function. The spec is parameterized by required template arguments.

For example, here is the Gaussian filter function API:

template <typename ComputeT = float,
          typename SrcImageT,
          typename DstImageT>
sycl::event gaussian(sycl::queue&                                                                      queue,
                     SrcImageT&                                                                        src,
                     DstImageT&                                                                        dst,
                     const gaussian_spec_t<SrcImageT::format_v, typename SrcImageT::data_t, ComputeT>& spec,
                     const std::vector<sycl::event>& dependencies = {})

The ComputeT template argument defines the datatype used for internal computations and defaults to float. If the implementation for a specific type is available, it shall be implemented as a specialization.

This generic algorithm works as a pseudocode template:

template <typename ComputeT = float,
          typename SrcImageT,
          typename DstImageT>
sycl::event algorithm(...)
{
  // ... some implementation code, device kernel part:
  // 1. Cast some data from SrcImageT::data_t type to ComputeT
  // 2. Computations using ComputeT datatype
  // 3. Cast some data from ComputeT to DstImageT::data_t type
}

If ComputeT is not available on the device, the implementation shall handle it as an error.

All computations performed by the function are done on the hardware device(s) associated with the queue, with possible aid from the host, unless otherwise specified. In the case of an ordered queue, all computations are also ordered with respect to other kernels as if enqueued on that queue.

A particular oneIPL implementation may not support the execution of a given oneIPL function on the specified device(s). In this case, the implementation shall either perform the computation on the host or throw an exception. See Error Handling for possible exceptions.

Device Usage#

oneIPL may not provide any interfaces for controlling device usage, e.g., controlling the number of cores used on the CPU or the number of execution units on a GPU. However, such functionality may be available by partitioning a sycl::device instance into subdevices, when supported by the device.

When given a queue associated with such a subdevice, the oneIPL implementations shall only perform computation on that subdevice.

Asynchronous Execution#

oneIPL is designed to allow asynchronous execution of computational functions, facilitating concurrent usage of multiple devices in the system. Each computational function enqueues work to be performed on the selected device, and may, but is not required to, return before execution completes.

Hence, it is the calling application’s responsibility to ensure that any inputs are valid until the computation is completed, and to wait for computation completion before reading any outputs. This behavior can be achieved automatically using DPC++ buffers, or manually when using Unified Shared Memory (USM) pointers, as described in the sections below.

Unless otherwise specified, asynchronous execution is allowed, but not guaranteed, by any oneIPL computational function and may vary between implementations and/or versions. oneIPL implementations must document whether execution is guaranteed to be asynchronous for each supported function. Regardless, calling applications shall not launch any oneIPL computational function with a dependency on a subsequent oneIPL function call, even if this computational function executes asynchronously (i.e., a oneIPL implementation may assume no antidependencies are present). This guarantee allows the oneIPL implementations to reserve resources for execution without risking deadlock.

Synchronization#

ipl::image objects automatically manage synchronization between kernel launches linked by a data dependency (either read-after-write, write-after-write, or write-after-read). oneIPL functions are not required to perform any additional synchronization of ipl::image arguments. However all functions return sycl::event and accept std::vector<sycl::event>, so the explicit synchronization of user-provided kernels and library calls can be done.

Host Thread Safety#

All oneIPL member and non-member functions are host thread safe. That is, they may be safely called simultaneously from concurrent host threads. However, oneIPL objects may not be shared between concurrent host threads unless otherwise specified.