Execution Model#

This section describes implementation the execution environment common to all oneIPL functionality. The execution environment includes how data is provided to computational routines in Use of Queues, support for several devices in Device Usage, synchronous and asynchronous execution models in Asynchronous Execution and Host Thread Safety.

Use of Queues#

The sycl::queue defined in the SYCL 2020 specification is used to specify the device and features enabled on that device on which a task will be enqueued. oneIPL shall have two forms of computational routines: class-based Member Functions and stand-alone Non-Member Functions. As these forms may interact with the sycl::queue in different ways, a section for each one is provided to describe assumptions.

Non-Member Functions#

Each oneIPL non-member computational routine takes a sycl::queue reference as its first parameter, std::vector<sycl::event> as last parameter, which shall have empty default value. Typically the following template signature must be implemented:

  • ComputeT - type in which computations would be done

  • SrcImageT or SrcBatchT - src image/batch type

  • DstImageT or DstBatchT - dst image/batch type

API might have multiple input/output type template arguments and funtion argumetns.

In addition to input and output image, the algorithmic parameters must be defined in separate class with _spec postfix. If a <function> requires additional arguments, they shall be provided in <function>_spec class and corresponding argument in the function. Spec shall be parameterized by required template arguments. Example. Gaussian filter API:

template <typename ComputeT = float,
          typename SrcImageT,
          typename DstImageT>
sycl::event gaussian(sycl::queue&                                                                      queue,
                     SrcImageT&                                                                        src,
                     DstImageT&                                                                        dst,
                     const gaussian_spec_t<SrcImageT::format_v, typename SrcImageT::data_t, ComputeT>& spec,
                     const std::vector<sycl::event>& dependencies = {})

ComputeT template argument shall define the datatype used for internal computations and have default value of float. If the implementation for specific type is available, is shall be implemented as specialization. The generic algorithm shall work as pseudocode template:

template <typename ComputeT = float,
          typename SrcImageT,
          typename DstImageT>
sycl::event algorithm(...)
{
  // ... some implementation code, device kernel part:
  // 1. Cast some data from SrcImageT::data_t type to ComputeT
  // 2. Computations using ComputeT datatype
  // 3 .Cast some data from ComputeT to DstImageT::data_t type
}

If ComputeT is not available on device, the implementation shall handle it as error.

All computation performed by the routine shall be done on the hardware device(s) associated with this queue, with possible aid from the host, unless otherwise specified. In the case of an ordered queue, all computation shall also be ordered with respect to other kernels as if enqueued on that queue.

A particular oneIPL implementation may not support the execution of a given oneIPL routine on the specified device(s). In this case, the implementation shall either perform the computation on the host or throw an exception. See Error Handling for the possible exceptions.

Member Functions#

oneIPL class-based APIs, used for ipl::image require a sycl::queue as an argument to the constructor. Implementation shall submit explicit copy operations and check USM pointer type if it is an argument of the constructor.The check can be disabled for performance reasons. The recommended implementation is to control the checks by the build options.

Device Usage#

oneIPL itself may not provide any interfaces for controlling device usage: for instance, controlling the number of cores used on the CPU, or the number of execution units on a GPU. However, such functionality may be available by partitioning a sycl::device instance into subdevices, when supported by the device.

When given a queue associated with such a subdevice, a oneIPL implementation shall only perform computation on that subdevice.

Asynchronous Execution#

The oneIPL API shall be designed to allow asynchronous execution of computational routines, to facilitate concurrent usage of multiple devices in the system. Each computational routine enqueues work to be performed on the selected device, and may (but is not required to) return before execution completes.

Hence, it is the calling application’s responsibility to ensure that any inputs are valid until computation is complete, and likewise to wait for computation completion before reading any outputs. This behavior can be achieved automatically when using DPC++ buffers, or manually when using Unified Shared Memory (USM) pointers, as described in the sections below.

Unless otherwise specified, asynchronous execution is allowed, but not guaranteed, by any oneIPL computational routine, and may vary between implementations and/or versions. oneIPL implementations must clearly document whether execution is guaranteed to be asynchronous for each supported routine. Regardless, calling applications shall not launch any oneIPL computational routine with a dependency on a future oneIPL API call, even if this computational routine executes asynchronously (i.e. a oneIPL implementation may assume no antidependencies are present). This guarantee allows oneIPL implementations to reserve resources for execution without risking deadlock.

Synchronization#

ipl::image objects automatically manage synchronization between kernel launches linked by a data dependency (either read-after-write, write-after-write, or write-after-read). oneIPL routines are not required to perform any additional synchronization of ipl::image arguments. However all routines returns sycl::event and accepts std::vector<sycl::event> so the explicit synchronization of user-provided kernels and library calls can be done.

Host Thread Safety#

All oneIPL member and non-member functions shall be host thread safe. That is, they may be safely called simultaneously from concurrent host threads. However, oneIPL objects may not be shared between concurrent host threads unless otherwise specified.