Execution Model

This section describes implementation the execution environment common to all oneIPL functionality. The execution environment includes how data is provided to computational routines in Use of Queues, support for several devices in Device Usage, synchronous and asynchronous execution models in Asynchronous Execution and Host Thread Safety.

Use of Queues

The sycl::queue defined in the SYCL 2020 specification is used to specify the device and features enabled on that device on which a task will be enqueued. oneIPL shall have two forms of computational routines: class-based Member Functions and stand-alone Non-Member Functions. As these forms may interact with the sycl::queue in different ways, a section for each one is provided to describe assumptions.

Non-Member Functions

Each oneIPL non-member computational routine takes a sycl::queue reference as its first parameter, std::vector<sycl::event> as last parameter, which shall have empty default value. Also API working specific image memory type and format shall be specified by template parameters for input and output image. Typically the following template signature must be implemented:

  • ComputeT - type in which computations would be done

  • Format - image format

  • DataT - image pixel component datatype

  • SrcAllocatorT - allcoator for src image

  • DstAllocatorT - allcoator for dst image

In addition to input and output image, the algorithmic parameters must be defined in separate class with _spec postfix. If a <function> requires additional arguments, they shall be provided in <function>_spec class and corresponding argument in the function.

Example. Gaussian filter supporting only floating point computations:

template <typename ComputeT = float,
        formats Format,
        typename DataT,
        typename SrcAllocatorT,
        typename DstAllocatorT>
sycl::event gaussian(sycl::queue&                                  queue,
                     image<Format, DataT, SrcAllocatorT>&          src,
                     image<Format, DataT, DstAllocatorT>&          dst,
                     const gaussian_spec<DataT, Format, ComputeT>& spec,
                     const std::vector<sycl::event>&               dependencies = {})

All computation performed by the routine shall be done on the hardware device(s) associated with this queue, with possible aid from the host, unless otherwise specified. In the case of an ordered queue, all computation shall also be ordered with respect to other kernels as if enqueued on that queue.

A particular oneIPL implementation may not support the execution of a given oneIPL routine on the specified device(s). In this case, the implementation shall either perform the computation on the host or throw an exception. See Error Handling for the possible exceptions.

Member Functions

oneIPL class-based APIs, used for ipl::image require a sycl::queue as an argument to the constructor. Implementation shall submit explicit copy operations and check USM pointer type if it is an argument of the constructor.

Device Usage

oneIPL itself may not provide any interfaces for controlling device usage: for instance, controlling the number of cores used on the CPU, or the number of execution units on a GPU. However, such functionality may be available by partitioning a sycl::device instance into subdevices, when supported by the device.

When given a queue associated with such a subdevice, a oneIPL implementation shall only perform computation on that subdevice.

Asynchronous Execution

The oneIPL API shall be designed to allow asynchronous execution of computational routines, to facilitate concurrent usage of multiple devices in the system. Each computational routine enqueues work to be performed on the selected device, and may (but is not required to) return before execution completes.

Hence, it is the calling application’s responsibility to ensure that any inputs are valid until computation is complete, and likewise to wait for computation completion before reading any outputs. This behavior can be achieved automatically when using DPC++ buffers, or manually when using Unified Shared Memory (USM) pointers, as described in the sections below.

Unless otherwise specified, asynchronous execution is allowed, but not guaranteed, by any oneIPL computational routine, and may vary between implementations and/or versions. oneIPL implementations must clearly document whether execution is guaranteed to be asynchronous for each supported routine. Regardless, calling applications shall not launch any oneIPL computational routine with a dependency on a future oneIPL API call, even if this computational routine executes asynchronously (i.e. a oneIPL implementation may assume no antidependencies are present). This guarantee allows oneIPL implementations to reserve resources for execution without risking deadlock.

Synchronization

ipl::image objects automatically manage synchronization between kernel launches linked by a data dependency (either read-after-write, write-after-write, or write-after-read). oneIPL routines are not required to perform any additional synchronization of ipl::image arguments. However all routines returns sycl::event and accepts std::vector<sycl::event> so the explicit synchronization of user-provided kernels and library calls can be done.

Host Thread Safety

All oneIPL member and non-member functions shall be host thread safe. That is, they may be safely called simultaneously from concurrent host threads. However, oneIPL objects may not be shared between concurrent host threads unless otherwise specified.