Batch Processing#

This section documents the batch processing API of oneIPL. Batch processing provides performance improvements on specific HW (e.g., GPUs or CPUs with a lot of cores) when processing multiple images in single function call.

Batch processing shall be supported for subset of functions defined in the spec. It provides the following transformations:

Batch processing for uniform input batch and uniform input batch.
Batch processing for non-uniform input batch and uniform output batch (e.g. resize to required size).

The uniform batch describes images/ROIs of the same sizes. The non-uniform batch describes images/ROIs of different sizes. The non-uniform input batch and uniform output batch describes images/ROIs of the different input sizes and the same output sizes, common to tasks such as image pre-processing for AI pipelines.

An illustration of non-uniform input and uniform output batch processing is shown below:

Batches can be constructed from arbitrary images and ROI locations. For example, batches can consist of multiple ROIs of one or several images. The figure below illustrates a batch consisting of multiple ROIs from two images:

The batch processing API follows these conventions:

For functions that perform computations, ComputeT is a required template parameter. For functions that do not perform computations (e.g., functions that only copy data), ComputeT is not required. The default type for ComputeT is float.
Compile-time checks restrict APIs as much as possible using SFINAE constructs.
Functions targeting to a device must take a device queue_t, source data, and destination data.
Functions may optionally take spec with additional algorithmic parameters and dependencies to other asynchronous calls (generic list of sycl::event objects).
Default values for spec and dependencies must be in the function declaration. They can be made optional.
The spec parameter is independent of metadata. Parameters that are dependent on metadata are provided as separate parameters. An example of a parameter dependent on metadata is show by pixel_t, pixel values, below.

Batch processing has the following syntax:

namespace oneapi {
namespace ipl {

  // No parameters are dependent on metadata. Dependency on compute type.
  template <typename ComputeT = float,
  template <typename SrcBatchT,
            typename DstBatchT>
  sycl::event algorithm1(sycl::queue&                         queue,
                         SrcBatchT&                           src,
                         DstBatchT&                           dst,
                         const algorithm1_spec<ComputeT>&     spec         = {},
                         const std::vector<sycl::event>&      dependencies = {});

  // Parameters are dependent on metadata. Dependency on compute type.
  template <typename ComputeT = float,
            typename SrcImageT,
            typename DstImageT>
  sycl::event algorithm2(sycl::queue&                         queue,
                         SrcBatchT&                           src,
                         DstBatchT&                           dst,
                         const algorithm2_spec<ComputeT>&     spec          = {},
                         const typename SrcImageT::pixel_t&   dependent_param = {},
                         const std::vector<sycl::event>&      dependencies  = {});

  // No parameters are dependent on metadata. No dependency on compute type.
  template <typename SrcBatchT,
            typename DstBatchT>
  sycl::event algorithm3(sycl::queue&                    queue,
                         SrcBatchT&                      src,
                         DstBatchT&                      dst,
                         const algorithm3_spec&          spec         = {},
                         const std::vector<sycl::event>& dependencies = {})
}
}